* [RFC 01/10] kmod: add test driver to stress test the module loader
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
@ 2016-12-08 18:47 ` Luis R. Rodriguez
2016-12-08 20:24 ` Kees Cook
2016-12-08 19:48 ` [RFC 02/10] module: fix memory leak on early load_module() failures Luis R. Rodriguez
` (9 subsequent siblings)
10 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 18:47 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, rgoldwyn, subashab, xypron.glpk, keescook,
atomlin, mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo,
akpm, torvalds, linux-kselftest, linux-doc, linux-kernel,
Luis R. Rodriguez
This adds a new stress test driver for kmod: the kernel module loader.
The new stress test driver, test_kmod, is only enabled as a module right
now. It should be possible to load this as built-in and load tests early
(refer to the force_init_test module parameter), however since a lot of
test can get a system out of memory fast we leave this disabled for now.
Using a system with 1024 MiB of RAM can *easily* get your kernel
OOM fast with this test driver.
The test_kmod driver exposes API knobs for us to fine tune simple
request_module() and get_fs_type() calls. Since these API calls
only allow each one parameter a test driver for these is rather
simple. Other factors that can help out test driver though are
the number of calls we issue and knowing current limitations of
each. This exposes configuration as much as possible through
userspace to be able to build tests directly from userspace.
Since it allows multiple misc devices its will eventually (once we
add a knob to let us create new devices at will) also be possible to
perform more tests in parallel, provided you have enough memory.
We only enable tests we know work as of right now.
Demo screenshots:
# tools/testing/selftests/kmod/kmod.sh
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0002_driver: OK! - loading kmod test
kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0002_fs: OK! - loading kmod test
kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
kmod_test_0003: OK! - loading kmod test
kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0004: OK! - loading kmod test
kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0005: OK! - loading kmod test
kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
kmod_test_0006: OK! - loading kmod test
kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
Test completed
You can also request for specific tests:
# tools/testing/selftests/kmod/kmod.sh -t 0001
kmod_test_0001_driver: OK! - loading kmod test
kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
kmod_test_0001_fs: OK! - loading kmod test
kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
Test completed
Lastly, the current available number of tests:
# tools/testing/selftests/kmod/kmod.sh --help
Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
Valid tests: 0001-0009
0001 - Simple test - 1 thread for empty string
0002 - Simple test - 1 thread for modules/filesystems that do not exist
0003 - Simple test - 1 thread for get_fs_type() only
0004 - Simple test - 2 threads for get_fs_type() only
0005 - multithreaded tests with default setup - request_module() only
0006 - multithreaded tests with default setup - get_fs_type() only
0007 - multithreaded tests with default setup test request_module() and get_fs_type()
0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
The following test cases currently fail, as such they are not currently
enabled by default:
# tools/testing/selftests/kmod/kmod.sh -t 0007
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
# tools/testing/selftests/kmod/kmod.sh -t 0010
# tools/testing/selftests/kmod/kmod.sh -t 0011
To be sure to run them as intended please unload both of the modules:
o test_module
o xfs
And ensure they are not loaded on your system prior to testing them.
If you use these paritions for your rootfs you can change the default
test driver used for get_fs_type() by exporting it into your
environment. For example of other test defaults you can override
refer to kmod.sh allow_user_defaults().
Behind the scenes this is how we fine tune at a test case prior to
hitting a trigger to run it:
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
cat /sys/devices/virtual/misc/test_kmod0/config
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
Finally to trigger:
echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
The kmod.sh script uses the above constructs to build differnt test cases.
A bit of interpretation of the current failures follows, first two
premises:
a) When request_module() is used userspace figures out an optimized version of
module order for us. Once it finds the modules it needs, as per depmod
symbol dep map, it will finit_module() the respective modules which
are needed for the original request_module() request.
b) We have an optimization in place whereby if a kernel uses
request_module() on a module already loaded we never bother
userspace as the module already is loaded. This is all handled by
kernel/kmod.c.
A few things to consider to help identify root causes of issues:
0) kmod 19 has a broken heuristic for modules being assumed to be
built-in to your kernel and will return 0 even though request_module()
failed. Upgrade to a newer version of kmod.
1) A get_fs_type() call for "xfs" will request_module() for
"fs-xfs", not for "xfs". The optimization in kernel described in b)
fails to catch if we have a lot of consecutive get_fs_type() calls.
The reason is the optimization in place does not look for aliases. This
means two consecutive get_fs_type() calls will bump kmod_concurrent, whereas
request_module() will not.
This one explanation why test case 0009 fails at least once for
get_fs_type().
2) If a module fails to load --- for whatever reason (kmod_concurrent
limit reached, file not yet present due to rootfs switch, out of memory)
we have a period of time during which module request for the same name
either with request_module() or get_fs_type() will *also* fail to load
even if the file for the module is ready.
This explains why *multiple* NULLs are possible on test 0009.
3) finit_module() consumes quite a bit of memory.
4) Filesystems typically also have more dependent modules than other
modules, its important to note though that even though a get_fs_type() call
does not incur additional kmod_concurrent bumps, since userspace
loads dependencies it finds it needs via finit_module_fd(), it *will*
take much more memory to load a module with a lot of dependencies.
Because of 3) and 4) we will easily run into out of memory failures
with certain tests. For instance test 0006 fails on qemu with 1024 MiB
of RAM. It panics a box after reaping all userspace processes and still
not having enough memory to reap.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
lib/Kconfig.debug | 25 +
lib/Makefile | 1 +
lib/test_kmod.c | 1248 +++++++++++++++++++++++++++++++++
tools/testing/selftests/kmod/Makefile | 11 +
tools/testing/selftests/kmod/config | 7 +
tools/testing/selftests/kmod/kmod.sh | 449 ++++++++++++
6 files changed, 1741 insertions(+)
create mode 100644 lib/test_kmod.c
create mode 100644 tools/testing/selftests/kmod/Makefile
create mode 100644 tools/testing/selftests/kmod/config
create mode 100755 tools/testing/selftests/kmod/kmod.sh
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 7446097f72bd..6cad548e0682 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1994,6 +1994,31 @@ config BUG_ON_DATA_CORRUPTION
If unsure, say N.
+config TEST_KMOD
+ tristate "kmod stress tester"
+ default n
+ depends on m
+ select TEST_LKM
+ select XFS_FS
+ select TUN
+ select BTRFS_FS
+ help
+ Test the kernel's module loading mechanism: kmod. kmod implements
+ support to load modules using the Linux kernel's usermode helper.
+ This test provides a series of tests against kmod.
+
+ Although technically you can either build test_kmod as a module or
+ into the kernel we disallow building it into the kernel since
+ it stress tests request_module() and this will very likely cause
+ some issues by taking over precious threads available from other
+ module load requests, ultimately this could be fatal.
+
+ To run tests run:
+
+ tools/testing/selftests/kmod/kmod.sh --help
+
+ If unsure, say N.
+
source "samples/Kconfig"
source "lib/Kconfig.kgdb"
diff --git a/lib/Makefile b/lib/Makefile
index d15e235f72ea..3c5a14821e16 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -55,6 +55,7 @@ obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
obj-$(CONFIG_TEST_PRINTF) += test_printf.o
obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
obj-$(CONFIG_TEST_UUID) += test_uuid.o
+obj-$(CONFIG_TEST_KMOD) += test_kmod.o
ifeq ($(CONFIG_DEBUG_KOBJECT),y)
CFLAGS_kobject.o += -DDEBUG
diff --git a/lib/test_kmod.c b/lib/test_kmod.c
new file mode 100644
index 000000000000..63fded83b9b6
--- /dev/null
+++ b/lib/test_kmod.c
@@ -0,0 +1,1248 @@
+/*
+ * kmod stress test driver
+ *
+ * Copyright (C) 2016 Luis R. Rodriguez <mcgrof@kernel.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of copyleft-next (version 0.3.1 or later) as published
+ * at http://copyleft-next.org/.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+/*
+ * This driver provides an interface to trigger and test the kernel's
+ * module loader through a series of configurations and a few triggers.
+ * To test this driver use the following script as root:
+ *
+ * tools/testing/selftests/kmod/kmod.sh --help
+ */
+
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/kmod.h>
+#include <linux/printk.h>
+#include <linux/kthread.h>
+#include <linux/sched.h>
+#include <linux/fs.h>
+#include <linux/miscdevice.h>
+#include <linux/vmalloc.h>
+#include <linux/slab.h>
+#include <linux/device.h>
+
+#define TEST_START_NUM_THREADS 50
+#define TEST_START_DRIVER "test_module"
+#define TEST_START_TEST_FS "xfs"
+#define TEST_START_TEST_CASE TEST_KMOD_DRIVER
+
+
+static bool force_init_test = false;
+module_param(force_init_test, bool_enable_only, 0644);
+MODULE_PARM_DESC(force_init_test,
+ "Force kicking a test immediatley after driver loads");
+
+/*
+ * For device allocation / registration
+ */
+static DEFINE_MUTEX(reg_dev_mutex);
+static LIST_HEAD(reg_test_devs);
+
+/*
+ * num_test_devs actually represents the *next* ID of the next
+ * device we will allow to create.
+ */
+static int num_test_devs;
+
+/**
+ * enum kmod_test_case - linker table test case
+ *
+ * If you add a test case, please be sure to review if you need to se
+ * @need_mod_put for your tests case.
+ *
+ * @TEST_KMOD_DRIVER: stress tests request_module()
+ * @TEST_KMOD_FS_TYPE: stress tests get_fs_type()
+ */
+enum kmod_test_case {
+ __TEST_KMOD_INVALID = 0,
+
+ TEST_KMOD_DRIVER,
+ TEST_KMOD_FS_TYPE,
+
+ __TEST_KMOD_MAX,
+};
+
+struct test_config {
+ char *test_driver;
+ char *test_fs;
+ unsigned int num_threads;
+ enum kmod_test_case test_case;
+ int test_result;
+};
+
+struct kmod_test_device;
+
+/**
+ * kmod_test_device_info - thread info
+ *
+ * @ret_sync: return value if request_module() is used, sync request for
+ * @TEST_KMOD_DRIVER
+ * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE
+ * @thread_idx: thread ID
+ * @test_dev: test device test is being performed under
+ * @need_mod_put: Some tests (get_fs_type() is one) requires putting the module
+ * (module_put(fs_sync->owner)) when done, otherwise you will not be able
+ * to unload the respective modules and re-test. We use this to keep
+ * accounting of when we need this and to help out in case we need to
+ * error out and deal with module_put() on error.
+ */
+struct kmod_test_device_info {
+ int ret_sync;
+ struct file_system_type *fs_sync;
+ struct task_struct *task_sync;
+ unsigned int thread_idx;
+ struct kmod_test_device *test_dev;
+ bool need_mod_put;
+};
+
+/**
+ * kmod_test_device - test device to help test kmod
+ *
+ * @dev_idx: unique ID for test device
+ * @config: configuration for the test
+ * @misc_dev: we use a misc device under the hood
+ * @dev: pointer to misc_dev's own struct device
+ * @config_mutex: protects configuration of test
+ * @trigger_mutex: the test trigger can only be fired once at a time
+ * @thread_lock: protects @done count, and the @info per each thread
+ * @done: number of threads which have completed or failed
+ * @test_is_oom: when we run out of memory, use this to halt moving forward
+ * @kthreads_done: completion used to signal when all work is done
+ * @list: needed to be part of the reg_test_devs
+ * @info: array of info for each thread
+ */
+struct kmod_test_device {
+ int dev_idx;
+ struct test_config config;
+ struct miscdevice misc_dev;
+ struct device *dev;
+ struct mutex config_mutex;
+ struct mutex trigger_mutex;
+ struct mutex thread_mutex;
+
+ unsigned int done;
+
+ bool test_is_oom;
+ struct completion kthreads_done;
+ struct list_head list;
+
+ struct kmod_test_device_info *info;
+};
+
+static const char *test_case_str(enum kmod_test_case test_case)
+{
+ switch (test_case) {
+ case TEST_KMOD_DRIVER:
+ return "TEST_KMOD_DRIVER";
+ case TEST_KMOD_FS_TYPE:
+ return "TEST_KMOD_FS_TYPE";
+ default:
+ return "invalid";
+ }
+}
+
+static struct miscdevice *dev_to_misc_dev(struct device *dev)
+{
+ return dev_get_drvdata(dev);
+}
+
+static struct kmod_test_device *misc_dev_to_test_dev(struct miscdevice *misc_dev)
+{
+ return container_of(misc_dev, struct kmod_test_device, misc_dev);
+}
+
+static struct kmod_test_device *dev_to_test_dev(struct device *dev)
+{
+ struct miscdevice *misc_dev;
+
+ misc_dev = dev_to_misc_dev(dev);
+
+ return misc_dev_to_test_dev(misc_dev);
+}
+
+/* Must run with thread_mutex held */
+static void kmod_test_done_check(struct kmod_test_device *test_dev,
+ unsigned int idx)
+{
+ struct test_config *config = &test_dev->config;
+
+ test_dev->done++;
+ dev_dbg(test_dev->dev, "Done thread count: %u\n", test_dev->done);
+
+ if (test_dev->done == config->num_threads) {
+ dev_info(test_dev->dev, "Done: %u threads have all run now\n",
+ test_dev->done);
+ dev_info(test_dev->dev, "Last thread to run: %u\n", idx);
+ complete(&test_dev->kthreads_done);
+ }
+}
+
+static void test_kmod_put_module(struct kmod_test_device_info *info)
+{
+ struct kmod_test_device *test_dev = info->test_dev;
+ struct test_config *config = &test_dev->config;
+
+ if (!info->need_mod_put)
+ return;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ break;
+ case TEST_KMOD_FS_TYPE:
+ if (info && info->fs_sync && info->fs_sync->owner)
+ module_put(info->fs_sync->owner);
+ break;
+ default:
+ BUG();
+ }
+
+ info->need_mod_put = true;
+}
+
+static int run_request(void *data)
+{
+ struct kmod_test_device_info *info = data;
+ struct kmod_test_device *test_dev = info->test_dev;
+ struct test_config *config = &test_dev->config;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ info->ret_sync = request_module("%s", config->test_driver);
+ break;
+ case TEST_KMOD_FS_TYPE:
+ info->fs_sync = get_fs_type(config->test_fs);
+ info->need_mod_put = true;
+ break;
+ default:
+ /* __trigger_config_run() already checked for test sanity */
+ BUG();
+ return -EINVAL;
+ }
+
+ dev_dbg(test_dev->dev, "Ran thread %u\n", info->thread_idx);
+
+ test_kmod_put_module(info);
+
+ mutex_lock(&test_dev->thread_mutex);
+ info->task_sync = NULL;
+ kmod_test_done_check(test_dev, info->thread_idx);
+ mutex_unlock(&test_dev->thread_mutex);
+
+ return 0;
+}
+
+static int tally_work_test(struct kmod_test_device_info *info)
+{
+ struct kmod_test_device *test_dev = info->test_dev;
+ struct test_config *config = &test_dev->config;
+ int err_ret = 0;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ /*
+ * Only capture errors, if one is found that's
+ * enough, for now.
+ */
+ if (info->ret_sync != 0)
+ err_ret = info->ret_sync;
+ dev_info(test_dev->dev,
+ "Sync thread %d return status: %d\n",
+ info->thread_idx, info->ret_sync);
+ break;
+ case TEST_KMOD_FS_TYPE:
+ /* For now we make this simple */
+ if (!info->fs_sync)
+ err_ret = -EINVAL;
+ dev_info(test_dev->dev, "Sync thread %u fs: %s\n",
+ info->thread_idx, info->fs_sync ? config->test_fs :
+ "NULL");
+ break;
+ default:
+ BUG();
+ }
+
+ return err_ret;
+}
+
+/*
+ * XXX: add result option to display if all errors did not match.
+ * For now we just keep any error code if one was found.
+ *
+ * If this ran it means *all* tasks were created fine and we
+ * are now just collecting results.
+ *
+ * Only propagate errors, do not override with a subsequent sucess case.
+ */
+static void tally_up_work(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ struct kmod_test_device_info *info;
+ unsigned int idx;
+ int err_ret = 0;
+ int ret = 0;
+
+ mutex_lock(&test_dev->thread_mutex);
+
+ dev_info(test_dev->dev, "Results:\n");
+
+ for (idx=0; idx < config->num_threads; idx++) {
+ info = &test_dev->info[idx];
+ ret = tally_work_test(info);
+ if (ret)
+ err_ret = ret;
+ }
+
+ /*
+ * Note: request_module() returns 256 for a module not found even
+ * though modprobe itself returns 1.
+ */
+ config->test_result = err_ret;
+
+ mutex_unlock(&test_dev->thread_mutex);
+}
+
+static int try_one_request(struct kmod_test_device *test_dev, unsigned int idx)
+{
+ struct kmod_test_device_info *info = &test_dev->info[idx];
+ int fail_ret = -ENOMEM;
+
+ mutex_lock(&test_dev->thread_mutex);
+
+ info->thread_idx = idx;
+ info->test_dev = test_dev;
+ info->task_sync = kthread_run(run_request, info, "%s-%u",
+ KBUILD_MODNAME, idx);
+
+ if (!info->task_sync || IS_ERR(info->task_sync)) {
+ test_dev->test_is_oom = true;
+ dev_err(test_dev->dev, "Setting up thread %u failed\n", idx);
+ info->task_sync = NULL;
+ goto err_out;
+ } else
+ dev_dbg(test_dev->dev, "Kicked off thread %u\n", idx);
+
+ mutex_unlock(&test_dev->thread_mutex);
+
+ return 0;
+
+err_out:
+ info->ret_sync = fail_ret;
+ mutex_unlock(&test_dev->thread_mutex);
+
+ return fail_ret;
+}
+
+static void test_dev_kmod_stop_tests(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ struct kmod_test_device_info *info;
+ unsigned int i;
+
+ dev_info(test_dev->dev, "Ending request_module() tests\n");
+
+ mutex_lock(&test_dev->thread_mutex);
+
+ for (i=0; i < config->num_threads; i++) {
+ info = &test_dev->info[i];
+ if (info->task_sync && !IS_ERR(info->task_sync)) {
+ dev_info(test_dev->dev,
+ "Stopping still-running thread %i\n", i);
+ kthread_stop(info->task_sync);
+ }
+
+ /*
+ * info->task_sync is well protected, it can only be
+ * NULL or a pointer to a struct. If its NULL we either
+ * never ran, or we did and we completed the work. Completed
+ * tasks *always* put the module for us. This is a sanity
+ * check -- just in case.
+ */
+ if (info->task_sync && info->need_mod_put)
+ test_kmod_put_module(info);
+ }
+
+ mutex_unlock(&test_dev->thread_mutex);
+}
+
+/*
+ * Only wait *iff* we did not run into any errors during all of our thread
+ * set up. If run into any issues we stop threads and just bail out with
+ * an error to the trigger. This also means we don't need any tally work
+ * for any threads which fail.
+ */
+static int try_requests(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ unsigned int idx;
+ int ret;
+ bool any_error = false;
+
+ for (idx=0; idx < config->num_threads; idx++) {
+ if (test_dev->test_is_oom) {
+ any_error = true;
+ break;
+ }
+
+ ret = try_one_request(test_dev, idx);
+ if (ret) {
+ any_error = true;
+ break;
+ }
+ }
+
+ if (!any_error) {
+ test_dev->test_is_oom = false;
+ dev_info(test_dev->dev,
+ "No errors were found while initializing threads\n");
+ wait_for_completion(&test_dev->kthreads_done);
+ tally_up_work(test_dev);
+ } else {
+ test_dev->test_is_oom = true;
+ dev_info(test_dev->dev,
+ "At least one thread failed to start, stop all work\n");
+ test_dev_kmod_stop_tests(test_dev);
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+static int run_test_driver(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ dev_info(test_dev->dev, "Test case: %s (%u)\n",
+ test_case_str(config->test_case),
+ config->test_case);
+ dev_info(test_dev->dev, "Test driver to load: %s\n",
+ config->test_driver);
+ dev_info(test_dev->dev, "Number of threads to run: %u\n",
+ config->num_threads);
+ dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
+ config->num_threads - 1);
+
+ return try_requests(test_dev);
+}
+
+static int run_test_fs_type(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ dev_info(test_dev->dev, "Test case: %s (%u)\n",
+ test_case_str(config->test_case),
+ config->test_case);
+ dev_info(test_dev->dev, "Test filesystem to load: %s\n",
+ config->test_fs);
+ dev_info(test_dev->dev, "Number of threads to run: %u\n",
+ config->num_threads);
+ dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
+ config->num_threads - 1);
+
+ return try_requests(test_dev);
+}
+
+static ssize_t config_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+ int len = 0;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ len += sprintf(buf, "Custom trigger configuration for: %s\n",
+ dev_name(dev));
+
+ len += sprintf(buf+len, "Number of threads:\t%u\n",
+ config->num_threads);
+
+ len += sprintf(buf+len, "Test_case:\t%s (%u)\n",
+ test_case_str(config->test_case),
+ config->test_case);
+
+ if (config->test_driver)
+ len += sprintf(buf+len, "driver:\t%s\n",
+ config->test_driver);
+ else
+ len += sprintf(buf+len, "driver:\tEMTPY\n");
+
+ if (config->test_fs)
+ len += sprintf(buf+len, "fs:\t%s\n",
+ config->test_fs);
+ else
+ len += sprintf(buf+len, "fs:\tEMTPY\n");
+
+
+ mutex_unlock(&test_dev->config_mutex);
+
+ return len;
+}
+static DEVICE_ATTR_RO(config);
+
+/*
+ * This ensures we don't allow kicking threads through if our configuration
+ * is faulty.
+ */
+static int __trigger_config_run(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ test_dev->done = 0;
+
+ switch (config->test_case) {
+ case TEST_KMOD_DRIVER:
+ return run_test_driver(test_dev);
+ case TEST_KMOD_FS_TYPE:
+ return run_test_fs_type(test_dev);
+ default:
+ dev_warn(test_dev->dev,
+ "Invalid test case requested: %u\n",
+ config->test_case);
+ return -EINVAL;
+ }
+}
+
+static int trigger_config_run(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ int ret;
+
+ mutex_lock(&test_dev->trigger_mutex);
+ mutex_lock(&test_dev->config_mutex);
+
+ ret = __trigger_config_run(test_dev);
+ if (ret < 0)
+ goto out;
+ dev_info(test_dev->dev, "General test result: %d\n",
+ config->test_result);
+
+ /*
+ * We must return 0 after a trigger even unless something went
+ * wrong with the setup of the test. If the test setup went fine
+ * then userspace must just check the result of config->test_result.
+ * One issue with relying on the return from a call in the kernel
+ * is if the kernel returns a possitive value using this trigger
+ * will not return the value to userspace, it would be lost.
+ *
+ * By not relying on capturing the return value of tests we are using
+ * through the trigger it also us to run tests with set -e and only
+ * fail when something went wrong with the driver upon trigger
+ * requests.
+ */
+ ret = 0;
+
+out:
+ mutex_unlock(&test_dev->config_mutex);
+ mutex_unlock(&test_dev->trigger_mutex);
+
+ return ret;
+}
+
+static ssize_t
+trigger_config_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ int ret;
+
+ if (test_dev->test_is_oom)
+ return -ENOMEM;
+
+ /* For all intents and purposes we don't care what userspace
+ * sent this trigger, we care only that we were triggered.
+ * We treat the return value only for caputuring issues with
+ * the test setup. At this point all the test variables should
+ * have been allocated so typically this should never fail.
+ */
+ ret = trigger_config_run(test_dev);
+ if (unlikely(ret < 0))
+ goto out;
+
+ /*
+ * Note: any return > 0 will be treated as success
+ * and the error value will not be available to userspace.
+ * Do not rely on trying to send to userspace a test value
+ * return value as possitive return errors will be lost.
+ */
+ if (WARN_ON(ret > 0))
+ return -EINVAL;
+
+ ret = count;
+out:
+ return ret;
+}
+static DEVICE_ATTR_WO(trigger_config);
+
+/*
+ * XXX: move to kstrncpy() once merged.
+ *
+ * Users should use kfree_const() when freeing these.
+ */
+static int __kstrncpy(char **dst, const char *name, size_t count, gfp_t gfp)
+{
+ *dst = kstrndup(name, count, gfp);
+ if (!*dst)
+ return -ENOSPC;
+ return count;
+}
+
+static int config_copy_test_driver_name(struct test_config *config,
+ const char *name,
+ size_t count)
+{
+ return __kstrncpy(&config->test_driver, name, count, GFP_KERNEL);
+}
+
+
+static int config_copy_test_fs(struct test_config *config, const char *name,
+ size_t count)
+{
+ return __kstrncpy(&config->test_fs, name, count, GFP_KERNEL);
+}
+
+static void __kmod_config_free(struct test_config *config)
+{
+ if (!config)
+ return;
+
+ kfree_const(config->test_driver);
+ config->test_driver = NULL;
+
+ kfree_const(config->test_fs);
+ config->test_driver = NULL;
+}
+
+static void kmod_config_free(struct kmod_test_device *test_dev)
+{
+ struct test_config *config;
+
+ if (!test_dev)
+ return;
+
+ config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ __kmod_config_free(config);
+ mutex_unlock(&test_dev->config_mutex);
+}
+
+static ssize_t config_test_driver_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+ int copied;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ kfree_const(config->test_driver);
+ config->test_driver = NULL;
+
+ copied = config_copy_test_driver_name(config, buf, count);
+ mutex_unlock(&test_dev->config_mutex);
+
+ return copied;
+}
+
+static ssize_t config_test_driver_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ strcpy(buf, config->test_driver);
+ strcat(buf, "\n");
+ mutex_unlock(&test_dev->config_mutex);
+
+ return strlen(buf) + 1;
+}
+static DEVICE_ATTR(config_test_driver, 0644, config_test_driver_show,
+ config_test_driver_store);
+
+static ssize_t config_test_fs_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+ int copied;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ kfree_const(config->test_fs);
+ config->test_fs = NULL;
+
+ copied = config_copy_test_fs(config, buf, count);
+ mutex_unlock(&test_dev->config_mutex);
+
+ return copied;
+}
+
+static ssize_t config_test_fs_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ strcpy(buf, config->test_fs);
+ strcat(buf, "\n");
+ mutex_unlock(&test_dev->config_mutex);
+
+ return strlen(buf) + 1;
+}
+static DEVICE_ATTR(config_test_fs, 0644, config_test_fs_show,
+ config_test_fs_store);
+
+static int trigger_config_run_driver(struct kmod_test_device *test_dev,
+ const char *test_driver)
+{
+ int copied;
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ config->test_case = TEST_KMOD_DRIVER;
+
+ kfree_const(config->test_driver);
+ config->test_driver = NULL;
+
+ copied = config_copy_test_driver_name(config, test_driver,
+ strlen(test_driver));
+ mutex_unlock(&test_dev->config_mutex);
+
+ if (copied != strlen(test_driver)) {
+ test_dev->test_is_oom = true;
+ return -EINVAL;
+ }
+
+ test_dev->test_is_oom = false;
+
+ return trigger_config_run(test_dev);
+}
+
+static int trigger_config_run_fs(struct kmod_test_device *test_dev,
+ const char *fs_type)
+{
+ int copied;
+ struct test_config *config = &test_dev->config;
+
+ mutex_lock(&test_dev->config_mutex);
+ config->test_case = TEST_KMOD_FS_TYPE;
+
+ kfree_const(config->test_fs);
+ config->test_driver = NULL;
+
+ copied = config_copy_test_fs(config, fs_type, strlen(fs_type));
+ mutex_unlock(&test_dev->config_mutex);
+
+ if (copied != strlen(fs_type)) {
+ test_dev->test_is_oom = true;
+ return -EINVAL;
+ }
+
+ test_dev->test_is_oom = false;
+
+ return trigger_config_run(test_dev);
+}
+
+static void free_test_dev_info(struct kmod_test_device *test_dev)
+{
+ if (test_dev->info) {
+ vfree(test_dev->info);
+ test_dev->info = NULL;
+ }
+}
+
+static int kmod_config_sync_info(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+
+ free_test_dev_info(test_dev);
+ test_dev->info = vzalloc(config->num_threads *
+ sizeof(struct kmod_test_device_info));
+ if (!test_dev->info) {
+ dev_err(test_dev->dev, "Cannot alloc test_dev info\n");
+ return -ENOMEM;
+ }
+
+ return 0;
+}
+
+/*
+ * Old kernels may not have this, if you want to port this code to
+ * test it on older kernels.
+ */
+#ifdef get_kmod_umh_limit
+static unsigned int kmod_init_test_thread_limit(void)
+{
+ return get_kmod_umh_limit();
+}
+#else
+static unsigned int kmod_init_test_thread_limit(void)
+{
+ return TEST_START_NUM_THREADS;
+}
+#endif
+
+static int __kmod_config_init(struct kmod_test_device *test_dev)
+{
+ struct test_config *config = &test_dev->config;
+ int ret = -ENOMEM, copied;
+
+ __kmod_config_free(config);
+
+ copied = config_copy_test_driver_name(config, TEST_START_DRIVER,
+ strlen(TEST_START_DRIVER));
+ if (copied != strlen(TEST_START_DRIVER))
+ goto err_out;
+
+ copied = config_copy_test_fs(config, TEST_START_TEST_FS,
+ strlen(TEST_START_TEST_FS));
+ if (copied != strlen(TEST_START_TEST_FS))
+ goto err_out;
+
+ config->num_threads = kmod_init_test_thread_limit();
+ config->test_result = 0;
+ config->test_case = TEST_START_TEST_CASE;
+
+ ret = kmod_config_sync_info(test_dev);
+ if (ret)
+ goto err_out;
+
+ test_dev->test_is_oom = false;
+
+ return 0;
+
+err_out:
+ test_dev->test_is_oom = true;
+ WARN_ON(test_dev->test_is_oom);
+
+ __kmod_config_free(config);
+
+ return ret;
+}
+
+static ssize_t reset_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ int ret;
+
+ mutex_lock(&test_dev->trigger_mutex);
+ mutex_lock(&test_dev->config_mutex);
+
+ ret = __kmod_config_init(test_dev);
+ if (ret < 0) {
+ ret = -ENOMEM;
+ dev_err(dev, "could not alloc settings for config trigger: %d\n",
+ ret);
+ goto out;
+ }
+
+ dev_info(dev, "reset\n");
+ ret = count;
+
+out:
+ mutex_unlock(&test_dev->config_mutex);
+ mutex_unlock(&test_dev->trigger_mutex);
+
+ return ret;
+}
+static DEVICE_ATTR_WO(reset);
+
+static int test_dev_config_update_uint_sync(struct kmod_test_device *test_dev,
+ const char *buf, size_t size,
+ unsigned int *config,
+ int (*test_sync)(struct kmod_test_device *test_dev))
+{
+ int ret;
+ char *end;
+ long new = simple_strtol(buf, &end, 0);
+ unsigned int old_val;
+ if (end == buf || new > UINT_MAX)
+ return -EINVAL;
+
+ mutex_lock(&test_dev->config_mutex);
+
+ old_val = *config;
+ *(unsigned int *)config = new;
+
+ ret = test_sync(test_dev);
+ if (ret) {
+ *(unsigned int *)config = old_val;
+
+ ret = test_sync(test_dev);
+ WARN_ON(ret);
+
+ mutex_unlock(&test_dev->config_mutex);
+ return -EINVAL;
+ }
+
+ mutex_unlock(&test_dev->config_mutex);
+ /* Always return full write size even if we didn't consume all */
+ return size;
+}
+
+static int test_dev_config_update_uint_range(struct kmod_test_device *test_dev,
+ const char *buf, size_t size,
+ unsigned int *config,
+ unsigned int min,
+ unsigned int max)
+{
+ char *end;
+ long new = simple_strtol(buf, &end, 0);
+ if (end == buf || new < min || new > max || new > UINT_MAX)
+ return -EINVAL;
+
+ mutex_lock(&test_dev->config_mutex);
+ *(unsigned int *)config = new;
+ mutex_unlock(&test_dev->config_mutex);
+
+ /* Always return full write size even if we didn't consume all */
+ return size;
+}
+
+static int test_dev_config_update_int(struct kmod_test_device *test_dev,
+ const char *buf, size_t size,
+ int *config)
+{
+ char *end;
+ long new = simple_strtol(buf, &end, 0);
+ if (end == buf || new > INT_MAX || new < INT_MIN)
+ return -EINVAL;
+ mutex_lock(&test_dev->config_mutex);
+ *(int *)config = new;
+ mutex_unlock(&test_dev->config_mutex);
+ /* Always return full write size even if we didn't consume all */
+ return size;
+}
+
+static ssize_t test_dev_config_show_int(struct kmod_test_device *test_dev,
+ char *buf,
+ int config)
+{
+ int val;
+
+ mutex_lock(&test_dev->config_mutex);
+ val = config;
+ mutex_unlock(&test_dev->config_mutex);
+
+ return snprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
+static ssize_t test_dev_config_show_uint(struct kmod_test_device *test_dev,
+ char *buf,
+ unsigned int config)
+{
+ unsigned int val;
+
+ mutex_lock(&test_dev->config_mutex);
+ val = config;
+ mutex_unlock(&test_dev->config_mutex);
+
+ return snprintf(buf, PAGE_SIZE, "%u\n", val);
+}
+
+static ssize_t test_result_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_update_int(test_dev, buf, count,
+ &config->test_result);
+}
+
+static ssize_t config_num_threads_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_update_uint_sync(test_dev, buf, count,
+ &config->num_threads,
+ kmod_config_sync_info);
+}
+
+static ssize_t config_num_threads_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_show_int(test_dev, buf, config->num_threads);
+}
+static DEVICE_ATTR(config_num_threads, 0644, config_num_threads_show,
+ config_num_threads_store);
+
+static ssize_t config_test_case_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_update_uint_range(test_dev, buf, count,
+ &config->test_case,
+ __TEST_KMOD_INVALID + 1,
+ __TEST_KMOD_MAX - 1);
+}
+
+static ssize_t config_test_case_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_show_uint(test_dev, buf, config->test_case);
+}
+static DEVICE_ATTR(config_test_case, 0644, config_test_case_show,
+ config_test_case_store);
+
+static ssize_t test_result_show(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+ struct kmod_test_device *test_dev = dev_to_test_dev(dev);
+ struct test_config *config = &test_dev->config;
+
+ return test_dev_config_show_int(test_dev, buf, config->test_result);
+}
+static DEVICE_ATTR(test_result, 0644, test_result_show, test_result_store);
+
+#define TEST_KMOD_DEV_ATTR(name) &dev_attr_##name.attr
+
+static struct attribute *test_dev_attrs[] = {
+ TEST_KMOD_DEV_ATTR(trigger_config),
+ TEST_KMOD_DEV_ATTR(config),
+ TEST_KMOD_DEV_ATTR(reset),
+
+ TEST_KMOD_DEV_ATTR(config_test_driver),
+ TEST_KMOD_DEV_ATTR(config_test_fs),
+ TEST_KMOD_DEV_ATTR(config_num_threads),
+ TEST_KMOD_DEV_ATTR(config_test_case),
+ TEST_KMOD_DEV_ATTR(test_result),
+
+ NULL,
+};
+
+ATTRIBUTE_GROUPS(test_dev);
+
+static int kmod_config_init(struct kmod_test_device *test_dev)
+{
+ int ret;
+
+ mutex_lock(&test_dev->config_mutex);
+ ret = __kmod_config_init(test_dev);
+ mutex_unlock(&test_dev->config_mutex);
+
+ return ret;
+}
+
+/*
+ * XXX: this could perhaps be made generic already too, but a hunt
+ * for actual users would be needed first. It could be generic
+ * if other test drivers end up using a similar mechanism.
+ */
+const char *test_dev_get_name(const char *base, int idx, gfp_t gfp)
+{
+ const char *name_const;
+ char *name;
+
+ if (!base)
+ return NULL;
+ if (strlen(base) > 30)
+ return NULL;
+ name = kzalloc(1024, gfp);
+ if (!name)
+ return NULL;
+
+ strncat(name, base, strlen(base));
+ sprintf(name+(strlen(base)), "%d", idx);
+ name_const = kstrdup_const(name, gfp);
+
+ kfree(name);
+
+ return name_const;
+}
+
+static struct kmod_test_device *alloc_test_dev_kmod(int idx)
+{
+ int ret;
+ struct kmod_test_device *test_dev;
+ struct miscdevice *misc_dev;
+
+ test_dev = vzalloc(sizeof(struct kmod_test_device));
+ if (!test_dev) {
+ pr_err("Cannot alloc test_dev\n");
+ goto err_out;
+ }
+
+ mutex_init(&test_dev->config_mutex);
+ mutex_init(&test_dev->trigger_mutex);
+ mutex_init(&test_dev->thread_mutex);
+
+ init_completion(&test_dev->kthreads_done);
+
+ ret = kmod_config_init(test_dev);
+ if (ret < 0) {
+ pr_err("Cannot alloc kmod_config_init()\n");
+ goto err_out_free;
+ }
+
+ test_dev->dev_idx = idx;
+ misc_dev = &test_dev->misc_dev;
+
+ misc_dev->minor = MISC_DYNAMIC_MINOR;
+ misc_dev->name = test_dev_get_name("test_kmod", test_dev->dev_idx,
+ GFP_KERNEL);
+ if (!misc_dev->name) {
+ pr_err("Cannot alloc misc_dev->name\n");
+ goto err_out_free_config;
+ }
+ misc_dev->groups = test_dev_groups;
+
+ return test_dev;
+
+err_out_free_config:
+ free_test_dev_info(test_dev);
+ kmod_config_free(test_dev);
+err_out_free:
+ vfree(test_dev);
+ test_dev = NULL;
+err_out:
+ return NULL;
+}
+
+static void free_test_dev_kmod(struct kmod_test_device *test_dev)
+{
+ if (test_dev) {
+ kfree_const(test_dev->misc_dev.name);
+ test_dev->misc_dev.name = NULL;
+ free_test_dev_info(test_dev);
+ kmod_config_free(test_dev);
+ vfree(test_dev);
+ test_dev = NULL;
+ }
+}
+
+static struct kmod_test_device *register_test_dev_kmod(void)
+{
+ struct kmod_test_device *test_dev = NULL;
+ int ret;
+
+ mutex_unlock(®_dev_mutex);
+
+ /* int should suffice for number of devices, test for wrap */
+ if (unlikely(num_test_devs + 1) < 0) {
+ pr_err("reached limit of number of test devices\n");
+ goto out;
+ }
+
+ test_dev = alloc_test_dev_kmod(num_test_devs);
+ if (!test_dev)
+ goto out;
+
+ ret = misc_register(&test_dev->misc_dev);
+ if (ret) {
+ pr_err("could not register misc device: %d\n", ret);
+ free_test_dev_kmod(test_dev);
+ goto out;
+ }
+
+ test_dev->dev = test_dev->misc_dev.this_device;
+ list_add_tail(&test_dev->list, ®_test_devs);
+ dev_info(test_dev->dev, "interface ready\n");
+
+ num_test_devs++;
+
+out:
+ mutex_unlock(®_dev_mutex);
+
+ return test_dev;
+
+}
+
+static int __init test_kmod_init(void)
+{
+ struct kmod_test_device *test_dev;
+ int ret;
+
+ test_dev = register_test_dev_kmod();
+ if (!test_dev) {
+ pr_err("Cannot add first test kmod device\n");
+ return -ENODEV;
+ }
+
+ /*
+ * With some work we might be able to gracefully enable
+ * testing with this driver built-in, for now this seems
+ * rather risky. For those willing to try have at it,
+ * and enable the below. Good luck! If that works, try
+ * lowering the init level for more fun.
+ */
+ if (force_init_test) {
+ ret = trigger_config_run_driver(test_dev, "tun");
+ if (WARN_ON(ret))
+ return ret;
+ ret = trigger_config_run_fs(test_dev, "btrfs");
+ if (WARN_ON(ret))
+ return ret;
+ }
+
+ return 0;
+}
+late_initcall(test_kmod_init);
+
+static
+void unregister_test_dev_kmod(struct kmod_test_device *test_dev)
+{
+ mutex_lock(&test_dev->trigger_mutex);
+ mutex_lock(&test_dev->config_mutex);
+
+ test_dev_kmod_stop_tests(test_dev);
+
+ dev_info(test_dev->dev, "removing interface\n");
+ misc_deregister(&test_dev->misc_dev);
+
+ mutex_unlock(&test_dev->config_mutex);
+ mutex_unlock(&test_dev->trigger_mutex);
+
+ free_test_dev_kmod(test_dev);
+}
+
+static void __exit test_kmod_exit(void)
+{
+ struct kmod_test_device *test_dev, *tmp;
+
+ mutex_lock(®_dev_mutex);
+ list_for_each_entry_safe(test_dev, tmp, ®_test_devs, list) {
+ list_del(&test_dev->list);
+ unregister_test_dev_kmod(test_dev);
+ }
+ mutex_unlock(®_dev_mutex);
+}
+module_exit(test_kmod_exit);
+
+MODULE_AUTHOR("Luis R. Rodriguez <mcgrof@kernel.org>");
+MODULE_LICENSE("GPL");
diff --git a/tools/testing/selftests/kmod/Makefile b/tools/testing/selftests/kmod/Makefile
new file mode 100644
index 000000000000..fa2ccc5fb3de
--- /dev/null
+++ b/tools/testing/selftests/kmod/Makefile
@@ -0,0 +1,11 @@
+# Makefile for kmod loading selftests
+
+# No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
+all:
+
+TEST_PROGS := kmod.sh
+
+include ../lib.mk
+
+# Nothing to clean up.
+clean:
diff --git a/tools/testing/selftests/kmod/config b/tools/testing/selftests/kmod/config
new file mode 100644
index 000000000000..259f4fd6b5e2
--- /dev/null
+++ b/tools/testing/selftests/kmod/config
@@ -0,0 +1,7 @@
+CONFIG_TEST_KMOD=m
+CONFIG_TEST_LKM=m
+CONFIG_XFS_FS=m
+
+# For the module parameter force_init_test is used
+CONFIG_TUN=m
+CONFIG_BTRFS_FS=m
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
new file mode 100755
index 000000000000..9ea1864d8bae
--- /dev/null
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -0,0 +1,449 @@
+#!/bin/bash
+#
+# Copyright (C) 2016 Luis R. Rodriguez <mcgrof@kernel.org>
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms of copyleft-next (version 0.3.1 or later) as published
+# at http://copyleft-next.org/.
+
+# This is a stress test script for kmod, the kernel module loader. It uses
+# test_kmod which exposes a series of knobs for the API for us so we can
+# tweak each test in userspace rather than in kernelspace.
+#
+# The way kmod works is it uses the kernel's usermode helper API to eventually
+# call /sbin/modprobe. It has a limit of the number of concurrent calls
+# possible. The kernel interface to load modules is request_module(), however
+# mount uses get_fs_type(). Both behave slightly differently, but the
+# differences are important enough to test each call separately. For this
+# reason test_kmod starts by providing tests for both calls.
+#
+# The test driver test_kmod assumes a series of defaults which you can
+# override by exporting to your environment prior running this script.
+# For instance this script assumes you do not have xfs loaded upon boot.
+# If this is false, export DEFAULT_KMOD_FS="ext4" prior to running this
+# script if the filesyste module you don't have loaded upon bootup
+# is ext4 instead. Refer to allow_user_defaults() for a list of user
+# override variables possible.
+#
+# You'll want at least 4096 GiB of RAM to expect to run these tests
+# without running out of memory on them. For other requirements refer
+# to test_reqs()
+
+set -e
+
+TEST_DRIVER="test_kmod"
+
+function allow_user_defaults()
+{
+ if [ -z $DEFAULT_KMOD_DRIVER ]; then
+ DEFAULT_KMOD_DRIVER="test_module"
+ fi
+
+ if [ -z $DEFAULT_KMOD_FS ]; then
+ DEFAULT_KMOD_FS="xfs"
+ fi
+
+ if [ -z $PROC_DIR ]; then
+ PROC_DIR="/proc/sys/kernel/"
+ fi
+
+ if [ -z $MODPROBE_LIMIT ]; then
+ MODPROBE_LIMIT=50
+ fi
+
+ if [ -z $DIR ]; then
+ DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0/"
+ fi
+
+ MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
+}
+
+test_reqs()
+{
+ if ! which modprobe 2> /dev/null > /dev/null; then
+ echo "$0: You need modprobe installed"
+ exit 1
+ fi
+
+ if ! which kmod 2> /dev/null > /dev/null; then
+ echo "$0: You need kmod installed"
+ exit 1
+ fi
+
+ # kmod 19 has a bad bug where it returns 0 when modprobe
+ # gets called *even* if the module was not loaded due to
+ # some bad heuristics. For details see:
+ #
+ # A work around is possible in-kernel but its rather
+ # complex.
+ KMOD_VERSION=$(kmod --version | awk '{print $3}')
+ if [[ $KMOD_VERSION -le 19 ]]; then
+ echo "$0: You need at least kmod 20"
+ echo "kmod <= 19 is buggy, for details see:"
+ echo "http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4"
+ exit 1
+ fi
+}
+
+function load_req_mod()
+{
+ if [ ! -d $DIR ]; then
+ # Alanis: "Oh isn't it ironic?"
+ modprobe $TEST_DRIVER
+ if [ ! -d $DIR ]; then
+ echo "$0: $DIR not present"
+ echo "You must have the following enabled in your kernel:"
+ cat $PWD/config
+ exit 1
+ fi
+ fi
+}
+
+test_finish()
+{
+ echo "Test completed"
+}
+
+errno_name_to_val()
+{
+ case "$1" in
+ # kmod calls modprobe and upon of a module not found
+ # modprobe returns just 1... However in the kernel we
+ # *sometimes* see 256...
+ MODULE_NOT_FOUND)
+ echo 256;;
+ SUCCESS)
+ echo 0;;
+ -EPERM)
+ echo -1;;
+ -ENOENT)
+ echo -2;;
+ -EINVAL)
+ echo -22;;
+ -ERR_ANY)
+ echo -123456;;
+ *)
+ echo invalid;;
+ esac
+}
+
+errno_val_to_name()
+ case "$1" in
+ 256)
+ echo MODULE_NOT_FOUND;;
+ 0)
+ echo SUCCESS;;
+ -1)
+ echo -EPERM;;
+ -2)
+ echo -ENOENT;;
+ -22)
+ echo -EINVAL;;
+ -123456)
+ echo -ERR_ANY;;
+ *)
+ echo invalid;;
+ esac
+
+config_set_test_case_driver()
+{
+ if ! echo -n 1 >$DIR/config_test_case; then
+ echo "$0: Unable to set to test case to driver" >&2
+ exit 1
+ fi
+}
+
+config_set_test_case_fs()
+{
+ if ! echo -n 2 >$DIR/config_test_case; then
+ echo "$0: Unable to set to test case to fs" >&2
+ exit 1
+ fi
+}
+
+config_num_threads()
+{
+ if ! echo -n $1 >$DIR/config_num_threads; then
+ echo "$0: Unable to set to number of threads" >&2
+ exit 1
+ fi
+}
+
+config_get_modprobe_limit()
+{
+ if [[ -f ${MODPROBE_LIMIT_FILE} ]] ; then
+ MODPROBE_LIMIT=$(cat $MODPROBE_LIMIT_FILE)
+ fi
+ echo $MODPROBE_LIMIT
+}
+
+config_num_thread_limit_extra()
+{
+ MODPROBE_LIMIT=$(config_get_modprobe_limit)
+ let EXTRA_LIMIT=$MODPROBE_LIMIT+$1
+ config_num_threads $EXTRA_LIMIT
+}
+
+# For special characters use printf directly,
+# refer to kmod_test_0001
+config_set_driver()
+{
+ if ! echo -n $1 >$DIR/config_test_driver; then
+ echo "$0: Unable to set driver" >&2
+ exit 1
+ fi
+}
+
+config_set_fs()
+{
+ if ! echo -n $1 >$DIR/config_test_fs; then
+ echo "$0: Unable to set driver" >&2
+ exit 1
+ fi
+}
+
+config_get_driver()
+{
+ cat $DIR/config_test_driver
+}
+
+config_get_test_result()
+{
+ cat $DIR/test_result
+}
+
+config_reset()
+{
+ if ! echo -n "1" >"$DIR"/reset; then
+ echo "$0: reset shuld have worked" >&2
+ exit 1
+ fi
+}
+
+config_show_config()
+{
+ echo "----------------------------------------------------"
+ cat "$DIR"/config
+ echo "----------------------------------------------------"
+}
+
+config_trigger()
+{
+ if ! echo -n "1" >"$DIR"/trigger_config 2>/dev/null; then
+ echo "$1: FAIL - loading should have worked"
+ config_show_config
+ exit 1
+ fi
+ echo "$1: OK! - loading kmod test"
+}
+
+config_trigger_want_fail()
+{
+ if echo "1" > $DIR/trigger_config 2>/dev/null; then
+ echo "$1: FAIL - test case was expected to fail"
+ config_show_config
+ exit 1
+ fi
+ echo "$1: OK! - kmod test case failed as expected"
+}
+
+config_expect_result()
+{
+ RC=$(config_get_test_result)
+ RC_NAME=$(errno_val_to_name $RC)
+
+ ERRNO_NAME=$2
+ ERRNO=$(errno_name_to_val $ERRNO_NAME)
+
+ if [[ $ERRNO_NAME = "-ERR_ANY" ]]; then
+ if [[ $RC -ge 0 ]]; then
+ echo "$1: FAIL, test expects $ERRNO_NAME - got $RC_NAME ($RC)" >&2
+ config_show_config
+ exit 1
+ fi
+ elif [[ $RC != $ERRNO ]]; then
+ echo "$1: FAIL, test expects $ERRNO_NAME ($ERRNO) - got $RC_NAME ($RC)" >&2
+ config_show_config
+ exit 1
+ fi
+ echo "$1: OK! - Return value: $RC ($RC_NAME), expected $ERRNO_NAME"
+}
+
+kmod_defaults_driver()
+{
+ config_reset
+ modprobe -r $DEFAULT_KMOD_DRIVER
+ config_set_driver $DEFAULT_KMOD_DRIVER
+}
+
+kmod_defaults_fs()
+{
+ config_reset
+ modprobe -r $DEFAULT_KMOD_FS
+ config_set_fs $DEFAULT_KMOD_FS
+ config_set_test_case_fs
+}
+
+kmod_test_0001_driver()
+{
+ NAME='\000'
+
+ kmod_defaults_driver
+ config_num_threads 1
+ printf '\000' >"$DIR"/config_test_driver
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+}
+
+kmod_test_0001_fs()
+{
+ NAME='\000'
+
+ kmod_defaults_fs
+ config_num_threads 1
+ printf '\000' >"$DIR"/config_test_fs
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+kmod_test_0001()
+{
+ kmod_test_0001_driver
+ kmod_test_0001_fs
+}
+
+kmod_test_0002_driver()
+{
+ NAME="nope-$DEFAULT_KMOD_DRIVER"
+
+ kmod_defaults_driver
+ config_set_driver $NAME
+ config_num_threads 1
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
+}
+
+kmod_test_0002_fs()
+{
+ NAME="nope-$DEFAULT_KMOD_FS"
+
+ kmod_defaults_fs
+ config_set_fs $NAME
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+kmod_test_0002()
+{
+ kmod_test_0002_driver
+ kmod_test_0002_fs
+}
+
+kmod_test_0003()
+{
+ kmod_defaults_fs
+ config_num_threads 1
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0004()
+{
+ kmod_defaults_fs
+ config_num_threads 2
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0005()
+{
+ kmod_defaults_driver
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0006()
+{
+ kmod_defaults_fs
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} SUCCESS
+}
+
+kmod_test_0007()
+{
+ kmod_test_0005
+ kmod_test_0006
+}
+
+kmod_test_0008()
+{
+ kmod_defaults_driver
+ MODPROBE_LIMIT=$(config_get_modprobe_limit)
+ let EXTRA=$MODPROBE_LIMIT/2
+ config_num_thread_limit_extra $EXTRA
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+kmod_test_0009()
+{
+ kmod_defaults_fs
+ #MODPROBE_LIMIT=$(config_get_modprobe_limit)
+ #let EXTRA=$MODPROBE_LIMIT/3
+ config_num_thread_limit_extra 5
+ config_trigger ${FUNCNAME[0]}
+ config_expect_result ${FUNCNAME[0]} -EINVAL
+}
+
+trap "test_finish" EXIT
+test_reqs
+allow_user_defaults
+load_req_mod
+
+usage()
+{
+ echo "Usage: $0 [ -t <4-number-digit> ]"
+ echo "Valid tests: 0001-0011"
+ echo
+ echo "0001 - Simple test - 1 thread for empty string"
+ echo "0002 - Simple test - 1 thread for modules/filesystems that do not exist"
+ echo "0003 - Simple test - 1 thread for get_fs_type() only"
+ echo "0004 - Simple test - 2 threads for get_fs_type() only"
+ echo "0005 - multithreaded tests with default setup - request_module() only"
+ echo "0006 - multithreaded tests with default setup - get_fs_type() only"
+ echo "0007 - multithreaded tests with default setup test request_module() and get_fs_type()"
+ echo "0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()"
+ echo "0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()"
+ exit 1
+}
+
+# You can ask for a specific test:
+if [[ $# > 0 ]] ; then
+ if [[ $1 != "-t" ]]; then
+ usage
+ fi
+
+ re='^[0-9]+$'
+ if ! [[ $2 =~ $re ]]; then
+ usage
+ fi
+
+ RUN_TEST=kmod_test_$2
+ $RUN_TEST
+ exit 0
+fi
+
+# Once tese are enabled please leave them as-is. Write your own test,
+# we have tons of space.
+kmod_test_0001
+kmod_test_0002
+kmod_test_0003
+kmod_test_0004
+kmod_test_0005
+kmod_test_0006
+kmod_test_0007
+
+#kmod_test_0008
+#kmod_test_0009
+
+exit 0
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 01/10] kmod: add test driver to stress test the module loader
2016-12-08 18:47 ` [RFC 01/10] kmod: add test driver to stress test the module loader Luis R. Rodriguez
@ 2016-12-08 20:24 ` Kees Cook
2016-12-13 21:10 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Kees Cook @ 2016-12-08 20:24 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, Petr Mladek, hare, rwright,
Jeff Mahoney, DSterba, fdmanana, neilb, rgoldwyn, subashab,
Heinrich Schuchardt, Aaron Tomlin, mbenes, Paul E. McKenney,
Dan Williams, Josh Poimboeuf, David S. Miller, Ingo Molnar,
Andrew Morton, Linus Torvalds, linux-kselftest, linux-doc, LKML
On Thu, Dec 8, 2016 at 10:47 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> This adds a new stress test driver for kmod: the kernel module loader.
> The new stress test driver, test_kmod, is only enabled as a module right
> now. It should be possible to load this as built-in and load tests early
> (refer to the force_init_test module parameter), however since a lot of
> test can get a system out of memory fast we leave this disabled for now.
>
> Using a system with 1024 MiB of RAM can *easily* get your kernel
> OOM fast with this test driver.
>
> The test_kmod driver exposes API knobs for us to fine tune simple
> request_module() and get_fs_type() calls. Since these API calls
> only allow each one parameter a test driver for these is rather
> simple. Other factors that can help out test driver though are
> the number of calls we issue and knowing current limitations of
> each. This exposes configuration as much as possible through
> userspace to be able to build tests directly from userspace.
>
> Since it allows multiple misc devices its will eventually (once we
> add a knob to let us create new devices at will) also be possible to
> perform more tests in parallel, provided you have enough memory.
>
> We only enable tests we know work as of right now.
>
> Demo screenshots:
>
> # tools/testing/selftests/kmod/kmod.sh
> kmod_test_0001_driver: OK! - loading kmod test
> kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
> kmod_test_0001_fs: OK! - loading kmod test
> kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
> kmod_test_0002_driver: OK! - loading kmod test
> kmod_test_0002_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
> kmod_test_0002_fs: OK! - loading kmod test
> kmod_test_0002_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
> kmod_test_0003: OK! - loading kmod test
> kmod_test_0003: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0004: OK! - loading kmod test
> kmod_test_0004: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0005: OK! - loading kmod test
> kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0006: OK! - loading kmod test
> kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0005: OK! - loading kmod test
> kmod_test_0005: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> kmod_test_0006: OK! - loading kmod test
> kmod_test_0006: OK! - Return value: 0 (SUCCESS), expected SUCCESS
> Test completed
>
> You can also request for specific tests:
>
> # tools/testing/selftests/kmod/kmod.sh -t 0001
> kmod_test_0001_driver: OK! - loading kmod test
> kmod_test_0001_driver: OK! - Return value: 256 (MODULE_NOT_FOUND), expected MODULE_NOT_FOUND
> kmod_test_0001_fs: OK! - loading kmod test
> kmod_test_0001_fs: OK! - Return value: -22 (-EINVAL), expected -EINVAL
> Test completed
>
> Lastly, the current available number of tests:
>
> # tools/testing/selftests/kmod/kmod.sh --help
> Usage: tools/testing/selftests/kmod/kmod.sh [ -t <4-number-digit> ]
> Valid tests: 0001-0009
>
> 0001 - Simple test - 1 thread for empty string
> 0002 - Simple test - 1 thread for modules/filesystems that do not exist
> 0003 - Simple test - 1 thread for get_fs_type() only
> 0004 - Simple test - 2 threads for get_fs_type() only
> 0005 - multithreaded tests with default setup - request_module() only
> 0006 - multithreaded tests with default setup - get_fs_type() only
> 0007 - multithreaded tests with default setup test request_module() and get_fs_type()
> 0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()
> 0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()
>
> The following test cases currently fail, as such they are not currently
> enabled by default:
>
> # tools/testing/selftests/kmod/kmod.sh -t 0007
> # tools/testing/selftests/kmod/kmod.sh -t 0008
> # tools/testing/selftests/kmod/kmod.sh -t 0009
> # tools/testing/selftests/kmod/kmod.sh -t 0010
> # tools/testing/selftests/kmod/kmod.sh -t 0011
>
> To be sure to run them as intended please unload both of the modules:
>
> o test_module
> o xfs
>
> And ensure they are not loaded on your system prior to testing them.
> If you use these paritions for your rootfs you can change the default
> test driver used for get_fs_type() by exporting it into your
> environment. For example of other test defaults you can override
> refer to kmod.sh allow_user_defaults().
>
> Behind the scenes this is how we fine tune at a test case prior to
> hitting a trigger to run it:
>
> cat /sys/devices/virtual/misc/test_kmod0/config
> echo -n "2" > /sys/devices/virtual/misc/test_kmod0/config_test_case
> echo -n "ext4" > /sys/devices/virtual/misc/test_kmod0/config_test_fs
> echo -n "80" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
> cat /sys/devices/virtual/misc/test_kmod0/config
> echo -n "1" > /sys/devices/virtual/misc/test_kmod0/config_num_threads
>
> Finally to trigger:
>
> echo -n "1" > /sys/devices/virtual/misc/test_kmod0/trigger_config
>
> The kmod.sh script uses the above constructs to build differnt test cases.
Typo: different
> A bit of interpretation of the current failures follows, first two
> premises:
>
> a) When request_module() is used userspace figures out an optimized version of
> module order for us. Once it finds the modules it needs, as per depmod
> symbol dep map, it will finit_module() the respective modules which
> are needed for the original request_module() request.
>
> b) We have an optimization in place whereby if a kernel uses
> request_module() on a module already loaded we never bother
> userspace as the module already is loaded. This is all handled by
> kernel/kmod.c.
>
> A few things to consider to help identify root causes of issues:
>
> 0) kmod 19 has a broken heuristic for modules being assumed to be
> built-in to your kernel and will return 0 even though request_module()
> failed. Upgrade to a newer version of kmod.
>
> 1) A get_fs_type() call for "xfs" will request_module() for
> "fs-xfs", not for "xfs". The optimization in kernel described in b)
> fails to catch if we have a lot of consecutive get_fs_type() calls.
> The reason is the optimization in place does not look for aliases. This
> means two consecutive get_fs_type() calls will bump kmod_concurrent, whereas
> request_module() will not.
>
> This one explanation why test case 0009 fails at least once for
> get_fs_type().
>
> 2) If a module fails to load --- for whatever reason (kmod_concurrent
> limit reached, file not yet present due to rootfs switch, out of memory)
> we have a period of time during which module request for the same name
> either with request_module() or get_fs_type() will *also* fail to load
> even if the file for the module is ready.
>
> This explains why *multiple* NULLs are possible on test 0009.
>
> 3) finit_module() consumes quite a bit of memory.
Is this due to reading the module into kernel memory or something else?
> 4) Filesystems typically also have more dependent modules than other
> modules, its important to note though that even though a get_fs_type() call
> does not incur additional kmod_concurrent bumps, since userspace
> loads dependencies it finds it needs via finit_module_fd(), it *will*
> take much more memory to load a module with a lot of dependencies.
>
> Because of 3) and 4) we will easily run into out of memory failures
> with certain tests. For instance test 0006 fails on qemu with 1024 MiB
> of RAM. It panics a box after reaping all userspace processes and still
> not having enough memory to reap.
Are the buffers not released until after all the dependent modules are
loaded? I thought it would load one by one?
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
This is a great selftest, thanks for working on it!
Notes below...
> ---
> lib/Kconfig.debug | 25 +
> lib/Makefile | 1 +
> lib/test_kmod.c | 1248 +++++++++++++++++++++++++++++++++
> tools/testing/selftests/kmod/Makefile | 11 +
> tools/testing/selftests/kmod/config | 7 +
> tools/testing/selftests/kmod/kmod.sh | 449 ++++++++++++
> 6 files changed, 1741 insertions(+)
> create mode 100644 lib/test_kmod.c
> create mode 100644 tools/testing/selftests/kmod/Makefile
> create mode 100644 tools/testing/selftests/kmod/config
> create mode 100755 tools/testing/selftests/kmod/kmod.sh
>
> diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> index 7446097f72bd..6cad548e0682 100644
> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -1994,6 +1994,31 @@ config BUG_ON_DATA_CORRUPTION
>
> If unsure, say N.
>
> +config TEST_KMOD
> + tristate "kmod stress tester"
> + default n
> + depends on m
> + select TEST_LKM
> + select XFS_FS
> + select TUN
> + select BTRFS_FS
Since the desired FS can be changed at runtime, maybe these selects
aren't needed?
> + help
> + Test the kernel's module loading mechanism: kmod. kmod implements
> + support to load modules using the Linux kernel's usermode helper.
> + This test provides a series of tests against kmod.
> +
> + Although technically you can either build test_kmod as a module or
> + into the kernel we disallow building it into the kernel since
> + it stress tests request_module() and this will very likely cause
> + some issues by taking over precious threads available from other
> + module load requests, ultimately this could be fatal.
> +
> + To run tests run:
> +
> + tools/testing/selftests/kmod/kmod.sh --help
> +
> + If unsure, say N.
> +
> source "samples/Kconfig"
>
> source "lib/Kconfig.kgdb"
> diff --git a/lib/Makefile b/lib/Makefile
> index d15e235f72ea..3c5a14821e16 100644
> --- a/lib/Makefile
> +++ b/lib/Makefile
> @@ -55,6 +55,7 @@ obj-$(CONFIG_TEST_STATIC_KEYS) += test_static_key_base.o
> obj-$(CONFIG_TEST_PRINTF) += test_printf.o
> obj-$(CONFIG_TEST_BITMAP) += test_bitmap.o
> obj-$(CONFIG_TEST_UUID) += test_uuid.o
> +obj-$(CONFIG_TEST_KMOD) += test_kmod.o
>
> ifeq ($(CONFIG_DEBUG_KOBJECT),y)
> CFLAGS_kobject.o += -DDEBUG
> diff --git a/lib/test_kmod.c b/lib/test_kmod.c
> new file mode 100644
> index 000000000000..63fded83b9b6
> --- /dev/null
> +++ b/lib/test_kmod.c
> @@ -0,0 +1,1248 @@
> +/*
> + * kmod stress test driver
> + *
> + * Copyright (C) 2016 Luis R. Rodriguez <mcgrof@kernel.org>
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms of copyleft-next (version 0.3.1 or later) as published
> + * at http://copyleft-next.org/.
> + */
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +/*
> + * This driver provides an interface to trigger and test the kernel's
> + * module loader through a series of configurations and a few triggers.
> + * To test this driver use the following script as root:
> + *
> + * tools/testing/selftests/kmod/kmod.sh --help
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/kmod.h>
> +#include <linux/printk.h>
> +#include <linux/kthread.h>
> +#include <linux/sched.h>
> +#include <linux/fs.h>
> +#include <linux/miscdevice.h>
> +#include <linux/vmalloc.h>
> +#include <linux/slab.h>
> +#include <linux/device.h>
> +
> +#define TEST_START_NUM_THREADS 50
> +#define TEST_START_DRIVER "test_module"
> +#define TEST_START_TEST_FS "xfs"
> +#define TEST_START_TEST_CASE TEST_KMOD_DRIVER
> +
> +
> +static bool force_init_test = false;
> +module_param(force_init_test, bool_enable_only, 0644);
> +MODULE_PARM_DESC(force_init_test,
> + "Force kicking a test immediatley after driver loads");
Typo: immediately
> +
> +/*
> + * For device allocation / registration
> + */
> +static DEFINE_MUTEX(reg_dev_mutex);
> +static LIST_HEAD(reg_test_devs);
> +
> +/*
> + * num_test_devs actually represents the *next* ID of the next
> + * device we will allow to create.
> + */
> +static int num_test_devs;
> +
> +/**
> + * enum kmod_test_case - linker table test case
> + *
> + * If you add a test case, please be sure to review if you need to se
> + * @need_mod_put for your tests case.
> + *
> + * @TEST_KMOD_DRIVER: stress tests request_module()
> + * @TEST_KMOD_FS_TYPE: stress tests get_fs_type()
> + */
> +enum kmod_test_case {
> + __TEST_KMOD_INVALID = 0,
> +
> + TEST_KMOD_DRIVER,
> + TEST_KMOD_FS_TYPE,
> +
> + __TEST_KMOD_MAX,
> +};
> +
> +struct test_config {
> + char *test_driver;
> + char *test_fs;
> + unsigned int num_threads;
> + enum kmod_test_case test_case;
> + int test_result;
> +};
> +
> +struct kmod_test_device;
> +
> +/**
> + * kmod_test_device_info - thread info
> + *
> + * @ret_sync: return value if request_module() is used, sync request for
> + * @TEST_KMOD_DRIVER
> + * @fs_sync: return value of get_fs_type() for @TEST_KMOD_FS_TYPE
> + * @thread_idx: thread ID
> + * @test_dev: test device test is being performed under
> + * @need_mod_put: Some tests (get_fs_type() is one) requires putting the module
> + * (module_put(fs_sync->owner)) when done, otherwise you will not be able
> + * to unload the respective modules and re-test. We use this to keep
> + * accounting of when we need this and to help out in case we need to
> + * error out and deal with module_put() on error.
> + */
> +struct kmod_test_device_info {
> + int ret_sync;
> + struct file_system_type *fs_sync;
> + struct task_struct *task_sync;
> + unsigned int thread_idx;
> + struct kmod_test_device *test_dev;
> + bool need_mod_put;
> +};
> +
> +/**
> + * kmod_test_device - test device to help test kmod
> + *
> + * @dev_idx: unique ID for test device
> + * @config: configuration for the test
> + * @misc_dev: we use a misc device under the hood
> + * @dev: pointer to misc_dev's own struct device
> + * @config_mutex: protects configuration of test
> + * @trigger_mutex: the test trigger can only be fired once at a time
> + * @thread_lock: protects @done count, and the @info per each thread
> + * @done: number of threads which have completed or failed
> + * @test_is_oom: when we run out of memory, use this to halt moving forward
> + * @kthreads_done: completion used to signal when all work is done
> + * @list: needed to be part of the reg_test_devs
> + * @info: array of info for each thread
> + */
> +struct kmod_test_device {
> + int dev_idx;
> + struct test_config config;
> + struct miscdevice misc_dev;
> + struct device *dev;
> + struct mutex config_mutex;
> + struct mutex trigger_mutex;
> + struct mutex thread_mutex;
> +
> + unsigned int done;
> +
> + bool test_is_oom;
> + struct completion kthreads_done;
> + struct list_head list;
> +
> + struct kmod_test_device_info *info;
> +};
> +
> +static const char *test_case_str(enum kmod_test_case test_case)
> +{
> + switch (test_case) {
> + case TEST_KMOD_DRIVER:
> + return "TEST_KMOD_DRIVER";
> + case TEST_KMOD_FS_TYPE:
> + return "TEST_KMOD_FS_TYPE";
> + default:
> + return "invalid";
> + }
> +}
> +
> +static struct miscdevice *dev_to_misc_dev(struct device *dev)
> +{
> + return dev_get_drvdata(dev);
> +}
> +
> +static struct kmod_test_device *misc_dev_to_test_dev(struct miscdevice *misc_dev)
> +{
> + return container_of(misc_dev, struct kmod_test_device, misc_dev);
> +}
> +
> +static struct kmod_test_device *dev_to_test_dev(struct device *dev)
> +{
> + struct miscdevice *misc_dev;
> +
> + misc_dev = dev_to_misc_dev(dev);
> +
> + return misc_dev_to_test_dev(misc_dev);
> +}
> +
> +/* Must run with thread_mutex held */
> +static void kmod_test_done_check(struct kmod_test_device *test_dev,
> + unsigned int idx)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + test_dev->done++;
> + dev_dbg(test_dev->dev, "Done thread count: %u\n", test_dev->done);
> +
> + if (test_dev->done == config->num_threads) {
> + dev_info(test_dev->dev, "Done: %u threads have all run now\n",
> + test_dev->done);
> + dev_info(test_dev->dev, "Last thread to run: %u\n", idx);
> + complete(&test_dev->kthreads_done);
> + }
> +}
> +
> +static void test_kmod_put_module(struct kmod_test_device_info *info)
> +{
> + struct kmod_test_device *test_dev = info->test_dev;
> + struct test_config *config = &test_dev->config;
> +
> + if (!info->need_mod_put)
> + return;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + break;
> + case TEST_KMOD_FS_TYPE:
> + if (info && info->fs_sync && info->fs_sync->owner)
> + module_put(info->fs_sync->owner);
> + break;
> + default:
> + BUG();
> + }
> +
> + info->need_mod_put = true;
> +}
> +
> +static int run_request(void *data)
> +{
> + struct kmod_test_device_info *info = data;
> + struct kmod_test_device *test_dev = info->test_dev;
> + struct test_config *config = &test_dev->config;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + info->ret_sync = request_module("%s", config->test_driver);
> + break;
> + case TEST_KMOD_FS_TYPE:
> + info->fs_sync = get_fs_type(config->test_fs);
> + info->need_mod_put = true;
> + break;
> + default:
> + /* __trigger_config_run() already checked for test sanity */
> + BUG();
> + return -EINVAL;
> + }
> +
> + dev_dbg(test_dev->dev, "Ran thread %u\n", info->thread_idx);
> +
> + test_kmod_put_module(info);
> +
> + mutex_lock(&test_dev->thread_mutex);
> + info->task_sync = NULL;
> + kmod_test_done_check(test_dev, info->thread_idx);
> + mutex_unlock(&test_dev->thread_mutex);
> +
> + return 0;
> +}
> +
> +static int tally_work_test(struct kmod_test_device_info *info)
> +{
> + struct kmod_test_device *test_dev = info->test_dev;
> + struct test_config *config = &test_dev->config;
> + int err_ret = 0;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + /*
> + * Only capture errors, if one is found that's
> + * enough, for now.
> + */
> + if (info->ret_sync != 0)
> + err_ret = info->ret_sync;
> + dev_info(test_dev->dev,
> + "Sync thread %d return status: %d\n",
> + info->thread_idx, info->ret_sync);
> + break;
> + case TEST_KMOD_FS_TYPE:
> + /* For now we make this simple */
> + if (!info->fs_sync)
> + err_ret = -EINVAL;
> + dev_info(test_dev->dev, "Sync thread %u fs: %s\n",
> + info->thread_idx, info->fs_sync ? config->test_fs :
> + "NULL");
> + break;
> + default:
> + BUG();
> + }
> +
> + return err_ret;
> +}
> +
> +/*
> + * XXX: add result option to display if all errors did not match.
> + * For now we just keep any error code if one was found.
> + *
> + * If this ran it means *all* tasks were created fine and we
> + * are now just collecting results.
> + *
> + * Only propagate errors, do not override with a subsequent sucess case.
> + */
> +static void tally_up_work(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + struct kmod_test_device_info *info;
> + unsigned int idx;
> + int err_ret = 0;
> + int ret = 0;
> +
> + mutex_lock(&test_dev->thread_mutex);
> +
> + dev_info(test_dev->dev, "Results:\n");
> +
> + for (idx=0; idx < config->num_threads; idx++) {
> + info = &test_dev->info[idx];
> + ret = tally_work_test(info);
> + if (ret)
> + err_ret = ret;
> + }
> +
> + /*
> + * Note: request_module() returns 256 for a module not found even
> + * though modprobe itself returns 1.
> + */
> + config->test_result = err_ret;
> +
> + mutex_unlock(&test_dev->thread_mutex);
> +}
> +
> +static int try_one_request(struct kmod_test_device *test_dev, unsigned int idx)
> +{
> + struct kmod_test_device_info *info = &test_dev->info[idx];
> + int fail_ret = -ENOMEM;
> +
> + mutex_lock(&test_dev->thread_mutex);
> +
> + info->thread_idx = idx;
> + info->test_dev = test_dev;
> + info->task_sync = kthread_run(run_request, info, "%s-%u",
> + KBUILD_MODNAME, idx);
> +
> + if (!info->task_sync || IS_ERR(info->task_sync)) {
> + test_dev->test_is_oom = true;
> + dev_err(test_dev->dev, "Setting up thread %u failed\n", idx);
> + info->task_sync = NULL;
> + goto err_out;
> + } else
> + dev_dbg(test_dev->dev, "Kicked off thread %u\n", idx);
> +
> + mutex_unlock(&test_dev->thread_mutex);
> +
> + return 0;
> +
> +err_out:
> + info->ret_sync = fail_ret;
> + mutex_unlock(&test_dev->thread_mutex);
> +
> + return fail_ret;
> +}
> +
> +static void test_dev_kmod_stop_tests(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + struct kmod_test_device_info *info;
> + unsigned int i;
> +
> + dev_info(test_dev->dev, "Ending request_module() tests\n");
> +
> + mutex_lock(&test_dev->thread_mutex);
> +
> + for (i=0; i < config->num_threads; i++) {
> + info = &test_dev->info[i];
> + if (info->task_sync && !IS_ERR(info->task_sync)) {
> + dev_info(test_dev->dev,
> + "Stopping still-running thread %i\n", i);
> + kthread_stop(info->task_sync);
> + }
> +
> + /*
> + * info->task_sync is well protected, it can only be
> + * NULL or a pointer to a struct. If its NULL we either
> + * never ran, or we did and we completed the work. Completed
> + * tasks *always* put the module for us. This is a sanity
> + * check -- just in case.
> + */
> + if (info->task_sync && info->need_mod_put)
> + test_kmod_put_module(info);
> + }
> +
> + mutex_unlock(&test_dev->thread_mutex);
> +}
> +
> +/*
> + * Only wait *iff* we did not run into any errors during all of our thread
> + * set up. If run into any issues we stop threads and just bail out with
> + * an error to the trigger. This also means we don't need any tally work
> + * for any threads which fail.
> + */
> +static int try_requests(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + unsigned int idx;
> + int ret;
> + bool any_error = false;
> +
> + for (idx=0; idx < config->num_threads; idx++) {
> + if (test_dev->test_is_oom) {
> + any_error = true;
> + break;
> + }
> +
> + ret = try_one_request(test_dev, idx);
> + if (ret) {
> + any_error = true;
> + break;
> + }
> + }
> +
> + if (!any_error) {
> + test_dev->test_is_oom = false;
> + dev_info(test_dev->dev,
> + "No errors were found while initializing threads\n");
> + wait_for_completion(&test_dev->kthreads_done);
> + tally_up_work(test_dev);
> + } else {
> + test_dev->test_is_oom = true;
> + dev_info(test_dev->dev,
> + "At least one thread failed to start, stop all work\n");
> + test_dev_kmod_stop_tests(test_dev);
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +static int run_test_driver(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + dev_info(test_dev->dev, "Test case: %s (%u)\n",
> + test_case_str(config->test_case),
> + config->test_case);
> + dev_info(test_dev->dev, "Test driver to load: %s\n",
> + config->test_driver);
> + dev_info(test_dev->dev, "Number of threads to run: %u\n",
> + config->num_threads);
> + dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
> + config->num_threads - 1);
> +
> + return try_requests(test_dev);
> +}
> +
> +static int run_test_fs_type(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + dev_info(test_dev->dev, "Test case: %s (%u)\n",
> + test_case_str(config->test_case),
> + config->test_case);
> + dev_info(test_dev->dev, "Test filesystem to load: %s\n",
> + config->test_fs);
> + dev_info(test_dev->dev, "Number of threads to run: %u\n",
> + config->num_threads);
> + dev_info(test_dev->dev, "Thread IDs will range from 0 - %u\n",
> + config->num_threads - 1);
> +
> + return try_requests(test_dev);
> +}
> +
> +static ssize_t config_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> + int len = 0;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + len += sprintf(buf, "Custom trigger configuration for: %s\n",
> + dev_name(dev));
> +
> + len += sprintf(buf+len, "Number of threads:\t%u\n",
> + config->num_threads);
> +
> + len += sprintf(buf+len, "Test_case:\t%s (%u)\n",
> + test_case_str(config->test_case),
> + config->test_case);
> +
> + if (config->test_driver)
> + len += sprintf(buf+len, "driver:\t%s\n",
> + config->test_driver);
> + else
> + len += sprintf(buf+len, "driver:\tEMTPY\n");
> +
> + if (config->test_fs)
> + len += sprintf(buf+len, "fs:\t%s\n",
> + config->test_fs);
> + else
> + len += sprintf(buf+len, "fs:\tEMTPY\n");
These should all use snprintf...
> +
> +
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return len;
> +}
> +static DEVICE_ATTR_RO(config);
> +
> +/*
> + * This ensures we don't allow kicking threads through if our configuration
> + * is faulty.
> + */
> +static int __trigger_config_run(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + test_dev->done = 0;
> +
> + switch (config->test_case) {
> + case TEST_KMOD_DRIVER:
> + return run_test_driver(test_dev);
> + case TEST_KMOD_FS_TYPE:
> + return run_test_fs_type(test_dev);
> + default:
> + dev_warn(test_dev->dev,
> + "Invalid test case requested: %u\n",
> + config->test_case);
> + return -EINVAL;
> + }
> +}
> +
> +static int trigger_config_run(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + int ret;
> +
> + mutex_lock(&test_dev->trigger_mutex);
> + mutex_lock(&test_dev->config_mutex);
> +
> + ret = __trigger_config_run(test_dev);
> + if (ret < 0)
> + goto out;
> + dev_info(test_dev->dev, "General test result: %d\n",
> + config->test_result);
> +
> + /*
> + * We must return 0 after a trigger even unless something went
> + * wrong with the setup of the test. If the test setup went fine
> + * then userspace must just check the result of config->test_result.
> + * One issue with relying on the return from a call in the kernel
> + * is if the kernel returns a possitive value using this trigger
> + * will not return the value to userspace, it would be lost.
> + *
> + * By not relying on capturing the return value of tests we are using
> + * through the trigger it also us to run tests with set -e and only
> + * fail when something went wrong with the driver upon trigger
> + * requests.
> + */
> + ret = 0;
> +
> +out:
> + mutex_unlock(&test_dev->config_mutex);
> + mutex_unlock(&test_dev->trigger_mutex);
> +
> + return ret;
> +}
> +
> +static ssize_t
> +trigger_config_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + int ret;
> +
> + if (test_dev->test_is_oom)
> + return -ENOMEM;
> +
> + /* For all intents and purposes we don't care what userspace
> + * sent this trigger, we care only that we were triggered.
> + * We treat the return value only for caputuring issues with
> + * the test setup. At this point all the test variables should
> + * have been allocated so typically this should never fail.
> + */
> + ret = trigger_config_run(test_dev);
> + if (unlikely(ret < 0))
> + goto out;
> +
> + /*
> + * Note: any return > 0 will be treated as success
> + * and the error value will not be available to userspace.
> + * Do not rely on trying to send to userspace a test value
> + * return value as possitive return errors will be lost.
> + */
> + if (WARN_ON(ret > 0))
> + return -EINVAL;
> +
> + ret = count;
> +out:
> + return ret;
> +}
> +static DEVICE_ATTR_WO(trigger_config);
> +
> +/*
> + * XXX: move to kstrncpy() once merged.
> + *
> + * Users should use kfree_const() when freeing these.
> + */
> +static int __kstrncpy(char **dst, const char *name, size_t count, gfp_t gfp)
> +{
> + *dst = kstrndup(name, count, gfp);
> + if (!*dst)
> + return -ENOSPC;
> + return count;
> +}
> +
> +static int config_copy_test_driver_name(struct test_config *config,
> + const char *name,
> + size_t count)
> +{
> + return __kstrncpy(&config->test_driver, name, count, GFP_KERNEL);
> +}
> +
> +
> +static int config_copy_test_fs(struct test_config *config, const char *name,
> + size_t count)
> +{
> + return __kstrncpy(&config->test_fs, name, count, GFP_KERNEL);
> +}
> +
> +static void __kmod_config_free(struct test_config *config)
> +{
> + if (!config)
> + return;
> +
> + kfree_const(config->test_driver);
> + config->test_driver = NULL;
> +
> + kfree_const(config->test_fs);
> + config->test_driver = NULL;
> +}
> +
> +static void kmod_config_free(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config;
> +
> + if (!test_dev)
> + return;
> +
> + config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + __kmod_config_free(config);
> + mutex_unlock(&test_dev->config_mutex);
> +}
> +
> +static ssize_t config_test_driver_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> + int copied;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + kfree_const(config->test_driver);
> + config->test_driver = NULL;
> +
> + copied = config_copy_test_driver_name(config, buf, count);
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return copied;
> +}
> +
> +static ssize_t config_test_driver_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + strcpy(buf, config->test_driver);
> + strcat(buf, "\n");
IIUC, the show/store API uses a max size of PAGE_SIZE. If that's
correct, it's possible that this show routine could write past the end
of buf, due to the end newline, etc. Best to use snprintf like you do
below for the other shows.
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return strlen(buf) + 1;
> +}
> +static DEVICE_ATTR(config_test_driver, 0644, config_test_driver_show,
> + config_test_driver_store);
> +
> +static ssize_t config_test_fs_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> + int copied;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + kfree_const(config->test_fs);
> + config->test_fs = NULL;
> +
> + copied = config_copy_test_fs(config, buf, count);
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return copied;
> +}
> +
> +static ssize_t config_test_fs_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + strcpy(buf, config->test_fs);
> + strcat(buf, "\n");
> + mutex_unlock(&test_dev->config_mutex);
Same here... (which, btw, could likely use to be a helper function,
the show and store functions here are identical except for test_driver
vs test_fs).
> +
> + return strlen(buf) + 1;
> +}
> +static DEVICE_ATTR(config_test_fs, 0644, config_test_fs_show,
> + config_test_fs_store);
> +
> +static int trigger_config_run_driver(struct kmod_test_device *test_dev,
> + const char *test_driver)
> +{
> + int copied;
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + config->test_case = TEST_KMOD_DRIVER;
> +
> + kfree_const(config->test_driver);
> + config->test_driver = NULL;
> +
> + copied = config_copy_test_driver_name(config, test_driver,
> + strlen(test_driver));
> + mutex_unlock(&test_dev->config_mutex);
> +
> + if (copied != strlen(test_driver)) {
Can't these copied tests just check < 0? (i.e. avoid the repeated
strlen which can be fragile.)
> + test_dev->test_is_oom = true;
> + return -EINVAL;
> + }
> +
> + test_dev->test_is_oom = false;
> +
> + return trigger_config_run(test_dev);
> +}
> +
> +static int trigger_config_run_fs(struct kmod_test_device *test_dev,
> + const char *fs_type)
> +{
> + int copied;
> + struct test_config *config = &test_dev->config;
> +
> + mutex_lock(&test_dev->config_mutex);
> + config->test_case = TEST_KMOD_FS_TYPE;
> +
> + kfree_const(config->test_fs);
> + config->test_driver = NULL;
> +
> + copied = config_copy_test_fs(config, fs_type, strlen(fs_type));
> + mutex_unlock(&test_dev->config_mutex);
> +
> + if (copied != strlen(fs_type)) {
> + test_dev->test_is_oom = true;
> + return -EINVAL;
> + }
> +
> + test_dev->test_is_oom = false;
> +
> + return trigger_config_run(test_dev);
> +}
These two functions are almost identical too. Only test_case and the
copy function change...
> +
> +static void free_test_dev_info(struct kmod_test_device *test_dev)
> +{
> + if (test_dev->info) {
> + vfree(test_dev->info);
> + test_dev->info = NULL;
> + }
> +}
vfree() already checks for NULL, you can drop the if.
> +
> +static int kmod_config_sync_info(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> +
> + free_test_dev_info(test_dev);
> + test_dev->info = vzalloc(config->num_threads *
> + sizeof(struct kmod_test_device_info));
> + if (!test_dev->info) {
> + dev_err(test_dev->dev, "Cannot alloc test_dev info\n");
> + return -ENOMEM;
> + }
> +
> + return 0;
> +}
> +
> +/*
> + * Old kernels may not have this, if you want to port this code to
> + * test it on older kernels.
> + */
> +#ifdef get_kmod_umh_limit
> +static unsigned int kmod_init_test_thread_limit(void)
> +{
> + return get_kmod_umh_limit();
> +}
> +#else
> +static unsigned int kmod_init_test_thread_limit(void)
> +{
> + return TEST_START_NUM_THREADS;
> +}
> +#endif
> +
> +static int __kmod_config_init(struct kmod_test_device *test_dev)
> +{
> + struct test_config *config = &test_dev->config;
> + int ret = -ENOMEM, copied;
> +
> + __kmod_config_free(config);
> +
> + copied = config_copy_test_driver_name(config, TEST_START_DRIVER,
> + strlen(TEST_START_DRIVER));
> + if (copied != strlen(TEST_START_DRIVER))
> + goto err_out;
> +
> + copied = config_copy_test_fs(config, TEST_START_TEST_FS,
> + strlen(TEST_START_TEST_FS));
> + if (copied != strlen(TEST_START_TEST_FS))
> + goto err_out;
> +
> + config->num_threads = kmod_init_test_thread_limit();
> + config->test_result = 0;
> + config->test_case = TEST_START_TEST_CASE;
> +
> + ret = kmod_config_sync_info(test_dev);
> + if (ret)
> + goto err_out;
> +
> + test_dev->test_is_oom = false;
> +
> + return 0;
> +
> +err_out:
> + test_dev->test_is_oom = true;
> + WARN_ON(test_dev->test_is_oom);
> +
> + __kmod_config_free(config);
> +
> + return ret;
> +}
> +
> +static ssize_t reset_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + int ret;
> +
> + mutex_lock(&test_dev->trigger_mutex);
> + mutex_lock(&test_dev->config_mutex);
> +
> + ret = __kmod_config_init(test_dev);
> + if (ret < 0) {
> + ret = -ENOMEM;
> + dev_err(dev, "could not alloc settings for config trigger: %d\n",
> + ret);
> + goto out;
> + }
> +
> + dev_info(dev, "reset\n");
> + ret = count;
> +
> +out:
> + mutex_unlock(&test_dev->config_mutex);
> + mutex_unlock(&test_dev->trigger_mutex);
> +
> + return ret;
> +}
> +static DEVICE_ATTR_WO(reset);
> +
> +static int test_dev_config_update_uint_sync(struct kmod_test_device *test_dev,
> + const char *buf, size_t size,
> + unsigned int *config,
> + int (*test_sync)(struct kmod_test_device *test_dev))
> +{
> + int ret;
> + char *end;
> + long new = simple_strtol(buf, &end, 0);
> + unsigned int old_val;
> + if (end == buf || new > UINT_MAX)
> + return -EINVAL;
> +
> + mutex_lock(&test_dev->config_mutex);
> +
> + old_val = *config;
> + *(unsigned int *)config = new;
> +
> + ret = test_sync(test_dev);
> + if (ret) {
> + *(unsigned int *)config = old_val;
> +
> + ret = test_sync(test_dev);
> + WARN_ON(ret);
> +
> + mutex_unlock(&test_dev->config_mutex);
> + return -EINVAL;
> + }
> +
> + mutex_unlock(&test_dev->config_mutex);
> + /* Always return full write size even if we didn't consume all */
> + return size;
> +}
> +
> +static int test_dev_config_update_uint_range(struct kmod_test_device *test_dev,
> + const char *buf, size_t size,
> + unsigned int *config,
> + unsigned int min,
> + unsigned int max)
> +{
> + char *end;
> + long new = simple_strtol(buf, &end, 0);
> + if (end == buf || new < min || new > max || new > UINT_MAX)
> + return -EINVAL;
> +
> + mutex_lock(&test_dev->config_mutex);
> + *(unsigned int *)config = new;
config is already an unsigned int *, why cast?
> + mutex_unlock(&test_dev->config_mutex);
> +
> + /* Always return full write size even if we didn't consume all */
> + return size;
> +}
> +
> +static int test_dev_config_update_int(struct kmod_test_device *test_dev,
> + const char *buf, size_t size,
> + int *config)
> +{
> + char *end;
> + long new = simple_strtol(buf, &end, 0);
> + if (end == buf || new > INT_MAX || new < INT_MIN)
> + return -EINVAL;
> + mutex_lock(&test_dev->config_mutex);
> + *(int *)config = new;
config is already an int *, why cast?
> + mutex_unlock(&test_dev->config_mutex);
> + /* Always return full write size even if we didn't consume all */
> + return size;
> +}
> +
> +static ssize_t test_dev_config_show_int(struct kmod_test_device *test_dev,
> + char *buf,
> + int config)
> +{
> + int val;
> +
> + mutex_lock(&test_dev->config_mutex);
> + val = config;
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return snprintf(buf, PAGE_SIZE, "%d\n", val);
> +}
> +
> +static ssize_t test_dev_config_show_uint(struct kmod_test_device *test_dev,
> + char *buf,
> + unsigned int config)
> +{
> + unsigned int val;
> +
> + mutex_lock(&test_dev->config_mutex);
> + val = config;
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return snprintf(buf, PAGE_SIZE, "%u\n", val);
> +}
> +
> +static ssize_t test_result_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_update_int(test_dev, buf, count,
> + &config->test_result);
> +}
> +
> +static ssize_t config_num_threads_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_update_uint_sync(test_dev, buf, count,
> + &config->num_threads,
> + kmod_config_sync_info);
> +}
> +
> +static ssize_t config_num_threads_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_show_int(test_dev, buf, config->num_threads);
> +}
> +static DEVICE_ATTR(config_num_threads, 0644, config_num_threads_show,
> + config_num_threads_store);
> +
> +static ssize_t config_test_case_store(struct device *dev,
> + struct device_attribute *attr,
> + const char *buf, size_t count)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_update_uint_range(test_dev, buf, count,
> + &config->test_case,
> + __TEST_KMOD_INVALID + 1,
> + __TEST_KMOD_MAX - 1);
> +}
> +
> +static ssize_t config_test_case_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_show_uint(test_dev, buf, config->test_case);
> +}
> +static DEVICE_ATTR(config_test_case, 0644, config_test_case_show,
> + config_test_case_store);
> +
> +static ssize_t test_result_show(struct device *dev,
> + struct device_attribute *attr,
> + char *buf)
> +{
> + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> + struct test_config *config = &test_dev->config;
> +
> + return test_dev_config_show_int(test_dev, buf, config->test_result);
> +}
> +static DEVICE_ATTR(test_result, 0644, test_result_show, test_result_store);
> +
> +#define TEST_KMOD_DEV_ATTR(name) &dev_attr_##name.attr
> +
> +static struct attribute *test_dev_attrs[] = {
> + TEST_KMOD_DEV_ATTR(trigger_config),
> + TEST_KMOD_DEV_ATTR(config),
> + TEST_KMOD_DEV_ATTR(reset),
> +
> + TEST_KMOD_DEV_ATTR(config_test_driver),
> + TEST_KMOD_DEV_ATTR(config_test_fs),
> + TEST_KMOD_DEV_ATTR(config_num_threads),
> + TEST_KMOD_DEV_ATTR(config_test_case),
> + TEST_KMOD_DEV_ATTR(test_result),
> +
> + NULL,
> +};
> +
> +ATTRIBUTE_GROUPS(test_dev);
> +
> +static int kmod_config_init(struct kmod_test_device *test_dev)
> +{
> + int ret;
> +
> + mutex_lock(&test_dev->config_mutex);
> + ret = __kmod_config_init(test_dev);
> + mutex_unlock(&test_dev->config_mutex);
> +
> + return ret;
> +}
> +
> +/*
> + * XXX: this could perhaps be made generic already too, but a hunt
> + * for actual users would be needed first. It could be generic
> + * if other test drivers end up using a similar mechanism.
> + */
> +const char *test_dev_get_name(const char *base, int idx, gfp_t gfp)
> +{
> + const char *name_const;
> + char *name;
> +
> + if (!base)
> + return NULL;
> + if (strlen(base) > 30)
> + return NULL;
why?
> + name = kzalloc(1024, gfp);
> + if (!name)
> + return NULL;
> +
> + strncat(name, base, strlen(base));
> + sprintf(name+(strlen(base)), "%d", idx);
> + name_const = kstrdup_const(name, gfp);
> +
> + kfree(name);
> +
> + return name_const;
> +}
What is going on here? Why not just:
return kasprintf(gfp, "%s%d", base, idx);
For all of that code? And kstrdup_const is pointless here since it'll
always just do the dup (as the kmalloc source isn't in rodata).
> +
> +static struct kmod_test_device *alloc_test_dev_kmod(int idx)
> +{
> + int ret;
> + struct kmod_test_device *test_dev;
> + struct miscdevice *misc_dev;
> +
> + test_dev = vzalloc(sizeof(struct kmod_test_device));
> + if (!test_dev) {
> + pr_err("Cannot alloc test_dev\n");
> + goto err_out;
> + }
> +
> + mutex_init(&test_dev->config_mutex);
> + mutex_init(&test_dev->trigger_mutex);
> + mutex_init(&test_dev->thread_mutex);
> +
> + init_completion(&test_dev->kthreads_done);
> +
> + ret = kmod_config_init(test_dev);
> + if (ret < 0) {
> + pr_err("Cannot alloc kmod_config_init()\n");
> + goto err_out_free;
> + }
> +
> + test_dev->dev_idx = idx;
> + misc_dev = &test_dev->misc_dev;
> +
> + misc_dev->minor = MISC_DYNAMIC_MINOR;
> + misc_dev->name = test_dev_get_name("test_kmod", test_dev->dev_idx,
> + GFP_KERNEL);
> + if (!misc_dev->name) {
> + pr_err("Cannot alloc misc_dev->name\n");
> + goto err_out_free_config;
> + }
> + misc_dev->groups = test_dev_groups;
> +
> + return test_dev;
> +
> +err_out_free_config:
> + free_test_dev_info(test_dev);
> + kmod_config_free(test_dev);
> +err_out_free:
> + vfree(test_dev);
> + test_dev = NULL;
> +err_out:
> + return NULL;
> +}
> +
> +static void free_test_dev_kmod(struct kmod_test_device *test_dev)
> +{
> + if (test_dev) {
> + kfree_const(test_dev->misc_dev.name);
> + test_dev->misc_dev.name = NULL;
> + free_test_dev_info(test_dev);
> + kmod_config_free(test_dev);
> + vfree(test_dev);
> + test_dev = NULL;
> + }
> +}
> +
> +static struct kmod_test_device *register_test_dev_kmod(void)
> +{
> + struct kmod_test_device *test_dev = NULL;
> + int ret;
> +
> + mutex_unlock(®_dev_mutex);
> +
> + /* int should suffice for number of devices, test for wrap */
> + if (unlikely(num_test_devs + 1) < 0) {
> + pr_err("reached limit of number of test devices\n");
> + goto out;
> + }
> +
> + test_dev = alloc_test_dev_kmod(num_test_devs);
> + if (!test_dev)
> + goto out;
> +
> + ret = misc_register(&test_dev->misc_dev);
> + if (ret) {
> + pr_err("could not register misc device: %d\n", ret);
> + free_test_dev_kmod(test_dev);
> + goto out;
> + }
> +
> + test_dev->dev = test_dev->misc_dev.this_device;
> + list_add_tail(&test_dev->list, ®_test_devs);
> + dev_info(test_dev->dev, "interface ready\n");
> +
> + num_test_devs++;
> +
> +out:
> + mutex_unlock(®_dev_mutex);
> +
> + return test_dev;
> +
> +}
> +
> +static int __init test_kmod_init(void)
> +{
> + struct kmod_test_device *test_dev;
> + int ret;
> +
> + test_dev = register_test_dev_kmod();
> + if (!test_dev) {
> + pr_err("Cannot add first test kmod device\n");
> + return -ENODEV;
> + }
> +
> + /*
> + * With some work we might be able to gracefully enable
> + * testing with this driver built-in, for now this seems
> + * rather risky. For those willing to try have at it,
> + * and enable the below. Good luck! If that works, try
> + * lowering the init level for more fun.
> + */
> + if (force_init_test) {
> + ret = trigger_config_run_driver(test_dev, "tun");
> + if (WARN_ON(ret))
> + return ret;
> + ret = trigger_config_run_fs(test_dev, "btrfs");
> + if (WARN_ON(ret))
> + return ret;
> + }
> +
> + return 0;
> +}
> +late_initcall(test_kmod_init);
> +
> +static
> +void unregister_test_dev_kmod(struct kmod_test_device *test_dev)
> +{
> + mutex_lock(&test_dev->trigger_mutex);
> + mutex_lock(&test_dev->config_mutex);
> +
> + test_dev_kmod_stop_tests(test_dev);
> +
> + dev_info(test_dev->dev, "removing interface\n");
> + misc_deregister(&test_dev->misc_dev);
> +
> + mutex_unlock(&test_dev->config_mutex);
> + mutex_unlock(&test_dev->trigger_mutex);
> +
> + free_test_dev_kmod(test_dev);
> +}
> +
> +static void __exit test_kmod_exit(void)
> +{
> + struct kmod_test_device *test_dev, *tmp;
> +
> + mutex_lock(®_dev_mutex);
> + list_for_each_entry_safe(test_dev, tmp, ®_test_devs, list) {
> + list_del(&test_dev->list);
> + unregister_test_dev_kmod(test_dev);
> + }
> + mutex_unlock(®_dev_mutex);
> +}
> +module_exit(test_kmod_exit);
> +
> +MODULE_AUTHOR("Luis R. Rodriguez <mcgrof@kernel.org>");
> +MODULE_LICENSE("GPL");
> diff --git a/tools/testing/selftests/kmod/Makefile b/tools/testing/selftests/kmod/Makefile
> new file mode 100644
> index 000000000000..fa2ccc5fb3de
> --- /dev/null
> +++ b/tools/testing/selftests/kmod/Makefile
> @@ -0,0 +1,11 @@
> +# Makefile for kmod loading selftests
> +
> +# No binaries, but make sure arg-less "make" doesn't trigger "run_tests"
> +all:
> +
> +TEST_PROGS := kmod.sh
> +
> +include ../lib.mk
> +
> +# Nothing to clean up.
> +clean:
> diff --git a/tools/testing/selftests/kmod/config b/tools/testing/selftests/kmod/config
> new file mode 100644
> index 000000000000..259f4fd6b5e2
> --- /dev/null
> +++ b/tools/testing/selftests/kmod/config
> @@ -0,0 +1,7 @@
> +CONFIG_TEST_KMOD=m
> +CONFIG_TEST_LKM=m
> +CONFIG_XFS_FS=m
> +
> +# For the module parameter force_init_test is used
> +CONFIG_TUN=m
> +CONFIG_BTRFS_FS=m
> diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
> new file mode 100755
> index 000000000000..9ea1864d8bae
> --- /dev/null
> +++ b/tools/testing/selftests/kmod/kmod.sh
> @@ -0,0 +1,449 @@
> +#!/bin/bash
> +#
> +# Copyright (C) 2016 Luis R. Rodriguez <mcgrof@kernel.org>
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms of copyleft-next (version 0.3.1 or later) as published
> +# at http://copyleft-next.org/.
> +
> +# This is a stress test script for kmod, the kernel module loader. It uses
> +# test_kmod which exposes a series of knobs for the API for us so we can
> +# tweak each test in userspace rather than in kernelspace.
> +#
> +# The way kmod works is it uses the kernel's usermode helper API to eventually
> +# call /sbin/modprobe. It has a limit of the number of concurrent calls
> +# possible. The kernel interface to load modules is request_module(), however
> +# mount uses get_fs_type(). Both behave slightly differently, but the
> +# differences are important enough to test each call separately. For this
> +# reason test_kmod starts by providing tests for both calls.
> +#
> +# The test driver test_kmod assumes a series of defaults which you can
> +# override by exporting to your environment prior running this script.
> +# For instance this script assumes you do not have xfs loaded upon boot.
> +# If this is false, export DEFAULT_KMOD_FS="ext4" prior to running this
> +# script if the filesyste module you don't have loaded upon bootup
> +# is ext4 instead. Refer to allow_user_defaults() for a list of user
> +# override variables possible.
> +#
> +# You'll want at least 4096 GiB of RAM to expect to run these tests
4TiB of RAM? I assume this was meant to be 4 GiB not 4096?
> +# without running out of memory on them. For other requirements refer
> +# to test_reqs()
> +
> +set -e
> +
> +TEST_DRIVER="test_kmod"
> +
> +function allow_user_defaults()
> +{
> + if [ -z $DEFAULT_KMOD_DRIVER ]; then
> + DEFAULT_KMOD_DRIVER="test_module"
> + fi
> +
> + if [ -z $DEFAULT_KMOD_FS ]; then
> + DEFAULT_KMOD_FS="xfs"
> + fi
> +
> + if [ -z $PROC_DIR ]; then
> + PROC_DIR="/proc/sys/kernel/"
> + fi
> +
> + if [ -z $MODPROBE_LIMIT ]; then
> + MODPROBE_LIMIT=50
> + fi
> +
> + if [ -z $DIR ]; then
> + DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0/"
> + fi
> +
> + MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
> +}
> +
> +test_reqs()
> +{
> + if ! which modprobe 2> /dev/null > /dev/null; then
> + echo "$0: You need modprobe installed"
While not a huge deal, I prefer that error messages end up on stderr,
so adding >&2 to all the failure echos (or providing an err function)
would be nice. (This happens in later places...)
> + exit 1
> + fi
> +
> + if ! which kmod 2> /dev/null > /dev/null; then
> + echo "$0: You need kmod installed"
> + exit 1
> + fi
> +
> + # kmod 19 has a bad bug where it returns 0 when modprobe
> + # gets called *even* if the module was not loaded due to
> + # some bad heuristics. For details see:
> + #
> + # A work around is possible in-kernel but its rather
> + # complex.
> + KMOD_VERSION=$(kmod --version | awk '{print $3}')
> + if [[ $KMOD_VERSION -le 19 ]]; then
> + echo "$0: You need at least kmod 20"
> + echo "kmod <= 19 is buggy, for details see:"
> + echo "http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4"
> + exit 1
> + fi
> +}
> +
> +function load_req_mod()
> +{
> + if [ ! -d $DIR ]; then
> + # Alanis: "Oh isn't it ironic?"
> + modprobe $TEST_DRIVER
> + if [ ! -d $DIR ]; then
> + echo "$0: $DIR not present"
> + echo "You must have the following enabled in your kernel:"
> + cat $PWD/config
I like this (minimum config in the test directory). Are other tests
doing this too?
> + exit 1
> + fi
> + fi
> +}
> +
> +test_finish()
> +{
> + echo "Test completed"
> +}
> +
> +errno_name_to_val()
> +{
> + case "$1" in
> + # kmod calls modprobe and upon of a module not found
> + # modprobe returns just 1... However in the kernel we
> + # *sometimes* see 256...
> + MODULE_NOT_FOUND)
> + echo 256;;
> + SUCCESS)
> + echo 0;;
> + -EPERM)
> + echo -1;;
> + -ENOENT)
> + echo -2;;
> + -EINVAL)
> + echo -22;;
> + -ERR_ANY)
> + echo -123456;;
> + *)
> + echo invalid;;
> + esac
> +}
> +
> +errno_val_to_name()
> + case "$1" in
> + 256)
> + echo MODULE_NOT_FOUND;;
> + 0)
> + echo SUCCESS;;
> + -1)
> + echo -EPERM;;
> + -2)
> + echo -ENOENT;;
> + -22)
> + echo -EINVAL;;
> + -123456)
> + echo -ERR_ANY;;
> + *)
> + echo invalid;;
> + esac
> +
> +config_set_test_case_driver()
> +{
> + if ! echo -n 1 >$DIR/config_test_case; then
> + echo "$0: Unable to set to test case to driver" >&2
> + exit 1
> + fi
> +}
> +
> +config_set_test_case_fs()
> +{
> + if ! echo -n 2 >$DIR/config_test_case; then
> + echo "$0: Unable to set to test case to fs" >&2
> + exit 1
> + fi
> +}
> +
> +config_num_threads()
> +{
> + if ! echo -n $1 >$DIR/config_num_threads; then
> + echo "$0: Unable to set to number of threads" >&2
> + exit 1
> + fi
> +}
> +
> +config_get_modprobe_limit()
> +{
> + if [[ -f ${MODPROBE_LIMIT_FILE} ]] ; then
> + MODPROBE_LIMIT=$(cat $MODPROBE_LIMIT_FILE)
> + fi
> + echo $MODPROBE_LIMIT
> +}
> +
> +config_num_thread_limit_extra()
> +{
> + MODPROBE_LIMIT=$(config_get_modprobe_limit)
> + let EXTRA_LIMIT=$MODPROBE_LIMIT+$1
> + config_num_threads $EXTRA_LIMIT
> +}
> +
> +# For special characters use printf directly,
> +# refer to kmod_test_0001
> +config_set_driver()
> +{
> + if ! echo -n $1 >$DIR/config_test_driver; then
> + echo "$0: Unable to set driver" >&2
> + exit 1
> + fi
> +}
> +
> +config_set_fs()
> +{
> + if ! echo -n $1 >$DIR/config_test_fs; then
> + echo "$0: Unable to set driver" >&2
> + exit 1
> + fi
> +}
> +
> +config_get_driver()
> +{
> + cat $DIR/config_test_driver
> +}
> +
> +config_get_test_result()
> +{
> + cat $DIR/test_result
> +}
> +
> +config_reset()
> +{
> + if ! echo -n "1" >"$DIR"/reset; then
> + echo "$0: reset shuld have worked" >&2
> + exit 1
> + fi
> +}
> +
> +config_show_config()
> +{
> + echo "----------------------------------------------------"
> + cat "$DIR"/config
> + echo "----------------------------------------------------"
> +}
> +
> +config_trigger()
> +{
> + if ! echo -n "1" >"$DIR"/trigger_config 2>/dev/null; then
> + echo "$1: FAIL - loading should have worked"
> + config_show_config
> + exit 1
> + fi
> + echo "$1: OK! - loading kmod test"
> +}
> +
> +config_trigger_want_fail()
> +{
> + if echo "1" > $DIR/trigger_config 2>/dev/null; then
> + echo "$1: FAIL - test case was expected to fail"
> + config_show_config
> + exit 1
> + fi
> + echo "$1: OK! - kmod test case failed as expected"
> +}
> +
> +config_expect_result()
> +{
> + RC=$(config_get_test_result)
> + RC_NAME=$(errno_val_to_name $RC)
> +
> + ERRNO_NAME=$2
> + ERRNO=$(errno_name_to_val $ERRNO_NAME)
> +
> + if [[ $ERRNO_NAME = "-ERR_ANY" ]]; then
> + if [[ $RC -ge 0 ]]; then
> + echo "$1: FAIL, test expects $ERRNO_NAME - got $RC_NAME ($RC)" >&2
> + config_show_config
> + exit 1
> + fi
> + elif [[ $RC != $ERRNO ]]; then
> + echo "$1: FAIL, test expects $ERRNO_NAME ($ERRNO) - got $RC_NAME ($RC)" >&2
> + config_show_config
> + exit 1
> + fi
> + echo "$1: OK! - Return value: $RC ($RC_NAME), expected $ERRNO_NAME"
> +}
> +
> +kmod_defaults_driver()
> +{
> + config_reset
> + modprobe -r $DEFAULT_KMOD_DRIVER
> + config_set_driver $DEFAULT_KMOD_DRIVER
> +}
> +
> +kmod_defaults_fs()
> +{
> + config_reset
> + modprobe -r $DEFAULT_KMOD_FS
> + config_set_fs $DEFAULT_KMOD_FS
> + config_set_test_case_fs
> +}
> +
> +kmod_test_0001_driver()
> +{
> + NAME='\000'
> +
> + kmod_defaults_driver
> + config_num_threads 1
> + printf '\000' >"$DIR"/config_test_driver
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
> +}
> +
> +kmod_test_0001_fs()
> +{
> + NAME='\000'
> +
> + kmod_defaults_fs
> + config_num_threads 1
> + printf '\000' >"$DIR"/config_test_fs
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +kmod_test_0001()
> +{
> + kmod_test_0001_driver
> + kmod_test_0001_fs
> +}
> +
> +kmod_test_0002_driver()
> +{
> + NAME="nope-$DEFAULT_KMOD_DRIVER"
> +
> + kmod_defaults_driver
> + config_set_driver $NAME
> + config_num_threads 1
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} MODULE_NOT_FOUND
> +}
> +
> +kmod_test_0002_fs()
> +{
> + NAME="nope-$DEFAULT_KMOD_FS"
> +
> + kmod_defaults_fs
> + config_set_fs $NAME
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +kmod_test_0002()
> +{
> + kmod_test_0002_driver
> + kmod_test_0002_fs
> +}
> +
> +kmod_test_0003()
> +{
> + kmod_defaults_fs
> + config_num_threads 1
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0004()
> +{
> + kmod_defaults_fs
> + config_num_threads 2
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0005()
> +{
> + kmod_defaults_driver
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0006()
> +{
> + kmod_defaults_fs
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} SUCCESS
> +}
> +
> +kmod_test_0007()
> +{
> + kmod_test_0005
> + kmod_test_0006
> +}
> +
> +kmod_test_0008()
> +{
> + kmod_defaults_driver
> + MODPROBE_LIMIT=$(config_get_modprobe_limit)
> + let EXTRA=$MODPROBE_LIMIT/2
> + config_num_thread_limit_extra $EXTRA
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +kmod_test_0009()
> +{
> + kmod_defaults_fs
> + #MODPROBE_LIMIT=$(config_get_modprobe_limit)
> + #let EXTRA=$MODPROBE_LIMIT/3
> + config_num_thread_limit_extra 5
> + config_trigger ${FUNCNAME[0]}
> + config_expect_result ${FUNCNAME[0]} -EINVAL
> +}
> +
> +trap "test_finish" EXIT
> +test_reqs
> +allow_user_defaults
> +load_req_mod
> +
> +usage()
> +{
> + echo "Usage: $0 [ -t <4-number-digit> ]"
> + echo "Valid tests: 0001-0011"
> + echo
> + echo "0001 - Simple test - 1 thread for empty string"
> + echo "0002 - Simple test - 1 thread for modules/filesystems that do not exist"
> + echo "0003 - Simple test - 1 thread for get_fs_type() only"
> + echo "0004 - Simple test - 2 threads for get_fs_type() only"
> + echo "0005 - multithreaded tests with default setup - request_module() only"
> + echo "0006 - multithreaded tests with default setup - get_fs_type() only"
> + echo "0007 - multithreaded tests with default setup test request_module() and get_fs_type()"
> + echo "0008 - multithreaded - push kmod_concurrent over max_modprobes for request_module()"
> + echo "0009 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()"
> + exit 1
> +}
> +
> +# You can ask for a specific test:
> +if [[ $# > 0 ]] ; then
> + if [[ $1 != "-t" ]]; then
> + usage
> + fi
> +
> + re='^[0-9]+$'
> + if ! [[ $2 =~ $re ]]; then
> + usage
> + fi
> +
> + RUN_TEST=kmod_test_$2
> + $RUN_TEST
> + exit 0
> +fi
> +
> +# Once tese are enabled please leave them as-is. Write your own test,
> +# we have tons of space.
> +kmod_test_0001
> +kmod_test_0002
> +kmod_test_0003
> +kmod_test_0004
> +kmod_test_0005
> +kmod_test_0006
> +kmod_test_0007
> +
> +#kmod_test_0008
> +#kmod_test_0009
While it's documented in the commit log, I think a short note for each
disabled test should be added here too.
> +
> +exit 0
> --
> 2.10.1
>
-Kees
--
Kees Cook
Nexus Security
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 01/10] kmod: add test driver to stress test the module loader
2016-12-08 20:24 ` Kees Cook
@ 2016-12-13 21:10 ` Luis R. Rodriguez
2016-12-16 7:41 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-13 21:10 UTC (permalink / raw)
To: Kees Cook
Cc: Luis R. Rodriguez, shuah, Jessica Yu, Rusty Russell,
Arnd Bergmann, Eric W. Biederman, Dmitry Torokhov,
Arnaldo Carvalho de Melo, Jonathan Corbet, martin.wilck,
Michal Marek, Petr Mladek, hare, rwright, Jeff Mahoney, DSterba,
fdmanana, neilb, rgoldwyn, subashab, Heinrich Schuchardt,
Aaron Tomlin, mbenes, Paul E. McKenney, Dan Williams,
Josh Poimboeuf, David S. Miller, Ingo Molnar, Andrew Morton,
Linus Torvalds, linux-kselftest, linux-doc, LKML
On Thu, Dec 08, 2016 at 12:24:35PM -0800, Kees Cook wrote:
> On Thu, Dec 8, 2016 at 10:47 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > The kmod.sh script uses the above constructs to build differnt test cases.
>
> Typo: different
Fixed.
> > 3) finit_module() consumes quite a bit of memory.
>
> Is this due to reading the module into kernel memory or something else?
Very likely yes, but to be honest I have not had chance to instrument too
carefully, its TODO work :)
> > 4) Filesystems typically also have more dependent modules than other
> > modules, its important to note though that even though a get_fs_type() call
> > does not incur additional kmod_concurrent bumps, since userspace
> > loads dependencies it finds it needs via finit_module_fd(), it *will*
> > take much more memory to load a module with a lot of dependencies.
> >
> > Because of 3) and 4) we will easily run into out of memory failures
> > with certain tests. For instance test 0006 fails on qemu with 1024 MiB
> > of RAM. It panics a box after reaping all userspace processes and still
> > not having enough memory to reap.
>
> Are the buffers not released until after all the dependent modules are
> loaded? I thought it would load one by one?
kmod.c allows up to kmod_concurrent concurrent requests out to userspace,
how it handles this is up to userspace, but note that prior to the knobs
exposed in this patch set userspace neither knew what kmod_concurrent
was nor how many concurrent threads are active at any point in time.
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>
> This is a great selftest, thanks for working on it!
My pleasure.
> Notes below...
>
> > diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
> > index 7446097f72bd..6cad548e0682 100644
> > --- a/lib/Kconfig.debug
> > +++ b/lib/Kconfig.debug
> > @@ -1994,6 +1994,31 @@ config BUG_ON_DATA_CORRUPTION
> >
> > If unsure, say N.
> >
> > +config TEST_KMOD
> > + tristate "kmod stress tester"
> > + default n
> > + depends on m
> > + select TEST_LKM
> > + select XFS_FS
> > + select TUN
> > + select BTRFS_FS
>
> Since the desired FS can be changed at runtime, maybe these selects
> aren't needed?
Well yes and no, yes because its the defaults built-in. No, because as you note
we can alter the defaults in userspace. Without the alternatives being set the
driver will not really work at all though. Here is an example where Arnd's
kconfig "suggests" for kconfig could come in handy. Until we have that I think
I'd prefer to keep it this way.
> > diff --git a/lib/test_kmod.c b/lib/test_kmod.c
> > new file mode 100644
> > index 000000000000..63fded83b9b6
> > --- /dev/null
> > +++ b/lib/test_kmod.c
> > @@ -0,0 +1,1248 @@
> > +static bool force_init_test = false;
> > +module_param(force_init_test, bool_enable_only, 0644);
> > +MODULE_PARM_DESC(force_init_test,
> > + "Force kicking a test immediatley after driver loads");
>
> Typo: immediately
Fixed.
> > +static ssize_t config_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> > + struct test_config *config = &test_dev->config;
> > + int len = 0;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > +
> > + len += sprintf(buf, "Custom trigger configuration for: %s\n",
> > + dev_name(dev));
> > +
> > + len += sprintf(buf+len, "Number of threads:\t%u\n",
> > + config->num_threads);
> > +
> > + len += sprintf(buf+len, "Test_case:\t%s (%u)\n",
> > + test_case_str(config->test_case),
> > + config->test_case);
> > +
> > + if (config->test_driver)
> > + len += sprintf(buf+len, "driver:\t%s\n",
> > + config->test_driver);
> > + else
> > + len += sprintf(buf+len, "driver:\tEMTPY\n");
> > +
> > + if (config->test_fs)
> > + len += sprintf(buf+len, "fs:\t%s\n",
> > + config->test_fs);
> > + else
> > + len += sprintf(buf+len, "fs:\tEMTPY\n");
>
> These should all use snprintf...
Fixed. If the caller is sysfs_kf_seq_show() then max is PAGE_SIZE, will
use that as the limit to start with.
> > +static ssize_t config_test_driver_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + strcpy(buf, config->test_driver);
> > + strcat(buf, "\n");
>
> IIUC, the show/store API uses a max size of PAGE_SIZE. If that's
> correct, it's possible that this show routine could write past the end
> of buf, due to the end newline, etc. Best to use snprintf like you do
> below for the other shows.
Sure.
> > +static ssize_t config_test_fs_show(struct device *dev,
> > + struct device_attribute *attr,
> > + char *buf)
> > +{
> > + struct kmod_test_device *test_dev = dev_to_test_dev(dev);
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + strcpy(buf, config->test_fs);
> > + strcat(buf, "\n");
> > + mutex_unlock(&test_dev->config_mutex);
>
> Same here... (which, btw, could likely use to be a helper function,
> the show and store functions here are identical except for test_driver
> vs test_fs).
Sure, I'm starting to think a lot of test boiler plate for setup and show of
config stuff could be shared. We can consider this more once we have a few more
test drivers like this. I have 3 total now in the pipeline.
> > +
> > + return strlen(buf) + 1;
> > +}
> > +static DEVICE_ATTR(config_test_fs, 0644, config_test_fs_show,
> > + config_test_fs_store);
> > +
> > +static int trigger_config_run_driver(struct kmod_test_device *test_dev,
> > + const char *test_driver)
> > +{
> > + int copied;
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > +
> > + config->test_case = TEST_KMOD_DRIVER;
> > +
> > + kfree_const(config->test_driver);
> > + config->test_driver = NULL;
> > +
> > + copied = config_copy_test_driver_name(config, test_driver,
> > + strlen(test_driver));
> > + mutex_unlock(&test_dev->config_mutex);
> > +
> > + if (copied != strlen(test_driver)) {
>
> Can't these copied tests just check < 0? (i.e. avoid the repeated
> strlen which can be fragile.)
Sure, it can be:
if (copied <= 0 || copied != strlen(test_driver)) {
That way its both a negative check and also that something
non-empty was passed.
> > + test_dev->test_is_oom = true;
> > + return -EINVAL;
And come to think of it, these should return -ENOMEM;
> > + }
> > +
> > + test_dev->test_is_oom = false;
> > +
> > + return trigger_config_run(test_dev);
> > +}
> > +
> > +static int trigger_config_run_fs(struct kmod_test_device *test_dev,
> > + const char *fs_type)
> > +{
> > + int copied;
> > + struct test_config *config = &test_dev->config;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + config->test_case = TEST_KMOD_FS_TYPE;
> > +
> > + kfree_const(config->test_fs);
> > + config->test_driver = NULL;
> > +
> > + copied = config_copy_test_fs(config, fs_type, strlen(fs_type));
> > + mutex_unlock(&test_dev->config_mutex);
> > +
> > + if (copied != strlen(fs_type)) {
> > + test_dev->test_is_oom = true;
> > + return -EINVAL;
> > + }
> > +
> > + test_dev->test_is_oom = false;
> > +
> > + return trigger_config_run(test_dev);
> > +}
>
> These two functions are almost identical too. Only test_case and the
> copy function change...
They are now shared.
> > +static void free_test_dev_info(struct kmod_test_device *test_dev)
> > +{
> > + if (test_dev->info) {
> > + vfree(test_dev->info);
> > + test_dev->info = NULL;
> > + }
> > +}
>
> vfree() already checks for NULL, you can drop the if.
Fixed.
> > +static int test_dev_config_update_uint_range(struct kmod_test_device *test_dev,
> > + const char *buf, size_t size,
> > + unsigned int *config,
> > + unsigned int min,
> > + unsigned int max)
> > +{
> > + char *end;
> > + long new = simple_strtol(buf, &end, 0);
> > + if (end == buf || new < min || new > max || new > UINT_MAX)
> > + return -EINVAL;
> > +
> > + mutex_lock(&test_dev->config_mutex);
> > + *(unsigned int *)config = new;
>
> config is already an unsigned int *, why cast?
Fixed.
> > +static int test_dev_config_update_int(struct kmod_test_device *test_dev,
> > + const char *buf, size_t size,
> > + int *config)
> > +{
> > + char *end;
> > + long new = simple_strtol(buf, &end, 0);
> > + if (end == buf || new > INT_MAX || new < INT_MIN)
> > + return -EINVAL;
> > + mutex_lock(&test_dev->config_mutex);
> > + *(int *)config = new;
>
> config is already an int *, why cast?
Fixed.
> > +/*
> > + * XXX: this could perhaps be made generic already too, but a hunt
> > + * for actual users would be needed first. It could be generic
> > + * if other test drivers end up using a similar mechanism.
> > + */
> > +const char *test_dev_get_name(const char *base, int idx, gfp_t gfp)
> > +{
> > + const char *name_const;
> > + char *name;
> > +
> > + if (!base)
> > + return NULL;
> > + if (strlen(base) > 30)
> > + return NULL;
>
> why?
It was an arbitrary limit, will use PAGE_SIZE. But I'll just remove the
entire routine (see below).
>
> > + name = kzalloc(1024, gfp);
> > + if (!name)
> > + return NULL;
> > +
> > + strncat(name, base, strlen(base));
> > + sprintf(name+(strlen(base)), "%d", idx);
> > + name_const = kstrdup_const(name, gfp);
> > +
> > + kfree(name);
> > +
> > + return name_const;
> > +}
>
> What is going on here? Why not just:
> return kasprintf(gfp, "%s%d", base, idx);
>
> For all of that code? And kstrdup_const is pointless here since it'll
> always just do the dup (as the kmalloc source isn't in rodata).
Heh, yeah, true, nuked.
> > diff --git a/tools/testing/selftests/kmod/config b/tools/testing/selftests/kmod/config
> > new file mode 100644
> > index 000000000000..259f4fd6b5e2
> > --- /dev/null
> > +++ b/tools/testing/selftests/kmod/config
> > @@ -0,0 +1,7 @@
> > +CONFIG_TEST_KMOD=m
> > +CONFIG_TEST_LKM=m
> > +CONFIG_XFS_FS=m
> > +
> > +# For the module parameter force_init_test is used
> > +CONFIG_TUN=m
> > +CONFIG_BTRFS_FS=m
> > diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
> > new file mode 100755
> > index 000000000000..9ea1864d8bae
> > --- /dev/null
> > +++ b/tools/testing/selftests/kmod/kmod.sh
> > @@ -0,0 +1,449 @@
> > +#!/bin/bash
> > +#
<-- snip -->
> > +# You'll want at least 4096 GiB of RAM to expect to run these tests
>
> 4TiB of RAM? I assume this was meant to be 4 GiB not 4096?
Whoops, yeah sorry 4 GiB only.
> > +# without running out of memory on them. For other requirements refer
> > +# to test_reqs()
> > +
> > +set -e
> > +
> > +TEST_DRIVER="test_kmod"
> > +
> > +function allow_user_defaults()
> > +{
> > + if [ -z $DEFAULT_KMOD_DRIVER ]; then
> > + DEFAULT_KMOD_DRIVER="test_module"
> > + fi
> > +
> > + if [ -z $DEFAULT_KMOD_FS ]; then
> > + DEFAULT_KMOD_FS="xfs"
> > + fi
> > +
> > + if [ -z $PROC_DIR ]; then
> > + PROC_DIR="/proc/sys/kernel/"
> > + fi
> > +
> > + if [ -z $MODPROBE_LIMIT ]; then
> > + MODPROBE_LIMIT=50
> > + fi
> > +
> > + if [ -z $DIR ]; then
> > + DIR="/sys/devices/virtual/misc/${TEST_DRIVER}0/"
> > + fi
> > +
> > + MODPROBE_LIMIT_FILE="${PROC_DIR}/kmod-limit"
> > +}
> > +
> > +test_reqs()
> > +{
> > + if ! which modprobe 2> /dev/null > /dev/null; then
> > + echo "$0: You need modprobe installed"
>
> While not a huge deal, I prefer that error messages end up on stderr,
> so adding >&2 to all the failure echos (or providing an err function)
> would be nice. (This happens in later places...)
Addressed.
> > +function load_req_mod()
> > +{
> > + if [ ! -d $DIR ]; then
> > + # Alanis: "Oh isn't it ironic?"
> > + modprobe $TEST_DRIVER
> > + if [ ! -d $DIR ]; then
> > + echo "$0: $DIR not present"
> > + echo "You must have the following enabled in your kernel:"
> > + cat $PWD/config
>
> I like this (minimum config in the test directory). Are other tests
> doing this too?
mcgrof@ergon ~/linux-next (git::(no branch, rebasing 20161213-kmod-test-driver))$ find tools/testing/selftests/ -name config
tools/testing/selftests/static_keys/config
tools/testing/selftests/cpu-hotplug/config
tools/testing/selftests/ipc/config
tools/testing/selftests/mount/config
tools/testing/selftests/zram/config
tools/testing/selftests/seccomp/config
tools/testing/selftests/memory-hotplug/config
tools/testing/selftests/vm/config
tools/testing/selftests/ftrace/config
tools/testing/selftests/pstore/config
tools/testing/selftests/firmware/config
tools/testing/selftests/net/config
tools/testing/selftests/bpf/config
tools/testing/selftests/user/config
tools/testing/selftests/kmod/config
Seems like a hipster trend.
> > +# Once tese are enabled please leave them as-is. Write your own test,
> > +# we have tons of space.
> > +kmod_test_0001
> > +kmod_test_0002
> > +kmod_test_0003
> > +kmod_test_0004
> > +kmod_test_0005
> > +kmod_test_0006
> > +kmod_test_0007
> > +
> > +#kmod_test_0008
> > +#kmod_test_0009
>
> While it's documented in the commit log, I think a short note for each
> disabled test should be added here too.
Will do, thanks so much for the review!
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 01/10] kmod: add test driver to stress test the module loader
2016-12-13 21:10 ` Luis R. Rodriguez
@ 2016-12-16 7:41 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-16 7:41 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Kees Cook, shuah, Jessica Yu, Rusty Russell, Arnd Bergmann,
Eric W. Biederman, Dmitry Torokhov, Arnaldo Carvalho de Melo,
Jonathan Corbet, martin.wilck, Michal Marek, Petr Mladek, hare,
rwright, Jeff Mahoney, DSterba, fdmanana, neilb, rgoldwyn,
subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Tue, Dec 13, 2016 at 10:10:41PM +0100, Luis R. Rodriguez wrote:
> On Thu, Dec 08, 2016 at 12:24:35PM -0800, Kees Cook wrote:
> > On Thu, Dec 8, 2016 at 10:47 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > 3) finit_module() consumes quite a bit of memory.
> >
> > Is this due to reading the module into kernel memory or something else?
>
> Very likely yes, but to be honest I have not had chance to instrument too
> carefully, its TODO work :)
I've checked and the issue is since get_fs_type() does not check for
aliases we end up hammering tons of module requests, this in turn is
an analysis on load_module(). Within there layout_and_allocate()
uses first a local copy of the passed user data and mapping it into
a struct module, after a bit of sanity checks it finally allocates a
copy for us, so its struct module size * however many requests were
allowed to get in for load_module(). We could simply avoid an allocation
if the module is already present. I have this as another optimization
now but am running many other tests to compare performance.
> > > +# Once tese are enabled please leave them as-is. Write your own test,
> > > +# we have tons of space.
> > > +kmod_test_0001
> > > +kmod_test_0002
> > > +kmod_test_0003
> > > +kmod_test_0004
> > > +kmod_test_0005
> > > +kmod_test_0006
> > > +kmod_test_0007
> > > +
> > > +#kmod_test_0008
> > > +#kmod_test_0009
> >
> > While it's documented in the commit log, I think a short note for each
> > disabled test should be added here too.
>
> Will do, thanks so much for the review!
As I added test 0008's reason for why I think it fails I realized that the reason the test
can sometimes fail is very different than test 0009 which is for get_fs_type(). You see
get_fs_type() hammers kmod concurrent since we don't have an alias check and moprobe
calling fs-xfs for instance does not catch that the module is already loaded so it
delays the get_fs_type() call and so the __request_module() call, hogging up its
kmod concurrent increment.
For direct request_module() calls we don't have the alias issue, but since
we don't check if a module is loaded prior to calling userspace (I now have a fix
for this, reducing this latency does help) it means there are often times the
chances we will pour in tons of requests without them getting processed and
go over the concurrent limit.
I've added a clutch into __request_module() then so instead of just failing
we first check if we're at a threshold (say about 1/4 away from limit) and
if so we let a few threads breath, until they are done. This fixes *both*
test cases without much code changes, however as I've noted in other threads,
this is not the only issue to address.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* [RFC 02/10] module: fix memory leak on early load_module() failures
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
2016-12-08 18:47 ` [RFC 01/10] kmod: add test driver to stress test the module loader Luis R. Rodriguez
@ 2016-12-08 19:48 ` Luis R. Rodriguez
2016-12-08 20:30 ` Kees Cook
` (2 more replies)
2016-12-08 19:48 ` [RFC 03/10] kmod: add dynamic max concurrent thread count Luis R. Rodriguez
` (8 subsequent siblings)
10 siblings, 3 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:48 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
While looking for early possible module loading failures I was
able to reproduce a memory leak possible with kmemleak. There
are a few rare ways to trigger a failure:
o we've run into a failure while processing kernel parameters
(parse_args() returns an error)
o mod_sysfs_setup() fails
o we're a live patch module and copy_module_elf() fails
Chances of running into this issue is really low.
kmemleak splat:
unreferenced object 0xffff9f2c4ada1b00 (size 32):
comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
hex dump (first 32 bytes):
6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
[<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
[<ffffffff8c1bc581>] kstrdup+0x31/0x60
[<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
[<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
[<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
[<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
[<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
[<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
[<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
[<ffffffff8c0a2b79>] kthread+0xc9/0xe0
[<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
[<ffffffffffffffff>] 0xffffffffffffffff
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
kernel/module.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/kernel/module.c b/kernel/module.c
index f7482db0f843..e420ed67e533 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -3722,6 +3722,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
mod_sysfs_teardown(mod);
coming_cleanup:
mod->state = MODULE_STATE_GOING;
+ destroy_params(mod->kp, mod->num_kp);
blocking_notifier_call_chain(&module_notify_list,
MODULE_STATE_GOING, mod);
klp_module_going(mod);
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 02/10] module: fix memory leak on early load_module() failures
2016-12-08 19:48 ` [RFC 02/10] module: fix memory leak on early load_module() failures Luis R. Rodriguez
@ 2016-12-08 20:30 ` Kees Cook
2016-12-08 21:10 ` Luis R. Rodriguez
2016-12-09 17:06 ` Miroslav Benes
2016-12-15 18:46 ` Aaron Tomlin
2 siblings, 1 reply; 65+ messages in thread
From: Kees Cook @ 2016-12-08 20:30 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, Petr Mladek, hare, rwright,
Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck, rgoldwyn,
subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
>
> o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
> o mod_sysfs_setup() fails
> o we're a live patch module and copy_module_elf() fails
>
> Chances of running into this issue is really low.
>
> kmemleak splat:
>
> unreferenced object 0xffff9f2c4ada1b00 (size 32):
> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
> hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Acked-by: Kees Cook <keescook@chromium.org>
Is this worth sending through -stable too?
-Kees
> ---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index f7482db0f843..e420ed67e533 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3722,6 +3722,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> mod_sysfs_teardown(mod);
> coming_cleanup:
> mod->state = MODULE_STATE_GOING;
> + destroy_params(mod->kp, mod->num_kp);
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_GOING, mod);
> klp_module_going(mod);
> --
> 2.10.1
>
--
Kees Cook
Nexus Security
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 02/10] module: fix memory leak on early load_module() failures
2016-12-08 20:30 ` Kees Cook
@ 2016-12-08 21:10 ` Luis R. Rodriguez
2016-12-08 21:17 ` Kees Cook
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 21:10 UTC (permalink / raw)
To: Kees Cook
Cc: shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, Petr Mladek, hare, rwright,
Jeff Mahoney, DSterba, Filipe Manana, NeilBrown, Guenter Roeck,
rgoldwyn, subashab, Heinrich Schuchardt, Aaron Tomlin,
Miroslav Benes, Paul E. McKenney, Dan Williams, Josh Poimboeuf,
David S. Miller, Ingo Molnar, Andrew Morton, Linus Torvalds,
linux-kselftest, linux-doc, LKML
On Thu, Dec 8, 2016 at 2:30 PM, Kees Cook <keescook@chromium.org> wrote:
> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> While looking for early possible module loading failures I was
>> able to reproduce a memory leak possible with kmemleak. There
>> are a few rare ways to trigger a failure:
>>
>> o we've run into a failure while processing kernel parameters
>> (parse_args() returns an error)
>> o mod_sysfs_setup() fails
>> o we're a live patch module and copy_module_elf() fails
>>
>> Chances of running into this issue is really low.
>>
>> kmemleak splat:
>>
>> unreferenced object 0xffff9f2c4ada1b00 (size 32):
>> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
>> hex dump (first 32 bytes):
>> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>> backtrace:
>> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
>> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
>> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
>> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
>> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
>> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
>> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
>> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
>> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
>> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
>> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
>> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
>> [<ffffffffffffffff>] 0xffffffffffffffff
>>
>> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>
> Acked-by: Kees Cook <keescook@chromium.org>
>
> Is this worth sending through -stable too?
Yes, for some reason git-send e-mail complained to me about
stable@kernel.org not being a valid local address, so I had to remove
it, but indeed. I'll try to fix this e-mail issue later and add your
tag.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 02/10] module: fix memory leak on early load_module() failures
2016-12-08 21:10 ` Luis R. Rodriguez
@ 2016-12-08 21:17 ` Kees Cook
0 siblings, 0 replies; 65+ messages in thread
From: Kees Cook @ 2016-12-08 21:17 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, Petr Mladek, hare, rwright,
Jeff Mahoney, DSterba, Filipe Manana, NeilBrown, Guenter Roeck,
rgoldwyn, subashab, Heinrich Schuchardt, Aaron Tomlin,
Miroslav Benes, Paul E. McKenney, Dan Williams, Josh Poimboeuf,
David S. Miller, Ingo Molnar, Andrew Morton, Linus Torvalds,
linux-kselftest, linux-doc, LKML
On Thu, Dec 8, 2016 at 1:10 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Thu, Dec 8, 2016 at 2:30 PM, Kees Cook <keescook@chromium.org> wrote:
>> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> While looking for early possible module loading failures I was
>>> able to reproduce a memory leak possible with kmemleak. There
>>> are a few rare ways to trigger a failure:
>>>
>>> o we've run into a failure while processing kernel parameters
>>> (parse_args() returns an error)
>>> o mod_sysfs_setup() fails
>>> o we're a live patch module and copy_module_elf() fails
>>>
>>> Chances of running into this issue is really low.
>>>
>>> kmemleak splat:
>>>
>>> unreferenced object 0xffff9f2c4ada1b00 (size 32):
>>> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
>>> hex dump (first 32 bytes):
>>> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
>>> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
>>> backtrace:
>>> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
>>> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
>>> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
>>> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
>>> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
>>> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
>>> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
>>> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
>>> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
>>> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
>>> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
>>> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
>>> [<ffffffffffffffff>] 0xffffffffffffffff
>>>
>>> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>>
>> Acked-by: Kees Cook <keescook@chromium.org>
>>
>> Is this worth sending through -stable too?
>
> Yes, for some reason git-send e-mail complained to me about
> stable@kernel.org not being a valid local address, so I had to remove
> it, but indeed. I'll try to fix this e-mail issue later and add your
> tag.
Yup, you want stable@vger.kernel.org. :)
-Kees
--
Kees Cook
Nexus Security
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 02/10] module: fix memory leak on early load_module() failures
2016-12-08 19:48 ` [RFC 02/10] module: fix memory leak on early load_module() failures Luis R. Rodriguez
2016-12-08 20:30 ` Kees Cook
@ 2016-12-09 17:06 ` Miroslav Benes
2016-12-16 8:51 ` Luis R. Rodriguez
2016-12-15 18:46 ` Aaron Tomlin
2 siblings, 1 reply; 65+ messages in thread
From: Miroslav Benes @ 2016-12-09 17:06 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, paulmck, dan.j.williams, jpoimboe, davem,
mingo, akpm, torvalds, linux-kselftest, linux-doc, linux-kernel
On Thu, 8 Dec 2016, Luis R. Rodriguez wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
>
> o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
> o mod_sysfs_setup() fails
> o we're a live patch module and copy_module_elf() fails
>
> Chances of running into this issue is really low.
>
> kmemleak splat:
>
> unreferenced object 0xffff9f2c4ada1b00 (size 32):
> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
> hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Reviewed-by: Miroslav Benes <mbenes@suse.cz>
What about
Fixes: e180a6b7759a ("param: fix charp parameters set via sysfs")
?
Miroslav
> ---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
>
> diff --git a/kernel/module.c b/kernel/module.c
> index f7482db0f843..e420ed67e533 100644
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -3722,6 +3722,7 @@ static int load_module(struct load_info *info, const char __user *uargs,
> mod_sysfs_teardown(mod);
> coming_cleanup:
> mod->state = MODULE_STATE_GOING;
> + destroy_params(mod->kp, mod->num_kp);
> blocking_notifier_call_chain(&module_notify_list,
> MODULE_STATE_GOING, mod);
> klp_module_going(mod);
> --
> 2.10.1
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 02/10] module: fix memory leak on early load_module() failures
2016-12-09 17:06 ` Miroslav Benes
@ 2016-12-16 8:51 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-16 8:51 UTC (permalink / raw)
To: Miroslav Benes
Cc: Luis R. Rodriguez, shuah, jeyu, rusty, ebiederm, dmitry.torokhov,
acme, corbet, martin.wilck, mmarek, pmladek, hare, rwright,
jeffm, DSterba, fdmanana, neilb, linux, rgoldwyn, subashab,
xypron.glpk, keescook, atomlin, paulmck, dan.j.williams,
jpoimboe, davem, mingo, akpm, torvalds, linux-kselftest,
linux-doc, linux-kernel
On Fri, Dec 09, 2016 at 06:06:44PM +0100, Miroslav Benes wrote:
>
> Reviewed-by: Miroslav Benes <mbenes@suse.cz>
>
> What about
>
> Fixes: e180a6b7759a ("param: fix charp parameters set via sysfs")
>
> ?
Sure thing, added thanks!
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 02/10] module: fix memory leak on early load_module() failures
2016-12-08 19:48 ` [RFC 02/10] module: fix memory leak on early load_module() failures Luis R. Rodriguez
2016-12-08 20:30 ` Kees Cook
2016-12-09 17:06 ` Miroslav Benes
@ 2016-12-15 18:46 ` Aaron Tomlin
2 siblings, 0 replies; 65+ messages in thread
From: Aaron Tomlin @ 2016-12-15 18:46 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, mbenes, paulmck, dan.j.williams, jpoimboe, davem,
mingo, akpm, torvalds, linux-kselftest, linux-doc, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 1735 bytes --]
On Thu 2016-12-08 11:48 -0800, Luis R. Rodriguez wrote:
> While looking for early possible module loading failures I was
> able to reproduce a memory leak possible with kmemleak. There
> are a few rare ways to trigger a failure:
>
> o we've run into a failure while processing kernel parameters
> (parse_args() returns an error)
> o mod_sysfs_setup() fails
> o we're a live patch module and copy_module_elf() fails
>
> Chances of running into this issue is really low.
>
> kmemleak splat:
>
> unreferenced object 0xffff9f2c4ada1b00 (size 32):
> comm "kworker/u16:4", pid 82, jiffies 4294897636 (age 681.816s)
> hex dump (first 32 bytes):
> 6d 65 6d 73 74 69 63 6b 30 00 00 00 00 00 00 00 memstick0.......
> 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
> backtrace:
> [<ffffffff8c6cfeba>] kmemleak_alloc+0x4a/0xa0
> [<ffffffff8c200046>] __kmalloc_track_caller+0x126/0x230
> [<ffffffff8c1bc581>] kstrdup+0x31/0x60
> [<ffffffff8c1bc5d4>] kstrdup_const+0x24/0x30
> [<ffffffff8c3c23aa>] kvasprintf_const+0x7a/0x90
> [<ffffffff8c3b5481>] kobject_set_name_vargs+0x21/0x90
> [<ffffffff8c4fbdd7>] dev_set_name+0x47/0x50
> [<ffffffffc07819e5>] memstick_check+0x95/0x33c [memstick]
> [<ffffffff8c09c893>] process_one_work+0x1f3/0x4b0
> [<ffffffff8c09cb98>] worker_thread+0x48/0x4e0
> [<ffffffff8c0a2b79>] kthread+0xc9/0xe0
> [<ffffffff8c6dab5f>] ret_from_fork+0x1f/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> kernel/module.c | 1 +
> 1 file changed, 1 insertion(+)
Reviewed-by: Aaron Tomlin <atomlin@redhat.com>
--
Aaron Tomlin
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 801 bytes --]
^ permalink raw reply [flat|nested] 65+ messages in thread
* [RFC 03/10] kmod: add dynamic max concurrent thread count
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
2016-12-08 18:47 ` [RFC 01/10] kmod: add test driver to stress test the module loader Luis R. Rodriguez
2016-12-08 19:48 ` [RFC 02/10] module: fix memory leak on early load_module() failures Luis R. Rodriguez
@ 2016-12-08 19:48 ` Luis R. Rodriguez
2016-12-08 20:28 ` Kees Cook
2016-12-14 15:38 ` Petr Mladek
2016-12-08 19:48 ` [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec Luis R. Rodriguez
` (7 subsequent siblings)
10 siblings, 2 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:48 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
We currently statically limit the number of modprobe threads which
we allow to run concurrently to 50. As per Keith Owens, this was a
completely arbitrary value, and it was set in the 2.3.38 days [0]
over 16 years ago in year 2000.
Although we haven't yet hit our lower limits, experimentation [1]
shows that when and if we hit this limit in the worst case, will be
fatal -- consider get_fs_type() failures upon mount on a system which
has many partitions, some of which might even be with the same
filesystem. Its best to be prudent and increase and set this
value to something more sensible which ensures we're far from hitting
the limit and also allows default build/user run time override.
The worst case is fatal given that once a module fails to load there
is a period of time during which subsequent request for the same module
will fail, so in the case of partitions its not just one request that
could fail, but whole series of partitions. This later issue of a
module request failure domino effect can be addressed later, but
increasing the limit to something more meaninful should at least give us
enough cushion to avoid this for a while.
Set this value up with a bit more meaninful modern limits:
Bump this up to 64 max for small systems (CONFIG_BASE_SMALL)
Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL)
Also allow the default max limit to be further fine tuned at compile
time and at initialization at run time at boot up using the kernel
parameter: max_modprobes.
[0] https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ab1c4ec7410f6ec64e1511d1a7d850fc99c09b44
[1] https://github.com/mcgrof/test_request_module
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
Documentation/admin-guide/kernel-parameters.txt | 7 ++++
include/linux/kmod.h | 3 +-
init/Kconfig | 23 +++++++++++++
init/main.c | 1 +
kernel/kmod.c | 43 ++++++++++++++++---------
5 files changed, 61 insertions(+), 16 deletions(-)
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index be2d6d0a03a4..92bcccc65ea4 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1700,6 +1700,13 @@
keepinitrd [HW,ARM]
+ kmod.max_modprobes [KNL]
+ This lets you set the max allowed of concurrent
+ modprobes threads possible on a system overriding the
+ default heuristic of:
+
+ min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT)
+
kernelcore= [KNL,X86,IA-64,PPC]
Format: nn[KMGTPE] | "mirror"
This parameter
diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index fcfd2bf14d3f..15783cd7f056 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -38,13 +38,14 @@ int __request_module(bool wait, const char *name, ...);
#define request_module_nowait(mod...) __request_module(false, mod)
#define try_then_request_module(x, mod...) \
((x) ?: (__request_module(true, mod), (x)))
+void init_kmod_umh(void);
#else
static inline int request_module(const char *name, ...) { return -ENOSYS; }
static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
+static inline void init_kmod_umh(void) { }
#define try_then_request_module(x, mod...) (x)
#endif
-
struct cred;
struct file;
diff --git a/init/Kconfig b/init/Kconfig
index 271692a352f1..da2c25746937 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
If unsure, or if you need to build out-of-tree modules, say N.
+config MAX_KMOD_CONCURRENT
+ int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
+ range 0 14
+ default 6 if !BASE_SMALL
+ default 7 if BASE_SMALL
+ help
+ The kernel restricts the number of possible concurrent calls to
+ request_module() to help avoid a recursive loop possible with
+ modules. The default maximum number of concurrent threads allowed
+ to run request_module() will be:
+
+ max_modprobes = min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT);
+
+ The value set in CONFIG_MAX_KMOD_CONCURRENT represents then the power
+ of 2 value used at boot time for the above computation. You can
+ override the default built value using the kernel parameter:
+
+ kmod.max_modprobes=4096
+
+ We set this to default to 64 (2^6) concurrent modprobe threads for
+ small systems, for larger systems this defaults to 128 (2^7)
+ concurrent modprobe threads.
+
endif # MODULES
config MODULES_TREE_LOOKUP
diff --git a/init/main.c b/init/main.c
index 8161208d4ece..1fa441aa32c6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -638,6 +638,7 @@ asmlinkage __visible void __init start_kernel(void)
thread_stack_cache_init();
cred_init();
fork_init();
+ init_kmod_umh();
proc_caches_init();
buffer_init();
key_init();
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 0277d1216f80..cb6f7ca7b8a5 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -44,6 +44,9 @@
#include <trace/events/module.h>
extern int max_threads;
+unsigned int max_modprobes;
+module_param(max_modprobes, uint, 0644);
+MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
#define CAP_BSET (void *)1
#define CAP_PI (void *)2
@@ -125,10 +128,8 @@ int __request_module(bool wait, const char *fmt, ...)
{
va_list args;
char module_name[MODULE_NAME_LEN];
- unsigned int max_modprobes;
int ret;
static atomic_t kmod_concurrent = ATOMIC_INIT(0);
-#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
static int kmod_loop_msg;
/*
@@ -152,19 +153,6 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret)
return ret;
- /* If modprobe needs a service that is in a module, we get a recursive
- * loop. Limit the number of running kmod threads to max_threads/2 or
- * MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
- * would be to run the parents of this process, counting how many times
- * kmod was invoked. That would mean accessing the internals of the
- * process tables to get the command line, proc_pid_cmdline is static
- * and it is not worth changing the proc code just to handle this case.
- * KAO.
- *
- * "trace the ppid" is simple, but will fail if someone's
- * parent exits. I think this is as good as it gets. --RR
- */
- max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
atomic_inc(&kmod_concurrent);
if (atomic_read(&kmod_concurrent) > max_modprobes) {
/* We may be blaming an innocent here, but unlikely */
@@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
return ret;
}
EXPORT_SYMBOL(__request_module);
+
+/*
+ * If modprobe needs a service that is in a module, we get a recursive
+ * loop. Limit the number of running kmod threads to max_threads/2 or
+ * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
+ * would be to run the parents of this process, counting how many times
+ * kmod was invoked. That would mean accessing the internals of the
+ * process tables to get the command line, proc_pid_cmdline is static
+ * and it is not worth changing the proc code just to handle this case.
+ *
+ * "trace the ppid" is simple, but will fail if someone's
+ * parent exits. I think this is as good as it gets.
+ *
+ * You can override with with a kernel parameter, for instance to allow
+ * 4096 concurrent modprobe instances:
+ *
+ * kmod.max_modprobes=4096
+ */
+void __init init_kmod_umh(void)
+{
+ if (!max_modprobes)
+ max_modprobes = min(max_threads/2,
+ 2 << CONFIG_MAX_KMOD_CONCURRENT);
+}
+
#endif /* CONFIG_MODULES */
static void call_usermodehelper_freeinfo(struct subprocess_info *info)
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 03/10] kmod: add dynamic max concurrent thread count
2016-12-08 19:48 ` [RFC 03/10] kmod: add dynamic max concurrent thread count Luis R. Rodriguez
@ 2016-12-08 20:28 ` Kees Cook
2016-12-08 21:00 ` Luis R. Rodriguez
2016-12-14 15:38 ` Petr Mladek
1 sibling, 1 reply; 65+ messages in thread
From: Kees Cook @ 2016-12-08 20:28 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, Petr Mladek, hare, rwright,
Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck, rgoldwyn,
subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> We currently statically limit the number of modprobe threads which
> we allow to run concurrently to 50. As per Keith Owens, this was a
> completely arbitrary value, and it was set in the 2.3.38 days [0]
> over 16 years ago in year 2000.
>
> Although we haven't yet hit our lower limits, experimentation [1]
> shows that when and if we hit this limit in the worst case, will be
> fatal -- consider get_fs_type() failures upon mount on a system which
> has many partitions, some of which might even be with the same
> filesystem. Its best to be prudent and increase and set this
> value to something more sensible which ensures we're far from hitting
> the limit and also allows default build/user run time override.
>
> The worst case is fatal given that once a module fails to load there
> is a period of time during which subsequent request for the same module
> will fail, so in the case of partitions its not just one request that
> could fail, but whole series of partitions. This later issue of a
> module request failure domino effect can be addressed later, but
> increasing the limit to something more meaninful should at least give us
> enough cushion to avoid this for a while.
>
> Set this value up with a bit more meaninful modern limits:
>
> Bump this up to 64 max for small systems (CONFIG_BASE_SMALL)
> Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL)
>
> Also allow the default max limit to be further fine tuned at compile
> time and at initialization at run time at boot up using the kernel
> parameter: max_modprobes.
>
> [0] https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/?id=ab1c4ec7410f6ec64e1511d1a7d850fc99c09b44
> [1] https://github.com/mcgrof/test_request_module
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> Documentation/admin-guide/kernel-parameters.txt | 7 ++++
> include/linux/kmod.h | 3 +-
> init/Kconfig | 23 +++++++++++++
> init/main.c | 1 +
> kernel/kmod.c | 43 ++++++++++++++++---------
> 5 files changed, 61 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
> index be2d6d0a03a4..92bcccc65ea4 100644
> --- a/Documentation/admin-guide/kernel-parameters.txt
> +++ b/Documentation/admin-guide/kernel-parameters.txt
> @@ -1700,6 +1700,13 @@
>
> keepinitrd [HW,ARM]
>
> + kmod.max_modprobes [KNL]
> + This lets you set the max allowed of concurrent
> + modprobes threads possible on a system overriding the
> + default heuristic of:
> +
> + min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT)
> +
> kernelcore= [KNL,X86,IA-64,PPC]
> Format: nn[KMGTPE] | "mirror"
> This parameter
> diff --git a/include/linux/kmod.h b/include/linux/kmod.h
> index fcfd2bf14d3f..15783cd7f056 100644
> --- a/include/linux/kmod.h
> +++ b/include/linux/kmod.h
> @@ -38,13 +38,14 @@ int __request_module(bool wait, const char *name, ...);
> #define request_module_nowait(mod...) __request_module(false, mod)
> #define try_then_request_module(x, mod...) \
> ((x) ?: (__request_module(true, mod), (x)))
> +void init_kmod_umh(void);
> #else
> static inline int request_module(const char *name, ...) { return -ENOSYS; }
> static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
> +static inline void init_kmod_umh(void) { }
> #define try_then_request_module(x, mod...) (x)
> #endif
>
> -
> struct cred;
> struct file;
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 271692a352f1..da2c25746937 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
>
> If unsure, or if you need to build out-of-tree modules, say N.
>
> +config MAX_KMOD_CONCURRENT
> + int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
> + range 0 14
> + default 6 if !BASE_SMALL
> + default 7 if BASE_SMALL
> + help
> + The kernel restricts the number of possible concurrent calls to
> + request_module() to help avoid a recursive loop possible with
> + modules. The default maximum number of concurrent threads allowed
> + to run request_module() will be:
> +
> + max_modprobes = min(max_threads/2, 2 << CONFIG_MAX_KMOD_CONCURRENT);
> +
> + The value set in CONFIG_MAX_KMOD_CONCURRENT represents then the power
> + of 2 value used at boot time for the above computation. You can
> + override the default built value using the kernel parameter:
> +
> + kmod.max_modprobes=4096
> +
> + We set this to default to 64 (2^6) concurrent modprobe threads for
> + small systems, for larger systems this defaults to 128 (2^7)
> + concurrent modprobe threads.
> +
> endif # MODULES
>
> config MODULES_TREE_LOOKUP
> diff --git a/init/main.c b/init/main.c
> index 8161208d4ece..1fa441aa32c6 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -638,6 +638,7 @@ asmlinkage __visible void __init start_kernel(void)
> thread_stack_cache_init();
> cred_init();
> fork_init();
> + init_kmod_umh();
> proc_caches_init();
> buffer_init();
> key_init();
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 0277d1216f80..cb6f7ca7b8a5 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -44,6 +44,9 @@
> #include <trace/events/module.h>
>
> extern int max_threads;
> +unsigned int max_modprobes;
> +module_param(max_modprobes, uint, 0644);
> +MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
>
> #define CAP_BSET (void *)1
> #define CAP_PI (void *)2
> @@ -125,10 +128,8 @@ int __request_module(bool wait, const char *fmt, ...)
> {
> va_list args;
> char module_name[MODULE_NAME_LEN];
> - unsigned int max_modprobes;
> int ret;
> static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> -#define MAX_KMOD_CONCURRENT 50 /* Completely arbitrary value - KAO */
> static int kmod_loop_msg;
>
> /*
> @@ -152,19 +153,6 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret)
> return ret;
>
> - /* If modprobe needs a service that is in a module, we get a recursive
> - * loop. Limit the number of running kmod threads to max_threads/2 or
> - * MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> - * would be to run the parents of this process, counting how many times
> - * kmod was invoked. That would mean accessing the internals of the
> - * process tables to get the command line, proc_pid_cmdline is static
> - * and it is not worth changing the proc code just to handle this case.
> - * KAO.
> - *
> - * "trace the ppid" is simple, but will fail if someone's
> - * parent exits. I think this is as good as it gets. --RR
> - */
> - max_modprobes = min(max_threads/2, MAX_KMOD_CONCURRENT);
> atomic_inc(&kmod_concurrent);
> if (atomic_read(&kmod_concurrent) > max_modprobes) {
> /* We may be blaming an innocent here, but unlikely */
> @@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
> +
> +/*
> + * If modprobe needs a service that is in a module, we get a recursive
> + * loop. Limit the number of running kmod threads to max_threads/2 or
> + * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> + * would be to run the parents of this process, counting how many times
> + * kmod was invoked. That would mean accessing the internals of the
> + * process tables to get the command line, proc_pid_cmdline is static
> + * and it is not worth changing the proc code just to handle this case.
> + *
> + * "trace the ppid" is simple, but will fail if someone's
> + * parent exits. I think this is as good as it gets.
> + *
> + * You can override with with a kernel parameter, for instance to allow
> + * 4096 concurrent modprobe instances:
> + *
> + * kmod.max_modprobes=4096
> + */
> +void __init init_kmod_umh(void)
What does umh mean?
> +{
> + if (!max_modprobes)
> + max_modprobes = min(max_threads/2,
> + 2 << CONFIG_MAX_KMOD_CONCURRENT);
> +}
> +
> #endif /* CONFIG_MODULES */
>
> static void call_usermodehelper_freeinfo(struct subprocess_info *info)
> --
> 2.10.1
>
--
Kees Cook
Nexus Security
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 03/10] kmod: add dynamic max concurrent thread count
2016-12-08 20:28 ` Kees Cook
@ 2016-12-08 21:00 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 21:00 UTC (permalink / raw)
To: Kees Cook
Cc: Luis R. Rodriguez, shuah, Jessica Yu, Rusty Russell,
Eric W. Biederman, Dmitry Torokhov, Arnaldo Carvalho de Melo,
Jonathan Corbet, martin.wilck, Michal Marek, Petr Mladek, hare,
rwright, Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck,
rgoldwyn, subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Thu, Dec 08, 2016 at 12:28:07PM -0800, Kees Cook wrote:
> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 0277d1216f80..cb6f7ca7b8a5 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -44,6 +44,9 @@
> > @@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
> > return ret;
> > }
> > EXPORT_SYMBOL(__request_module);
> > +
> > +/*
> > + * If modprobe needs a service that is in a module, we get a recursive
> > + * loop. Limit the number of running kmod threads to max_threads/2 or
> > + * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> > + * would be to run the parents of this process, counting how many times
> > + * kmod was invoked. That would mean accessing the internals of the
> > + * process tables to get the command line, proc_pid_cmdline is static
> > + * and it is not worth changing the proc code just to handle this case.
> > + *
> > + * "trace the ppid" is simple, but will fail if someone's
> > + * parent exits. I think this is as good as it gets.
> > + *
> > + * You can override with with a kernel parameter, for instance to allow
> > + * 4096 concurrent modprobe instances:
> > + *
> > + * kmod.max_modprobes=4096
> > + */
> > +void __init init_kmod_umh(void)
>
> What does umh mean?
umh is user mode helper. kmod.c actually implements the kernel's umh code.
A subsequent series I will want to move all that to umh.c and keep module
loading separate in kmod.c But that's for later as a cleanup.
BTW any chance I can have you trim replies to file name and hunk for changes
you reply to ? As an example I did that here :)
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 03/10] kmod: add dynamic max concurrent thread count
2016-12-08 19:48 ` [RFC 03/10] kmod: add dynamic max concurrent thread count Luis R. Rodriguez
2016-12-08 20:28 ` Kees Cook
@ 2016-12-14 15:38 ` Petr Mladek
2016-12-16 8:39 ` Luis R. Rodriguez
1 sibling, 1 reply; 65+ messages in thread
From: Petr Mladek @ 2016-12-14 15:38 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, hare, rwright, jeffm, DSterba, fdmanana,
neilb, linux, rgoldwyn, subashab, xypron.glpk, keescook, atomlin,
mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo, akpm,
torvalds, linux-kselftest, linux-doc, linux-kernel
On Thu 2016-12-08 11:48:14, Luis R. Rodriguez wrote:
> We currently statically limit the number of modprobe threads which
> we allow to run concurrently to 50. As per Keith Owens, this was a
> completely arbitrary value, and it was set in the 2.3.38 days [0]
> over 16 years ago in year 2000.
>
> Although we haven't yet hit our lower limits, experimentation [1]
> shows that when and if we hit this limit in the worst case, will be
> fatal -- consider get_fs_type() failures upon mount on a system which
> has many partitions, some of which might even be with the same
> filesystem. Its best to be prudent and increase and set this
> value to something more sensible which ensures we're far from hitting
> the limit and also allows default build/user run time override.
>
> The worst case is fatal given that once a module fails to load there
> is a period of time during which subsequent request for the same module
> will fail, so in the case of partitions its not just one request that
> could fail, but whole series of partitions. This later issue of a
> module request failure domino effect can be addressed later, but
> increasing the limit to something more meaninful should at least give us
> enough cushion to avoid this for a while.
>
> Set this value up with a bit more meaninful modern limits:
>
> Bump this up to 64 max for small systems (CONFIG_BASE_SMALL)
> Bump this up to 128 max for larger systems (!CONFIG_BASE_SMALL)
>
> diff --git a/init/Kconfig b/init/Kconfig
> index 271692a352f1..da2c25746937 100644
> --- a/init/Kconfig
> +++ b/init/Kconfig
> @@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
>
> If unsure, or if you need to build out-of-tree modules, say N.
>
> +config MAX_KMOD_CONCURRENT
> + int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
> + range 0 14
Would not too small range break loading module dependencies?
I am not sure how it is implemented but it might require having
some more module loads in progress.
I would give 6 as minimum. Nobody has troubles with the current limit.
> + default 6 if !BASE_SMALL
> + default 7 if BASE_SMALL
Aren't the conditions inversed?
> diff --git a/init/main.c b/init/main.c
> index 8161208d4ece..1fa441aa32c6 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -638,6 +638,7 @@ asmlinkage __visible void __init start_kernel(void)
> thread_stack_cache_init();
> cred_init();
> fork_init();
> + init_kmod_umh();
> proc_caches_init();
> buffer_init();
> key_init();
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 0277d1216f80..cb6f7ca7b8a5 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -186,6 +174,31 @@ int __request_module(bool wait, const char *fmt, ...)
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
> +
> +/*
> + * If modprobe needs a service that is in a module, we get a recursive
> + * loop. Limit the number of running kmod threads to max_threads/2 or
> + * CONFIG_MAX_KMOD_CONCURRENT, whichever is the smaller. A cleaner method
> + * would be to run the parents of this process, counting how many times
> + * kmod was invoked. That would mean accessing the internals of the
> + * process tables to get the command line, proc_pid_cmdline is static
> + * and it is not worth changing the proc code just to handle this case.
> + *
> + * "trace the ppid" is simple, but will fail if someone's
> + * parent exits. I think this is as good as it gets.
> + *
> + * You can override with with a kernel parameter, for instance to allow
> + * 4096 concurrent modprobe instances:
> + *
> + * kmod.max_modprobes=4096
> + */
> +void __init init_kmod_umh(void)
> +{
> + if (!max_modprobes)
> + max_modprobes = min(max_threads/2,
> + 2 << CONFIG_MAX_KMOD_CONCURRENT);
This should be
1 << CONFIG_MAX_KMOD_CONCURRENT);
1 << 1 = 2;
Note that this calculation is mentioned also some comments and
documentation.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 03/10] kmod: add dynamic max concurrent thread count
2016-12-14 15:38 ` Petr Mladek
@ 2016-12-16 8:39 ` Luis R. Rodriguez
2017-01-10 19:24 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-16 8:39 UTC (permalink / raw)
To: Petr Mladek
Cc: Luis R. Rodriguez, shuah, jeyu, rusty, ebiederm, dmitry.torokhov,
acme, corbet, martin.wilck, mmarek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Wed, Dec 14, 2016 at 04:38:27PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:48:14, Luis R. Rodriguez wrote:
> > diff --git a/init/Kconfig b/init/Kconfig
> > index 271692a352f1..da2c25746937 100644
> > --- a/init/Kconfig
> > +++ b/init/Kconfig
> > @@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
> >
> > If unsure, or if you need to build out-of-tree modules, say N.
> >
> > +config MAX_KMOD_CONCURRENT
> > + int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
> > + range 0 14
>
> Would not too small range break loading module dependencies?
No, dependencies are resolved by depmod, so userspace looks at the list and
just finit_module() the depenencies, skipping kmod. So the limit is
really only for kernel acting like a boss.
> I am not sure how it is implemented but it might require having
> some more module loads in progress.
Dependencies should be OK, a more serious concern with dependencies is
the aggregate memory it takes to load all dep modules for one required
module since finit_module() ends up allocating the struct module to copy
over data from userspace.
> I would give 6 as minimum. Nobody has troubles with the current limit.
Fair enough! Although disabling modprobe calls all together seemed like
a fun test, that should we allow that via the module parameter at least?
> > + default 6 if !BASE_SMALL
> > + default 7 if BASE_SMALL
>
> Aren't the conditions inversed?
Whoops yes, sorry.
> > +void __init init_kmod_umh(void)
> > +{
> > + if (!max_modprobes)
> > + max_modprobes = min(max_threads/2,
> > + 2 << CONFIG_MAX_KMOD_CONCURRENT);
>
> This should be
>
> 1 << CONFIG_MAX_KMOD_CONCURRENT);
>
> 1 << 1 = 2;
>
> Note that this calculation is mentioned also some comments and
> documentation.
Heh sorry, yes fixed! Good thing I had still tested all along with the
value I intended though :P
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 03/10] kmod: add dynamic max concurrent thread count
2016-12-16 8:39 ` Luis R. Rodriguez
@ 2017-01-10 19:24 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-10 19:24 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Petr Mladek, shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme,
corbet, martin.wilck, mmarek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Fri, Dec 16, 2016 at 09:39:56AM +0100, Luis R. Rodriguez wrote:
> On Wed, Dec 14, 2016 at 04:38:27PM +0100, Petr Mladek wrote:
> > On Thu 2016-12-08 11:48:14, Luis R. Rodriguez wrote:
> > > diff --git a/init/Kconfig b/init/Kconfig
> > > index 271692a352f1..da2c25746937 100644
> > > --- a/init/Kconfig
> > > +++ b/init/Kconfig
> > > @@ -2111,6 +2111,29 @@ config TRIM_UNUSED_KSYMS
> > >
> > > If unsure, or if you need to build out-of-tree modules, say N.
> > >
> > > +config MAX_KMOD_CONCURRENT
> > > + int "Max allowed concurrent request_module() calls (6=>64, 10=>1024)"
> > > + range 0 14
> >
> > Would not too small range break loading module dependencies?
>
> No, dependencies are resolved by depmod, so userspace looks at the list and
> just finit_module() the depenencies, skipping kmod. So the limit is
> really only for kernel acting like a boss.
>
> > I am not sure how it is implemented but it might require having
> > some more module loads in progress.
>
> Dependencies should be OK, a more serious concern with dependencies is
> the aggregate memory it takes to load all dep modules for one required
> module since finit_module() ends up allocating the struct module to copy
> over data from userspace.
A simple change can enable us to bail out on finit_module() if a module
is already present by looking at the passed userspace data. I have this
change now but as discussed, whether or not its desirable should be a
matter of whether or not in the typical case (bootup time) things improve.
>From some initial tests it would seem this doesn't help much but it does
help with trying to load the same module over and over again, the explanation
I can think of for this is by introducing a lookup on finit_module() we also
delay module loading by the lookup time, in the general case we would not need
this, so this is likely not worth merging. Will run some final tests to
confirm.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (2 preceding siblings ...)
2016-12-08 19:48 ` [RFC 03/10] kmod: add dynamic max concurrent thread count Luis R. Rodriguez
@ 2016-12-08 19:48 ` Luis R. Rodriguez
2016-12-08 20:29 ` Kees Cook
2016-12-22 5:07 ` Jessica Yu
2016-12-08 19:48 ` [RFC 05/10] kmod: return -EBUSY if modprobe limit is reached Luis R. Rodriguez
` (6 subsequent siblings)
10 siblings, 2 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:48 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
kmod_concurrent is used as an atomic counter for enabling
the allowed limit of modprobe calls, provide wrappers for it
to enable this to be expanded on more easily. This will be done
later.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
kernel/kmod.c | 27 +++++++++++++++++++++------
1 file changed, 21 insertions(+), 6 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index cb6f7ca7b8a5..049d7eabda38 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -44,6 +44,9 @@
#include <trace/events/module.h>
extern int max_threads;
+
+static atomic_t kmod_concurrent = ATOMIC_INIT(0);
+
unsigned int max_modprobes;
module_param(max_modprobes, uint, 0644);
MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
@@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
return -ENOMEM;
}
+static int kmod_umh_threads_get(void)
+{
+ atomic_inc(&kmod_concurrent);
+ if (atomic_read(&kmod_concurrent) < max_modprobes)
+ return 0;
+ atomic_dec(&kmod_concurrent);
+ return -ENOMEM;
+}
+
+static void kmod_umh_threads_put(void)
+{
+ atomic_dec(&kmod_concurrent);
+}
+
/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
@@ -129,7 +146,6 @@ int __request_module(bool wait, const char *fmt, ...)
va_list args;
char module_name[MODULE_NAME_LEN];
int ret;
- static atomic_t kmod_concurrent = ATOMIC_INIT(0);
static int kmod_loop_msg;
/*
@@ -153,8 +169,8 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret)
return ret;
- atomic_inc(&kmod_concurrent);
- if (atomic_read(&kmod_concurrent) > max_modprobes) {
+ ret = kmod_umh_threads_get();
+ if (ret) {
/* We may be blaming an innocent here, but unlikely */
if (kmod_loop_msg < 5) {
printk(KERN_ERR
@@ -162,15 +178,14 @@ int __request_module(bool wait, const char *fmt, ...)
module_name);
kmod_loop_msg++;
}
- atomic_dec(&kmod_concurrent);
- return -ENOMEM;
+ return ret;
}
trace_module_request(module_name, wait, _RET_IP_);
ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
- atomic_dec(&kmod_concurrent);
+ kmod_umh_threads_put();
return ret;
}
EXPORT_SYMBOL(__request_module);
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-08 19:48 ` [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec Luis R. Rodriguez
@ 2016-12-08 20:29 ` Kees Cook
2016-12-08 21:08 ` Luis R. Rodriguez
2016-12-22 5:07 ` Jessica Yu
1 sibling, 1 reply; 65+ messages in thread
From: Kees Cook @ 2016-12-08 20:29 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, Petr Mladek, hare, rwright,
Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck, rgoldwyn,
subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> kmod_concurrent is used as an atomic counter for enabling
> the allowed limit of modprobe calls, provide wrappers for it
> to enable this to be expanded on more easily. This will be done
> later.
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> kernel/kmod.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index cb6f7ca7b8a5..049d7eabda38 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -44,6 +44,9 @@
> #include <trace/events/module.h>
>
> extern int max_threads;
> +
> +static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> +
> unsigned int max_modprobes;
> module_param(max_modprobes, uint, 0644);
> MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
> @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> return -ENOMEM;
> }
>
> +static int kmod_umh_threads_get(void)
> +{
> + atomic_inc(&kmod_concurrent);
> + if (atomic_read(&kmod_concurrent) < max_modprobes)
> + return 0;
> + atomic_dec(&kmod_concurrent);
> + return -ENOMEM;
> +}
> +
> +static void kmod_umh_threads_put(void)
> +{
> + atomic_dec(&kmod_concurrent);
> +}
Can you use a kref here instead? We're trying to kill raw use of
atomic_t for reference counting...
> +
> /**
> * __request_module - try to load a kernel module
> * @wait: wait (or not) for the operation to complete
> @@ -129,7 +146,6 @@ int __request_module(bool wait, const char *fmt, ...)
> va_list args;
> char module_name[MODULE_NAME_LEN];
> int ret;
> - static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> static int kmod_loop_msg;
>
> /*
> @@ -153,8 +169,8 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret)
> return ret;
>
> - atomic_inc(&kmod_concurrent);
> - if (atomic_read(&kmod_concurrent) > max_modprobes) {
> + ret = kmod_umh_threads_get();
> + if (ret) {
> /* We may be blaming an innocent here, but unlikely */
> if (kmod_loop_msg < 5) {
> printk(KERN_ERR
> @@ -162,15 +178,14 @@ int __request_module(bool wait, const char *fmt, ...)
> module_name);
> kmod_loop_msg++;
> }
> - atomic_dec(&kmod_concurrent);
> - return -ENOMEM;
> + return ret;
> }
>
> trace_module_request(module_name, wait, _RET_IP_);
>
> ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
>
> - atomic_dec(&kmod_concurrent);
> + kmod_umh_threads_put();
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
> --
> 2.10.1
>
--
Kees Cook
Nexus Security
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-08 20:29 ` Kees Cook
@ 2016-12-08 21:08 ` Luis R. Rodriguez
2016-12-15 12:46 ` Petr Mladek
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 21:08 UTC (permalink / raw)
To: Kees Cook
Cc: Luis R. Rodriguez, shuah, Jessica Yu, Rusty Russell,
Eric W. Biederman, Dmitry Torokhov, Arnaldo Carvalho de Melo,
Jonathan Corbet, martin.wilck, Michal Marek, Petr Mladek, hare,
rwright, Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck,
rgoldwyn, subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > kmod_concurrent is used as an atomic counter for enabling
> > the allowed limit of modprobe calls, provide wrappers for it
> > to enable this to be expanded on more easily. This will be done
> > later.
> >
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> > kernel/kmod.c | 27 +++++++++++++++++++++------
> > 1 file changed, 21 insertions(+), 6 deletions(-)
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index cb6f7ca7b8a5..049d7eabda38 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -44,6 +44,9 @@
> > #include <trace/events/module.h>
> >
> > extern int max_threads;
> > +
> > +static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> > +
> > unsigned int max_modprobes;
> > module_param(max_modprobes, uint, 0644);
> > MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
> > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > return -ENOMEM;
> > }
> >
> > +static int kmod_umh_threads_get(void)
> > +{
> > + atomic_inc(&kmod_concurrent);
> > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > + return 0;
> > + atomic_dec(&kmod_concurrent);
> > + return -ENOMEM;
> > +}
> > +
> > +static void kmod_umh_threads_put(void)
> > +{
> > + atomic_dec(&kmod_concurrent);
> > +}
>
> Can you use a kref here instead? We're trying to kill raw use of
> atomic_t for reference counting...
That's a much broader functional change than I was looking for, but I am up for
it. Can you describe the benefit of using kref you expect or why this is an
ongoing crusade? Since its a larger functional change how about doing this
change later, and we can test impact with the tress test driver. In theory if
there are benefits can't we add a test case to prove the gains?
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-08 21:08 ` Luis R. Rodriguez
@ 2016-12-15 12:46 ` Petr Mladek
2016-12-16 8:05 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Petr Mladek @ 2016-12-15 12:46 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Kees Cook, shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, hare, rwright, Jeff Mahoney, DSterba,
fdmanana, neilb, Guenter Roeck, rgoldwyn, subashab,
Heinrich Schuchardt, Aaron Tomlin, mbenes, Paul E. McKenney,
Dan Williams, Josh Poimboeuf, David S. Miller, Ingo Molnar,
Andrew Morton, Linus Torvalds, linux-kselftest, linux-doc, LKML
On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > kmod_concurrent is used as an atomic counter for enabling
> > > the allowed limit of modprobe calls, provide wrappers for it
> > > to enable this to be expanded on more easily. This will be done
> > > later.
> > >
> > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > > ---
> > > kernel/kmod.c | 27 +++++++++++++++++++++------
> > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > index cb6f7ca7b8a5..049d7eabda38 100644
> > > --- a/kernel/kmod.c
> > > +++ b/kernel/kmod.c
> > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > > return -ENOMEM;
> > > }
> > >
> > > +static int kmod_umh_threads_get(void)
> > > +{
> > > + atomic_inc(&kmod_concurrent);
This approach might actually cause false failures. If we
are on the limit and more processes do this increment
in parallel, it makes the number bigger that it should be.
> > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > + return 0;
> > > + atomic_dec(&kmod_concurrent);
> > > + return -ENOMEM;
> > > +}
> > > +
> > > +static void kmod_umh_threads_put(void)
> > > +{
> > > + atomic_dec(&kmod_concurrent);
> > > +}
> >
> > Can you use a kref here instead? We're trying to kill raw use of
> > atomic_t for reference counting...
>
> That's a much broader functional change than I was looking for, but I am up for
> it. Can you describe the benefit of using kref you expect or why this is an
> ongoing crusade? Since its a larger functional change how about doing this
> change later, and we can test impact with the tress test driver. In theory if
> there are benefits can't we add a test case to prove the gains?
Kees probably refers to the kref improvements that Peter Zijlstra
is working on, see
https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
The advantage is that the new refcount API handles over and
underflow.
Another advantage is that it increments/decrements the value
only when it is safe. It uses cmpxchg to make sure that
the checks are valid.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-15 12:46 ` Petr Mladek
@ 2016-12-16 8:05 ` Luis R. Rodriguez
2016-12-22 4:48 ` Jessica Yu
2017-01-10 18:57 ` [RFC 04/10] " Luis R. Rodriguez
0 siblings, 2 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-16 8:05 UTC (permalink / raw)
To: Petr Mladek
Cc: Luis R. Rodriguez, Kees Cook, shuah, Jessica Yu, Rusty Russell,
Eric W. Biederman, Dmitry Torokhov, Arnaldo Carvalho de Melo,
Jonathan Corbet, martin.wilck, Michal Marek, hare, rwright,
Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck, rgoldwyn,
subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > kmod_concurrent is used as an atomic counter for enabling
> > > > the allowed limit of modprobe calls, provide wrappers for it
> > > > to enable this to be expanded on more easily. This will be done
> > > > later.
> > > >
> > > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > > > ---
> > > > kernel/kmod.c | 27 +++++++++++++++++++++------
> > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > >
> > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > > index cb6f7ca7b8a5..049d7eabda38 100644
> > > > --- a/kernel/kmod.c
> > > > +++ b/kernel/kmod.c
> > > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > > > return -ENOMEM;
> > > > }
> > > >
> > > > +static int kmod_umh_threads_get(void)
> > > > +{
> > > > + atomic_inc(&kmod_concurrent);
>
> This approach might actually cause false failures. If we
> are on the limit and more processes do this increment
> in parallel, it makes the number bigger that it should be.
This approach is *exactly* what the existing code does :P
I just provided wrappers. I agree with the old approach though,
reason is it acts as a lock in for the bump. What seems rather
stupid though is to just reject with an error on limit without first
taking a breather. I've now added a little clutch so that we first
take some fresh air when close to the limit, this reduces the chances
of going fatal.
With a clutch in place we can still go over the limit, its just we'd
have a few threads waiting until previous calls clear out. If there
is enough calls waiting eventually we'll fail.
Note though that __request_module() can wait, but here is an option
to not wait so such a clutch can only wait if we're allowed to.
> > > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > > + return 0;
> > > > + atomic_dec(&kmod_concurrent);
> > > > + return -ENOMEM;
> > > > +}
> > > > +
> > > > +static void kmod_umh_threads_put(void)
> > > > +{
> > > > + atomic_dec(&kmod_concurrent);
> > > > +}
> > >
> > > Can you use a kref here instead? We're trying to kill raw use of
> > > atomic_t for reference counting...
> >
> > That's a much broader functional change than I was looking for, but I am up for
> > it. Can you describe the benefit of using kref you expect or why this is an
> > ongoing crusade? Since its a larger functional change how about doing this
> > change later, and we can test impact with the tress test driver. In theory if
> > there are benefits can't we add a test case to prove the gains?
>
> Kees probably refers to the kref improvements that Peter Zijlstra
> is working on, see
> https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
>
> The advantage is that the new refcount API handles over and
> underflow.
>
> Another advantage is that it increments/decrements the value
> only when it is safe. It uses cmpxchg to make sure that
> the checks are valid.
Great thanks, will look into that.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-16 8:05 ` Luis R. Rodriguez
@ 2016-12-22 4:48 ` Jessica Yu
2017-01-06 20:54 ` Luis R. Rodriguez
2017-01-10 18:57 ` [RFC 04/10] " Luis R. Rodriguez
1 sibling, 1 reply; 65+ messages in thread
From: Jessica Yu @ 2016-12-22 4:48 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Petr Mladek, Kees Cook, shuah, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, hare, rwright, Jeff Mahoney, DSterba,
fdmanana, neilb, Guenter Roeck, rgoldwyn, subashab,
Heinrich Schuchardt, Aaron Tomlin, mbenes, Paul E. McKenney,
Dan Williams, Josh Poimboeuf, David S. Miller, Ingo Molnar,
Andrew Morton, Linus Torvalds, linux-kselftest, linux-doc, LKML
+++ Luis R. Rodriguez [16/12/16 09:05 +0100]:
>On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
>> On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
>> > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
>> > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > > > kmod_concurrent is used as an atomic counter for enabling
>> > > > the allowed limit of modprobe calls, provide wrappers for it
>> > > > to enable this to be expanded on more easily. This will be done
>> > > > later.
>> > > >
>> > > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>> > > > ---
>> > > > kernel/kmod.c | 27 +++++++++++++++++++++------
>> > > > 1 file changed, 21 insertions(+), 6 deletions(-)
>> > > >
>> > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
>> > > > index cb6f7ca7b8a5..049d7eabda38 100644
>> > > > --- a/kernel/kmod.c
>> > > > +++ b/kernel/kmod.c
>> > > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
>> > > > return -ENOMEM;
>> > > > }
>> > > >
>> > > > +static int kmod_umh_threads_get(void)
>> > > > +{
>> > > > + atomic_inc(&kmod_concurrent);
>>
>> This approach might actually cause false failures. If we
>> are on the limit and more processes do this increment
>> in parallel, it makes the number bigger that it should be.
>
>This approach is *exactly* what the existing code does :P
>I just provided wrappers. I agree with the old approach though,
>reason is it acts as a lock in for the bump.
I think what Petr meant was that we could run into false failures when multiple
atomic increments happen between the first increment and the subsequent
atomic_read.
Say max_modprobes is 64 -
atomic_inc(&kmod_concurrent); // thread 1: kmod_concurrent is 63
atomic_inc(&kmod_concurrent); // thread 2: kmod_concurrent is 64
atomic_inc(&kmod_concurrent); // thread 3: kmod_concurrent is 65
if (atomic_read(&kmod_concurrent) < max_modprobes) // if all threads read 65 here, then all will error out
return 0; // when the first two should have succeeded (false failures)
atomic_dec(&kmod_concurrent);
return -ENOMEM;
But yeah, I think this issue was already in the existing kmod code..
Jessica
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-22 4:48 ` Jessica Yu
@ 2017-01-06 20:54 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-06 20:54 UTC (permalink / raw)
To: Jessica Yu
Cc: Luis R. Rodriguez, Petr Mladek, Kees Cook, shuah, Rusty Russell,
Eric W. Biederman, Dmitry Torokhov, Arnaldo Carvalho de Melo,
Jonathan Corbet, martin.wilck, Michal Marek, hare, rwright,
Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck, rgoldwyn,
subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Wed, Dec 21, 2016 at 08:48:06PM -0800, Jessica Yu wrote:
> +++ Luis R. Rodriguez [16/12/16 09:05 +0100]:
> > On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> > > On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > > > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > > kmod_concurrent is used as an atomic counter for enabling
> > > > > > the allowed limit of modprobe calls, provide wrappers for it
> > > > > > to enable this to be expanded on more easily. This will be done
> > > > > > later.
> > > > > >
> > > > > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > > > > > ---
> > > > > > kernel/kmod.c | 27 +++++++++++++++++++++------
> > > > > > 1 file changed, 21 insertions(+), 6 deletions(-)
> > > > > >
> > > > > > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > > > > > index cb6f7ca7b8a5..049d7eabda38 100644
> > > > > > --- a/kernel/kmod.c
> > > > > > +++ b/kernel/kmod.c
> > > > > > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > > > > > return -ENOMEM;
> > > > > > }
> > > > > >
> > > > > > +static int kmod_umh_threads_get(void)
> > > > > > +{
> > > > > > + atomic_inc(&kmod_concurrent);
> > >
> > > This approach might actually cause false failures. If we
> > > are on the limit and more processes do this increment
> > > in parallel, it makes the number bigger that it should be.
> >
> > This approach is *exactly* what the existing code does :P
> > I just provided wrappers. I agree with the old approach though,
> > reason is it acts as a lock in for the bump.
>
> I think what Petr meant was that we could run into false failures when multiple
> atomic increments happen between the first increment and the subsequent
> atomic_read.
>
> Say max_modprobes is 64 -
>
> atomic_inc(&kmod_concurrent); // thread 1: kmod_concurrent is 63
> atomic_inc(&kmod_concurrent); // thread 2: kmod_concurrent is 64
> atomic_inc(&kmod_concurrent); // thread 3: kmod_concurrent is 65
> if (atomic_read(&kmod_concurrent) < max_modprobes) // if all threads read 65 here, then all will error out
> return 0; // when the first two should have succeeded (false failures)
> atomic_dec(&kmod_concurrent);
> return -ENOMEM;
>
> But yeah, I think this issue was already in the existing kmod code..
Ah right, but the code was very simple and there is only one operation
in between which we'd race against given the old code just incremented
first nd immediately checked for the limit. The more code we have the
more chances for what you describe to happen.
I've added another change into my series, a clutch, its at the end of this
email. With this we change we check for the limit right away and put on
hold any items reaching the limit, while other requests passing the limit
will be bumped. We have then:
if (!kmod_concurrent_sane()) {
pr_warn_ratelimited("request_module: kmod_concurrent (%u) close to critical levels (max_modprobes: %u) for module %s\n, backing off for a bit",
atomic_read(&kmod_concurrent), max_modprobes, module_name);
wait_event_interruptible(kmod_wq, kmod_concurrent_sane());
}
ret = kmod_umh_threads_get();
if (ret) {
pr_err_ratelimited("%s: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
__func__, module_name, max_modprobes);
return ret;
}
The same race you describe is possible -- but we now would at least use
a clutch immediately as we approach the limit. Maybe it makes sense to
post a new series after I fold the alias code and sanity check into a
debug kconfig option ?
Luis
commit 95c55552283cf99e2a48b84dc766d5fa547f046e
Author: Luis R. Rodriguez <mcgrof@kernel.org>
Date: Thu Dec 15 23:24:22 2016 -0600
kmod: add a clutch around 1/4 of modprobe thread limit
If we reach the limit of modprobe_limit threads running the next
request_module() call will fail. The original reason for adding
a kill was to do away with possible issues with in old circumstances
which would create a recursive series of request_module() calls.
We can do better than just be super aggressive and reject calls
once we've reached the limit by adding a clutch so that if we're
1/4th of the way close to the limit we make these new calls wait
until pending threads complete.
There is still a chance you can fail new incomming requests which
can bump kmod_concurrent beyond the limit, however the clutch helps
with a bit of breathing room to allow the system to process pending
requests before activating the upper last 1/4th of the limit requests.
This fixes test cases 0008 and 0009 of the selftest for kmod:
tools/testing/selftests/kmod/kmod.sh -t 0008
tools/testing/selftests/kmod/kmod.sh -t 0009
Both tests reveal the clutch in action:
Dec 15 16:12:14 piggy kernel: request_module: kmod_concurrent (96) close critical levels (max_modprobes: 128) for module test_module
...
Dec 15 16:12:23 piggy kernel: request_module: kmod_concurrent (96) close critical levels (max_modprobes: 128) for module test_module
...
The only difference is the clutch helps with avoiding making
request_module() requests fatal more often. With x86_64 qemu,
with 4 cores, 4 GiB of RAM it takes the following run time to
run both tests:
time kmod.sh -t 0008
real 0m22.247s
user 0m0.084s
sys 0m11.328s
time kmod.sh -t 0009
real 0m58.785s
user 0m0.492s
sys 0m10.852s
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
diff --git a/kernel/kmod.c b/kernel/kmod.c
index d6595d2de209..f8c880bbf658 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -58,6 +58,7 @@ static DECLARE_RWSEM(umhelper_sem);
#ifdef CONFIG_MODULES
static atomic_t kmod_concurrent = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
/*
modprobe_path is set via /proc/sys.
@@ -156,6 +157,16 @@ int get_kmod_umh_count(void)
return atomic_read(&kmod_concurrent);
}
+static bool kmod_concurrent_sane(void)
+{
+ unsigned int clutch;
+
+ clutch = get_kmod_umh_limit() - (get_kmod_umh_limit()/4);
+ if (get_kmod_umh_count() < clutch)
+ return true;
+ return false;
+}
+
/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
@@ -199,6 +210,12 @@ int __request_module(bool wait, const char *fmt, ...)
if (ret)
return ret;
+ if (!kmod_concurrent_sane()) {
+ pr_warn_ratelimited("request_module: kmod_concurrent (%u) close to critical levels (max_modprobes: %u) for module %s\n, backing off for a bit",
+ atomic_read(&kmod_concurrent), max_modprobes, module_name);
+ wait_event_interruptible(kmod_wq, kmod_concurrent_sane());
+ }
+
ret = kmod_umh_threads_get();
if (ret) {
pr_err_ratelimited("%s: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
@@ -211,6 +228,7 @@ int __request_module(bool wait, const char *fmt, ...)
ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
kmod_umh_threads_put();
+ wake_up_all(&kmod_wq);
return ret;
}
EXPORT_SYMBOL(__request_module);
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index f8ccc938e0fb..08d9bea4bade 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -52,28 +52,8 @@ ALL_TESTS="$ALL_TESTS 0004:1:1"
ALL_TESTS="$ALL_TESTS 0005:10:1"
ALL_TESTS="$ALL_TESTS 0006:10:1"
ALL_TESTS="$ALL_TESTS 0007:5:1"
-
-# Disabled tests:
-#
-# 0008 x 150 - multithreaded - push kmod_concurrent over max_modprobes for request_module()"
-# Current best-effort failure interpretation:
-# Enough module requests get loaded in place fast enough to reach over the
-# max_modprobes limit and trigger a failure -- before we're even able to
-# start processing pending requests.
-ALL_TESTS="$ALL_TESTS 0008:150:0"
-
-# 0009 x 150 - multithreaded - push kmod_concurrent over max_modprobes for get_fs_type()"
-# Current best-effort failure interpretation:
-#
-# get_fs_type() requests modules using aliases as such the optimization in
-# place today to look for already loaded modules will not take effect and
-# we end up requesting a new module to load, this bumps the kmod_concurrent,
-# and in certain circumstances can lead to pushing the kmod_concurrent over
-# the max_modprobe limit.
-#
-# This test fails much easier than test 0008 since the alias optimizations
-# are not in place.
-ALL_TESTS="$ALL_TESTS 0009:150:0"
+ALL_TESTS="$ALL_TESTS 0008:150:1"
+ALL_TESTS="$ALL_TESTS 0009:150:1"
test_modprobe()
{
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-16 8:05 ` Luis R. Rodriguez
2016-12-22 4:48 ` Jessica Yu
@ 2017-01-10 18:57 ` Luis R. Rodriguez
2017-01-11 20:08 ` Luis R. Rodriguez
1 sibling, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-10 18:57 UTC (permalink / raw)
To: Petr Mladek, Kees Cook, Peter Zijlstra
Cc: mcgrof, shuah, Jessica Yu, Rusty Russell, Eric W. Biederman,
Dmitry Torokhov, Arnaldo Carvalho de Melo, Jonathan Corbet,
martin.wilck, Michal Marek, hare, rwright, Jeff Mahoney, DSterba,
fdmanana, neilb, Guenter Roeck, rgoldwyn, subashab,
Heinrich Schuchardt, Aaron Tomlin, mbenes, Paul E. McKenney,
Dan Williams, Josh Poimboeuf, David S. Miller, Ingo Molnar,
Andrew Morton, Linus Torvalds, linux-kselftest, linux-doc, LKML
On Fri, Dec 16, 2016 at 09:05:00AM +0100, Luis R. Rodriguez wrote:
> On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> > On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > > > + return 0;
> > > > > + atomic_dec(&kmod_concurrent);
> > > > > + return -ENOMEM;
> > > > > +}
> > > > > +
> > > > > +static void kmod_umh_threads_put(void)
> > > > > +{
> > > > > + atomic_dec(&kmod_concurrent);
> > > > > +}
> > > >
> > > > Can you use a kref here instead? We're trying to kill raw use of
> > > > atomic_t for reference counting...
> > >
> > > That's a much broader functional change than I was looking for, but I am up for
> > > it. Can you describe the benefit of using kref you expect or why this is an
> > > ongoing crusade? Since its a larger functional change how about doing this
> > > change later, and we can test impact with the tress test driver. In theory if
> > > there are benefits can't we add a test case to prove the gains?
> >
> > Kees probably refers to the kref improvements that Peter Zijlstra
> > is working on, see
> > https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
> >
> > The advantage is that the new refcount API handles over and
> > underflow.
> >
> > Another advantage is that it increments/decrements the value
> > only when it is safe. It uses cmpxchg to make sure that
> > the checks are valid.
>
> Great thanks, will look into that.
OK I've done the conversion now, the only thing is linux-next as of today lacks
KREF_INIT() so I've open coded it for now. Once Peter's changes get merged the
only thing we'dneed is to change the open code line to KREF_INIT().
I'll annotate this as Suggested-by Kees and Petr, I did this as a separate atomic
step after this to make it easier for review.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2017-01-10 18:57 ` [RFC 04/10] " Luis R. Rodriguez
@ 2017-01-11 20:08 ` Luis R. Rodriguez
2017-05-16 18:02 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-11 20:08 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Petr Mladek, Kees Cook, Peter Zijlstra, shuah, Jessica Yu,
Rusty Russell, Eric W. Biederman, Dmitry Torokhov,
Arnaldo Carvalho de Melo, Jonathan Corbet, martin.wilck,
Michal Marek, hare, rwright, Jeff Mahoney, DSterba, fdmanana,
neilb, Guenter Roeck, rgoldwyn, subashab, Heinrich Schuchardt,
Aaron Tomlin, mbenes, Paul E. McKenney, Dan Williams,
Josh Poimboeuf, David S. Miller, Ingo Molnar, Andrew Morton,
Linus Torvalds, linux-kselftest, linux-doc, LKML
On Tue, Jan 10, 2017 at 07:57:10PM +0100, Luis R. Rodriguez wrote:
> On Fri, Dec 16, 2016 at 09:05:00AM +0100, Luis R. Rodriguez wrote:
> > On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> > > On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > > > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > > > > + return 0;
> > > > > > + atomic_dec(&kmod_concurrent);
> > > > > > + return -ENOMEM;
> > > > > > +}
> > > > > > +
> > > > > > +static void kmod_umh_threads_put(void)
> > > > > > +{
> > > > > > + atomic_dec(&kmod_concurrent);
> > > > > > +}
> > > > >
> > > > > Can you use a kref here instead? We're trying to kill raw use of
> > > > > atomic_t for reference counting...
> > > >
> > > > That's a much broader functional change than I was looking for, but I am up for
> > > > it. Can you describe the benefit of using kref you expect or why this is an
> > > > ongoing crusade? Since its a larger functional change how about doing this
> > > > change later, and we can test impact with the tress test driver. In theory if
> > > > there are benefits can't we add a test case to prove the gains?
> > >
> > > Kees probably refers to the kref improvements that Peter Zijlstra
> > > is working on, see
> > > https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
> > >
> > > The advantage is that the new refcount API handles over and
> > > underflow.
> > >
> > > Another advantage is that it increments/decrements the value
> > > only when it is safe. It uses cmpxchg to make sure that
> > > the checks are valid.
> >
> > Great thanks, will look into that.
>
> OK I've done the conversion now, the only thing is linux-next as of today lacks
> KREF_INIT() so I've open coded it for now. Once Peter's changes get merged the
> only thing we'dneed is to change the open code line to KREF_INIT().
>
> I'll annotate this as Suggested-by Kees and Petr, I did this as a separate atomic
> step after this to make it easier for review.
Spoke too soon, kref_read() is not upstream yet either, so I can hold conversion
over until Peter's work is merged. Peter please Cc me on those patches if possible
:D
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2017-01-11 20:08 ` Luis R. Rodriguez
@ 2017-05-16 18:02 ` Luis R. Rodriguez
2017-05-18 2:37 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-05-16 18:02 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Petr Mladek, Kees Cook, Peter Zijlstra, shuah, Jessica Yu,
Rusty Russell, Eric W. Biederman, Dmitry Torokhov,
Arnaldo Carvalho de Melo, Jonathan Corbet, martin.wilck,
Michal Marek, hare, rwright, Jeff Mahoney, DSterba, fdmanana,
neilb, Guenter Roeck, rgoldwyn, subashab, Heinrich Schuchardt,
Aaron Tomlin, mbenes, Paul E. McKenney, Dan Williams,
Josh Poimboeuf, David S. Miller, Ingo Molnar, Andrew Morton,
Linus Torvalds, linux-kselftest, linux-doc, LKML
On Wed, Jan 11, 2017 at 09:08:57PM +0100, Luis R. Rodriguez wrote:
> On Tue, Jan 10, 2017 at 07:57:10PM +0100, Luis R. Rodriguez wrote:
> > On Fri, Dec 16, 2016 at 09:05:00AM +0100, Luis R. Rodriguez wrote:
> > > On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> > > > On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > > > > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > > > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > > > > > + return 0;
> > > > > > > + atomic_dec(&kmod_concurrent);
> > > > > > > + return -ENOMEM;
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void kmod_umh_threads_put(void)
> > > > > > > +{
> > > > > > > + atomic_dec(&kmod_concurrent);
> > > > > > > +}
> > > > > >
> > > > > > Can you use a kref here instead? We're trying to kill raw use of
> > > > > > atomic_t for reference counting...
> > > > >
> > > > > That's a much broader functional change than I was looking for, but I am up for
> > > > > it. Can you describe the benefit of using kref you expect or why this is an
> > > > > ongoing crusade? Since its a larger functional change how about doing this
> > > > > change later, and we can test impact with the tress test driver. In theory if
> > > > > there are benefits can't we add a test case to prove the gains?
> > > >
> > > > Kees probably refers to the kref improvements that Peter Zijlstra
> > > > is working on, see
> > > > https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
> > > >
> > > > The advantage is that the new refcount API handles over and
> > > > underflow.
> > > >
> > > > Another advantage is that it increments/decrements the value
> > > > only when it is safe. It uses cmpxchg to make sure that
> > > > the checks are valid.
> > >
> > > Great thanks, will look into that.
> >
> > OK I've done the conversion now, the only thing is linux-next as of today lacks
> > KREF_INIT() so I've open coded it for now. Once Peter's changes get merged the
> > only thing we'dneed is to change the open code line to KREF_INIT().
> >
> > I'll annotate this as Suggested-by Kees and Petr, I did this as a separate atomic
> > step after this to make it easier for review.
>
> Spoke too soon, kref_read() is not upstream yet either, so I can hold conversion
> over until Peter's work is merged. Peter please Cc me on those patches if possible
> :D
All the needed kref stuff is upstream now, however, kref is overkill for
kmod_concurrent given this is just a counter, it is not used to release
any object, and kref_put() requires such mechanism. The lightweight
refcount_t is much more appropriate here so will use that and respin
this series, finally.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec
2017-05-16 18:02 ` Luis R. Rodriguez
@ 2017-05-18 2:37 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-05-18 2:37 UTC (permalink / raw)
To: Petr Mladek, Kees Cook, Peter Zijlstra
Cc: Luis R. Rodriguez, shuah, Jessica Yu, Rusty Russell,
Eric W. Biederman, Dmitry Torokhov, Arnaldo Carvalho de Melo,
Jonathan Corbet, martin.wilck, Michal Marek, hare, rwright,
Jeff Mahoney, DSterba, fdmanana, neilb, Guenter Roeck, rgoldwyn,
subashab, Heinrich Schuchardt, Aaron Tomlin, mbenes,
Paul E. McKenney, Dan Williams, Josh Poimboeuf, David S. Miller,
Ingo Molnar, Andrew Morton, Linus Torvalds, linux-kselftest,
linux-doc, LKML
On Tue, May 16, 2017 at 08:02:17PM +0200, Luis R. Rodriguez wrote:
> On Wed, Jan 11, 2017 at 09:08:57PM +0100, Luis R. Rodriguez wrote:
> > On Tue, Jan 10, 2017 at 07:57:10PM +0100, Luis R. Rodriguez wrote:
> > > On Fri, Dec 16, 2016 at 09:05:00AM +0100, Luis R. Rodriguez wrote:
> > > > On Thu, Dec 15, 2016 at 01:46:25PM +0100, Petr Mladek wrote:
> > > > > On Thu 2016-12-08 22:08:59, Luis R. Rodriguez wrote:
> > > > > > On Thu, Dec 08, 2016 at 12:29:42PM -0800, Kees Cook wrote:
> > > > > > > On Thu, Dec 8, 2016 at 11:48 AM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > > > > > + if (atomic_read(&kmod_concurrent) < max_modprobes)
> > > > > > > > + return 0;
> > > > > > > > + atomic_dec(&kmod_concurrent);
> > > > > > > > + return -ENOMEM;
> > > > > > > > +}
> > > > > > > > +
> > > > > > > > +static void kmod_umh_threads_put(void)
> > > > > > > > +{
> > > > > > > > + atomic_dec(&kmod_concurrent);
> > > > > > > > +}
> > > > > > >
> > > > > > > Can you use a kref here instead? We're trying to kill raw use of
> > > > > > > atomic_t for reference counting...
> > > > > >
> > > > > > That's a much broader functional change than I was looking for, but I am up for
> > > > > > it. Can you describe the benefit of using kref you expect or why this is an
> > > > > > ongoing crusade? Since its a larger functional change how about doing this
> > > > > > change later, and we can test impact with the tress test driver. In theory if
> > > > > > there are benefits can't we add a test case to prove the gains?
> > > > >
> > > > > Kees probably refers to the kref improvements that Peter Zijlstra
> > > > > is working on, see
> > > > > https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
> > > > >
> > > > > The advantage is that the new refcount API handles over and
> > > > > underflow.
> > > > >
> > > > > Another advantage is that it increments/decrements the value
> > > > > only when it is safe. It uses cmpxchg to make sure that
> > > > > the checks are valid.
> > > >
> > > > Great thanks, will look into that.
> > >
> > > OK I've done the conversion now, the only thing is linux-next as of today lacks
> > > KREF_INIT() so I've open coded it for now. Once Peter's changes get merged the
> > > only thing we'dneed is to change the open code line to KREF_INIT().
> > >
> > > I'll annotate this as Suggested-by Kees and Petr, I did this as a separate atomic
> > > step after this to make it easier for review.
> >
> > Spoke too soon, kref_read() is not upstream yet either, so I can hold conversion
> > over until Peter's work is merged. Peter please Cc me on those patches if possible
> > :D
>
> All the needed kref stuff is upstream now, however, kref is overkill for
> kmod_concurrent given this is just a counter, it is not used to release
> any object, and kref_put() requires such mechanism. The lightweight
> refcount_t is much more appropriate here so will use that and respin
> this series, finally.
And... even the refcount_t is overkill here given even with preemption stuff on
inc we still run into the warnings implemented by the recount stuff right away.
The only way to properly fix this is with a proper lock and I don't think this is
worth it at this point.
This would be an issue if the accounting here was for an object but since its
not and its just a loose estimate for a subjective "reasonable threshold" this
is all just overkill.
Lesson: (unless I hear otherwise)
As such I see no real strong motivation for a change here now. Counters, used
without any object references or any real critical stuff is left best with the
old atomic counters.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-08 19:48 ` [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec Luis R. Rodriguez
2016-12-08 20:29 ` Kees Cook
@ 2016-12-22 5:07 ` Jessica Yu
2017-01-10 20:28 ` Luis R. Rodriguez
1 sibling, 1 reply; 65+ messages in thread
From: Jessica Yu @ 2016-12-22 5:07 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
+++ Luis R. Rodriguez [08/12/16 11:48 -0800]:
>kmod_concurrent is used as an atomic counter for enabling
>the allowed limit of modprobe calls, provide wrappers for it
>to enable this to be expanded on more easily. This will be done
>later.
>
>Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
>---
> kernel/kmod.c | 27 +++++++++++++++++++++------
> 1 file changed, 21 insertions(+), 6 deletions(-)
>
>diff --git a/kernel/kmod.c b/kernel/kmod.c
>index cb6f7ca7b8a5..049d7eabda38 100644
>--- a/kernel/kmod.c
>+++ b/kernel/kmod.c
>@@ -44,6 +44,9 @@
> #include <trace/events/module.h>
>
> extern int max_threads;
>+
>+static atomic_t kmod_concurrent = ATOMIC_INIT(0);
>+
> unsigned int max_modprobes;
> module_param(max_modprobes, uint, 0644);
> MODULE_PARM_DESC(max_modprobes, "Max number of allowed concurrent modprobes");
>@@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> return -ENOMEM;
> }
>
>+static int kmod_umh_threads_get(void)
>+{
>+ atomic_inc(&kmod_concurrent);
>+ if (atomic_read(&kmod_concurrent) < max_modprobes)
Should this not be <=? I think this only allows up to max_modprobes-1 concurrent threads.
>+ return 0;
>+ atomic_dec(&kmod_concurrent);
>+ return -ENOMEM;
>+}
>+
>+static void kmod_umh_threads_put(void)
>+{
>+ atomic_dec(&kmod_concurrent);
>+}
>+
> /**
> * __request_module - try to load a kernel module
> * @wait: wait (or not) for the operation to complete
>@@ -129,7 +146,6 @@ int __request_module(bool wait, const char *fmt, ...)
> va_list args;
> char module_name[MODULE_NAME_LEN];
> int ret;
>- static atomic_t kmod_concurrent = ATOMIC_INIT(0);
> static int kmod_loop_msg;
>
> /*
>@@ -153,8 +169,8 @@ int __request_module(bool wait, const char *fmt, ...)
> if (ret)
> return ret;
>
>- atomic_inc(&kmod_concurrent);
>- if (atomic_read(&kmod_concurrent) > max_modprobes) {
>+ ret = kmod_umh_threads_get();
>+ if (ret) {
> /* We may be blaming an innocent here, but unlikely */
> if (kmod_loop_msg < 5) {
> printk(KERN_ERR
>@@ -162,15 +178,14 @@ int __request_module(bool wait, const char *fmt, ...)
> module_name);
> kmod_loop_msg++;
> }
>- atomic_dec(&kmod_concurrent);
>- return -ENOMEM;
>+ return ret;
> }
>
> trace_module_request(module_name, wait, _RET_IP_);
>
> ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
>
>- atomic_dec(&kmod_concurrent);
>+ kmod_umh_threads_put();
> return ret;
> }
> EXPORT_SYMBOL(__request_module);
>--
>2.10.1
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: kmod: provide wrappers for kmod_concurrent inc/dec
2016-12-22 5:07 ` Jessica Yu
@ 2017-01-10 20:28 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-10 20:28 UTC (permalink / raw)
To: Jessica Yu
Cc: Luis R. Rodriguez, shuah, rusty, ebiederm, dmitry.torokhov, acme,
corbet, martin.wilck, mmarek, pmladek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Wed, Dec 21, 2016 at 09:07:21PM -0800, Jessica Yu wrote:
> +++ Luis R. Rodriguez [08/12/16 11:48 -0800]:
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index cb6f7ca7b8a5..049d7eabda38 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -108,6 +111,20 @@ static int call_modprobe(char *module_name, int wait)
> > return -ENOMEM;
> > }
> >
> > +static int kmod_umh_threads_get(void)
> > +{
> > + atomic_inc(&kmod_concurrent);
> > + if (atomic_read(&kmod_concurrent) < max_modprobes)
>
> Should this not be <=? I think this only allows up to max_modprobes-1 concurrent threads.
True, fixed!
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* [RFC 05/10] kmod: return -EBUSY if modprobe limit is reached
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (3 preceding siblings ...)
2016-12-08 19:48 ` [RFC 04/10] kmod: provide wrappers for kmod_concurrent inc/dec Luis R. Rodriguez
@ 2016-12-08 19:48 ` Luis R. Rodriguez
2016-12-08 19:48 ` [RFC 06/10] kmod: provide sanity check on kmod_concurrent access Luis R. Rodriguez
` (5 subsequent siblings)
10 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:48 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
Running out of our modprobe limit is not a memory limit but
a system specific established limitation set to avoid a possible
recursive issue with modprobe. This gives userspace a better idea
of what happened if we can't load a module, it could use this to
wait and try again.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
kernel/kmod.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 049d7eabda38..ab38539f7e91 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -117,7 +117,7 @@ static int kmod_umh_threads_get(void)
if (atomic_read(&kmod_concurrent) < max_modprobes)
return 0;
atomic_dec(&kmod_concurrent);
- return -ENOMEM;
+ return -EBUSY;
}
static void kmod_umh_threads_put(void)
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [RFC 06/10] kmod: provide sanity check on kmod_concurrent access
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (4 preceding siblings ...)
2016-12-08 19:48 ` [RFC 05/10] kmod: return -EBUSY if modprobe limit is reached Luis R. Rodriguez
@ 2016-12-08 19:48 ` Luis R. Rodriguez
2016-12-14 16:08 ` Petr Mladek
2016-12-15 12:57 ` Petr Mladek
2016-12-08 19:49 ` [RFC 07/10] kmod: use simplified rate limit printk Luis R. Rodriguez
` (4 subsequent siblings)
10 siblings, 2 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:48 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
Only decrement *iff* we're possitive. Warn if we've hit
a situation where the counter is already 0 after we're done
with a modprobe call, this would tell us we have an unaccounted
counter access -- this in theory should not be possible as
only one routine controls the counter, however preemption is
one case that could trigger this situation. Avoid that situation
by disabling preemptiong while we access the counter.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
kernel/kmod.c | 20 ++++++++++++++++----
1 file changed, 16 insertions(+), 4 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index ab38539f7e91..09cf35a2075a 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -113,16 +113,28 @@ static int call_modprobe(char *module_name, int wait)
static int kmod_umh_threads_get(void)
{
+ int ret = 0;
+
+ preempt_disable();
atomic_inc(&kmod_concurrent);
if (atomic_read(&kmod_concurrent) < max_modprobes)
- return 0;
- atomic_dec(&kmod_concurrent);
- return -EBUSY;
+ goto out;
+
+ atomic_dec_if_positive(&kmod_concurrent);
+ ret = -EBUSY;
+out:
+ preempt_enable();
+ return 0;
}
static void kmod_umh_threads_put(void)
{
- atomic_dec(&kmod_concurrent);
+ int ret;
+
+ preempt_disable();
+ ret = atomic_dec_if_positive(&kmod_concurrent);
+ WARN_ON(ret < 0);
+ preempt_enable();
}
/**
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 06/10] kmod: provide sanity check on kmod_concurrent access
2016-12-08 19:48 ` [RFC 06/10] kmod: provide sanity check on kmod_concurrent access Luis R. Rodriguez
@ 2016-12-14 16:08 ` Petr Mladek
2016-12-14 17:12 ` Luis R. Rodriguez
2016-12-15 12:57 ` Petr Mladek
1 sibling, 1 reply; 65+ messages in thread
From: Petr Mladek @ 2016-12-14 16:08 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, hare, rwright, jeffm, DSterba, fdmanana,
neilb, linux, rgoldwyn, subashab, xypron.glpk, keescook, atomlin,
mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo, akpm,
torvalds, linux-kselftest, linux-doc, linux-kernel
On Thu 2016-12-08 11:48:50, Luis R. Rodriguez wrote:
> Only decrement *iff* we're possitive. Warn if we've hit
> a situation where the counter is already 0 after we're done
> with a modprobe call, this would tell us we have an unaccounted
> counter access -- this in theory should not be possible as
> only one routine controls the counter, however preemption is
> one case that could trigger this situation. Avoid that situation
> by disabling preemptiong while we access the counter.
I am curious about it. How could enabled preemption cause that
the counter will get negative?
Unaccounted access would be possible if put() is called
without get() or if put() is called before get().
I do not see a way how the value might get negative when
the calls are paired and ordered.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 06/10] kmod: provide sanity check on kmod_concurrent access
2016-12-14 16:08 ` Petr Mladek
@ 2016-12-14 17:12 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-14 17:12 UTC (permalink / raw)
To: Petr Mladek
Cc: Luis R. Rodriguez, shuah, jeyu, rusty, ebiederm, dmitry.torokhov,
acme, corbet, martin.wilck, mmarek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Wed, Dec 14, 2016 at 05:08:58PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:48:50, Luis R. Rodriguez wrote:
> > Only decrement *iff* we're possitive. Warn if we've hit
> > a situation where the counter is already 0 after we're done
> > with a modprobe call, this would tell us we have an unaccounted
> > counter access -- this in theory should not be possible as
> > only one routine controls the counter, however preemption is
> > one case that could trigger this situation. Avoid that situation
> > by disabling preemptiong while we access the counter.
>
> I am curious about it. How could enabled preemption cause that
> the counter will get negative?
As the commit log describes today in theory this is not possible
was we have only have one routine controlling the counter. If we
were to expand this then such possibilities become more real.
> Unaccounted access would be possible if put() is called
> without get() or if put() is called before get().
Exactly, so buggy users of the get/put calls in future calls.
I can just drop the preemption disable / enable for now as it
should not be an issue now.
> I do not see a way how the value might get negative when
> the calls are paired and ordered.
Right, this just matches parity with module_put(), its perhaps
*preemptively* too cautious though so I could just drop the
preemption enable/disable for now as that would slow down
things a bit.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 06/10] kmod: provide sanity check on kmod_concurrent access
2016-12-08 19:48 ` [RFC 06/10] kmod: provide sanity check on kmod_concurrent access Luis R. Rodriguez
2016-12-14 16:08 ` Petr Mladek
@ 2016-12-15 12:57 ` Petr Mladek
2017-01-10 20:00 ` Luis R. Rodriguez
1 sibling, 1 reply; 65+ messages in thread
From: Petr Mladek @ 2016-12-15 12:57 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, hare, rwright, jeffm, DSterba, fdmanana,
neilb, linux, rgoldwyn, subashab, xypron.glpk, keescook, atomlin,
mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo, akpm,
torvalds, linux-kselftest, linux-doc, linux-kernel
On Thu 2016-12-08 11:48:50, Luis R. Rodriguez wrote:
> Only decrement *iff* we're possitive. Warn if we've hit
> a situation where the counter is already 0 after we're done
> with a modprobe call, this would tell us we have an unaccounted
> counter access -- this in theory should not be possible as
> only one routine controls the counter, however preemption is
> one case that could trigger this situation. Avoid that situation
> by disabling preemptiong while we access the counter.
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> kernel/kmod.c | 20 ++++++++++++++++----
> 1 file changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index ab38539f7e91..09cf35a2075a 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -113,16 +113,28 @@ static int call_modprobe(char *module_name, int wait)
>
> static int kmod_umh_threads_get(void)
> {
> + int ret = 0;
> +
> + preempt_disable();
> atomic_inc(&kmod_concurrent);
> if (atomic_read(&kmod_concurrent) < max_modprobes)
> - return 0;
> - atomic_dec(&kmod_concurrent);
> - return -EBUSY;
> + goto out;
I though more about it and the disabled preemtion might make
sense here. It makes sure that we are not rescheduled here
and that kmod_concurrent is not increased by mistake for too long.
Well, it still would make sense to increment the value
only when it is under the limit and set the incremented
value using cmpxchg to avoid races.
I mean to use similar trick that is used by refcount_inc(), see
https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
> + atomic_dec_if_positive(&kmod_concurrent);
> + ret = -EBUSY;
> +out:
> + preempt_enable();
> + return 0;
> }
>
> static void kmod_umh_threads_put(void)
> {
> - atomic_dec(&kmod_concurrent);
> + int ret;
> +
> + preempt_disable();
> + ret = atomic_dec_if_positive(&kmod_concurrent);
> + WARN_ON(ret < 0);
> + preempt_enable();
The disabled preemption does not make much sense here.
We do not need to tie the atomic operation and the WARN
together so tightly.
Best Regards,
Petr
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 06/10] kmod: provide sanity check on kmod_concurrent access
2016-12-15 12:57 ` Petr Mladek
@ 2017-01-10 20:00 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-10 20:00 UTC (permalink / raw)
To: Petr Mladek, Peter Zijlstra
Cc: Luis R. Rodriguez, shuah, jeyu, rusty, ebiederm, dmitry.torokhov,
acme, corbet, martin.wilck, mmarek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Thu, Dec 15, 2016 at 01:57:48PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:48:50, Luis R. Rodriguez wrote:
> > Only decrement *iff* we're possitive. Warn if we've hit
> > a situation where the counter is already 0 after we're done
> > with a modprobe call, this would tell us we have an unaccounted
> > counter access -- this in theory should not be possible as
> > only one routine controls the counter, however preemption is
> > one case that could trigger this situation. Avoid that situation
> > by disabling preemptiong while we access the counter.
> >
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> > kernel/kmod.c | 20 ++++++++++++++++----
> > 1 file changed, 16 insertions(+), 4 deletions(-)
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index ab38539f7e91..09cf35a2075a 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -113,16 +113,28 @@ static int call_modprobe(char *module_name, int wait)
> >
> > static int kmod_umh_threads_get(void)
> > {
> > + int ret = 0;
> > +
> > + preempt_disable();
> > atomic_inc(&kmod_concurrent);
> > if (atomic_read(&kmod_concurrent) < max_modprobes)
> > - return 0;
> > - atomic_dec(&kmod_concurrent);
> > - return -EBUSY;
> > + goto out;
>
> I though more about it and the disabled preemtion might make
> sense here. It makes sure that we are not rescheduled here
> and that kmod_concurrent is not increased by mistake for too long.
I think its good to add a comment here about this.
> Well, it still would make sense to increment the value
> only when it is under the limit and set the incremented
> value using cmpxchg to avoid races.
>
> I mean to use similar trick that is used by refcount_inc(), see
> https://lkml.kernel.org/r/20161114174446.832175072@infradead.org
Right, I see now. Since we are converting this to kref though we would
immediately get the advantages of kref_get() using the new refcount_inc() once
that goes in, so I think its best we just sit tight to get that benefit given
as Jessica acknowledged the existing code has has this issue for ages, waiting
a bit longer should not hurt. The preemption should help in the meantime as
well.
The note I've made then is:
/*
* Disabling preemption makes sure that we are not rescheduled here.
*
* Also preemption helps kmod_concurrent is not increased by mistake
* for too long given in theory two concurrent threads could race on
* kref_get() before we kref_read().
*
* XXX: once Peter's refcount_t gets merged kref's kref_get() will use
* the new refcount_inc() and then each inc will be atomic with respect
* to each thread, as such when Peter's refcount_t gets merged
* the above comment "Also preemption ..." can be removed.
*/
Come to think of it, once Peter's changes go in at first glance it may seem
preemption would be pointless then but but I think that just mitigates a few
of the refcount_inc() instances where (old != val), that is -- when two threads
got the same bump, so think it can be kept even after Peter's refcount_t work.
> > + atomic_dec_if_positive(&kmod_concurrent);
> > + ret = -EBUSY;
> > +out:
> > + preempt_enable();
> > + return 0;
> > }
> >
> > static void kmod_umh_threads_put(void)
> > {
> > - atomic_dec(&kmod_concurrent);
> > + int ret;
> > +
> > + preempt_disable();
> > + ret = atomic_dec_if_positive(&kmod_concurrent);
> > + WARN_ON(ret < 0);
> > + preempt_enable();
>
> The disabled preemption does not make much sense here.
> We do not need to tie the atomic operation and the WARN
> together so tightly.
Makes sense, will add a note.
kref also lacks such a mnemonic as atomic_dec_if_positive()
and since I've now converted this to kref I've dropped this.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* [RFC 07/10] kmod: use simplified rate limit printk
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (5 preceding siblings ...)
2016-12-08 19:48 ` [RFC 06/10] kmod: provide sanity check on kmod_concurrent access Luis R. Rodriguez
@ 2016-12-08 19:49 ` Luis R. Rodriguez
2016-12-14 16:23 ` Petr Mladek
2016-12-08 19:49 ` [RFC 08/10] sysctl: add support for unsigned int properly Luis R. Rodriguez
` (3 subsequent siblings)
10 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:49 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
Just use the simplified rate limit printk when the max modprobe
limit is reached, while at it throw out a bone should the error
be triggered.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
kernel/kmod.c | 10 ++--------
1 file changed, 2 insertions(+), 8 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 09cf35a2075a..ef65f4c3578a 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -158,7 +158,6 @@ int __request_module(bool wait, const char *fmt, ...)
va_list args;
char module_name[MODULE_NAME_LEN];
int ret;
- static int kmod_loop_msg;
/*
* We don't allow synchronous module loading from async. Module
@@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
ret = kmod_umh_threads_get();
if (ret) {
- /* We may be blaming an innocent here, but unlikely */
- if (kmod_loop_msg < 5) {
- printk(KERN_ERR
- "request_module: runaway loop modprobe %s\n",
- module_name);
- kmod_loop_msg++;
- }
+ pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
+ max_modprobes, module_name);
return ret;
}
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 07/10] kmod: use simplified rate limit printk
2016-12-08 19:49 ` [RFC 07/10] kmod: use simplified rate limit printk Luis R. Rodriguez
@ 2016-12-14 16:23 ` Petr Mladek
2016-12-14 16:41 ` Joe Perches
2016-12-16 8:44 ` Luis R. Rodriguez
0 siblings, 2 replies; 65+ messages in thread
From: Petr Mladek @ 2016-12-14 16:23 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, hare, rwright, jeffm, DSterba, fdmanana,
neilb, linux, rgoldwyn, subashab, xypron.glpk, keescook, atomlin,
mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo, akpm,
torvalds, linux-kselftest, linux-doc, linux-kernel
On Thu 2016-12-08 11:49:01, Luis R. Rodriguez wrote:
> Just use the simplified rate limit printk when the max modprobe
> limit is reached, while at it throw out a bone should the error
> be triggered.
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> kernel/kmod.c | 10 ++--------
> 1 file changed, 2 insertions(+), 8 deletions(-)
>
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 09cf35a2075a..ef65f4c3578a 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -158,7 +158,6 @@ int __request_module(bool wait, const char *fmt, ...)
> va_list args;
> char module_name[MODULE_NAME_LEN];
> int ret;
> - static int kmod_loop_msg;
>
> /*
> * We don't allow synchronous module loading from async. Module
> @@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
>
> ret = kmod_umh_threads_get();
> if (ret) {
> - /* We may be blaming an innocent here, but unlikely */
> - if (kmod_loop_msg < 5) {
> - printk(KERN_ERR
> - "request_module: runaway loop modprobe %s\n",
> - module_name);
> - kmod_loop_msg++;
> - }
> + pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
> + max_modprobes, module_name);
I like this change. I would only be even more descriptive in which
limit is reached. Something like
pr_err_ratelimited("request_module: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
module_name, max_modprobes);
Either way, feel free to add:
Reviewed-by: Petr Mladek <pmladek@suse.com>
Best Regards,
Petr
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 07/10] kmod: use simplified rate limit printk
2016-12-14 16:23 ` Petr Mladek
@ 2016-12-14 16:41 ` Joe Perches
2016-12-16 8:44 ` Luis R. Rodriguez
1 sibling, 0 replies; 65+ messages in thread
From: Joe Perches @ 2016-12-14 16:41 UTC (permalink / raw)
To: Petr Mladek, Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, hare, rwright, jeffm, DSterba, fdmanana,
neilb, linux, rgoldwyn, subashab, xypron.glpk, keescook, atomlin,
mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo, akpm,
torvalds, linux-kselftest, linux-doc, linux-kernel
On Wed, 2016-12-14 at 17:23 +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:49:01, Luis R. Rodriguez wrote:
> > Just use the simplified rate limit printk when the max modprobe
> > limit is reached, while at it throw out a bone should the error
> > be triggered.
[]
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
[]
> > @@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
> >
> > ret = kmod_umh_threads_get();
> > if (ret) {
> > - /* We may be blaming an innocent here, but unlikely */
> > - if (kmod_loop_msg < 5) {
> > - printk(KERN_ERR
> > - "request_module: runaway loop modprobe %s\n",
> > - module_name);
> > - kmod_loop_msg++;
> > - }
> > + pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
> > + max_modprobes, module_name);
>
> I like this change. I would only be even more descriptive in which
> limit is reached. Something like
>
> pr_err_ratelimited("request_module: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
> module_name, max_modprobes);
>
> Either way, feel free to add:
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
Seems sensible.
I suggest using "%s: ", __func__ instead of embedding
the function name.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 07/10] kmod: use simplified rate limit printk
2016-12-14 16:23 ` Petr Mladek
2016-12-14 16:41 ` Joe Perches
@ 2016-12-16 8:44 ` Luis R. Rodriguez
1 sibling, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-16 8:44 UTC (permalink / raw)
To: Petr Mladek
Cc: Luis R. Rodriguez, shuah, jeyu, rusty, ebiederm, dmitry.torokhov,
acme, corbet, martin.wilck, mmarek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Wed, Dec 14, 2016 at 05:23:50PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:49:01, Luis R. Rodriguez wrote:
> > Just use the simplified rate limit printk when the max modprobe
> > limit is reached, while at it throw out a bone should the error
> > be triggered.
> >
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> > kernel/kmod.c | 10 ++--------
> > 1 file changed, 2 insertions(+), 8 deletions(-)
> >
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 09cf35a2075a..ef65f4c3578a 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -158,7 +158,6 @@ int __request_module(bool wait, const char *fmt, ...)
> > va_list args;
> > char module_name[MODULE_NAME_LEN];
> > int ret;
> > - static int kmod_loop_msg;
> >
> > /*
> > * We don't allow synchronous module loading from async. Module
> > @@ -183,13 +182,8 @@ int __request_module(bool wait, const char *fmt, ...)
> >
> > ret = kmod_umh_threads_get();
> > if (ret) {
> > - /* We may be blaming an innocent here, but unlikely */
> > - if (kmod_loop_msg < 5) {
> > - printk(KERN_ERR
> > - "request_module: runaway loop modprobe %s\n",
> > - module_name);
> > - kmod_loop_msg++;
> > - }
> > + pr_err_ratelimited("request_module: modprobe limit (%u) reached with module %s\n",
> > + max_modprobes, module_name);
>
> I like this change. I would only be even more descriptive in which
> limit is reached. Something like
>
> pr_err_ratelimited("request_module: module \"%s\" reached limit (%u) of concurrent modprobe calls\n",
> module_name, max_modprobes);
Sure, changed.
> Either way, feel free to add:
>
> Reviewed-by: Petr Mladek <pmladek@suse.com>
Thanks!
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* [RFC 08/10] sysctl: add support for unsigned int properly
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (6 preceding siblings ...)
2016-12-08 19:49 ` [RFC 07/10] kmod: use simplified rate limit printk Luis R. Rodriguez
@ 2016-12-08 19:49 ` Luis R. Rodriguez
2016-12-08 19:49 ` [RFC 09/10] kmod: add helpers for getting kmod count and limit Luis R. Rodriguez
` (2 subsequent siblings)
10 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:49 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
Commit e7d316a02f6838 ("sysctl: handle error writing UINT_MAX to u32 fields")
added proc_douintvec() to start help adding support for unsigned int,
this however was only half the work needed, all these issues are present
with the current implementation:
o Printing the values shows a negative value, this happens
since do_proc_dointvec() and this uses proc_put_long()
o We can easily wrap around the int values: UINT_MAX is
4294967295, if we echo in 4294967295 + 1 we end up with 0,
using 4294967295 + 2 we end up with 1.
o We echo negative values in and they are accepted
Fix all these issues by adding our own do_proc_douintvec(). Likewise to
keep parity provide the other typically useful proc_douintvec_minmax().
Adding proc_douintvec_minmax_sysadmin() is easy but we wait for an actual
user for that.
Cc: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Cc: Heinrich Schuchardt <xypron.glpk@gmx.de>
Cc: Kees Cook <keescook@chromium.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Fixes: e7d316a02f68 ("sysctl: handle error writing UINT_MAX to u32 fields")
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
include/linux/sysctl.h | 3 +
kernel/sysctl.c | 184 +++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 181 insertions(+), 6 deletions(-)
diff --git a/include/linux/sysctl.h b/include/linux/sysctl.h
index adf4e51cf597..a35d40ecc211 100644
--- a/include/linux/sysctl.h
+++ b/include/linux/sysctl.h
@@ -47,6 +47,9 @@ extern int proc_douintvec(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
extern int proc_dointvec_minmax(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
+extern int proc_douintvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp,
+ loff_t *ppos);
extern int proc_dointvec_jiffies(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
extern int proc_dointvec_userhz_jiffies(struct ctl_table *, int,
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 1a292ebcbbb6..06711e648fa3 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -2125,12 +2125,12 @@ static int do_proc_dointvec_conv(bool *negp, unsigned long *lvalp,
return 0;
}
-static int do_proc_douintvec_conv(bool *negp, unsigned long *lvalp,
- int *valp,
- int write, void *data)
+static int do_proc_douintvec_conv(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data)
{
if (write) {
- if (*negp)
+ if (*lvalp > (unsigned long) UINT_MAX)
return -EINVAL;
*valp = *lvalp;
} else {
@@ -2243,6 +2243,115 @@ static int do_proc_dointvec(struct ctl_table *table, int write,
buffer, lenp, ppos, conv, data);
}
+static int __do_proc_douintvec(void *tbl_data, struct ctl_table *table,
+ int write, void __user *buffer,
+ size_t *lenp, loff_t *ppos,
+ int (*conv)(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data),
+ void *data)
+{
+ unsigned int *i, vleft;
+ bool first = true;
+ int err = 0;
+ size_t left;
+ char *kbuf = NULL, *p;
+
+ if (!tbl_data || !table->maxlen || !*lenp || (*ppos && !write)) {
+ *lenp = 0;
+ return 0;
+ }
+
+ i = (unsigned int *) tbl_data;
+ vleft = table->maxlen / sizeof(*i);
+ left = *lenp;
+
+ if (!conv)
+ conv = do_proc_douintvec_conv;
+
+ if (write) {
+ if (*ppos) {
+ switch (sysctl_writes_strict) {
+ case SYSCTL_WRITES_STRICT:
+ goto out;
+ case SYSCTL_WRITES_WARN:
+ warn_sysctl_write(table);
+ break;
+ default:
+ break;
+ }
+ }
+
+ if (left > PAGE_SIZE - 1)
+ left = PAGE_SIZE - 1;
+ p = kbuf = memdup_user_nul(buffer, left);
+ if (IS_ERR(kbuf))
+ return PTR_ERR(kbuf);
+ }
+
+ for (; left && vleft--; i++, first=false) {
+ unsigned long lval;
+ bool neg;
+
+ if (write) {
+ left -= proc_skip_spaces(&p);
+
+ if (!left)
+ break;
+ err = proc_get_long(&p, &left, &lval, &neg,
+ proc_wspace_sep,
+ sizeof(proc_wspace_sep), NULL);
+ if (neg) {
+ err = -EINVAL;
+ break;
+ }
+ if (err)
+ break;
+ if (conv(&lval, i, 1, data)) {
+ err = -EINVAL;
+ break;
+ }
+ } else {
+ if (conv(&lval, i, 0, data)) {
+ err = -EINVAL;
+ break;
+ }
+ if (!first)
+ err = proc_put_char(&buffer, &left, '\t');
+ if (err)
+ break;
+ err = proc_put_long(&buffer, &left, lval, false);
+ if (err)
+ break;
+ }
+ }
+
+ if (!write && !first && left && !err)
+ err = proc_put_char(&buffer, &left, '\n');
+ if (write && !err && left)
+ left -= proc_skip_spaces(&p);
+ if (write) {
+ kfree(kbuf);
+ if (first)
+ return err ? : -EINVAL;
+ }
+ *lenp -= left;
+out:
+ *ppos += *lenp;
+ return err;
+}
+
+static int do_proc_douintvec(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos,
+ int (*conv)(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data),
+ void *data)
+{
+ return __do_proc_douintvec(table->data, table, write,
+ buffer, lenp, ppos, conv, data);
+}
+
/**
* proc_dointvec - read a vector of integers
* @table: the sysctl table
@@ -2278,8 +2387,8 @@ int proc_dointvec(struct ctl_table *table, int write,
int proc_douintvec(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
- return do_proc_dointvec(table, write, buffer, lenp, ppos,
- do_proc_douintvec_conv, NULL);
+ return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ do_proc_douintvec_conv, NULL);
}
/*
@@ -2384,6 +2493,62 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
do_proc_dointvec_minmax_conv, ¶m);
}
+struct do_proc_douintvec_minmax_conv_param {
+ unsigned int *min;
+ unsigned int *max;
+};
+
+static int do_proc_douintvec_minmax_conv(unsigned long *lvalp,
+ unsigned int *valp,
+ int write, void *data)
+{
+ struct do_proc_douintvec_minmax_conv_param *param = data;
+ if (write) {
+ unsigned int val = *lvalp;
+ if ((param->min && *param->min > val) ||
+ (param->max && *param->max < val))
+ return -ERANGE;
+
+ if (*lvalp > (unsigned long) UINT_MAX)
+ return -EINVAL;
+ *valp = val;
+ } else {
+ unsigned int val = *valp;
+ *lvalp = (unsigned long) val;
+ }
+ return 0;
+}
+
+/**
+ * proc_douintvec_minmax - read a vector of unsigned ints with min/max values
+ * @table: the sysctl table
+ * @write: %TRUE if this is a write to the sysctl file
+ * @buffer: the user buffer
+ * @lenp: the size of the user buffer
+ * @ppos: file position
+ *
+ * Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integer
+ * values from/to the user buffer, treated as an ASCII string. Negative
+ * strings are not allowed.
+ *
+ * This routine will ensure the values are within the range specified by
+ * table->extra1 (min) and table->extra2 (max). There is a final sanity
+ * check for UINT_MAX to avoid having to support wrap around uses from
+ * userspace.
+ *
+ * Returns 0 on success.
+ */
+int proc_douintvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct do_proc_douintvec_minmax_conv_param param = {
+ .min = (unsigned int *) table->extra1,
+ .max = (unsigned int *) table->extra2,
+ };
+ return do_proc_douintvec(table, write, buffer, lenp, ppos,
+ do_proc_douintvec_minmax_conv, ¶m);
+}
+
static void validate_coredump_safety(void)
{
#ifdef CONFIG_COREDUMP
@@ -2891,6 +3056,12 @@ int proc_dointvec_minmax(struct ctl_table *table, int write,
return -ENOSYS;
}
+int proc_douintvec_minmax(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ return -ENOSYS;
+}
+
int proc_dointvec_jiffies(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos)
{
@@ -2933,6 +3104,7 @@ EXPORT_SYMBOL(proc_dointvec);
EXPORT_SYMBOL(proc_douintvec);
EXPORT_SYMBOL(proc_dointvec_jiffies);
EXPORT_SYMBOL(proc_dointvec_minmax);
+EXPORT_SYMBOL_GPL(proc_douintvec_minmax);
EXPORT_SYMBOL(proc_dointvec_userhz_jiffies);
EXPORT_SYMBOL(proc_dointvec_ms_jiffies);
EXPORT_SYMBOL(proc_dostring);
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [RFC 09/10] kmod: add helpers for getting kmod count and limit
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (7 preceding siblings ...)
2016-12-08 19:49 ` [RFC 08/10] sysctl: add support for unsigned int properly Luis R. Rodriguez
@ 2016-12-08 19:49 ` Luis R. Rodriguez
2016-12-15 16:56 ` Petr Mladek
2016-12-08 19:49 ` [RFC 10/10] kmod: add a sanity check on module loading Luis R. Rodriguez
2017-01-11 19:10 ` [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
10 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:49 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
This adds helpers for getting access to the kmod count and limit from
userspace. While at it, this also lets userspace fine tune the kmod
limit after boot, it uses the shiny new proc_douintvec_minmax().
These knobs should help userspace more gracefully and deterministically
handle module loading.
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
include/linux/kmod.h | 8 +++++
kernel/kmod.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
kernel/sysctl.c | 14 +++++++++
3 files changed, 103 insertions(+), 2 deletions(-)
diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index 15783cd7f056..94c7379cff94 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -39,13 +39,21 @@ int __request_module(bool wait, const char *name, ...);
#define try_then_request_module(x, mod...) \
((x) ?: (__request_module(true, mod), (x)))
void init_kmod_umh(void);
+unsigned int get_kmod_umh_limit(void);
+int sysctl_kmod_count(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
+int sysctl_kmod_limit(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos);
#else
static inline int request_module(const char *name, ...) { return -ENOSYS; }
static inline int request_module_nowait(const char *name, ...) { return -ENOSYS; }
static inline void init_kmod_umh(void) { }
+static unsigned int get_kmod_umh_limit(void) { return 0; }
#define try_then_request_module(x, mod...) (x)
#endif
+#define get_kmod_umh_limit get_kmod_umh_limit
+
struct cred;
struct file;
diff --git a/kernel/kmod.c b/kernel/kmod.c
index ef65f4c3578a..a0f449f77ed7 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -138,6 +138,27 @@ static void kmod_umh_threads_put(void)
}
/**
+ * get_kmod_umh_limit - get concurrent modprobe thread limit
+ *
+ * Returns the number of allowed concurrent modprobe calls.
+ */
+unsigned int get_kmod_umh_limit(void)
+{
+ return max_modprobes;
+}
+EXPORT_SYMBOL_GPL(get_kmod_umh_limit);
+
+/**
+ * get_kmod_umh_count - get number of concurrent modprobe calls running
+ *
+ * Returns the number of concurrent modprobe calls currently running.
+ */
+int get_kmod_umh_count(void)
+{
+ return atomic_read(&kmod_concurrent);
+}
+
+/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
* @fmt: printf style format string for the name of the module
@@ -196,6 +217,11 @@ int __request_module(bool wait, const char *fmt, ...)
}
EXPORT_SYMBOL(__request_module);
+static void __set_max_modprobes(unsigned int suggested)
+{
+ max_modprobes = min((unsigned int) max_threads/2, suggested);
+}
+
/*
* If modprobe needs a service that is in a module, we get a recursive
* loop. Limit the number of running kmod threads to max_threads/2 or
@@ -212,12 +238,65 @@ EXPORT_SYMBOL(__request_module);
* 4096 concurrent modprobe instances:
*
* kmod.max_modprobes=4096
+ *
+ * You can also set the limit via sysctl:
+ *
+ * echo 4096 > /proc/sys/kernel/kmod-limit
+ *
+ * You can also set the query the current thread count:
+ *
+ * cat /proc/sys/kernel/kmod-count
+ *
+ * These knobs should enable userspace to more gracefully and
+ * deterministically handle module loading.
*/
void __init init_kmod_umh(void)
{
if (!max_modprobes)
- max_modprobes = min(max_threads/2,
- 2 << CONFIG_MAX_KMOD_CONCURRENT);
+ __set_max_modprobes(2 << CONFIG_MAX_KMOD_CONCURRENT);
+}
+
+int sysctl_kmod_count(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+ int ret = 0;
+ int count = get_kmod_umh_count();
+
+ t = *table;
+ t.data = &count;
+
+ if (write)
+ return -EPERM;
+
+ ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
+
+ return ret;
+}
+
+int sysctl_kmod_limit(struct ctl_table *table, int write,
+ void __user *buffer, size_t *lenp, loff_t *ppos)
+{
+ struct ctl_table t;
+ int ret;
+ unsigned int local_max_modprobes = max_modprobes;
+ unsigned int min = 0;
+ unsigned int max = max_threads/2;
+
+ t = *table;
+ t.data = &local_max_modprobes;
+ t.extra1 = &min;
+ t.extra2 = &max;
+
+ ret = proc_douintvec_minmax(&t, write, buffer, lenp, ppos);
+ if (ret == -ERANGE)
+ pr_err("modprobe thread valid range: %u - %u\n", min, max);
+ if (ret || !write)
+ return ret;
+
+ __set_max_modprobes((unsigned int) local_max_modprobes);
+
+ return 0;
}
#endif /* CONFIG_MODULES */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index 06711e648fa3..0ba56001e49b 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -660,6 +660,20 @@ static struct ctl_table kern_table[] = {
.extra1 = &one,
.extra2 = &one,
},
+ {
+ .procname = "kmod-count",
+ .data = NULL, /* filled in by handler */
+ .maxlen = sizeof(int),
+ .mode = 0444,
+ .proc_handler = sysctl_kmod_count,
+ },
+ {
+ .procname = "kmod-limit",
+ .data = NULL, /* filled in by handler */
+ .maxlen = sizeof(unsigned int),
+ .mode = 0644,
+ .proc_handler = sysctl_kmod_limit,
+ },
#endif
#ifdef CONFIG_UEVENT_HELPER
{
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 09/10] kmod: add helpers for getting kmod count and limit
2016-12-08 19:49 ` [RFC 09/10] kmod: add helpers for getting kmod count and limit Luis R. Rodriguez
@ 2016-12-15 16:56 ` Petr Mladek
2016-12-16 7:57 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Petr Mladek @ 2016-12-15 16:56 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, hare, rwright, jeffm, DSterba, fdmanana,
neilb, linux, rgoldwyn, subashab, xypron.glpk, keescook, atomlin,
mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo, akpm,
torvalds, linux-kselftest, linux-doc, linux-kernel
On Thu 2016-12-08 11:49:20, Luis R. Rodriguez wrote:
> This adds helpers for getting access to the kmod count and limit from
> userspace. While at it, this also lets userspace fine tune the kmod
> limit after boot, it uses the shiny new proc_douintvec_minmax().
>
> These knobs should help userspace more gracefully and deterministically
> handle module loading.
>
> Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> ---
> include/linux/kmod.h | 8 +++++
> kernel/kmod.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> kernel/sysctl.c | 14 +++++++++
> 3 files changed, 103 insertions(+), 2 deletions(-)
I am not sure if it is worth it. As you say in the 3rd patch,
there was rather low limit for 16 years and nobody probably had
problems with it.
Anyway, it seems that such know should also get documented in
Documentation/sysctl/kernel.txt
Best Regards,
Petr
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 09/10] kmod: add helpers for getting kmod count and limit
2016-12-15 16:56 ` Petr Mladek
@ 2016-12-16 7:57 ` Luis R. Rodriguez
2017-01-11 18:27 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-16 7:57 UTC (permalink / raw)
To: Petr Mladek
Cc: Luis R. Rodriguez, shuah, jeyu, rusty, ebiederm, dmitry.torokhov,
acme, corbet, martin.wilck, mmarek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Thu, Dec 15, 2016 at 05:56:19PM +0100, Petr Mladek wrote:
> On Thu 2016-12-08 11:49:20, Luis R. Rodriguez wrote:
> > This adds helpers for getting access to the kmod count and limit from
> > userspace. While at it, this also lets userspace fine tune the kmod
> > limit after boot, it uses the shiny new proc_douintvec_minmax().
> >
> > These knobs should help userspace more gracefully and deterministically
> > handle module loading.
> >
> > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > ---
> > include/linux/kmod.h | 8 +++++
> > kernel/kmod.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> > kernel/sysctl.c | 14 +++++++++
> > 3 files changed, 103 insertions(+), 2 deletions(-)
>
> I am not sure if it is worth it. As you say in the 3rd patch,
> there was rather low limit for 16 years and nobody probably had
> problems with it.
Note, *probably* - ie, this could have gone unreported for a while, and
to be frank how can we know for sure a pesky module just did not load due
to this? In the case of get_fs_type() issue this can be fatal for a partition
mount, not a good example to wait to look forward to before we take this
serious.
I added the sysctl value mostly for read purposes, the count is probably
useless for any accounting to be done in userspace due to delays this
reading and making this value useful in userspace can have, I can nuke
that. The kmod-limit however seems very useful so that userspace knows
how to properly thread *safely* modprobe calls more deterministically.
Adding write support to let one bump the limit was just an easy convenience
possible given the read support was being added, but its use should
really only be useful for testing purposes post bootup given that the
real value in the limit will be important at boot time prior to the sysctl
parsing. The real know tweak which should be used in case of issues is
the module parameter added earlier.
So I could drop the kmod-count, and just make the kmod-limit read-only.
Thoughts?
> Anyway, it seems that such know should also get documented in
> Documentation/sysctl/kernel.txt
Will do if we keep them, thanks.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 09/10] kmod: add helpers for getting kmod count and limit
2016-12-16 7:57 ` Luis R. Rodriguez
@ 2017-01-11 18:27 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-11 18:27 UTC (permalink / raw)
To: Luis R. Rodriguez, Tom Gundersen
Cc: Petr Mladek, shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme,
corbet, martin.wilck, mmarek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Fri, Dec 16, 2016 at 08:57:26AM +0100, Luis R. Rodriguez wrote:
> On Thu, Dec 15, 2016 at 05:56:19PM +0100, Petr Mladek wrote:
> > On Thu 2016-12-08 11:49:20, Luis R. Rodriguez wrote:
> > > This adds helpers for getting access to the kmod count and limit from
> > > userspace. While at it, this also lets userspace fine tune the kmod
> > > limit after boot, it uses the shiny new proc_douintvec_minmax().
> > >
> > > These knobs should help userspace more gracefully and deterministically
> > > handle module loading.
> > >
> > > Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
> > > ---
> > > include/linux/kmod.h | 8 +++++
> > > kernel/kmod.c | 83 ++++++++++++++++++++++++++++++++++++++++++++++++++--
> > > kernel/sysctl.c | 14 +++++++++
> > > 3 files changed, 103 insertions(+), 2 deletions(-)
> >
> > I am not sure if it is worth it. As you say in the 3rd patch,
> > there was rather low limit for 16 years and nobody probably had
> > problems with it.
>
> Note, *probably* - ie, this could have gone unreported for a while, and
> to be frank how can we know for sure a pesky module just did not load due
> to this? In the case of get_fs_type() issue this can be fatal for a partition
> mount, not a good example to wait to look forward to before we take this
> serious.
>
> I added the sysctl value mostly for read purposes, the count is probably
> useless for any accounting to be done in userspace due to delays this
> reading and making this value useful in userspace can have, I can nuke
> that. The kmod-limit however seems very useful so that userspace knows
> how to properly thread *safely* modprobe calls more deterministically.
>
> Adding write support to let one bump the limit was just an easy convenience
> possible given the read support was being added, but its use should
> really only be useful for testing purposes post bootup given that the
> real value in the limit will be important at boot time prior to the sysctl
> parsing. The real know tweak which should be used in case of issues is
> the module parameter added earlier.
>
> So I could drop the kmod-count, and just make the kmod-limit read-only.
> Thoughts?
OK I've done this and also since there was confusion about dependencies
possibly affecting kmod_concurrent I've added a note about this on the
Documentation/sysctl/kernel.txt documentation. This documentation also
clarifies the intent behind exposing this interface, which is to help
enable userspace make using modprobe more deterministic (its why I've
Cc'd Tom). The following changes have been made, and I'll fold this into this
patch and rename the title.
> > Anyway, it seems that such know should also get documented in
> > Documentation/sysctl/kernel.txt
>
> Will do if we keep them, thanks.
Below are the changes I've made:
diff --git a/Documentation/sysctl/kernel.txt b/Documentation/sysctl/kernel.txt
index a32b4b748644..c82aeaf60ca7 100644
--- a/Documentation/sysctl/kernel.txt
+++ b/Documentation/sysctl/kernel.txt
@@ -370,6 +370,26 @@ with the "modules_disabled" sysctl.
==============================================================
+kmod-limit:
+
+Get the max amount of concurrent requests (kmod_concurrent) the kernel can
+make out to userspace to call 'modprobe'. This limit is known internally to the
+kernel as max_modprobes. This interface is designed to enable userspace to
+query the kernel for the max_modprobes limit so userspace can more
+deterministically handle module loading by only enabling max_modprobes
+'modprobe' calls at a time.
+
+Dependencies are resolved in userspace through depmod, so one modprobe
+call only bumps the number of concurrent threads (kmod_concurrent) by one.
+Dependencies for a module then are loaded directly in userspace using
+init_module() / finit_module() skipping bumping kmod_concurrent or being
+affected by max_modprobes.
+
+The max_modprobes value is set at build time with CONFIG_MAX_KMOD_CONCURRENT.
+You can override at initialization with the module parameter max_modprobes.
+
+==============================================================
+
kptr_restrict:
This toggle indicates whether restrictions are placed on
diff --git a/include/linux/kmod.h b/include/linux/kmod.h
index c30d797fe4d3..1ee833e5896d 100644
--- a/include/linux/kmod.h
+++ b/include/linux/kmod.h
@@ -40,8 +40,6 @@ int __request_module(bool wait, const char *name, ...);
((x) ?: (__request_module(true, mod), (x)))
void init_kmod_umh(void);
unsigned int get_kmod_umh_limit(void);
-int sysctl_kmod_count(struct ctl_table *table, int write,
- void __user *buffer, size_t *lenp, loff_t *ppos);
int sysctl_kmod_limit(struct ctl_table *table, int write,
void __user *buffer, size_t *lenp, loff_t *ppos);
#else
diff --git a/kernel/kmod.c b/kernel/kmod.c
index f2fd9f088278..0303bce326b8 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -158,16 +158,6 @@ unsigned int get_kmod_umh_limit(void)
EXPORT_SYMBOL_GPL(get_kmod_umh_limit);
/**
- * get_kmod_umh_count - get number of concurrent modprobe calls running
- *
- * Returns the number of concurrent modprobe calls currently running.
- */
-int get_kmod_umh_count(void)
-{
- return atomic_read(&kmod_concurrent);
-}
-
-/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
* @fmt: printf style format string for the name of the module
@@ -226,11 +216,6 @@ int __request_module(bool wait, const char *fmt, ...)
}
EXPORT_SYMBOL(__request_module);
-static void __set_max_modprobes(unsigned int suggested)
-{
- max_modprobes = min((unsigned int) max_threads/2, suggested);
-}
-
/*
* If modprobe needs a service that is in a module, we get a recursive
* loop. Limit the number of running kmod threads to max_threads/2 or
@@ -247,40 +232,12 @@ static void __set_max_modprobes(unsigned int suggested)
* 4096 concurrent modprobe instances:
*
* kmod.max_modprobes=4096
- *
- * You can also set the limit via sysctl:
- *
- * echo 4096 > /proc/sys/kernel/kmod-limit
- *
- * You can also set the query the current thread count:
- *
- * cat /proc/sys/kernel/kmod-count
- *
- * These knobs should enable userspace to more gracefully and
- * deterministically handle module loading.
*/
void __init init_kmod_umh(void)
{
if (!max_modprobes)
- __set_max_modprobes(1 << CONFIG_MAX_KMOD_CONCURRENT);
-}
-
-int sysctl_kmod_count(struct ctl_table *table, int write,
- void __user *buffer, size_t *lenp, loff_t *ppos)
-{
- struct ctl_table t;
- int ret = 0;
- int count = get_kmod_umh_count();
-
- t = *table;
- t.data = &count;
-
- if (write)
- return -EPERM;
-
- ret = proc_dointvec_minmax(&t, write, buffer, lenp, ppos);
-
- return ret;
+ max_modprobes = min(max_threads/2,
+ 1 << CONFIG_MAX_KMOD_CONCURRENT);
}
int sysctl_kmod_limit(struct ctl_table *table, int write,
@@ -297,15 +254,12 @@ int sysctl_kmod_limit(struct ctl_table *table, int write,
t.extra1 = &min;
t.extra2 = &max;
- ret = proc_douintvec_minmax(&t, write, buffer, lenp, ppos);
- if (ret == -ERANGE)
- pr_err("modprobe thread valid range: %u - %u\n", min, max);
- if (ret || !write)
- return ret;
+ if (write)
+ return -EPERM;
- __set_max_modprobes((unsigned int) local_max_modprobes);
+ ret = proc_douintvec_minmax(&t, write, buffer, lenp, ppos);
- return 0;
+ return ret;
}
#endif /* CONFIG_MODULES */
diff --git a/kernel/sysctl.c b/kernel/sysctl.c
index d59cca78417a..52cf84131f74 100644
--- a/kernel/sysctl.c
+++ b/kernel/sysctl.c
@@ -661,17 +661,10 @@ static struct ctl_table kern_table[] = {
.extra2 = &one,
},
{
- .procname = "kmod-count",
- .data = NULL, /* filled in by handler */
- .maxlen = sizeof(int),
- .mode = 0444,
- .proc_handler = sysctl_kmod_count,
- },
- {
.procname = "kmod-limit",
.data = NULL, /* filled in by handler */
.maxlen = sizeof(unsigned int),
- .mode = 0644,
+ .mode = 0444,
.proc_handler = sysctl_kmod_limit,
},
#endif
^ permalink raw reply related [flat|nested] 65+ messages in thread
* [RFC 10/10] kmod: add a sanity check on module loading
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (8 preceding siblings ...)
2016-12-08 19:49 ` [RFC 09/10] kmod: add helpers for getting kmod count and limit Luis R. Rodriguez
@ 2016-12-08 19:49 ` Luis R. Rodriguez
2016-12-09 20:03 ` Martin Wilck
` (2 more replies)
2017-01-11 19:10 ` [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
10 siblings, 3 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-08 19:49 UTC (permalink / raw)
To: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
kmod has an optimization in place whereby if a some kernel code
uses request_module() on a module already loaded we never bother
userspace as the module already is loaded. This is not true for
get_fs_type() though as it uses aliases.
Additionally kmod <= v19 was broken -- it returns 0 to modprobe calls,
assuming the kernel module is built-in, where really we have a race as
the module starts forming. kmod <= v19 has incorrect userspace heuristics,
a userspace kmod fix is available for it:
http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4
This changes kmod to address both:
o Provides the alias optimization for get_fs_type() so modules already
loaded do not get re-requested.
o Provides a sanity test to verify modprobe's work
This is important given how any get_fs_type() users assert success
means we're ready to go, and tests with the new test_kmod stress driver
reveal that request_module() and get_fs_type() might fail for a few
other reasons. You don't need old kmod to fail on request_module() or
get_fs_type(), with the right system setup, these calls *can* fail
today.
Although this does get us in the business of keeping alias maps in
kernel, the the work to support and maintain this is trivial.
Aditionally, since it may be important get_fs_type() should not fail on
certain systems, this tightens things up a bit more.
The TL;DR:
kmod <= v19 will return 0 on modprobe calls if you are built-in,
however its heuristics for checking if you are built-in were broken.
It assumed that having the directory /sys/module/module-name
but not having the file /sys/module/module-name/initstate
is sufficient to assume a module is built-in.
The kernel loads the inittstate attribute *after* it creates the
directory. This is an issue when modprobe returns 0 for kernel calls
which assumes a return of 0 on request_module() can give you the
right to assert the module is loaded and live.
We cannot trust returns of modprobe as 0 in the kernel, we need to
verify that modules are live if modprobe return 0 but only if modules
*are* modules. The kernel heuristic we use to determine if a module is
built-in is that if modprobe returns 0 we know we must be built-in or
a module, but if we are a module clearly we must have a lingering kmod
dangling on our linked list. If there is no modules there we are *somewhat*
certain the module must be built in.
This is not enough though... we cannot easily work around this since the
kernel can use aliases to userspace for modules calls. For instance
fs/namespace.c uses fs-modulename for filesystesms on get_fs_type(), so
these need to be taken into consideration as well.
Using kmod <= 19 will give you a NULL get_fs_type() return even though
the module was loaded... That is a corner case, there are other failures
for request_module() though -- the other failures are not easy to
reproduce though but fortunately we have a stress test driver to help
with that now. Use the following tests:
# tools/testing/selftests/kmod/kmod.sh -t 0008
# tools/testing/selftests/kmod/kmod.sh -t 0009
You can more easily see this error if you have kmod <= v19 installed.
You will need to install kmod <= v19, be sure to install its modprobe
into /sbin/ as by default the 'make install' target does not replace
your own.
This test helps cure test_kmod cases 0008 0009 so enable them.
Reported-by: Martin Wilck <martin.wilck@suse.com>
Reported-by: Randy Wright <rwright@hpe.com>
Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
---
kernel/kmod.c | 73 ++++++++++++++++++++++++++++++++++++
kernel/module.c | 11 ++++--
tools/testing/selftests/kmod/kmod.sh | 9 ++---
3 files changed, 85 insertions(+), 8 deletions(-)
diff --git a/kernel/kmod.c b/kernel/kmod.c
index a0f449f77ed7..6bf0feab41d1 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -61,6 +61,11 @@ static DECLARE_RWSEM(umhelper_sem);
#ifdef CONFIG_MODULES
+bool finished_loading(const char *name);
+int module_wait_until_finished(const char *name);
+struct module *find_module_all(const char *name, size_t len,
+ bool even_unformed);
+
/*
modprobe_path is set via /proc/sys.
*/
@@ -158,6 +163,72 @@ int get_kmod_umh_count(void)
return atomic_read(&kmod_concurrent);
}
+static bool kmod_exists(char *name)
+{
+ struct module *mod;
+
+ mutex_lock(&module_mutex);
+ mod = find_module_all(name, strlen(name), true);
+ mutex_unlock(&module_mutex);
+
+ if (mod)
+ return true;
+
+ return false;
+}
+
+/*
+ * The assumption is this must be a module, it could still not be live though
+ * since kmod <= 19 returns 0 even if it was not ready yet. Allow for force
+ * wait check in case you are stuck on old userspace.
+ */
+static int wait_for_kmod(char *name)
+{
+ int ret = 0;
+
+ if (!finished_loading(name))
+ ret = module_wait_until_finished(name);
+
+ return ret;
+}
+
+/*
+ * kmod <= 19 will tell us modprobe returned 0 even if the module
+ * is not ready yet, it does this because it checks the /sys/module/mod-name
+ * directory and if its created but the /sys/module/mod-name/initstate is not
+ * created it assumes you have a built-in driver. At this point the module
+ * is still unformed, and telling the kernel at any point via request_module()
+ * will cause issues given a lot of places in the kernel assert that the driver
+ * will be present and ready. We need to account for this.
+ *
+ * If we had a module and even if buggy modprobe returned 0, we know we'd at
+ * least have a dangling kmod entry we could fetch.
+ *
+ * If modprobe returned 0 and we cannot find a kmod entry this is a good
+ * indicator your by userspace and kernel space that what you have is built-in.
+ *
+ * If modprobe returned 0 and we can find a kmod entry we should air on the
+ * side of caution and wait for the module to become ready or going.
+ *
+ * In the worst case, for built-in, we have to check on the module list for
+ * as many aliases possible the kernel gives the module, if that is n, that
+ * n traversals on the module list.
+ */
+static int finished_kmod_load(char *name)
+{
+ int ret = 0;
+ bool is_fs = (strlen(name) > 3) && (strncmp(name, "fs-", 3) == 0);
+
+ if (kmod_exists(name)) {
+ ret = wait_for_kmod(name);
+ } else {
+ if (is_fs && kmod_exists(name + 3))
+ ret = wait_for_kmod(name + 3);
+ }
+
+ return ret;
+}
+
/**
* __request_module - try to load a kernel module
* @wait: wait (or not) for the operation to complete
@@ -211,6 +282,8 @@ int __request_module(bool wait, const char *fmt, ...)
trace_module_request(module_name, wait, _RET_IP_);
ret = call_modprobe(module_name, wait ? UMH_WAIT_PROC : UMH_WAIT_EXEC);
+ if (!ret)
+ ret = finished_kmod_load(module_name);
kmod_umh_threads_put();
return ret;
diff --git a/kernel/module.c b/kernel/module.c
index e420ed67e533..bf854321dca0 100644
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -590,8 +590,8 @@ EXPORT_SYMBOL_GPL(find_symbol);
* Search for module by name: must hold module_mutex (or preempt disabled
* for read-only access).
*/
-static struct module *find_module_all(const char *name, size_t len,
- bool even_unformed)
+struct module *find_module_all(const char *name, size_t len,
+ bool even_unformed)
{
struct module *mod;
@@ -3325,7 +3325,7 @@ static int post_relocation(struct module *mod, const struct load_info *info)
}
/* Is this module of this name done loading? No locks held. */
-static bool finished_loading(const char *name)
+bool finished_loading(const char *name)
{
struct module *mod;
bool ret;
@@ -3486,6 +3486,11 @@ static int may_init_module(void)
return 0;
}
+int module_wait_until_finished(const char *name)
+{
+ return wait_event_interruptible(module_wq, finished_loading(name));
+}
+
/*
* We try to place it in the list now to make sure it's unique before
* we dedicate too many resources. In particular, temporary percpu
diff --git a/tools/testing/selftests/kmod/kmod.sh b/tools/testing/selftests/kmod/kmod.sh
index 9ea1864d8bae..ccf35b8d1671 100755
--- a/tools/testing/selftests/kmod/kmod.sh
+++ b/tools/testing/selftests/kmod/kmod.sh
@@ -382,7 +382,7 @@ kmod_test_0008()
let EXTRA=$MODPROBE_LIMIT/2
config_num_thread_limit_extra $EXTRA
config_trigger ${FUNCNAME[0]}
- config_expect_result ${FUNCNAME[0]} -EINVAL
+ config_expect_result ${FUNCNAME[0]} SUCCESS
}
kmod_test_0009()
@@ -392,7 +392,7 @@ kmod_test_0009()
#let EXTRA=$MODPROBE_LIMIT/3
config_num_thread_limit_extra 5
config_trigger ${FUNCNAME[0]}
- config_expect_result ${FUNCNAME[0]} -EINVAL
+ config_expect_result ${FUNCNAME[0]} SUCCESS
}
trap "test_finish" EXIT
@@ -442,8 +442,7 @@ kmod_test_0004
kmod_test_0005
kmod_test_0006
kmod_test_0007
-
-#kmod_test_0008
-#kmod_test_0009
+kmod_test_0008
+kmod_test_0009
exit 0
--
2.10.1
^ permalink raw reply related [flat|nested] 65+ messages in thread
* Re: [RFC 10/10] kmod: add a sanity check on module loading
2016-12-08 19:49 ` [RFC 10/10] kmod: add a sanity check on module loading Luis R. Rodriguez
@ 2016-12-09 20:03 ` Martin Wilck
2016-12-09 20:56 ` Linus Torvalds
2016-12-15 0:27 ` Rusty Russell
2017-01-04 2:47 ` Jessica Yu
2 siblings, 1 reply; 65+ messages in thread
From: Martin Wilck @ 2016-12-09 20:03 UTC (permalink / raw)
To: Luis R. Rodriguez, shuah, jeyu, rusty, ebiederm, dmitry.torokhov,
acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
>
> Although this does get us in the business of keeping alias maps in
> kernel, the the work to support and maintain this is trivial.
You've implemented a special treatment for request_module("fs-$X")in
finished_kmod_load(), but there are many more aliases defined (and
used) in the kernel. Do you plan to implement special code for "char-
major-$X", "crypto-$X", "binfmt-$X" etc. later?
Regards
Martin
--
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 10/10] kmod: add a sanity check on module loading
2016-12-09 20:03 ` Martin Wilck
@ 2016-12-09 20:56 ` Linus Torvalds
2016-12-15 18:08 ` Luis R. Rodriguez
0 siblings, 1 reply; 65+ messages in thread
From: Linus Torvalds @ 2016-12-09 20:56 UTC (permalink / raw)
To: Martin Wilck
Cc: Luis R. Rodriguez, shuah, Jessica Yu, Rusty Russell,
Eric W. Biederman, Dmitry Torokhov, Arnaldo Carvalho de Melo,
Jonathan Corbet, martin.wilck, Michal Marek, Petr Mladek,
Hannes Reinecke, rwright, Jeff Mahoney, David Sterba, fdmanana,
NeilBrown, Guenter Roeck, Goldwyn Rodrigues, subashab,
Heinrich Schuchardt, Kees Cook, atomlin, mbenes, Paul McKenney,
Dan Williams, Josh Poimboeuf, David Miller, Ingo Molnar,
Andrew Morton, linux-kselftest, open list:DOCUMENTATION,
Linux Kernel Mailing List
On Fri, Dec 9, 2016 at 12:03 PM, Martin Wilck <mwilck@suse.com> wrote:
> On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
>>
>> Although this does get us in the business of keeping alias maps in
>> kernel, the the work to support and maintain this is trivial.
>
> You've implemented a special treatment for request_module("fs-$X")in
> finished_kmod_load(), but there are many more aliases defined (and
> used) in the kernel. Do you plan to implement special code for "char-
> major-$X", "crypto-$X", "binfmt-$X" etc. later?
Yeah, no, that is just complete garbage.
Those module aliases already exist in the module info section. We just
don't parse the alias tags in the kernel.
So the real fix is to make find_module_all() just do that.
Doing random ad-hoc "let's prefix with 'fs-xyz'" games are completely
unacceptable. That's just pure shit. Stop this idiocy.
Linus
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 10/10] kmod: add a sanity check on module loading
2016-12-09 20:56 ` Linus Torvalds
@ 2016-12-15 18:08 ` Luis R. Rodriguez
0 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-15 18:08 UTC (permalink / raw)
To: Linus Torvalds
Cc: Martin Wilck, Luis R. Rodriguez, shuah, Jessica Yu,
Rusty Russell, Eric W. Biederman, Dmitry Torokhov,
Arnaldo Carvalho de Melo, Jonathan Corbet, martin.wilck,
Michal Marek, Petr Mladek, Hannes Reinecke, rwright,
Jeff Mahoney, David Sterba, fdmanana, NeilBrown, Guenter Roeck,
Goldwyn Rodrigues, subashab, Heinrich Schuchardt, Kees Cook,
atomlin, mbenes, Paul McKenney, Dan Williams, Josh Poimboeuf,
David Miller, Ingo Molnar, Andrew Morton, linux-kselftest,
open list:DOCUMENTATION, Linux Kernel Mailing List
On Fri, Dec 09, 2016 at 12:56:21PM -0800, Linus Torvalds wrote:
> On Fri, Dec 9, 2016 at 12:03 PM, Martin Wilck <mwilck@suse.com> wrote:
> > On Thu, 2016-12-08 at 11:49 -0800, Luis R. Rodriguez wrote:
> >>
> >> Although this does get us in the business of keeping alias maps in
> >> kernel, the the work to support and maintain this is trivial.
> >
> > You've implemented a special treatment for request_module("fs-$X")in
> > finished_kmod_load(), but there are many more aliases defined (and
> > used) in the kernel. Do you plan to implement special code for "char-
> > major-$X", "crypto-$X", "binfmt-$X" etc. later?
>
> Yeah, no, that is just complete garbage.
>
> Those module aliases already exist in the module info section. We just
> don't parse the alias tags in the kernel.
>
> So the real fix is to make find_module_all() just do that.
Ah yes, that is much sexier, this is now done and it works nicely, thanks
for the suggestion.
> Doing random ad-hoc "let's prefix with 'fs-xyz'" games are completely
> unacceptable. That's just pure shit. Stop this idiocy.
Look at that fin DNA in action :)
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 10/10] kmod: add a sanity check on module loading
2016-12-08 19:49 ` [RFC 10/10] kmod: add a sanity check on module loading Luis R. Rodriguez
2016-12-09 20:03 ` Martin Wilck
@ 2016-12-15 0:27 ` Rusty Russell
2016-12-16 8:31 ` Luis R. Rodriguez
2017-01-04 2:47 ` Jessica Yu
2 siblings, 1 reply; 65+ messages in thread
From: Rusty Russell @ 2016-12-15 0:27 UTC (permalink / raw)
To: Luis R. Rodriguez, shuah, jeyu, ebiederm, dmitry.torokhov, acme, corbet
Cc: martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel, Luis R. Rodriguez
"Luis R. Rodriguez" <mcgrof@kernel.org> writes:
> kmod has an optimization in place whereby if a some kernel code
> uses request_module() on a module already loaded we never bother
> userspace as the module already is loaded. This is not true for
> get_fs_type() though as it uses aliases.
Well, the obvious thing to do here is block kmod if we're currently
loading the same module. Otherwise it has to do some weird spinning
thing in userspace anyway.
We already have module_wq for this, we just need a bit more code to
share the return value; and there's a weird corner case there where we
have "modprobe foo param=invalid" then "modprobe foo param=valid" and we
fail both with -EINVAL, but it's probably not worth fixing.
Cheers,
Rusty.
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 10/10] kmod: add a sanity check on module loading
2016-12-15 0:27 ` Rusty Russell
@ 2016-12-16 8:31 ` Luis R. Rodriguez
2016-12-17 3:54 ` Rusty Russell
0 siblings, 1 reply; 65+ messages in thread
From: Luis R. Rodriguez @ 2016-12-16 8:31 UTC (permalink / raw)
To: Rusty Russell
Cc: Luis R. Rodriguez, shuah, jeyu, ebiederm, dmitry.torokhov, acme,
corbet, martin.wilck, mmarek, pmladek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
On Thu, Dec 15, 2016 at 10:57:42AM +1030, Rusty Russell wrote:
> "Luis R. Rodriguez" <mcgrof@kernel.org> writes:
> > kmod has an optimization in place whereby if a some kernel code
> > uses request_module() on a module already loaded we never bother
> > userspace as the module already is loaded. This is not true for
> > get_fs_type() though as it uses aliases.
>
> Well, the obvious thing to do here is block kmod if we're currently
> loading the same module.
OK thanks, I've now added this, it sure helps. Test cases 0008 and 0009 require
hammering on the test over and over to see a failure on vanilla kernels,
an upper bound I found was about 150 times each test. Running test 0008
150 times with this enhancement you mentioned shaves off ~4 seconds.
For test 0009 it shaves off ~16 seconds, but as I note below the alias support
was needed as well.
> Otherwise it has to do some weird spinning
> thing in userspace anyway.
Right, but note that the get_fs_type() tests would still fail given
module.c was not alias-aware yet. I have the patches to add support
for the aliases now though and this is part of what helped shave
off time from the tests.
> We already have module_wq for this, we just need a bit more code to
> share the return value; and there's a weird corner case there where we
> have "modprobe foo param=invalid" then "modprobe foo param=valid" and we
> fail both with -EINVAL, but it's probably not worth fixing.
Hm OK. Although the set of patches I have fix and optimize now some
of these corner cases one issue that I still didn't quite yet figure
out was that a failure propagates secondary failures. That is,
say a module fails and you have loaded 4 request for the same module,
if the first request failed the last 3 *could* also fail. You can
trigger and see this with the latest script:
http://drvbp1.linux-foundation.org/~mcgrof/2016/12/16/kmod.sh
The latest version of the test_kmod driver:
http://drvbp1.linux-foundation.org/~mcgrof/2016/12/16/test_kmod.patch
./kmod.sh -t 0008
./kmod.sh -t 0009
When either of these fail you'll on dmesg that either a few NULL or
errors were found. It may not be worth fixing this race... given
that after apply all of my patches I no longer see this at all,
but I'm pretty sure a test case can be created to replicate more
easily.
FWIW a few things did occur to me:
a) list_add_rcu() is used so new modules get added first
b) find_module_all() returns the last module which was added as it traverses
the module list
Because of a) and b) if two modules for the same driver can be on
the list at the same time then we'll get very likely a module which
is unformed or going than a live module. Changing module addition
to use list_add_tail_rcu() should mean we typically get the first
module added to the list for the module name I think, but other
than that I could not think clearly of the root case to allowing
multiple errors.
BTW should find_module_all() use rcu to traverse?
--- a/kernel/module.c
+++ b/kernel/module.c
@@ -594,7 +594,7 @@ static struct module *find_module_all(const char *name, size_t len,
module_assert_mutex_or_preempt();
- list_for_each_entry(mod, &modules, list) {
+ list_for_each_entry_rcu(mod, &modules, list) {
if (!even_unformed && mod->state == MODULE_STATE_UNFORMED)
continue;
if (strlen(mod->name) == len && !memcmp(mod->name, name, len))
@@ -3532,7 +3532,7 @@ static int add_unformed_module(struct module *mod)
goto out;
}
mod_update_bounds(mod);
- list_add_rcu(&mod->list, &modules);
+ list_add_tail_rcu(&mod->list, &modules);
mod_tree_insert(mod);
err = 0;
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 10/10] kmod: add a sanity check on module loading
2016-12-16 8:31 ` Luis R. Rodriguez
@ 2016-12-17 3:54 ` Rusty Russell
[not found] ` <CAB=NE6VvuA9a6hf6yoopGfUxVJQM5HyV5bNzUdsEtUV0UhbG-g@mail.gmail.com>
0 siblings, 1 reply; 65+ messages in thread
From: Rusty Russell @ 2016-12-17 3:54 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Luis R. Rodriguez, shuah, jeyu, ebiederm, dmitry.torokhov, acme,
corbet, martin.wilck, mmarek, pmladek, hare, rwright, jeffm,
DSterba, fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
"Luis R. Rodriguez" <mcgrof@kernel.org> writes:
> On Thu, Dec 15, 2016 at 10:57:42AM +1030, Rusty Russell wrote:
>> "Luis R. Rodriguez" <mcgrof@kernel.org> writes:
>> > kmod has an optimization in place whereby if a some kernel code
>> > uses request_module() on a module already loaded we never bother
>> > userspace as the module already is loaded. This is not true for
>> > get_fs_type() though as it uses aliases.
>>
>> Well, the obvious thing to do here is block kmod if we're currently
>> loading the same module.
>
> OK thanks, I've now added this, it sure helps. Test cases 0008 and 0009 require
> hammering on the test over and over to see a failure on vanilla kernels,
> an upper bound I found was about 150 times each test. Running test 0008
> 150 times with this enhancement you mentioned shaves off ~4 seconds.
> For test 0009 it shaves off ~16 seconds, but as I note below the alias support
> was needed as well.
>
>> Otherwise it has to do some weird spinning
>> thing in userspace anyway.
>
> Right, but note that the get_fs_type() tests would still fail given
> module.c was not alias-aware yet.
AFAICT the mistake here is that kmod is returning "done, OK" when the
module it is trying to load is already loading (but not finished
loading). That's the root problem; it's an attempt at optimization by
kmod which goes awry.
Looking at the code in the kernel, we *already* get this right: block if
a module is still loading anyway. Once it succeeds we return -EBUSY; if
it fails we'll proceed to try to load it again.
I don't understand what you're trying to fix with adding aliases
in-kernel?
> FWIW a few things did occur to me:
>
> a) list_add_rcu() is used so new modules get added first
Only after we're sure that there are no duplicates.
> b) find_module_all() returns the last module which was added as it traverses
> the module list
> BTW should find_module_all() use rcu to traverse?
Yes; the kallsyms code does this on Oops. Not really a big issue in
practice, but a nice fix.
Thanks,
Rusty.
>
> --- a/kernel/module.c
> +++ b/kernel/module.c
> @@ -594,7 +594,7 @@ static struct module *find_module_all(const char *name, size_t len,
>
> module_assert_mutex_or_preempt();
>
> - list_for_each_entry(mod, &modules, list) {
> + list_for_each_entry_rcu(mod, &modules, list) {
> if (!even_unformed && mod->state == MODULE_STATE_UNFORMED)
> continue;
> if (strlen(mod->name) == len && !memcmp(mod->name, name, len))
> @@ -3532,7 +3532,7 @@ static int add_unformed_module(struct module *mod)
> goto out;
> }
> mod_update_bounds(mod);
> - list_add_rcu(&mod->list, &modules);
> + list_add_tail_rcu(&mod->list, &modules);
> mod_tree_insert(mod);
> err = 0;
>
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: kmod: add a sanity check on module loading
2016-12-08 19:49 ` [RFC 10/10] kmod: add a sanity check on module loading Luis R. Rodriguez
2016-12-09 20:03 ` Martin Wilck
2016-12-15 0:27 ` Rusty Russell
@ 2017-01-04 2:47 ` Jessica Yu
2 siblings, 0 replies; 65+ messages in thread
From: Jessica Yu @ 2017-01-04 2:47 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, linux, rgoldwyn, subashab, xypron.glpk,
keescook, atomlin, mbenes, paulmck, dan.j.williams, jpoimboe,
davem, mingo, akpm, torvalds, linux-kselftest, linux-doc,
linux-kernel
+++ Luis R. Rodriguez [08/12/16 11:49 -0800]:
>kmod has an optimization in place whereby if a some kernel code
>uses request_module() on a module already loaded we never bother
>userspace as the module already is loaded. This is not true for
>get_fs_type() though as it uses aliases.
>
>Additionally kmod <= v19 was broken -- it returns 0 to modprobe calls,
>assuming the kernel module is built-in, where really we have a race as
>the module starts forming. kmod <= v19 has incorrect userspace heuristics,
>a userspace kmod fix is available for it:
>
>http://git.kernel.org/cgit/utils/kernel/kmod/kmod.git/commit/libkmod/libkmod-module.c?id=fd44a98ae2eb5eb32161088954ab21e58e19dfc4
>
>This changes kmod to address both:
>
> o Provides the alias optimization for get_fs_type() so modules already
> loaded do not get re-requested.
>
> o Provides a sanity test to verify modprobe's work
>
>This is important given how any get_fs_type() users assert success
>means we're ready to go, and tests with the new test_kmod stress driver
>reveal that request_module() and get_fs_type() might fail for a few
>other reasons. You don't need old kmod to fail on request_module() or
>get_fs_type(), with the right system setup, these calls *can* fail
>today.
>
>Although this does get us in the business of keeping alias maps in
>kernel, the the work to support and maintain this is trivial.
>Aditionally, since it may be important get_fs_type() should not fail on
>certain systems, this tightens things up a bit more.
>
>The TL;DR:
>
>kmod <= v19 will return 0 on modprobe calls if you are built-in,
>however its heuristics for checking if you are built-in were broken.
>
>It assumed that having the directory /sys/module/module-name
>but not having the file /sys/module/module-name/initstate
>is sufficient to assume a module is built-in.
>
>The kernel loads the inittstate attribute *after* it creates the
>directory. This is an issue when modprobe returns 0 for kernel calls
>which assumes a return of 0 on request_module() can give you the
>right to assert the module is loaded and live.
>
>We cannot trust returns of modprobe as 0 in the kernel, we need to
>verify that modules are live if modprobe return 0 but only if modules
>*are* modules. The kernel heuristic we use to determine if a module is
>built-in is that if modprobe returns 0 we know we must be built-in or
>a module, but if we are a module clearly we must have a lingering kmod
>dangling on our linked list. If there is no modules there we are *somewhat*
>certain the module must be built in.
>
>This is not enough though... we cannot easily work around this since the
>kernel can use aliases to userspace for modules calls. For instance
>fs/namespace.c uses fs-modulename for filesystesms on get_fs_type(), so
>these need to be taken into consideration as well.
>
>Using kmod <= 19 will give you a NULL get_fs_type() return even though
>the module was loaded... That is a corner case, there are other failures
>for request_module() though -- the other failures are not easy to
>reproduce though but fortunately we have a stress test driver to help
>with that now. Use the following tests:
>
> # tools/testing/selftests/kmod/kmod.sh -t 0008
> # tools/testing/selftests/kmod/kmod.sh -t 0009
>
>You can more easily see this error if you have kmod <= v19 installed.
>
>You will need to install kmod <= v19, be sure to install its modprobe
>into /sbin/ as by default the 'make install' target does not replace
>your own.
>
>This test helps cure test_kmod cases 0008 0009 so enable them.
>
>Reported-by: Martin Wilck <martin.wilck@suse.com>
>Reported-by: Randy Wright <rwright@hpe.com>
>Signed-off-by: Luis R. Rodriguez <mcgrof@kernel.org>
Back from travel today, apologies for the delay. Will be able to give
this a proper look this week.
Jessica
^ permalink raw reply [flat|nested] 65+ messages in thread
* Re: [RFC 00/10] kmod: stress test driver, few fixes and enhancements
2016-12-08 18:47 [RFC 00/10] kmod: stress test driver, few fixes and enhancements Luis R. Rodriguez
` (9 preceding siblings ...)
2016-12-08 19:49 ` [RFC 10/10] kmod: add a sanity check on module loading Luis R. Rodriguez
@ 2017-01-11 19:10 ` Luis R. Rodriguez
10 siblings, 0 replies; 65+ messages in thread
From: Luis R. Rodriguez @ 2017-01-11 19:10 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: shuah, jeyu, rusty, ebiederm, dmitry.torokhov, acme, corbet,
martin.wilck, mmarek, pmladek, hare, rwright, jeffm, DSterba,
fdmanana, neilb, rgoldwyn, subashab, xypron.glpk, keescook,
atomlin, mbenes, paulmck, dan.j.williams, jpoimboe, davem, mingo,
akpm, torvalds, linux-kselftest, linux-doc, linux-kernel
On Thu, Dec 08, 2016 at 10:47:51AM -0800, Luis R. Rodriguez wrote:
> Upon running into an old kmod v19 issue with mount (get_fs_type()) a few of us
> hunted for the cause of the issue. Although the issue ended up being a
> userspace issue, a stress test driver was written to help reproduce the issue,
> and along the way a few other fixes and sanity checks were implemented.
>
> I've taken the time to generalize the stress test driver as a kselftest driver
> with a 9 test cases. The last two test cases reveal an existing issue which
> is not yet addressed upstream, even if you have kmod v19 present. A fix is
> proposed in the last patch. Orignally we had discarded this patch as too
> complex due to the alias handling, but upon further analysis of test cases
> and memory pressure issues, it seems worth considering. Other than the
> last patch I don't think much of the other patches are controversial, but
> sending as RFC first just in case.
>
> If its not clear, an end goal here is to make module loading a bit more
> deterministic with stronger sanity checks and stress tests. Please note,
> the stress test diver requires 4 GiB of RAM to run all tests without running
> out of memory. A lot of this has to do with the memory requirements needed
> for a dynamic test for multiple threads, but note that the final memory
> pressure and OOMs actually don't come from this allocation, but instead
> from many finit_module() calls, this consumes quite a bit of memory, specially
> if you have a lot of dependencies which also need to be loaded prior to
> your needed module -- as is the case for filesystem drivers.
>
> These patches are available on my linux-next git-tree on my branch
> 20161208-kmod-test-driver-try2 [0], which is based on linux-next tag
> next-20161208. Patches are also available based on v4.9-rc8 [1] for
> those looking for a bit more stable tree given x86_64 on linux-next is
> hosed at the moment.
>
> Since kmod.c doesn't seem to get much love, and since I've been digging
> quite a bit into it for other users (firmware) I suppose I could volunteer
> myself to maintain this code as well, unless there are oppositions to this.
>
> [0] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux-next.git/log/?h=20161208-kmod-test-driver-try2
> [1] https://git.kernel.org/cgit/linux/kernel/git/mcgrof/linux.git/log/?h=20161208-kmod-test-driver
>
> Luis R. Rodriguez (10):
> kmod: add test driver to stress test the module loader
> module: fix memory leak on early load_module() failures
> kmod: add dynamic max concurrent thread count
> kmod: provide wrappers for kmod_concurrent inc/dec
> kmod: return -EBUSY if modprobe limit is reached
> kmod: provide sanity check on kmod_concurrent access
> kmod: use simplified rate limit printk
> sysctl: add support for unsigned int properly
> kmod: add helpers for getting kmod count and limit
> kmod: add a sanity check on module loading
>
A lot of good discussions have come up form this, and so also
a few more patches. I'm going to split up the work into changes
which make sense now and leave debug work for a follow up later.
Luis
^ permalink raw reply [flat|nested] 65+ messages in thread