linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/3] zram: fix few sysfs races
@ 2021-07-03  0:19 Luis Chamberlain
  2021-07-03  0:19 ` [PATCH v6 1/3] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:19 UTC (permalink / raw)
  To: akpm, minchan, gregkh, jeyu, ngupta, sergey.senozhatsky.work, rafael
  Cc: mcgrof, axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos,
	rostedt, peterz, linux-block, linux-kernel

Andrew,

This v6 rebases on to linux-next tag next-20210701, and adds a new third
patch to use ATTRIBUTE_GROUPS. This goes run time tested with the LTP
zram02 script. As per Minchan's request, sending this through you.

Luis Chamberlain (3):
  zram: fix crashes with cpu hotplug multistate
  zram: fix deadlock with sysfs attribute usage and module removal
  zram: use ATTRIBUTE_GROUPS

 drivers/block/zram/zram_drv.c | 119 ++++++++++++++++++++++++----------
 drivers/block/zram/zram_drv.h |  54 +++++++++++++++
 2 files changed, 139 insertions(+), 34 deletions(-)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 1/3] zram: fix crashes with cpu hotplug multistate
  2021-07-03  0:19 [PATCH v6 0/3] zram: fix few sysfs races Luis Chamberlain
@ 2021-07-03  0:19 ` Luis Chamberlain
  2021-07-03  0:19 ` [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal Luis Chamberlain
  2021-07-03  0:19 ` [PATCH v6 3/3] zram: use ATTRIBUTE_GROUPS Luis Chamberlain
  2 siblings, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:19 UTC (permalink / raw)
  To: akpm, minchan, gregkh, jeyu, ngupta, sergey.senozhatsky.work, rafael
  Cc: mcgrof, axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos,
	rostedt, peterz, linux-block, linux-kernel

Provide a simple state machine to fix races with driver exit where we
remove the CPU multistate callbacks and re-initialization / creation of
new per CPU instances which should be managed by these callbacks.

The zram driver makes use of cpu hotplug multistate support, whereby it
associates a struct zcomp per CPU. Each struct zcomp represents a
compression algorithm in charge of managing compression streams per
CPU. Although a compiled zram driver only supports a fixed set of
compression algorithms, each zram device gets a struct zcomp allocated
per CPU. The "multi" in CPU hotplug multstate refers to these per
cpu struct zcomp instances. Each of these will have the CPU hotplug
callback called for it on CPU plug / unplug. The kernel's CPU hotplug
multistate keeps a linked list of these different structures so that
it will iterate over them on CPU transitions.

By default at driver initialization we will create just one zram device
(num_devices=1) and a zcomp structure then set for the now default
lzo-rle comrpession algorithm. At driver removal we first remove each
zram device, and so we destroy the associated struct zcomp per CPU. But
since we expose sysfs attributes to create new devices or reset /
initialize existing zram devices, we can easily end up re-initializing
a struct zcomp for a zram device before the exit routine of the module
removes the cpu hotplug callback. When this happens the kernel's CPU
hotplug will detect that at least one instance (struct zcomp for us)
exists. This can happen in the following situation:

CPU 1                            CPU 2

                                disksize_store(...);
class_unregister(...);
idr_for_each(...);
zram_debugfs_destroy();

idr_destroy(...);
unregister_blkdev(...);
cpuhp_remove_multi_state(...);

The warning comes up on cpuhp_remove_multi_state() when it sees that the
state for CPUHP_ZCOMP_PREPARE does not have an empty instance linked list.
In this case, that a struct zcom still exists, the driver allowed its
creation per CPU even though we could have just freed them per CPU
though a call on another CPU, and we are then later trying to remove the
hotplug callback.

Fix all this by providing a zram initialization boolean
protected the shared in the driver zram_index_mutex, which we
can use to annotate when sysfs attributes are safe to use or
not -- once the driver is properly initialized. When the driver
is going down we also are sure to not let userspace muck with
attributes which may affect each per cpu struct zcomp.

This also fixes a series of possible memory leaks. The
crashes and memory leaks can easily be caused by issuing
the zram02.sh script from the LTP project [0] in a loop
in two separate windows:

  cd testcases/kernel/device-drivers/zram
  while true; do PATH=$PATH:$PWD:$PWD/../../../lib/ ./zram02.sh; done

You end up with a splat as follows:

kernel: zram: Removed device: zram0
kernel: zram: Added device: zram0
kernel: zram0: detected capacity change from 0 to 209715200
kernel: Adding 104857596k swap on /dev/zram0.  <etc>
kernel: zram0: detected capacitky change from 209715200 to 0
kernel: zram0: detected capacity change from 0 to 209715200
kernel: ------------[ cut here ]------------
kernel: Error: Removing state 63 which has instances left.
kernel: WARNING: CPU: 7 PID: 70457 at \
	kernel/cpu.c:2069 __cpuhp_remove_state_cpuslocked+0xf9/0x100
kernel: Modules linked in: zram(E-) zsmalloc(E) <etc>
kernel: CPU: 7 PID: 70457 Comm: rmmod Tainted: G            \
	E     5.12.0-rc1-next-20210304 #3
kernel: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), \
	BIOS 1.14.0-2 04/01/2014
kernel: RIP: 0010:__cpuhp_remove_state_cpuslocked+0xf9/0x100
kernel: Code: <etc>
kernel: RSP: 0018:ffffa800c139be98 EFLAGS: 00010282
kernel: RAX: 0000000000000000 RBX: ffffffff9083db58 RCX: ffff9609f7dd86d8
kernel: RDX: 00000000ffffffd8 RSI: 0000000000000027 RDI: ffff9609f7dd86d0
kernel: RBP: 0000000000000000i R08: 0000000000000000 R09: ffffa800c139bcb8
kernel: R10: ffffa800c139bcb0 R11: ffffffff908bea40 R12: 000000000000003f
kernel: R13: 00000000000009d8 R14: 0000000000000000 R15: 0000000000000000
kernel: FS: 00007f1b075a7540(0000) GS:ffff9609f7dc0000(0000) knlGS:<etc>
kernel: CS:  0010 DS: 0000 ES 0000 CR0: 0000000080050033
kernel: CR2: 00007f1b07610490 CR3: 00000001bd04e000 CR4: 0000000000350ee0
kernel: Call Trace:
kernel: __cpuhp_remove_state+0x2e/0x80
kernel: __do_sys_delete_module+0x190/0x2a0
kernel:  do_syscall_64+0x33/0x80
kernel: entry_SYSCALL_64_after_hwframe+0x44/0xae

The "Error: Removing state 63 which has instances left" refers
to the zram per CPU struct zcomp instances left.

[0] https://github.com/linux-test-project/ltp.git

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/zram/zram_drv.c | 63 ++++++++++++++++++++++++++++++-----
 1 file changed, 55 insertions(+), 8 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index fcaf2750f68f..43d4c9971330 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -44,6 +44,8 @@ static DEFINE_MUTEX(zram_index_mutex);
 static int zram_major;
 static const char *default_compressor = CONFIG_ZRAM_DEF_COMP;
 
+static bool zram_up;
+
 /* Module params (documentation at end) */
 static unsigned int num_devices = 1;
 /*
@@ -1704,6 +1706,7 @@ static void zram_reset_device(struct zram *zram)
 	comp = zram->comp;
 	disksize = zram->disksize;
 	zram->disksize = 0;
+	zram->comp = NULL;
 
 	set_capacity_and_notify(zram->disk, 0);
 	part_stat_set_all(zram->disk->part0, 0);
@@ -1724,9 +1727,18 @@ static ssize_t disksize_store(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 	int err;
 
+	mutex_lock(&zram_index_mutex);
+
+	if (!zram_up) {
+		err = -ENODEV;
+		goto out;
+	}
+
 	disksize = memparse(buf, NULL);
-	if (!disksize)
-		return -EINVAL;
+	if (!disksize) {
+		err = -EINVAL;
+		goto out;
+	}
 
 	down_write(&zram->init_lock);
 	if (init_done(zram)) {
@@ -1754,12 +1766,16 @@ static ssize_t disksize_store(struct device *dev,
 	set_capacity_and_notify(zram->disk, zram->disksize >> SECTOR_SHIFT);
 	up_write(&zram->init_lock);
 
+	mutex_unlock(&zram_index_mutex);
+
 	return len;
 
 out_free_meta:
 	zram_meta_free(zram, disksize);
 out_unlock:
 	up_write(&zram->init_lock);
+out:
+	mutex_unlock(&zram_index_mutex);
 	return err;
 }
 
@@ -1775,8 +1791,17 @@ static ssize_t reset_store(struct device *dev,
 	if (ret)
 		return ret;
 
-	if (!do_reset)
-		return -EINVAL;
+	mutex_lock(&zram_index_mutex);
+
+	if (!zram_up) {
+		len = -ENODEV;
+		goto out;
+	}
+
+	if (!do_reset) {
+		len = -EINVAL;
+		goto out;
+	}
 
 	zram = dev_to_zram(dev);
 	bdev = zram->disk->part0;
@@ -1785,7 +1810,8 @@ static ssize_t reset_store(struct device *dev,
 	/* Do not reset an active device or claimed device */
 	if (bdev->bd_openers || zram->claim) {
 		mutex_unlock(&bdev->bd_disk->open_mutex);
-		return -EBUSY;
+		len = -EBUSY;
+		goto out;
 	}
 
 	/* From now on, anyone can't open /dev/zram[0-9] */
@@ -1800,6 +1826,8 @@ static ssize_t reset_store(struct device *dev,
 	zram->claim = false;
 	mutex_unlock(&bdev->bd_disk->open_mutex);
 
+out:
+	mutex_unlock(&zram_index_mutex);
 	return len;
 }
 
@@ -2010,6 +2038,10 @@ static ssize_t hot_add_show(struct class *class,
 	int ret;
 
 	mutex_lock(&zram_index_mutex);
+	if (!zram_up) {
+		mutex_unlock(&zram_index_mutex);
+		return -ENODEV;
+	}
 	ret = zram_add();
 	mutex_unlock(&zram_index_mutex);
 
@@ -2037,6 +2069,11 @@ static ssize_t hot_remove_store(struct class *class,
 
 	mutex_lock(&zram_index_mutex);
 
+	if (!zram_up) {
+		ret = -ENODEV;
+		goto out;
+	}
+
 	zram = idr_find(&zram_index_idr, dev_id);
 	if (zram) {
 		ret = zram_remove(zram);
@@ -2046,6 +2083,7 @@ static ssize_t hot_remove_store(struct class *class,
 		ret = -ENODEV;
 	}
 
+out:
 	mutex_unlock(&zram_index_mutex);
 	return ret ? ret : count;
 }
@@ -2072,12 +2110,15 @@ static int zram_remove_cb(int id, void *ptr, void *data)
 
 static void destroy_devices(void)
 {
+	mutex_lock(&zram_index_mutex);
+	zram_up = false;
 	class_unregister(&zram_control_class);
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
 	zram_debugfs_destroy();
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
 	cpuhp_remove_multi_state(CPUHP_ZCOMP_PREPARE);
+	mutex_unlock(&zram_index_mutex);
 }
 
 static int __init zram_init(void)
@@ -2105,15 +2146,21 @@ static int __init zram_init(void)
 		return -EBUSY;
 	}
 
+	mutex_lock(&zram_index_mutex);
+
 	while (num_devices != 0) {
-		mutex_lock(&zram_index_mutex);
 		ret = zram_add();
-		mutex_unlock(&zram_index_mutex);
-		if (ret < 0)
+		if (ret < 0) {
+			mutex_unlock(&zram_index_mutex);
 			goto out_error;
+		}
 		num_devices--;
 	}
 
+	zram_up = true;
+
+	mutex_unlock(&zram_index_mutex);
+
 	return 0;
 
 out_error:
-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-03  0:19 [PATCH v6 0/3] zram: fix few sysfs races Luis Chamberlain
  2021-07-03  0:19 ` [PATCH v6 1/3] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
@ 2021-07-03  0:19 ` Luis Chamberlain
  2021-07-10 19:28   ` Andrew Morton
  2021-07-21 11:29   ` Greg KH
  2021-07-03  0:19 ` [PATCH v6 3/3] zram: use ATTRIBUTE_GROUPS Luis Chamberlain
  2 siblings, 2 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:19 UTC (permalink / raw)
  To: akpm, minchan, gregkh, jeyu, ngupta, sergey.senozhatsky.work, rafael
  Cc: mcgrof, axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos,
	rostedt, peterz, linux-block, linux-kernel

When sysfs attributes use a lock also used on module removal we can
potentially deadlock. This happens when for instance a sysfs file on
a driver is used, then at the same time we have module removal call
trigger. The module removal call code holds a lock, and then the sysfs
file entry waits for the same lock. While holding the lock the module
removal tries to remove the sysfs entries, but these cannot be removed
yet as one is waiting for a lock. This won't complete as the lock is
already held. Likewise module removal cannot complete, and so we deadlock.

To fix this we just *try* to get a refcount to the module when a shared
lock is used, prior to mucking with a sysfs attribute. If this fails we
just give up right away.

We use a try method as a full lock means we'd then make our sysfs
attributes busy us out from possible module removal, and so userspace
could force denying module removal, a silly form of "DOS" against module
removal. A try lock on the module removal ensures we give priority to
module removal and interacting with sysfs attributes only comes second.
Using a full lock could mean for instance that if you don't stop poking
at sysfs files you cannot remove a module.

This deadlock was first reported with the zram driver, a sketch of how
this can happen follows:

CPU A                              CPU B
                                   whatever_store()
module_unload
  mutex_lock(foo)
                                   mutex_lock(foo)
   del_gendisk(zram->disk);
     device_del()
       device_remove_groups()

In this situation whatever_store() is waiting for the mutex foo to
become unlocked, but that won't happen until module removal is complete.
But module removal won't complete until the sysfs file being poked
completes which is waiting for a lock already held.

This is a generic kernel issue with sysfs files which use any lock also
used on module removal. Different generic solutions have been proposed.
One approach proposed is by directly by augmenting attributes with module
information [0]. This patch implements a solution by adding macros with
the prefix MODULE_DEVICE_ATTR_*() which accomplish the same. Until we
don't have a generic agreed upon solution for this shared between drivers,
we must implement a fix for this on each driver.

We make zram use the new MODULE_DEVICE_ATTR_*() helpers, and completely
open code the solution for class attributes as there are only a few of
those.

This issue can be reproduced easily on the zram driver as follows:

Loop 1 on one terminal:

while true;
	do modprobe zram;
	modprobe -r zram;
done

Loop 2 on a second terminal:
while true; do
	echo 1024 >  /sys/block/zram0/disksize;
	echo 1 > /sys/block/zram0/reset;
done

Without this patch we end up in a deadlock, and the following
stack trace is produced which hints to us what the issue was:

INFO: task bash:888 blocked for more than 120 seconds.
      Tainted: G            E 5.12.0-rc1-next-20210304+ #4
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
task:bash            state:D stack:    0 pid:  888 ppid: 887 flags:<etc>
Call Trace:
 __schedule+0x2e4/0x900
 schedule+0x46/0xb0
 schedule_preempt_disabled+0xa/0x10
 __mutex_lock.constprop.0+0x2c3/0x490
 ? _kstrtoull+0x35/0xd0
 reset_store+0x6c/0x160 [zram]
 kernfs_fop_write_iter+0x124/0x1b0
 new_sync_write+0x11c/0x1b0
 vfs_write+0x1c2/0x260
 ksys_write+0x5f/0xe0
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f34f2c3df33
RSP: 002b:00007ffe751df6e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000002 RCX: 00007f34f2c3df33
RDX: 0000000000000002 RSI: 0000561ccb06ec10 RDI: 0000000000000001
RBP: 0000561ccb06ec10 R08: 000000000000000a R09: 0000000000000001
R10: 0000561ccb157590 R11: 0000000000000246 R12: 0000000000000002
R13: 00007f34f2d0e6a0 R14: 0000000000000002 R15: 00007f34f2d0e8a0
INFO: task modprobe:1104 can't die for more than 120 seconds.
task:modprobe        state:D stack:    0 pid: 1104 ppid: 916 flags:<etc>
Call Trace:
 __schedule+0x2e4/0x900
 schedule+0x46/0xb0
 __kernfs_remove.part.0+0x228/0x2b0
 ? finish_wait+0x80/0x80
 kernfs_remove_by_name_ns+0x50/0x90
 remove_files+0x2b/0x60
 sysfs_remove_group+0x38/0x80
 sysfs_remove_groups+0x29/0x40
 device_remove_attrs+0x4a/0x80
 device_del+0x183/0x3e0
 ? mutex_lock+0xe/0x30
 del_gendisk+0x27a/0x2d0
 zram_remove+0x8a/0xb0 [zram]
 ? hot_remove_store+0xf0/0xf0 [zram]
 zram_remove_cb+0xd/0x10 [zram]
 idr_for_each+0x5e/0xd0
 destroy_devices+0x39/0x6f [zram]
 __do_sys_delete_module+0x190/0x2a0
 do_syscall_64+0x33/0x80
 entry_SYSCALL_64_after_hwframe+0x44/0xae
RIP: 0033:0x7f32adf727d7
RSP: 002b:00007ffc08bb38a8 EFLAGS: 00000206 ORIG_RAX: 00000000000000b0
RAX: ffffffffffffffda RBX: 000055eea23cbb10 RCX: 00007f32adf727d7
RDX: 0000000000000000 RSI: 0000000000000800 RDI: 000055eea23cbb78
RBP: 000055eea23cbb10 R08: 0000000000000000 R09: 0000000000000000
R10: 00007f32adfe5ac0 R11: 0000000000000206 R12: 000055eea23cbb78
R13: 0000000000000000 R14: 0000000000000000 R15: 000055eea23cbc20

[0] https://lkml.kernel.org/r/20210401235925.GR4332@42.do-not-panic.com

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/zram/zram_drv.c | 47 ++++++++++++++++++------------
 drivers/block/zram/zram_drv.h | 54 +++++++++++++++++++++++++++++++++++
 2 files changed, 83 insertions(+), 18 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 43d4c9971330..205cf9287d0c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1134,12 +1134,12 @@ static ssize_t debug_stat_show(struct device *dev,
 	return ret;
 }
 
-static DEVICE_ATTR_RO(io_stat);
-static DEVICE_ATTR_RO(mm_stat);
+MODULE_DEVICE_ATTR_RO(io_stat);
+MODULE_DEVICE_ATTR_RO(mm_stat);
 #ifdef CONFIG_ZRAM_WRITEBACK
-static DEVICE_ATTR_RO(bd_stat);
+MODULE_DEVICE_ATTR_RO(bd_stat);
 #endif
-static DEVICE_ATTR_RO(debug_stat);
+MODULE_DEVICE_ATTR_RO(debug_stat);
 
 static void zram_meta_free(struct zram *zram, u64 disksize)
 {
@@ -1861,20 +1861,20 @@ static const struct block_device_operations zram_wb_devops = {
 	.owner = THIS_MODULE
 };
 
-static DEVICE_ATTR_WO(compact);
-static DEVICE_ATTR_RW(disksize);
-static DEVICE_ATTR_RO(initstate);
-static DEVICE_ATTR_WO(reset);
-static DEVICE_ATTR_WO(mem_limit);
-static DEVICE_ATTR_WO(mem_used_max);
-static DEVICE_ATTR_WO(idle);
-static DEVICE_ATTR_RW(max_comp_streams);
-static DEVICE_ATTR_RW(comp_algorithm);
+MODULE_DEVICE_ATTR_WO(compact);
+MODULE_DEVICE_ATTR_RW(disksize);
+MODULE_DEVICE_ATTR_RO(initstate);
+MODULE_DEVICE_ATTR_WO(reset);
+MODULE_DEVICE_ATTR_WO(mem_limit);
+MODULE_DEVICE_ATTR_WO(mem_used_max);
+MODULE_DEVICE_ATTR_WO(idle);
+MODULE_DEVICE_ATTR_RW(max_comp_streams);
+MODULE_DEVICE_ATTR_RW(comp_algorithm);
 #ifdef CONFIG_ZRAM_WRITEBACK
-static DEVICE_ATTR_RW(backing_dev);
-static DEVICE_ATTR_WO(writeback);
-static DEVICE_ATTR_RW(writeback_limit);
-static DEVICE_ATTR_RW(writeback_limit_enable);
+MODULE_DEVICE_ATTR_RW(backing_dev);
+MODULE_DEVICE_ATTR_WO(writeback);
+MODULE_DEVICE_ATTR_RW(writeback_limit);
+MODULE_DEVICE_ATTR_RW(writeback_limit_enable);
 #endif
 
 static struct attribute *zram_disk_attrs[] = {
@@ -2037,13 +2037,19 @@ static ssize_t hot_add_show(struct class *class,
 {
 	int ret;
 
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
 	mutex_lock(&zram_index_mutex);
 	if (!zram_up) {
 		mutex_unlock(&zram_index_mutex);
-		return -ENODEV;
+		ret = -ENODEV;
+		goto out;
 	}
 	ret = zram_add();
+out:
 	mutex_unlock(&zram_index_mutex);
+	module_put(THIS_MODULE);
 
 	if (ret < 0)
 		return ret;
@@ -2052,6 +2058,7 @@ static ssize_t hot_add_show(struct class *class,
 static struct class_attribute class_attr_hot_add =
 	__ATTR(hot_add, 0400, hot_add_show, NULL);
 
+#define module_hot_remove_store hot_remove_store
 static ssize_t hot_remove_store(struct class *class,
 			struct class_attribute *attr,
 			const char *buf,
@@ -2067,6 +2074,9 @@ static ssize_t hot_remove_store(struct class *class,
 	if (dev_id < 0)
 		return -EINVAL;
 
+	if (!try_module_get(THIS_MODULE))
+		return -ENODEV;
+
 	mutex_lock(&zram_index_mutex);
 
 	if (!zram_up) {
@@ -2085,6 +2095,7 @@ static ssize_t hot_remove_store(struct class *class,
 
 out:
 	mutex_unlock(&zram_index_mutex);
+	module_put(THIS_MODULE);
 	return ret ? ret : count;
 }
 static CLASS_ATTR_WO(hot_remove);
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 80c3b43b4828..90f6777d7d0a 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -126,4 +126,58 @@ struct zram {
 	struct dentry *debugfs_dir;
 #endif
 };
+
+#undef __ATTR_RO
+#undef __ATTR_RW
+#undef __ATTR_WO
+
+#define __ATTR_RO(_name) {						\
+	.attr	= { .name = __stringify(_name), .mode = 0444 },		\
+	.show	= module_##_name##_show,						\
+}
+#define __ATTR_RW(_name) __ATTR(_name, 0644, module_##_name##_show, module_##_name##_store)
+#define __ATTR_WO(_name) {						\
+	.attr	= { .name = __stringify(_name), .mode = 0200 },		\
+	.store	= module_##_name##_store,				\
+}
+
+#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
+static ssize_t module_ ## _name ## _store(struct device *dev, \
+				   struct device_attribute *attr, \
+				   const char *buf, size_t len) \
+{ \
+	ssize_t __ret; \
+	if (!try_module_get(THIS_MODULE)) \
+		return -ENODEV; \
+	__ret = _name ## _store(dev, attr, buf, len); \
+	module_put(THIS_MODULE); \
+	return __ret; \
+}
+
+#define MODULE_DEVICE_ATTR_FUNC_SHOW(_name) \
+static ssize_t module_ ## _name ## _show(struct device *dev, \
+					 struct device_attribute *attr, \
+					 char *buf) \
+{ \
+	ssize_t __ret; \
+	if (!try_module_get(THIS_MODULE)) \
+		return -ENODEV; \
+	__ret = _name ## _show(dev, attr, buf); \
+	module_put(THIS_MODULE); \
+	return __ret; \
+}
+
+#define MODULE_DEVICE_ATTR_WO(_name) \
+MODULE_DEVICE_ATTR_FUNC_STORE(_name); \
+static DEVICE_ATTR_WO(_name)
+
+#define MODULE_DEVICE_ATTR_RW(_name) \
+MODULE_DEVICE_ATTR_FUNC_STORE(_name); \
+MODULE_DEVICE_ATTR_FUNC_SHOW(_name); \
+static DEVICE_ATTR_RW(_name)
+
+#define MODULE_DEVICE_ATTR_RO(_name) \
+MODULE_DEVICE_ATTR_FUNC_SHOW(_name); \
+static DEVICE_ATTR_RO(_name)
+
 #endif
-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v6 3/3] zram: use ATTRIBUTE_GROUPS
  2021-07-03  0:19 [PATCH v6 0/3] zram: fix few sysfs races Luis Chamberlain
  2021-07-03  0:19 ` [PATCH v6 1/3] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
  2021-07-03  0:19 ` [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal Luis Chamberlain
@ 2021-07-03  0:19 ` Luis Chamberlain
  2 siblings, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-03  0:19 UTC (permalink / raw)
  To: akpm, minchan, gregkh, jeyu, ngupta, sergey.senozhatsky.work, rafael
  Cc: mcgrof, axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos,
	rostedt, peterz, linux-block, linux-kernel

Remove boilerplate code and use ATTRIBUTE_GROUPS() to
simplify the code further. This produces no functional changes
other than reducing the size of the group name variable.

Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
---
 drivers/block/zram/zram_drv.c | 11 ++---------
 1 file changed, 2 insertions(+), 9 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 205cf9287d0c..56be6817c5b2 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1902,14 +1902,7 @@ static struct attribute *zram_disk_attrs[] = {
 	NULL,
 };
 
-static const struct attribute_group zram_disk_attr_group = {
-	.attrs = zram_disk_attrs,
-};
-
-static const struct attribute_group *zram_disk_attr_groups[] = {
-	&zram_disk_attr_group,
-	NULL,
-};
+ATTRIBUTE_GROUPS(zram_disk);
 
 /*
  * Allocate and initialize new zram device. the function returns
@@ -1981,7 +1974,7 @@ static int zram_add(void)
 		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
 	blk_queue_flag_set(QUEUE_FLAG_STABLE_WRITES, zram->disk->queue);
-	device_add_disk(NULL, zram->disk, zram_disk_attr_groups);
+	device_add_disk(NULL, zram->disk, zram_disk_groups);
 
 	strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
 
-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-03  0:19 ` [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal Luis Chamberlain
@ 2021-07-10 19:28   ` Andrew Morton
  2021-07-11  5:00     ` Greg KH
  2021-07-12 23:17     ` Luis Chamberlain
  2021-07-21 11:29   ` Greg KH
  1 sibling, 2 replies; 12+ messages in thread
From: Andrew Morton @ 2021-07-10 19:28 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: minchan, gregkh, jeyu, ngupta, sergey.senozhatsky.work, rafael,
	axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos, rostedt,
	peterz, linux-block, linux-kernel

On Fri,  2 Jul 2021 17:19:57 -0700 Luis Chamberlain <mcgrof@kernel.org> wrote:

> +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> +static ssize_t module_ ## _name ## _store(struct device *dev, \
> +				   struct device_attribute *attr, \
> +				   const char *buf, size_t len) \
> +{ \
> +	ssize_t __ret; \
> +	if (!try_module_get(THIS_MODULE)) \
> +		return -ENODEV; \
> +	__ret = _name ## _store(dev, attr, buf, len); \
> +	module_put(THIS_MODULE); \
> +	return __ret; \
> +}

I assume that Greg's comments on try_module_get() are applicable here
also.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-10 19:28   ` Andrew Morton
@ 2021-07-11  5:00     ` Greg KH
  2021-07-12 23:17     ` Luis Chamberlain
  1 sibling, 0 replies; 12+ messages in thread
From: Greg KH @ 2021-07-11  5:00 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Luis Chamberlain, minchan, jeyu, ngupta, sergey.senozhatsky.work,
	rafael, axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos,
	rostedt, peterz, linux-block, linux-kernel

On Sat, Jul 10, 2021 at 12:28:51PM -0700, Andrew Morton wrote:
> On Fri,  2 Jul 2021 17:19:57 -0700 Luis Chamberlain <mcgrof@kernel.org> wrote:
> 
> > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > +				   struct device_attribute *attr, \
> > +				   const char *buf, size_t len) \
> > +{ \
> > +	ssize_t __ret; \
> > +	if (!try_module_get(THIS_MODULE)) \
> > +		return -ENODEV; \
> > +	__ret = _name ## _store(dev, attr, buf, len); \
> > +	module_put(THIS_MODULE); \
> > +	return __ret; \
> > +}
> 
> I assume that Greg's comments on try_module_get() are applicable here
> also.

Yes, this is still broken code and does not do what it says it does,
please do not merge it.

Again, almost anything that does try_module_get(THIS_MODULE) is broken,
this code included.  I'll write more in a week or so when I get a chance
to get to this series in my reviews...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-10 19:28   ` Andrew Morton
  2021-07-11  5:00     ` Greg KH
@ 2021-07-12 23:17     ` Luis Chamberlain
  1 sibling, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-12 23:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: minchan, gregkh, jeyu, ngupta, sergey.senozhatsky.work, rafael,
	axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos, rostedt,
	peterz, linux-block, linux-kernel

On Sat, Jul 10, 2021 at 12:28:51PM -0700, Andrew Morton wrote:
> On Fri,  2 Jul 2021 17:19:57 -0700 Luis Chamberlain <mcgrof@kernel.org> wrote:
> 
> > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > +				   struct device_attribute *attr, \
> > +				   const char *buf, size_t len) \
> > +{ \
> > +	ssize_t __ret; \
> > +	if (!try_module_get(THIS_MODULE)) \
> > +		return -ENODEV; \
> > +	__ret = _name ## _store(dev, attr, buf, len); \
> > +	module_put(THIS_MODULE); \
> > +	return __ret; \
> > +}
> 
> I assume that Greg's comments on try_module_get() are applicable here
> also.

While we wait for Greg for an alternative, patch #1 is still fine.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-03  0:19 ` [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal Luis Chamberlain
  2021-07-10 19:28   ` Andrew Morton
@ 2021-07-21 11:29   ` Greg KH
  2021-07-22 22:17     ` Luis Chamberlain
  1 sibling, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-07-21 11:29 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: akpm, minchan, jeyu, ngupta, sergey.senozhatsky.work, rafael,
	axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos, rostedt,
	peterz, linux-block, linux-kernel

On Fri, Jul 02, 2021 at 05:19:57PM -0700, Luis Chamberlain wrote:
> +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> +static ssize_t module_ ## _name ## _store(struct device *dev, \
> +				   struct device_attribute *attr, \
> +				   const char *buf, size_t len) \
> +{ \
> +	ssize_t __ret; \
> +	if (!try_module_get(THIS_MODULE)) \
> +		return -ENODEV; \

I feel like this needs to be written down somewhere as I see it come up
all the time.

Again, this is racy and broken code.  You can NEVER try to increment
your own module reference count unless it has already been incremented
by someone external first.

As "proof", what happens if this module is unloaded right _before_ this
call happens?  The module will be unloaded, memory zeroed out (or
overridden), and then the processor will resume here and try to call (or
return into) this code path.

Boom.

Just say no to "try_module_get(THIS_MODULE)" as it is totally wrong.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-21 11:29   ` Greg KH
@ 2021-07-22 22:17     ` Luis Chamberlain
  2021-07-23 11:15       ` Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-22 22:17 UTC (permalink / raw)
  To: Greg KH
  Cc: akpm, minchan, jeyu, ngupta, sergey.senozhatsky.work, rafael,
	axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos, rostedt,
	peterz, linux-block, linux-kernel

On Wed, Jul 21, 2021 at 01:29:29PM +0200, Greg KH wrote:
> On Fri, Jul 02, 2021 at 05:19:57PM -0700, Luis Chamberlain wrote:
> > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > +				   struct device_attribute *attr, \
> > +				   const char *buf, size_t len) \
> > +{ \
> > +	ssize_t __ret; \
> > +	if (!try_module_get(THIS_MODULE)) \
> > +		return -ENODEV; \
> 
> I feel like this needs to be written down somewhere as I see it come up
> all the time.

I'll go ahead and cook up a patch to do just this after I send this
email out.

> Again, this is racy and broken code.  You can NEVER try to increment
> your own module reference count unless it has already been incremented
> by someone external first.

In the zram driver's case the sysfs files are still pegged on, because
as we noted before the kernfs active reference will ensure the store
operation still exists. If the driver removes the operation prior to
getting the active reference, the write will just fail. kernfs ensures
once a file is opened the op is not removed until the operation completes.

If a file is opened then, the module cannot possibly be removed. The
piece of information we realy care about is the use of module_is_live()
inside try_module_get() which does:

static inline bool module_is_live(struct module *mod)
{                                                                               
	return mod->state != MODULE_STATE_GOING;
}

The try allows module removal to trump use of the sysfs file. If
userspace wants the module removed, it gives up in favor for that
operation.

> As "proof", what happens if this module is unloaded right _before_ this
> call happens?  The module will be unloaded, memory zeroed out (or
> overridden), and then the processor will resume here and try to call (or
> return into) this code path.

The use of try_module_get() is protected to be correct by the kernfs active
reference, which in turn ensures the module is not gone. That is, when
a sysfs file read / write op is issued, if the file was opened we *know*
the module is not gone yet. It cannot possibly be removed. But once
inside the operation, try_module_get() can check to see if userspace did
want to remove the module, and if so it would immediately bail out and
yield to that operation.

Userspace cannot open a sysfs file with the module being gone.
The kernfs active prevents that.

> Boom.

I think it would be good we add a self test for this particular case.
I'll go ahead and extend my sysfs tests with one test for this case.

I could do this by adding a new sysfs file to the test driver and where
all it does is this try_module_get() thing. This can then be raced with
module removals attempts.

But it does not mean all you say is wrong.

I think the value of what you are saying requires documenting as it
was not clear to me either. I'll send a patch now.

> Just say no to "try_module_get(THIS_MODULE)" as it is totally wrong.

Context is required and documented. I'll end a patch.

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-22 22:17     ` Luis Chamberlain
@ 2021-07-23 11:15       ` Greg KH
  2021-07-23 17:49         ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Greg KH @ 2021-07-23 11:15 UTC (permalink / raw)
  To: Luis Chamberlain
  Cc: akpm, minchan, jeyu, ngupta, sergey.senozhatsky.work, rafael,
	axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos, rostedt,
	peterz, linux-block, linux-kernel

On Thu, Jul 22, 2021 at 03:17:05PM -0700, Luis Chamberlain wrote:
> On Wed, Jul 21, 2021 at 01:29:29PM +0200, Greg KH wrote:
> > On Fri, Jul 02, 2021 at 05:19:57PM -0700, Luis Chamberlain wrote:
> > > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > > +				   struct device_attribute *attr, \
> > > +				   const char *buf, size_t len) \
> > > +{ \
> > > +	ssize_t __ret; \
> > > +	if (!try_module_get(THIS_MODULE)) \
> > > +		return -ENODEV; \
> > 
> > I feel like this needs to be written down somewhere as I see it come up
> > all the time.
> 
> I'll go ahead and cook up a patch to do just this after I send this
> email out.
> 
> > Again, this is racy and broken code.  You can NEVER try to increment
> > your own module reference count unless it has already been incremented
> > by someone external first.
> 
> In the zram driver's case the sysfs files are still pegged on, because
> as we noted before the kernfs active reference will ensure the store
> operation still exists.

How does that happen without a module lock?

> If the driver removes the operation prior to
> getting the active reference, the write will just fail. kernfs ensures
> once a file is opened the op is not removed until the operation completes.

How does it do that?

> If a file is opened then, the module cannot possibly be removed. The
> piece of information we realy care about is the use of module_is_live()
> inside try_module_get() which does:
> 
> static inline bool module_is_live(struct module *mod)
> {                                                                               
> 	return mod->state != MODULE_STATE_GOING;
> }
> 
> The try allows module removal to trump use of the sysfs file. If
> userspace wants the module removed, it gives up in favor for that
> operation.

I do not see the tie in kernfs to module reference counts, what am I
missing?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-23 11:15       ` Greg KH
@ 2021-07-23 17:49         ` Luis Chamberlain
  2021-07-27 17:35           ` Luis Chamberlain
  0 siblings, 1 reply; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-23 17:49 UTC (permalink / raw)
  To: Greg KH
  Cc: akpm, minchan, jeyu, ngupta, sergey.senozhatsky.work, rafael,
	axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos, rostedt,
	peterz, linux-block, linux-kernel

On Fri, Jul 23, 2021 at 01:15:49PM +0200, Greg KH wrote:
> On Thu, Jul 22, 2021 at 03:17:05PM -0700, Luis Chamberlain wrote:
> > On Wed, Jul 21, 2021 at 01:29:29PM +0200, Greg KH wrote:
> > > On Fri, Jul 02, 2021 at 05:19:57PM -0700, Luis Chamberlain wrote:
> > > > +#define MODULE_DEVICE_ATTR_FUNC_STORE(_name) \
> > > > +static ssize_t module_ ## _name ## _store(struct device *dev, \
> > > > +				   struct device_attribute *attr, \
> > > > +				   const char *buf, size_t len) \
> > > > +{ \
> > > > +	ssize_t __ret; \
> > > > +	if (!try_module_get(THIS_MODULE)) \
> > > > +		return -ENODEV; \
> > > 
> > > I feel like this needs to be written down somewhere as I see it come up
> > > all the time.
> > 
> > I'll go ahead and cook up a patch to do just this after I send this
> > email out.
> > 
> > > Again, this is racy and broken code.  You can NEVER try to increment
> > > your own module reference count unless it has already been incremented
> > > by someone external first.
> > 
> > In the zram driver's case the sysfs files are still pegged on, because
> > as we noted before the kernfs active reference will ensure the store
> > operation still exists.
> 
> How does that happen without a module lock?

If a read / write operations is happening on a sysfs file created by a
module, the module cannot be removed because it is the module's own
responsibility to remove the sysfs file on module exit. There is no
module lock. It is inferred.

> > If the driver removes the operation prior to
> > getting the active reference, the write will just fail. kernfs ensures
> > once a file is opened the op is not removed until the operation completes.
> 
> How does it do that?

Using an active reference.

> > If a file is opened then, the module cannot possibly be removed. The
> > piece of information we realy care about is the use of module_is_live()
> > inside try_module_get() which does:
> > 
> > static inline bool module_is_live(struct module *mod)
> > {                                                                               
> > 	return mod->state != MODULE_STATE_GOING;
> > }
> > 
> > The try allows module removal to trump use of the sysfs file. If
> > userspace wants the module removed, it gives up in favor for that
> > operation.
> 
> I do not see the tie in kernfs to module reference counts, what am I
> missing?

Let me try to describe this again. Let's take it step by step, premise
by premise on the inference assumption. Let me know at which point you
disagree.

We are talking about sysfs files and you're argument is that
try_module_get() should lock the module, and so cannot be used
in sysfs files. My point is that such module lock is inferred:

1) Sysfs files are created by a module, that same module is responsible
   for removing the same sysfs files.
2) The module can only be removed and gone, once *all* sysfs files are
   removed first.
3) If any of the module's sysfs files are present the module must
   still be present
4) kernfs ensures that if a file is opened the file will not be
   removed until any pending operation completes
5) If a sysfs file is used to write something, that means the
   sysfs file has not yet been removed, and we know it will
   remain in existance throughout its entire operation
6) When a sysfs file operation is being run, the module must
   always exist

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal
  2021-07-23 17:49         ` Luis Chamberlain
@ 2021-07-27 17:35           ` Luis Chamberlain
  0 siblings, 0 replies; 12+ messages in thread
From: Luis Chamberlain @ 2021-07-27 17:35 UTC (permalink / raw)
  To: Greg KH, David Laight
  Cc: akpm, minchan, jeyu, ngupta, sergey.senozhatsky.work, rafael,
	axboe, tj, mbenes, jpoimboe, tglx, keescook, jikos, rostedt,
	peterz, linux-block, linux-kernel

On Fri, Jul 23, 2021 at 10:49:19AM -0700, Luis Chamberlain wrote:
> We are talking about sysfs files and you're argument is that
> try_module_get() should lock the module, and so cannot be used
> in sysfs files. My point is that such module lock is inferred:
> 
> 1) Sysfs files are created by a module, that same module is responsible
>    for removing the same sysfs files.
> 2) The module can only be removed and gone, once *all* sysfs files are
>    removed first.
> 3) If any of the module's sysfs files are present the module must
>    still be present
> 4) kernfs ensures that if a file is opened the file will not be
>    removed until any pending operation completes
> 5) If a sysfs file is used to write something, that means the
>    sysfs file has not yet been removed, and we know it will
>    remain in existance throughout its entire operation
> 6) When a sysfs file operation is being run, the module must
>    always exist

Greg,

I'm inclined to believe my original generic solution would be better
again [0]. Specially since we can drop the dev_type_get() / dev_type_put()
stuff.

Thoughts?

[0] https://lore.kernel.org/linux-block/20210401235925.GR4332@42.do-not-panic.com/

  Luis

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-07-27 17:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-07-03  0:19 [PATCH v6 0/3] zram: fix few sysfs races Luis Chamberlain
2021-07-03  0:19 ` [PATCH v6 1/3] zram: fix crashes with cpu hotplug multistate Luis Chamberlain
2021-07-03  0:19 ` [PATCH v6 2/3] zram: fix deadlock with sysfs attribute usage and module removal Luis Chamberlain
2021-07-10 19:28   ` Andrew Morton
2021-07-11  5:00     ` Greg KH
2021-07-12 23:17     ` Luis Chamberlain
2021-07-21 11:29   ` Greg KH
2021-07-22 22:17     ` Luis Chamberlain
2021-07-23 11:15       ` Greg KH
2021-07-23 17:49         ` Luis Chamberlain
2021-07-27 17:35           ` Luis Chamberlain
2021-07-03  0:19 ` [PATCH v6 3/3] zram: use ATTRIBUTE_GROUPS Luis Chamberlain

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).