linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 0/9] introduce on-demand device creation
@ 2015-04-27 13:21 Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 1/9] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
                   ` (9 more replies)
  0 siblings, 10 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We currently don't support zram on-demand device creation.  The only way
to have N zram devices is to specify num_devices module parameter (default
value 1).  That means that if, for some reason, at some point, user wants
to have N + 1 devies he/she must umount all the existing devices, unload
the module, load the module passing num_devices equals to N + 1.

This patchset introduces zram-control sysfs class, which has two sysfs
attrs:

 - zram_add     -- add a new zram device
 - zram_remove  -- remove a specific (device_id) zram device

    Usage example:
        # add a new specific zram device
        cat /sys/class/zram-control/zram_add
        1

        # remove a specific zram device
        echo 4 > /sys/class/zram-control/zram_remove


V3:
-- rebase against 4.1
-- review comments from Minchan were addressed
-- no sysfs RO tricks anymore

V2:
-- quick rebase and cleanup in attempt to catch 4.1 merge window

Sergey Senozhatsky (9):
  zram: add `compact` sysfs entry to documentation
  zram: cosmetic ZRAM_ATTR_RO code formatting tweak
  zram: use idr instead of `zram_devices' array
  zram: reorganize code layout
  zram: remove max_num_devices limitation
  zram: report every added and removed device
  zram: trivial: correct flag operations comment
  zram: return zram device_id from zram_add()
  zram: add dynamic device add/remove functionality

 Documentation/blockdev/zram.txt |  29 +-
 drivers/block/zram/zram_drv.c   | 975 ++++++++++++++++++++++------------------
 drivers/block/zram/zram_drv.h   |   6 -
 3 files changed, 558 insertions(+), 452 deletions(-)

-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCHv3 1/9] zram: add `compact` sysfs entry to documentation
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 2/9] zram: cosmetic ZRAM_ATTR_RO code formatting tweak Sergey Senozhatsky
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Briefly describe missing `compact` sysfs entry.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/blockdev/zram.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 48a183e..bef4998 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -126,7 +126,7 @@ mem_used_max      RW    the maximum amount memory zram have consumed to
 mem_limit         RW    the maximum amount of memory ZRAM can use to store
                         the compressed data
 num_migrated      RO    the number of objects migrated migrated by compaction
-
+compact           WO    trigger memory compaction
 
 WARNING
 =======
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 2/9] zram: cosmetic ZRAM_ATTR_RO code formatting tweak
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 1/9] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 3/9] zram: use idr instead of `zram_devices' array Sergey Senozhatsky
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Fix a misplaced backslash.

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 01ec694..41816d0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -49,7 +49,7 @@ static inline void deprecated_attr_warn(const char *name)
 }
 
 #define ZRAM_ATTR_RO(name)						\
-static ssize_t name##_show(struct device *d,		\
+static ssize_t name##_show(struct device *d,				\
 				struct device_attribute *attr, char *b)	\
 {									\
 	struct zram *zram = dev_to_zram(d);				\
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 3/9] zram: use idr instead of `zram_devices' array
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 1/9] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 2/9] zram: cosmetic ZRAM_ATTR_RO code formatting tweak Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 4/9] zram: reorganize code layout Sergey Senozhatsky
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This patch makes some preparations for on-demand device add/remove
functionality.

Remove `zram_devices' array and switch to id-to-pointer translation (idr).
idr doesn't bloat zram struct with additional members, f.e. list_head,
yet still provides ability to match the device_id with the device pointer.

No user-space visible changes.

[do not lose -ENOMEM return code when `queue' allocation has failed]
Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 86 ++++++++++++++++++++++++-------------------
 1 file changed, 49 insertions(+), 37 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 41816d0..9ee0d7f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -28,12 +28,12 @@
 #include <linux/string.h>
 #include <linux/vmalloc.h>
 #include <linux/err.h>
+#include <linux/idr.h>
 
 #include "zram_drv.h"
 
-/* Globals */
+static DEFINE_IDR(zram_index_idr);
 static int zram_major;
-static struct zram *zram_devices;
 static const char *default_compressor = "lzo";
 
 /* Module params (documentation at end) */
@@ -1152,10 +1152,20 @@ static struct attribute_group zram_disk_attr_group = {
 	.attrs = zram_disk_attrs,
 };
 
-static int create_device(struct zram *zram, int device_id)
+static int zram_add(int device_id)
 {
+	struct zram *zram;
 	struct request_queue *queue;
-	int ret = -ENOMEM;
+	int ret;
+
+	zram = kzalloc(sizeof(struct zram), GFP_KERNEL);
+	if (!zram)
+		return -ENOMEM;
+
+	ret = idr_alloc(&zram_index_idr, zram, device_id,
+			device_id + 1, GFP_KERNEL);
+	if (ret < 0)
+		goto out_free_dev;
 
 	init_rwsem(&zram->init_lock);
 
@@ -1163,12 +1173,13 @@ static int create_device(struct zram *zram, int device_id)
 	if (!queue) {
 		pr_err("Error allocating disk queue for device %d\n",
 			device_id);
-		goto out;
+		ret = -ENOMEM;
+		goto out_free_idr;
 	}
 
 	blk_queue_make_request(queue, zram_make_request);
 
-	 /* gendisk structure */
+	/* gendisk structure */
 	zram->disk = alloc_disk(1);
 	if (!zram->disk) {
 		pr_warn("Error allocating disk structure for device %d\n",
@@ -1233,34 +1244,42 @@ out_free_disk:
 	put_disk(zram->disk);
 out_free_queue:
 	blk_cleanup_queue(queue);
-out:
+out_free_idr:
+	idr_remove(&zram_index_idr, device_id);
+out_free_dev:
+	kfree(zram);
 	return ret;
 }
 
-static void destroy_devices(unsigned int nr)
+static void zram_remove(struct zram *zram)
 {
-	struct zram *zram;
-	unsigned int i;
-
-	for (i = 0; i < nr; i++) {
-		zram = &zram_devices[i];
-		/*
-		 * Remove sysfs first, so no one will perform a disksize
-		 * store while we destroy the devices
-		 */
-		sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
-				&zram_disk_attr_group);
+	/*
+	 * Remove sysfs first, so no one will perform a disksize
+	 * store while we destroy the devices
+	 */
+	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
+			&zram_disk_attr_group);
 
-		zram_reset_device(zram);
+	zram_reset_device(zram);
+	idr_remove(&zram_index_idr, zram->disk->first_minor);
+	blk_cleanup_queue(zram->disk->queue);
+	del_gendisk(zram->disk);
+	put_disk(zram->disk);
+	kfree(zram);
+}
 
-		blk_cleanup_queue(zram->disk->queue);
-		del_gendisk(zram->disk);
-		put_disk(zram->disk);
-	}
+static int zram_remove_cb(int id, void *ptr, void *data)
+{
+	zram_remove(ptr);
+	return 0;
+}
 
-	kfree(zram_devices);
+static void destroy_devices(void)
+{
+	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
+	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
-	pr_info("Destroyed %u device(s)\n", nr);
+	pr_info("Destroyed device(s)\n");
 }
 
 static int __init zram_init(void)
@@ -1279,16 +1298,9 @@ static int __init zram_init(void)
 		return -EBUSY;
 	}
 
-	/* Allocate the device array and initialize each one */
-	zram_devices = kzalloc(num_devices * sizeof(struct zram), GFP_KERNEL);
-	if (!zram_devices) {
-		unregister_blkdev(zram_major, "zram");
-		return -ENOMEM;
-	}
-
 	for (dev_id = 0; dev_id < num_devices; dev_id++) {
-		ret = create_device(&zram_devices[dev_id], dev_id);
-		if (ret)
+		ret = zram_add(dev_id);
+		if (ret != 0)
 			goto out_error;
 	}
 
@@ -1296,13 +1308,13 @@ static int __init zram_init(void)
 	return 0;
 
 out_error:
-	destroy_devices(dev_id);
+	destroy_devices();
 	return ret;
 }
 
 static void __exit zram_exit(void)
 {
-	destroy_devices(num_devices);
+	destroy_devices();
 }
 
 module_init(zram_init);
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 4/9] zram: reorganize code layout
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2015-04-27 13:21 ` [PATCHv3 3/9] zram: use idr instead of `zram_devices' array Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 5/9] zram: remove max_num_devices limitation Sergey Senozhatsky
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This patch looks big, but basically it just moves code blocks.
No functional changes.

Our current code layout looks like a sandwitch.

For example,
a) between read/write handlers, we have update_used_max() helper function:

static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw

b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:

static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request

c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.

Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):

-- one-liners: zram_test_flag/etc.

-- helpers: is_partial_io/update_position/etc

-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.

-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page

-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page

-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 733 +++++++++++++++++++++---------------------
 1 file changed, 366 insertions(+), 367 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9ee0d7f..3d7ac0c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -70,33 +70,117 @@ static inline struct zram *dev_to_zram(struct device *dev)
 	return (struct zram *)dev_to_disk(dev)->private_data;
 }
 
-static ssize_t compact_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+/* flag operations needs meta->tb_lock */
+static int zram_test_flag(struct zram_meta *meta, u32 index,
+			enum zram_pageflags flag)
 {
-	unsigned long nr_migrated;
-	struct zram *zram = dev_to_zram(dev);
-	struct zram_meta *meta;
+	return meta->table[index].value & BIT(flag);
+}
 
-	down_read(&zram->init_lock);
-	if (!init_done(zram)) {
-		up_read(&zram->init_lock);
-		return -EINVAL;
-	}
+static void zram_set_flag(struct zram_meta *meta, u32 index,
+			enum zram_pageflags flag)
+{
+	meta->table[index].value |= BIT(flag);
+}
 
-	meta = zram->meta;
-	nr_migrated = zs_compact(meta->mem_pool);
-	atomic64_add(nr_migrated, &zram->stats.num_migrated);
-	up_read(&zram->init_lock);
+static void zram_clear_flag(struct zram_meta *meta, u32 index,
+			enum zram_pageflags flag)
+{
+	meta->table[index].value &= ~BIT(flag);
+}
 
-	return len;
+static size_t zram_get_obj_size(struct zram_meta *meta, u32 index)
+{
+	return meta->table[index].value & (BIT(ZRAM_FLAG_SHIFT) - 1);
 }
 
-static ssize_t disksize_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static void zram_set_obj_size(struct zram_meta *meta,
+					u32 index, size_t size)
 {
-	struct zram *zram = dev_to_zram(dev);
+	unsigned long flags = meta->table[index].value >> ZRAM_FLAG_SHIFT;
 
-	return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
+	meta->table[index].value = (flags << ZRAM_FLAG_SHIFT) | size;
+}
+
+static inline int is_partial_io(struct bio_vec *bvec)
+{
+	return bvec->bv_len != PAGE_SIZE;
+}
+
+/*
+ * Check if request is within bounds and aligned on zram logical blocks.
+ */
+static inline int valid_io_request(struct zram *zram,
+		sector_t start, unsigned int size)
+{
+	u64 end, bound;
+
+	/* unaligned request */
+	if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
+		return 0;
+	if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
+		return 0;
+
+	end = start + (size >> SECTOR_SHIFT);
+	bound = zram->disksize >> SECTOR_SHIFT;
+	/* out of range range */
+	if (unlikely(start >= bound || end > bound || start > end))
+		return 0;
+
+	/* I/O request is valid */
+	return 1;
+}
+
+static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
+{
+	if (*offset + bvec->bv_len >= PAGE_SIZE)
+		(*index)++;
+	*offset = (*offset + bvec->bv_len) % PAGE_SIZE;
+}
+
+static inline void update_used_max(struct zram *zram,
+					const unsigned long pages)
+{
+	unsigned long old_max, cur_max;
+
+	old_max = atomic_long_read(&zram->stats.max_used_pages);
+
+	do {
+		cur_max = old_max;
+		if (pages > cur_max)
+			old_max = atomic_long_cmpxchg(
+				&zram->stats.max_used_pages, cur_max, pages);
+	} while (old_max != cur_max);
+}
+
+static int page_zero_filled(void *ptr)
+{
+	unsigned int pos;
+	unsigned long *page;
+
+	page = (unsigned long *)ptr;
+
+	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
+		if (page[pos])
+			return 0;
+	}
+
+	return 1;
+}
+
+static void handle_zero_page(struct bio_vec *bvec)
+{
+	struct page *page = bvec->bv_page;
+	void *user_mem;
+
+	user_mem = kmap_atomic(page);
+	if (is_partial_io(bvec))
+		memset(user_mem + bvec->bv_offset, 0, bvec->bv_len);
+	else
+		clear_page(user_mem);
+	kunmap_atomic(user_mem);
+
+	flush_dcache_page(page);
 }
 
 static ssize_t initstate_show(struct device *dev,
@@ -112,6 +196,14 @@ static ssize_t initstate_show(struct device *dev,
 	return scnprintf(buf, PAGE_SIZE, "%u\n", val);
 }
 
+static ssize_t disksize_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
+}
+
 static ssize_t orig_data_size_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -139,19 +231,6 @@ static ssize_t mem_used_total_show(struct device *dev,
 	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
 }
 
-static ssize_t max_comp_streams_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
-{
-	int val;
-	struct zram *zram = dev_to_zram(dev);
-
-	down_read(&zram->init_lock);
-	val = zram->max_comp_streams;
-	up_read(&zram->init_lock);
-
-	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
-}
-
 static ssize_t mem_limit_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -221,6 +300,19 @@ static ssize_t mem_used_max_store(struct device *dev,
 	return len;
 }
 
+static ssize_t max_comp_streams_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	int val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->max_comp_streams;
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -278,71 +370,101 @@ static ssize_t comp_algorithm_store(struct device *dev,
 	return len;
 }
 
-/* flag operations needs meta->tb_lock */
-static int zram_test_flag(struct zram_meta *meta, u32 index,
-			enum zram_pageflags flag)
+static ssize_t compact_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
 {
-	return meta->table[index].value & BIT(flag);
-}
+	unsigned long nr_migrated;
+	struct zram *zram = dev_to_zram(dev);
+	struct zram_meta *meta;
 
-static void zram_set_flag(struct zram_meta *meta, u32 index,
-			enum zram_pageflags flag)
-{
-	meta->table[index].value |= BIT(flag);
-}
+	down_read(&zram->init_lock);
+	if (!init_done(zram)) {
+		up_read(&zram->init_lock);
+		return -EINVAL;
+	}
 
-static void zram_clear_flag(struct zram_meta *meta, u32 index,
-			enum zram_pageflags flag)
-{
-	meta->table[index].value &= ~BIT(flag);
-}
+	meta = zram->meta;
+	nr_migrated = zs_compact(meta->mem_pool);
+	atomic64_add(nr_migrated, &zram->stats.num_migrated);
+	up_read(&zram->init_lock);
 
-static size_t zram_get_obj_size(struct zram_meta *meta, u32 index)
-{
-	return meta->table[index].value & (BIT(ZRAM_FLAG_SHIFT) - 1);
+	return len;
 }
 
-static void zram_set_obj_size(struct zram_meta *meta,
-					u32 index, size_t size)
+static ssize_t io_stat_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
 {
-	unsigned long flags = meta->table[index].value >> ZRAM_FLAG_SHIFT;
+	struct zram *zram = dev_to_zram(dev);
+	ssize_t ret;
 
-	meta->table[index].value = (flags << ZRAM_FLAG_SHIFT) | size;
-}
+	down_read(&zram->init_lock);
+	ret = scnprintf(buf, PAGE_SIZE,
+			"%8llu %8llu %8llu %8llu\n",
+			(u64)atomic64_read(&zram->stats.failed_reads),
+			(u64)atomic64_read(&zram->stats.failed_writes),
+			(u64)atomic64_read(&zram->stats.invalid_io),
+			(u64)atomic64_read(&zram->stats.notify_free));
+	up_read(&zram->init_lock);
 
-static inline int is_partial_io(struct bio_vec *bvec)
-{
-	return bvec->bv_len != PAGE_SIZE;
+	return ret;
 }
 
-/*
- * Check if request is within bounds and aligned on zram logical blocks.
- */
-static inline int valid_io_request(struct zram *zram,
-		sector_t start, unsigned int size)
+static ssize_t mm_stat_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
 {
-	u64 end, bound;
+	struct zram *zram = dev_to_zram(dev);
+	u64 orig_size, mem_used = 0;
+	long max_used;
+	ssize_t ret;
 
-	/* unaligned request */
-	if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
-		return 0;
-	if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
-		return 0;
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		mem_used = zs_get_total_pages(zram->meta->mem_pool);
 
-	end = start + (size >> SECTOR_SHIFT);
-	bound = zram->disksize >> SECTOR_SHIFT;
-	/* out of range range */
-	if (unlikely(start >= bound || end > bound || start > end))
-		return 0;
+	orig_size = atomic64_read(&zram->stats.pages_stored);
+	max_used = atomic_long_read(&zram->stats.max_used_pages);
 
-	/* I/O request is valid */
-	return 1;
-}
+	ret = scnprintf(buf, PAGE_SIZE,
+			"%8llu %8llu %8llu %8lu %8ld %8llu %8llu\n",
+			orig_size << PAGE_SHIFT,
+			(u64)atomic64_read(&zram->stats.compr_data_size),
+			mem_used << PAGE_SHIFT,
+			zram->limit_pages << PAGE_SHIFT,
+			max_used << PAGE_SHIFT,
+			(u64)atomic64_read(&zram->stats.zero_pages),
+			(u64)atomic64_read(&zram->stats.num_migrated));
+	up_read(&zram->init_lock);
 
-static void zram_meta_free(struct zram_meta *meta, u64 disksize)
-{
-	size_t num_pages = disksize >> PAGE_SHIFT;
-	size_t index;
+	return ret;
+}
+
+static DEVICE_ATTR_RO(io_stat);
+static DEVICE_ATTR_RO(mm_stat);
+ZRAM_ATTR_RO(num_reads);
+ZRAM_ATTR_RO(num_writes);
+ZRAM_ATTR_RO(failed_reads);
+ZRAM_ATTR_RO(failed_writes);
+ZRAM_ATTR_RO(invalid_io);
+ZRAM_ATTR_RO(notify_free);
+ZRAM_ATTR_RO(zero_pages);
+ZRAM_ATTR_RO(compr_data_size);
+
+static inline bool zram_meta_get(struct zram *zram)
+{
+	if (atomic_inc_not_zero(&zram->refcount))
+		return true;
+	return false;
+}
+
+static inline void zram_meta_put(struct zram *zram)
+{
+	atomic_dec(&zram->refcount);
+}
+
+static void zram_meta_free(struct zram_meta *meta, u64 disksize)
+{
+	size_t num_pages = disksize >> PAGE_SHIFT;
+	size_t index;
 
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < num_pages; index++) {
@@ -390,56 +512,6 @@ out_error:
 	return NULL;
 }
 
-static inline bool zram_meta_get(struct zram *zram)
-{
-	if (atomic_inc_not_zero(&zram->refcount))
-		return true;
-	return false;
-}
-
-static inline void zram_meta_put(struct zram *zram)
-{
-	atomic_dec(&zram->refcount);
-}
-
-static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
-{
-	if (*offset + bvec->bv_len >= PAGE_SIZE)
-		(*index)++;
-	*offset = (*offset + bvec->bv_len) % PAGE_SIZE;
-}
-
-static int page_zero_filled(void *ptr)
-{
-	unsigned int pos;
-	unsigned long *page;
-
-	page = (unsigned long *)ptr;
-
-	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
-		if (page[pos])
-			return 0;
-	}
-
-	return 1;
-}
-
-static void handle_zero_page(struct bio_vec *bvec)
-{
-	struct page *page = bvec->bv_page;
-	void *user_mem;
-
-	user_mem = kmap_atomic(page);
-	if (is_partial_io(bvec))
-		memset(user_mem + bvec->bv_offset, 0, bvec->bv_len);
-	else
-		clear_page(user_mem);
-	kunmap_atomic(user_mem);
-
-	flush_dcache_page(page);
-}
-
-
 /*
  * To protect concurrent access to the same index entry,
  * caller should hold this table index entry's bit_spinlock to
@@ -557,21 +629,6 @@ out_cleanup:
 	return ret;
 }
 
-static inline void update_used_max(struct zram *zram,
-					const unsigned long pages)
-{
-	unsigned long old_max, cur_max;
-
-	old_max = atomic_long_read(&zram->stats.max_used_pages);
-
-	do {
-		cur_max = old_max;
-		if (pages > cur_max)
-			old_max = atomic_long_cmpxchg(
-				&zram->stats.max_used_pages, cur_max, pages);
-	} while (old_max != cur_max);
-}
-
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 			   int offset)
 {
@@ -699,35 +756,6 @@ out:
 	return ret;
 }
 
-static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
-			int offset, int rw)
-{
-	unsigned long start_time = jiffies;
-	int ret;
-
-	generic_start_io_acct(rw, bvec->bv_len >> SECTOR_SHIFT,
-			&zram->disk->part0);
-
-	if (rw == READ) {
-		atomic64_inc(&zram->stats.num_reads);
-		ret = zram_bvec_read(zram, bvec, index, offset);
-	} else {
-		atomic64_inc(&zram->stats.num_writes);
-		ret = zram_bvec_write(zram, bvec, index, offset);
-	}
-
-	generic_end_io_acct(rw, &zram->disk->part0, start_time);
-
-	if (unlikely(ret)) {
-		if (rw == READ)
-			atomic64_inc(&zram->stats.failed_reads);
-		else
-			atomic64_inc(&zram->stats.failed_writes);
-	}
-
-	return ret;
-}
-
 /*
  * zram_bio_discard - handler on discard request
  * @index: physical block index in PAGE_SIZE units
@@ -767,149 +795,32 @@ static void zram_bio_discard(struct zram *zram, u32 index,
 	}
 }
 
-static void zram_reset_device(struct zram *zram)
-{
-	struct zram_meta *meta;
-	struct zcomp *comp;
-	u64 disksize;
-
-	down_write(&zram->init_lock);
-
-	zram->limit_pages = 0;
-
-	if (!init_done(zram)) {
-		up_write(&zram->init_lock);
-		return;
-	}
-
-	meta = zram->meta;
-	comp = zram->comp;
-	disksize = zram->disksize;
-	/*
-	 * Refcount will go down to 0 eventually and r/w handler
-	 * cannot handle further I/O so it will bail out by
-	 * check zram_meta_get.
-	 */
-	zram_meta_put(zram);
-	/*
-	 * We want to free zram_meta in process context to avoid
-	 * deadlock between reclaim path and any other locks.
-	 */
-	wait_event(zram->io_done, atomic_read(&zram->refcount) == 0);
-
-	/* Reset stats */
-	memset(&zram->stats, 0, sizeof(zram->stats));
-	zram->disksize = 0;
-	zram->max_comp_streams = 1;
-	set_capacity(zram->disk, 0);
-
-	up_write(&zram->init_lock);
-	/* I/O operation under all of CPU are done so let's free */
-	zram_meta_free(meta, disksize);
-	zcomp_destroy(comp);
-}
-
-static ssize_t disksize_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
-{
-	u64 disksize;
-	struct zcomp *comp;
-	struct zram_meta *meta;
-	struct zram *zram = dev_to_zram(dev);
-	int err;
-
-	disksize = memparse(buf, NULL);
-	if (!disksize)
-		return -EINVAL;
-
-	disksize = PAGE_ALIGN(disksize);
-	meta = zram_meta_alloc(zram->disk->first_minor, disksize);
-	if (!meta)
-		return -ENOMEM;
-
-	comp = zcomp_create(zram->compressor, zram->max_comp_streams);
-	if (IS_ERR(comp)) {
-		pr_info("Cannot initialise %s compressing backend\n",
-				zram->compressor);
-		err = PTR_ERR(comp);
-		goto out_free_meta;
-	}
-
-	down_write(&zram->init_lock);
-	if (init_done(zram)) {
-		pr_info("Cannot change disksize for initialized device\n");
-		err = -EBUSY;
-		goto out_destroy_comp;
-	}
-
-	init_waitqueue_head(&zram->io_done);
-	atomic_set(&zram->refcount, 1);
-	zram->meta = meta;
-	zram->comp = comp;
-	zram->disksize = disksize;
-	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
-	up_write(&zram->init_lock);
-
-	/*
-	 * Revalidate disk out of the init_lock to avoid lockdep splat.
-	 * It's okay because disk's capacity is protected by init_lock
-	 * so that revalidate_disk always sees up-to-date capacity.
-	 */
-	revalidate_disk(zram->disk);
-
-	return len;
-
-out_destroy_comp:
-	up_write(&zram->init_lock);
-	zcomp_destroy(comp);
-out_free_meta:
-	zram_meta_free(meta, disksize);
-	return err;
-}
-
-static ssize_t reset_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
+			int offset, int rw)
 {
+	unsigned long start_time = jiffies;
 	int ret;
-	unsigned short do_reset;
-	struct zram *zram;
-	struct block_device *bdev;
-
-	zram = dev_to_zram(dev);
-	bdev = bdget_disk(zram->disk, 0);
 
-	if (!bdev)
-		return -ENOMEM;
+	generic_start_io_acct(rw, bvec->bv_len >> SECTOR_SHIFT,
+			&zram->disk->part0);
 
-	mutex_lock(&bdev->bd_mutex);
-	/* Do not reset an active device! */
-	if (bdev->bd_openers) {
-		ret = -EBUSY;
-		goto out;
+	if (rw == READ) {
+		atomic64_inc(&zram->stats.num_reads);
+		ret = zram_bvec_read(zram, bvec, index, offset);
+	} else {
+		atomic64_inc(&zram->stats.num_writes);
+		ret = zram_bvec_write(zram, bvec, index, offset);
 	}
 
-	ret = kstrtou16(buf, 10, &do_reset);
-	if (ret)
-		goto out;
+	generic_end_io_acct(rw, &zram->disk->part0, start_time);
 
-	if (!do_reset) {
-		ret = -EINVAL;
-		goto out;
+	if (unlikely(ret)) {
+		if (rw == READ)
+			atomic64_inc(&zram->stats.failed_reads);
+		else
+			atomic64_inc(&zram->stats.failed_writes);
 	}
 
-	/* Make sure all pending I/O is finished */
-	fsync_bdev(bdev);
-	zram_reset_device(zram);
-
-	mutex_unlock(&bdev->bd_mutex);
-	revalidate_disk(zram->disk);
-	bdput(bdev);
-
-	return len;
-
-out:
-	mutex_unlock(&bdev->bd_mutex);
-	bdput(bdev);
 	return ret;
 }
 
@@ -1049,80 +960,168 @@ out:
 	return err;
 }
 
-static const struct block_device_operations zram_devops = {
-	.swap_slot_free_notify = zram_slot_free_notify,
-	.rw_page = zram_rw_page,
-	.owner = THIS_MODULE
-};
+static void zram_reset_device(struct zram *zram)
+{
+	struct zram_meta *meta;
+	struct zcomp *comp;
+	u64 disksize;
 
-static DEVICE_ATTR_WO(compact);
-static DEVICE_ATTR_RW(disksize);
-static DEVICE_ATTR_RO(initstate);
-static DEVICE_ATTR_WO(reset);
-static DEVICE_ATTR_RO(orig_data_size);
-static DEVICE_ATTR_RO(mem_used_total);
-static DEVICE_ATTR_RW(mem_limit);
-static DEVICE_ATTR_RW(mem_used_max);
-static DEVICE_ATTR_RW(max_comp_streams);
-static DEVICE_ATTR_RW(comp_algorithm);
+	down_write(&zram->init_lock);
 
-static ssize_t io_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+	zram->limit_pages = 0;
+
+	if (!init_done(zram)) {
+		up_write(&zram->init_lock);
+		return;
+	}
+
+	meta = zram->meta;
+	comp = zram->comp;
+	disksize = zram->disksize;
+	/*
+	 * Refcount will go down to 0 eventually and r/w handler
+	 * cannot handle further I/O so it will bail out by
+	 * check zram_meta_get.
+	 */
+	zram_meta_put(zram);
+	/*
+	 * We want to free zram_meta in process context to avoid
+	 * deadlock between reclaim path and any other locks.
+	 */
+	wait_event(zram->io_done, atomic_read(&zram->refcount) == 0);
+
+	/* Reset stats */
+	memset(&zram->stats, 0, sizeof(zram->stats));
+	zram->disksize = 0;
+	zram->max_comp_streams = 1;
+	set_capacity(zram->disk, 0);
+
+	up_write(&zram->init_lock);
+	/* I/O operation under all of CPU are done so let's free */
+	zram_meta_free(meta, disksize);
+	zcomp_destroy(comp);
+}
+
+static ssize_t disksize_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
 {
+	u64 disksize;
+	struct zcomp *comp;
+	struct zram_meta *meta;
 	struct zram *zram = dev_to_zram(dev);
-	ssize_t ret;
+	int err;
 
-	down_read(&zram->init_lock);
-	ret = scnprintf(buf, PAGE_SIZE,
-			"%8llu %8llu %8llu %8llu\n",
-			(u64)atomic64_read(&zram->stats.failed_reads),
-			(u64)atomic64_read(&zram->stats.failed_writes),
-			(u64)atomic64_read(&zram->stats.invalid_io),
-			(u64)atomic64_read(&zram->stats.notify_free));
-	up_read(&zram->init_lock);
+	disksize = memparse(buf, NULL);
+	if (!disksize)
+		return -EINVAL;
 
-	return ret;
+	disksize = PAGE_ALIGN(disksize);
+	meta = zram_meta_alloc(zram->disk->first_minor, disksize);
+	if (!meta)
+		return -ENOMEM;
+
+	comp = zcomp_create(zram->compressor, zram->max_comp_streams);
+	if (IS_ERR(comp)) {
+		pr_info("Cannot initialise %s compressing backend\n",
+				zram->compressor);
+		err = PTR_ERR(comp);
+		goto out_free_meta;
+	}
+
+	down_write(&zram->init_lock);
+	if (init_done(zram)) {
+		pr_info("Cannot change disksize for initialized device\n");
+		err = -EBUSY;
+		goto out_destroy_comp;
+	}
+
+	init_waitqueue_head(&zram->io_done);
+	atomic_set(&zram->refcount, 1);
+	zram->meta = meta;
+	zram->comp = comp;
+	zram->disksize = disksize;
+	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
+	up_write(&zram->init_lock);
+
+	/*
+	 * Revalidate disk out of the init_lock to avoid lockdep splat.
+	 * It's okay because disk's capacity is protected by init_lock
+	 * so that revalidate_disk always sees up-to-date capacity.
+	 */
+	revalidate_disk(zram->disk);
+
+	return len;
+
+out_destroy_comp:
+	up_write(&zram->init_lock);
+	zcomp_destroy(comp);
+out_free_meta:
+	zram_meta_free(meta, disksize);
+	return err;
 }
 
-static ssize_t mm_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static ssize_t reset_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
 {
-	struct zram *zram = dev_to_zram(dev);
-	u64 orig_size, mem_used = 0;
-	long max_used;
-	ssize_t ret;
+	int ret;
+	unsigned short do_reset;
+	struct zram *zram;
+	struct block_device *bdev;
 
-	down_read(&zram->init_lock);
-	if (init_done(zram))
-		mem_used = zs_get_total_pages(zram->meta->mem_pool);
+	zram = dev_to_zram(dev);
+	bdev = bdget_disk(zram->disk, 0);
 
-	orig_size = atomic64_read(&zram->stats.pages_stored);
-	max_used = atomic_long_read(&zram->stats.max_used_pages);
+	if (!bdev)
+		return -ENOMEM;
 
-	ret = scnprintf(buf, PAGE_SIZE,
-			"%8llu %8llu %8llu %8lu %8ld %8llu %8llu\n",
-			orig_size << PAGE_SHIFT,
-			(u64)atomic64_read(&zram->stats.compr_data_size),
-			mem_used << PAGE_SHIFT,
-			zram->limit_pages << PAGE_SHIFT,
-			max_used << PAGE_SHIFT,
-			(u64)atomic64_read(&zram->stats.zero_pages),
-			(u64)atomic64_read(&zram->stats.num_migrated));
-	up_read(&zram->init_lock);
+	mutex_lock(&bdev->bd_mutex);
+	/* Do not reset an active device! */
+	if (bdev->bd_openers) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	ret = kstrtou16(buf, 10, &do_reset);
+	if (ret)
+		goto out;
+
+	if (!do_reset) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* Make sure all pending I/O is finished */
+	fsync_bdev(bdev);
+	zram_reset_device(zram);
+
+	mutex_unlock(&bdev->bd_mutex);
+	revalidate_disk(zram->disk);
+	bdput(bdev);
+
+	return len;
 
+out:
+	mutex_unlock(&bdev->bd_mutex);
+	bdput(bdev);
 	return ret;
 }
 
-static DEVICE_ATTR_RO(io_stat);
-static DEVICE_ATTR_RO(mm_stat);
-ZRAM_ATTR_RO(num_reads);
-ZRAM_ATTR_RO(num_writes);
-ZRAM_ATTR_RO(failed_reads);
-ZRAM_ATTR_RO(failed_writes);
-ZRAM_ATTR_RO(invalid_io);
-ZRAM_ATTR_RO(notify_free);
-ZRAM_ATTR_RO(zero_pages);
-ZRAM_ATTR_RO(compr_data_size);
+static const struct block_device_operations zram_devops = {
+	.swap_slot_free_notify = zram_slot_free_notify,
+	.rw_page = zram_rw_page,
+	.owner = THIS_MODULE
+};
+
+static DEVICE_ATTR_WO(compact);
+static DEVICE_ATTR_RW(disksize);
+static DEVICE_ATTR_RO(initstate);
+static DEVICE_ATTR_WO(reset);
+static DEVICE_ATTR_RO(orig_data_size);
+static DEVICE_ATTR_RO(mem_used_total);
+static DEVICE_ATTR_RW(mem_limit);
+static DEVICE_ATTR_RW(mem_used_max);
+static DEVICE_ATTR_RW(max_comp_streams);
+static DEVICE_ATTR_RW(comp_algorithm);
 
 static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_disksize.attr,
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 5/9] zram: remove max_num_devices limitation
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2015-04-27 13:21 ` [PATCHv3 4/9] zram: reorganize code layout Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 6/9] zram: report every added and removed device Sergey Senozhatsky
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Limiting the number of zram devices to 32 (default max_num_devices value)
is confusing, let's drop it.  A user with 2TB or 4TB of RAM, for example,
can request as many devices as he can handle.

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/blockdev/zram.txt | 4 +++-
 drivers/block/zram/zram_drv.c   | 8 +-------
 drivers/block/zram/zram_drv.h   | 6 ------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index bef4998..65e9430 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -19,7 +19,9 @@ Following shows a typical sequence of steps for using zram.
 1) Load Module:
 	modprobe zram num_devices=4
 	This creates 4 devices: /dev/zram{0,1,2,3}
-	(num_devices parameter is optional. Default: 1)
+
+num_devices parameter is optional and tells zram how many devices should be
+pre-created. Default: 1.
 
 2) Set max number of compression streams
 	Compression backend may use up to max_comp_streams compression streams,
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3d7ac0c..b875517 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1285,12 +1285,6 @@ static int __init zram_init(void)
 {
 	int ret, dev_id;
 
-	if (num_devices > max_num_devices) {
-		pr_warn("Invalid value for num_devices: %u\n",
-				num_devices);
-		return -EINVAL;
-	}
-
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_warn("Unable to get major number\n");
@@ -1320,7 +1314,7 @@ module_init(zram_init);
 module_exit(zram_exit);
 
 module_param(num_devices, uint, 0);
-MODULE_PARM_DESC(num_devices, "Number of zram devices");
+MODULE_PARM_DESC(num_devices, "Number of pre-created zram devices");
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Nitin Gupta <ngupta@vflare.org>");
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 570c598..042994e 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -20,12 +20,6 @@
 
 #include "zcomp.h"
 
-/*
- * Some arbitrary value. This is just to catch
- * invalid value for num_devices module parameter.
- */
-static const unsigned max_num_devices = 32;
-
 /*-- Configurable parameters */
 
 /*
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 6/9] zram: report every added and removed device
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2015-04-27 13:21 ` [PATCHv3 5/9] zram: remove max_num_devices limitation Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 7/9] zram: trivial: correct flag operations comment Sergey Senozhatsky
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

With dynamic device creation/removal (which will be introduced later in
the series) printing num_devices in zram_init() will not make a lot of
sense, as well as printing the number of destroyed devices in
destroy_devices(). Print per-device action (added/removed) in zram_add()
and zram_remove() instead.

Example:

[ 3645.259652] zram: Added device: zram5
[ 3646.152074] zram: Added device: zram6
[ 3650.585012] zram: Removed device: zram5
[ 3655.845584] zram: Added device: zram8
[ 3660.975223] zram: Removed device: zram6

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index b875517..ec6396c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1236,6 +1236,8 @@ static int zram_add(int device_id)
 	strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
 	zram->meta = NULL;
 	zram->max_comp_streams = 1;
+
+	pr_info("Added device: %s\n", zram->disk->disk_name);
 	return 0;
 
 out_free_disk:
@@ -1252,6 +1254,7 @@ out_free_dev:
 
 static void zram_remove(struct zram *zram)
 {
+	pr_info("Removed device: %s\n", zram->disk->disk_name);
 	/*
 	 * Remove sysfs first, so no one will perform a disksize
 	 * store while we destroy the devices
@@ -1278,7 +1281,6 @@ static void destroy_devices(void)
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
-	pr_info("Destroyed device(s)\n");
 }
 
 static int __init zram_init(void)
@@ -1297,7 +1299,6 @@ static int __init zram_init(void)
 			goto out_error;
 	}
 
-	pr_info("Created %u device(s)\n", num_devices);
 	return 0;
 
 out_error:
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 7/9] zram: trivial: correct flag operations comment
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
                   ` (5 preceding siblings ...)
  2015-04-27 13:21 ` [PATCHv3 6/9] zram: report every added and removed device Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 8/9] zram: return zram device_id from zram_add() Sergey Senozhatsky
                   ` (2 subsequent siblings)
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We don't have meta->tb_lock anymore and use meta table entry bit_spin_lock
instead. update corresponding comment.

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index ec6396c..37a753c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -70,7 +70,7 @@ static inline struct zram *dev_to_zram(struct device *dev)
 	return (struct zram *)dev_to_disk(dev)->private_data;
 }
 
-/* flag operations needs meta->tb_lock */
+/* flag operations require table entry bit_spin_lock() being held */
 static int zram_test_flag(struct zram_meta *meta, u32 index,
 			enum zram_pageflags flag)
 {
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 8/9] zram: return zram device_id from zram_add()
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
                   ` (6 preceding siblings ...)
  2015-04-27 13:21 ` [PATCHv3 7/9] zram: trivial: correct flag operations comment Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-27 13:21 ` [PATCHv3 9/9] zram: add dynamic device add/remove functionality Sergey Senozhatsky
  2015-04-27 13:41 ` [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This patch prepares zram to enable on-demand device creation.
zram_add() performs automatic device_id assignment and returns
new device id (>= 0) or error code (< 0).

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 37a753c..3df4394 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1151,20 +1151,24 @@ static struct attribute_group zram_disk_attr_group = {
 	.attrs = zram_disk_attrs,
 };
 
-static int zram_add(int device_id)
+/*
+ * Allocate and initialize new zram device. the function returns
+ * '>= 0' device_id upon success, and negative value otherwise.
+ */
+static int zram_add(void)
 {
 	struct zram *zram;
 	struct request_queue *queue;
-	int ret;
+	int ret, device_id;
 
 	zram = kzalloc(sizeof(struct zram), GFP_KERNEL);
 	if (!zram)
 		return -ENOMEM;
 
-	ret = idr_alloc(&zram_index_idr, zram, device_id,
-			device_id + 1, GFP_KERNEL);
+	ret = idr_alloc(&zram_index_idr, zram, 0, 0, GFP_KERNEL);
 	if (ret < 0)
 		goto out_free_dev;
+	device_id = ret;
 
 	init_rwsem(&zram->init_lock);
 
@@ -1238,7 +1242,7 @@ static int zram_add(int device_id)
 	zram->max_comp_streams = 1;
 
 	pr_info("Added device: %s\n", zram->disk->disk_name);
-	return 0;
+	return device_id;
 
 out_free_disk:
 	del_gendisk(zram->disk);
@@ -1285,7 +1289,7 @@ static void destroy_devices(void)
 
 static int __init zram_init(void)
 {
-	int ret, dev_id;
+	int ret;
 
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
@@ -1293,10 +1297,11 @@ static int __init zram_init(void)
 		return -EBUSY;
 	}
 
-	for (dev_id = 0; dev_id < num_devices; dev_id++) {
-		ret = zram_add(dev_id);
-		if (ret != 0)
+	while (num_devices != 0) {
+		ret = zram_add();
+		if (ret < 0)
 			goto out_error;
+		num_devices--;
 	}
 
 	return 0;
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
                   ` (7 preceding siblings ...)
  2015-04-27 13:21 ` [PATCHv3 8/9] zram: return zram device_id from zram_add() Sergey Senozhatsky
@ 2015-04-27 13:21 ` Sergey Senozhatsky
  2015-04-29  0:16   ` Sergey Senozhatsky
  2015-04-27 13:41 ` [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
  9 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:21 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We currently don't support on-demand device creation. The one and only way
to have N zram devices is to specify num_devices module parameter (default
value: 1). IOW if, for some reason, at some point, user wants to have
N + 1 devies he/she must umount all the existing devices, unload the
module, load the module passing num_devices equals to N + 1. And do this
again, if needed.

This patch introduces zram control sysfs class, which has two sysfs
attrs:
- zram_add      -- add a new zram device
- zram_remove   -- remove a specific (device_id) zram device

zram_add sysfs attr is read-only and has only automatic device id
assignment mode (as requested by Minchan Kim). read operation performed
on this attr creates a new zram device and returns back its device_id or
error status.

Usage example:
	# add a new specific zram device
	cat /sys/class/zram-control/zram_add
	2

	# remove a specific zram device
	echo 4 > /sys/class/zram-control/zram_remove

Returning zram_add() error code back to user (-ENOMEM in this case)

	cat /sys/class/zram-control/zram_add
	cat: /sys/class/zram-control/zram_add: Cannot allocate memory

NOTE, there might be users who already depend on the fact that at least
zram0 device gets always created by zram_init(). Preserve this behavior.

Reported-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/blockdev/zram.txt |  23 +++++++-
 drivers/block/zram/zram_drv.c   | 120 +++++++++++++++++++++++++++++++++-------
 2 files changed, 120 insertions(+), 23 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 65e9430..fc686d4 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -99,7 +99,24 @@ size of the disk when not in use so a huge zram is wasteful.
 	mkfs.ext4 /dev/zram1
 	mount /dev/zram1 /tmp
 
-7) Stats:
+7) Add/remove zram devices
+
+zram provides a control interface, which enables dynamic (on-demand) device
+addition and removal.
+
+In order to add a new /dev/zramX device, perform read operation on zram_add
+attribute. This will return either new device's device id (meaning that you
+can use /dev/zram<id>) or error code.
+
+Example:
+	cat /sys/class/zram-control/zram_add
+	1
+
+To remove the existing /dev/zramX device (where X is a device id)
+execute
+	echo X > /sys/class/zram-control/zram_remove
+
+8) Stats:
 Per-device statistics are exported as various nodes under /sys/block/zram<id>/
 
 A brief description of exported device attritbutes. For more details please
@@ -174,11 +191,11 @@ line of text and contains the following stats separated by whitespace:
 	zero_pages
 	num_migrated
 
-8) Deactivate:
+9) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
-9) Reset:
+10) Reset:
 	Write any positive value to 'reset' sysfs node
 	echo 1 > /sys/block/zram0/reset
 	echo 1 > /sys/block/zram1/reset
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3df4394..fa530aa 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -29,10 +29,14 @@
 #include <linux/vmalloc.h>
 #include <linux/err.h>
 #include <linux/idr.h>
+#include <linux/sysfs.h>
 
 #include "zram_drv.h"
 
 static DEFINE_IDR(zram_index_idr);
+/* idr index must be protected */
+static DEFINE_MUTEX(zram_index_mutex);
+
 static int zram_major;
 static const char *default_compressor = "lzo";
 
@@ -1068,26 +1072,23 @@ static ssize_t reset_store(struct device *dev,
 	struct zram *zram;
 	struct block_device *bdev;
 
+	ret = kstrtou16(buf, 10, &do_reset);
+	if (ret)
+		return ret;
+
+	if (!do_reset)
+		return -EINVAL;
+
 	zram = dev_to_zram(dev);
 	bdev = bdget_disk(zram->disk, 0);
-
 	if (!bdev)
 		return -ENOMEM;
 
 	mutex_lock(&bdev->bd_mutex);
 	/* Do not reset an active device! */
 	if (bdev->bd_openers) {
-		ret = -EBUSY;
-		goto out;
-	}
-
-	ret = kstrtou16(buf, 10, &do_reset);
-	if (ret)
-		goto out;
-
-	if (!do_reset) {
-		ret = -EINVAL;
-		goto out;
+		mutex_unlock(&bdev->bd_mutex);
+		return -EBUSY;
 	}
 
 	/* Make sure all pending I/O is finished */
@@ -1099,11 +1100,6 @@ static ssize_t reset_store(struct device *dev,
 	bdput(bdev);
 
 	return len;
-
-out:
-	mutex_unlock(&bdev->bd_mutex);
-	bdput(bdev);
-	return ret;
 }
 
 static const struct block_device_operations zram_devops = {
@@ -1256,24 +1252,98 @@ out_free_dev:
 	return ret;
 }
 
-static void zram_remove(struct zram *zram)
+static int zram_remove(struct zram *zram)
 {
-	pr_info("Removed device: %s\n", zram->disk->disk_name);
+	struct block_device *bdev;
+
+	bdev = bdget_disk(zram->disk, 0);
+	if (!bdev)
+		return -ENOMEM;
+
+	mutex_lock(&bdev->bd_mutex);
+	if (bdev->bd_openers) {
+		mutex_unlock(&bdev->bd_mutex);
+		return -EBUSY;
+	}
+
 	/*
 	 * Remove sysfs first, so no one will perform a disksize
-	 * store while we destroy the devices
+	 * store while we destroy the devices. This also helps during
+	 * zram_remove() -- device_reset() is the last holder of
+	 * ->init_lock.
 	 */
 	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
 			&zram_disk_attr_group);
 
+	/* Make sure all pending I/O is finished */
+	fsync_bdev(bdev);
 	zram_reset_device(zram);
+	mutex_unlock(&bdev->bd_mutex);
+
+	pr_info("Removed device: %s\n", zram->disk->disk_name);
+
 	idr_remove(&zram_index_idr, zram->disk->first_minor);
 	blk_cleanup_queue(zram->disk->queue);
 	del_gendisk(zram->disk);
 	put_disk(zram->disk);
 	kfree(zram);
+
+	return 0;
 }
 
+/* zram module control sysfs attributes */
+static ssize_t zram_add_show(struct class *class,
+			struct class_attribute *attr,
+			char *buf)
+{
+	int ret;
+
+	mutex_lock(&zram_index_mutex);
+	ret = zram_add();
+	mutex_unlock(&zram_index_mutex);
+
+	if (ret < 0)
+		return ret;
+	return scnprintf(buf, PAGE_SIZE, "%d\n", ret);
+}
+
+static ssize_t zram_remove_store(struct class *class,
+			struct class_attribute *attr,
+			const char *buf,
+			size_t count)
+{
+	struct zram *zram;
+	int ret, dev_id;
+
+	/* dev_id is gendisk->first_minor, which is `int' */
+	ret = kstrtoint(buf, 10, &dev_id);
+	if (ret || dev_id < 0)
+		return -EINVAL;
+
+	mutex_lock(&zram_index_mutex);
+
+	zram = idr_find(&zram_index_idr, dev_id);
+	if (zram)
+		ret = zram_remove(zram);
+	else
+		ret = -ENODEV;
+
+	mutex_unlock(&zram_index_mutex);
+	return ret ? ret : count;
+}
+
+static struct class_attribute zram_control_class_attrs[] = {
+	__ATTR_RO(zram_add),
+	__ATTR_WO(zram_remove),
+	__ATTR_NULL,
+};
+
+static struct class zram_control_class = {
+	.name		= "zram-control",
+	.owner		= THIS_MODULE,
+	.class_attrs	= zram_control_class_attrs,
+};
+
 static int zram_remove_cb(int id, void *ptr, void *data)
 {
 	zram_remove(ptr);
@@ -1282,6 +1352,7 @@ static int zram_remove_cb(int id, void *ptr, void *data)
 
 static void destroy_devices(void)
 {
+	class_unregister(&zram_control_class);
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
@@ -1291,14 +1362,23 @@ static int __init zram_init(void)
 {
 	int ret;
 
+	ret = class_register(&zram_control_class);
+	if (ret) {
+		pr_warn("Unable to register zram-control class\n");
+		return ret;
+	}
+
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_warn("Unable to get major number\n");
+		class_unregister(&zram_control_class);
 		return -EBUSY;
 	}
 
 	while (num_devices != 0) {
+		mutex_lock(&zram_index_mutex);
 		ret = zram_add();
+		mutex_unlock(&zram_index_mutex);
 		if (ret < 0)
 			goto out_error;
 		num_devices--;
-- 
2.4.0.rc3.3.g6eb1401


^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 0/9] introduce on-demand device creation
  2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
                   ` (8 preceding siblings ...)
  2015-04-27 13:21 ` [PATCHv3 9/9] zram: add dynamic device add/remove functionality Sergey Senozhatsky
@ 2015-04-27 13:41 ` Sergey Senozhatsky
  9 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-27 13:41 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

On (04/27/15 22:21), Sergey Senozhatsky wrote:
> We currently don't support zram on-demand device creation.  The only way
> to have N zram devices is to specify num_devices module parameter (default
> value 1).  That means that if, for some reason, at some point, user wants
> to have N + 1 devies he/she must umount all the existing devices, unload
> the module, load the module passing num_devices equals to N + 1.

argh, please ignore this series. it causes lockdep warning. will resend.

sorry for the noise.

	-ss


> This patchset introduces zram-control sysfs class, which has two sysfs
> attrs:
> 
>  - zram_add     -- add a new zram device
>  - zram_remove  -- remove a specific (device_id) zram device
> 
>     Usage example:
>         # add a new specific zram device
>         cat /sys/class/zram-control/zram_add
>         1
> 
>         # remove a specific zram device
>         echo 4 > /sys/class/zram-control/zram_remove
> 
> 
> V3:
> -- rebase against 4.1
> -- review comments from Minchan were addressed
> -- no sysfs RO tricks anymore
> 
> V2:
> -- quick rebase and cleanup in attempt to catch 4.1 merge window
> 
> Sergey Senozhatsky (9):
>   zram: add `compact` sysfs entry to documentation
>   zram: cosmetic ZRAM_ATTR_RO code formatting tweak
>   zram: use idr instead of `zram_devices' array
>   zram: reorganize code layout
>   zram: remove max_num_devices limitation
>   zram: report every added and removed device
>   zram: trivial: correct flag operations comment
>   zram: return zram device_id from zram_add()
>   zram: add dynamic device add/remove functionality
> 
>  Documentation/blockdev/zram.txt |  29 +-
>  drivers/block/zram/zram_drv.c   | 975 ++++++++++++++++++++++------------------
>  drivers/block/zram/zram_drv.h   |   6 -
>  3 files changed, 558 insertions(+), 452 deletions(-)
> 
> -- 
> 2.4.0.rc3.3.g6eb1401
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-27 13:21 ` [PATCHv3 9/9] zram: add dynamic device add/remove functionality Sergey Senozhatsky
@ 2015-04-29  0:16   ` Sergey Senozhatsky
  2015-04-29  6:48     ` Minchan Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-29  0:16 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

Hello,

Minchan, a quick question. just to avoid resends and to make sure that I'm
not missing any better solution.


lockdep is unhappy here:

> -static void zram_remove(struct zram *zram)
> +static int zram_remove(struct zram *zram)
>  {
> -	pr_info("Removed device: %s\n", zram->disk->disk_name);
> +	struct block_device *bdev;
> +
> +	bdev = bdget_disk(zram->disk, 0);
> +	if (!bdev)
> +		return -ENOMEM;
> +
> +	mutex_lock(&bdev->bd_mutex);
> +	if (bdev->bd_openers) {
> +		mutex_unlock(&bdev->bd_mutex);
> +		return -EBUSY;
> +	}
> +
>  	/*
>  	 * Remove sysfs first, so no one will perform a disksize
> -	 * store while we destroy the devices
> +	 * store while we destroy the devices. This also helps during
> +	 * zram_remove() -- device_reset() is the last holder of
> +	 * ->init_lock.
>  	 */
>  	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
>  			&zram_disk_attr_group);
>  
> +	/* Make sure all pending I/O is finished */
> +	fsync_bdev(bdev);
>  	zram_reset_device(zram);
> +	mutex_unlock(&bdev->bd_mutex);
> +
> +	pr_info("Removed device: %s\n", zram->disk->disk_name);
> +
>  	idr_remove(&zram_index_idr, zram->disk->first_minor);
>  	blk_cleanup_queue(zram->disk->queue);
>  	del_gendisk(zram->disk);
>  	put_disk(zram->disk);
>  	kfree(zram);
> +
> +	return 0;
>  }


lock ->bd_mutex
	if ->bd_openers {
		unlock ->bd_mutex
		return -EBUSY;
	}

	sysfs_remove_group()
		^^^^^^^^^ lockdep splat
	zram_reset_device()
		lock ->init_lock
		reset device
		unlock ->init_lock
unlock ->bd_mutex
...
kfree zram


why did I do this:

sysfs_remove_group() turns zram_reset_device() into the last possible ->init_lock
owner: there are no sysfs nodes before final zram_reset_device(), so no
concurrent nor later store()/show() sysfs handler can be called. it closes a number
of race conditions, like:

	CPU0			CPU1
	umount
	zram_remove()
	zram_reset_device()	disksize_store()
				mount
	kfree zram

or

	CPU0				CPU1
	umount
	zram_remove()
	zram_reset_device()
					cat /sys/block/zram0/_any_sysfs_node_
	sysfs_remove_group()
	kfree zram			_any_sysfs_node_read()


and so on. so removing sysfs group before zram_reset_device() makes sense.

at the same time we need to prevent `umount-zram_remove vs. mount' race and forbid
zram_remove() on active device. so we check ->bd_openers and perform device reset
under ->bd_mutex.


a quick solution I can think of is to do something like this:

sysfs_remove_group();
lock ->bd_mutex
	if ->bd_openers {
		unlock ->bd_mutex
		create_sysfs_group()
			^^^^^^^^   return attrs back
		return -EBUSY
	}

	zram_reset_device()
		lock ->init_lock
		reset device
		unlock ->init_lock
unlock ->bd_mutex
...
kfree zram


iow, move sysfs_remove_group() out of ->bd_mutex lock, but keep it
before ->bd_openers check & zram_reset_device().

I don't think that we can handle it with only ->init_lock. we need to unlock
it at some point and kfree zram that contains that lock. and we have no idea
are there any lock owners or waiters when we kfree zram. removing sysfs group
and acquiring ->init_lock for write in zram_reset_device() guarantee that
there will no further ->init_lock owners and we can safely kfree after
`unlock ->init_lock`. hence, sysfs_remove_group() must happen before
zram_reset_device().


create_sysfs_group() potentially can fail. which is a bit ugly. but user
will be able to umount device and remove it anyway.


what do you think?

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-29  0:16   ` Sergey Senozhatsky
@ 2015-04-29  6:48     ` Minchan Kim
  2015-04-29  7:02       ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Minchan Kim @ 2015-04-29  6:48 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky

Hello Sergey,

On Wed, Apr 29, 2015 at 09:16:24AM +0900, Sergey Senozhatsky wrote:
> Hello,
> 
> Minchan, a quick question. just to avoid resends and to make sure that I'm
> not missing any better solution.
> 
> 
> lockdep is unhappy here:
> 
> > -static void zram_remove(struct zram *zram)
> > +static int zram_remove(struct zram *zram)
> >  {
> > -	pr_info("Removed device: %s\n", zram->disk->disk_name);
> > +	struct block_device *bdev;
> > +
> > +	bdev = bdget_disk(zram->disk, 0);
> > +	if (!bdev)
> > +		return -ENOMEM;
> > +
> > +	mutex_lock(&bdev->bd_mutex);
> > +	if (bdev->bd_openers) {
> > +		mutex_unlock(&bdev->bd_mutex);
> > +		return -EBUSY;
> > +	}
> > +
> >  	/*
> >  	 * Remove sysfs first, so no one will perform a disksize
> > -	 * store while we destroy the devices
> > +	 * store while we destroy the devices. This also helps during
> > +	 * zram_remove() -- device_reset() is the last holder of
> > +	 * ->init_lock.
> >  	 */
> >  	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
> >  			&zram_disk_attr_group);
> >  
> > +	/* Make sure all pending I/O is finished */
> > +	fsync_bdev(bdev);
> >  	zram_reset_device(zram);
> > +	mutex_unlock(&bdev->bd_mutex);
> > +
> > +	pr_info("Removed device: %s\n", zram->disk->disk_name);
> > +
> >  	idr_remove(&zram_index_idr, zram->disk->first_minor);
> >  	blk_cleanup_queue(zram->disk->queue);
> >  	del_gendisk(zram->disk);
> >  	put_disk(zram->disk);
> >  	kfree(zram);
> > +
> > +	return 0;
> >  }
> 
> 
> lock ->bd_mutex
> 	if ->bd_openers {
> 		unlock ->bd_mutex
> 		return -EBUSY;
> 	}
> 
> 	sysfs_remove_group()
> 		^^^^^^^^^ lockdep splat
> 	zram_reset_device()
> 		lock ->init_lock
> 		reset device
> 		unlock ->init_lock
> unlock ->bd_mutex
> ...
> kfree zram
> 
> 
> why did I do this:
> 
> sysfs_remove_group() turns zram_reset_device() into the last possible ->init_lock
> owner: there are no sysfs nodes before final zram_reset_device(), so no
> concurrent nor later store()/show() sysfs handler can be called. it closes a number
> of race conditions, like:
> 
> 	CPU0			CPU1
> 	umount
> 	zram_remove()
> 	zram_reset_device()	disksize_store()
> 				mount
> 	kfree zram
> 
> or
> 
> 	CPU0				CPU1
> 	umount
> 	zram_remove()
> 	zram_reset_device()
> 					cat /sys/block/zram0/_any_sysfs_node_
> 	sysfs_remove_group()
> 	kfree zram			_any_sysfs_node_read()
> 
> 
> and so on. so removing sysfs group before zram_reset_device() makes sense.
> 
> at the same time we need to prevent `umount-zram_remove vs. mount' race and forbid
> zram_remove() on active device. so we check ->bd_openers and perform device reset
> under ->bd_mutex.
> 

Could you explain in detail about unmount-zram_remove vs. mount race?
I guess it should be done in upper layer(e,g. VFS). Anyway, I want to be
more clear about that.

> 
> a quick solution I can think of is to do something like this:
> 
> sysfs_remove_group();
> lock ->bd_mutex
> 	if ->bd_openers {
> 		unlock ->bd_mutex
> 		create_sysfs_group()
> 			^^^^^^^^   return attrs back
> 		return -EBUSY
> 	}
> 
> 	zram_reset_device()
> 		lock ->init_lock
> 		reset device
> 		unlock ->init_lock
> unlock ->bd_mutex
> ...
> kfree zram
> 
> 
> iow, move sysfs_remove_group() out of ->bd_mutex lock, but keep it
> before ->bd_openers check & zram_reset_device().
> 
> I don't think that we can handle it with only ->init_lock. we need to unlock
> it at some point and kfree zram that contains that lock. and we have no idea
> are there any lock owners or waiters when we kfree zram. removing sysfs group
> and acquiring ->init_lock for write in zram_reset_device() guarantee that
> there will no further ->init_lock owners and we can safely kfree after
> `unlock ->init_lock`. hence, sysfs_remove_group() must happen before
> zram_reset_device().
> 
> 
> create_sysfs_group() potentially can fail. which is a bit ugly. but user
> will be able to umount device and remove it anyway.
> 
> 
> what do you think?
> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-29  6:48     ` Minchan Kim
@ 2015-04-29  7:02       ` Sergey Senozhatsky
  2015-04-29  7:23         ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-29  7:02 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

Hello Minchan,

On (04/29/15 15:48), Minchan Kim wrote:
[..]
> > 
> > 	CPU0			CPU1
> > 	umount
> > 	zram_remove()
> > 	zram_reset_device()	disksize_store()
> > 				mount
> > 	kfree zram
> > 
> > or
> > 
> > 	CPU0				CPU1
> > 	umount
> > 	zram_remove()
> > 	zram_reset_device()
> > 					cat /sys/block/zram0/_any_sysfs_node_
> > 	sysfs_remove_group()
> > 	kfree zram			_any_sysfs_node_read()
> > 
> > 
> > and so on. so removing sysfs group before zram_reset_device() makes sense.
> > 
> > at the same time we need to prevent `umount-zram_remove vs. mount' race and forbid
> > zram_remove() on active device. so we check ->bd_openers and perform device reset
> > under ->bd_mutex.
> > 
> 
> Could you explain in detail about unmount-zram_remove vs. mount race?
> I guess it should be done in upper layer(e,g. VFS). Anyway, I want to be
> more clear about that.
> 

sure. I was talking about this one:

	CPU0			CPU1
	umount
	zram_remove()
	lock ->bd_mutex
	zram_reset_device()
	unlock ->bd_mutex
				disksize_store
				mount
				echo 'test' > /mnt/test
	kfree zram
				zram write


w/o ->bd_mutex around zram_reset_device() it's evern simpler, I guess.
hm, I don't think VFS can help us here.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-29  7:02       ` Sergey Senozhatsky
@ 2015-04-29  7:23         ` Sergey Senozhatsky
  2015-04-30  5:47           ` Minchan Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-29  7:23 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

On (04/29/15 16:02), Sergey Senozhatsky wrote:
> sure. I was talking about this one:
> 
> 	CPU0			CPU1
> 	umount
> 	zram_remove()
> 	lock ->bd_mutex
> 	zram_reset_device()
> 	unlock ->bd_mutex
> 				disksize_store
> 				mount
> 				echo 'test' > /mnt/test
> 	kfree zram
> 				zram write
> 

I'll take a look later today. currently I think of something like:


	sysfs_remove_group()
	lock ->bd_mutex
		... check ->bd_openers

	zram_reset_device()
	blk_cleanup_queue()
	del_gendisk()
	put_disk()

	unlock ->bd_mutex
	bdput bdev

	idr_remove()
	kfree(zram)


iow, idr_remove() and kfree() are done outside of ->bd_mutex lock.
but I may be wrong. haven't tested yet. but seems reasonable: we
invalidate ->bdev, delete partitions, etc., holding ->bd_mutex and
then release ->bdev, which does final put. need to check that in
detail.

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-29  7:23         ` Sergey Senozhatsky
@ 2015-04-30  5:47           ` Minchan Kim
  2015-04-30  6:34             ` Sergey Senozhatsky
  0 siblings, 1 reply; 25+ messages in thread
From: Minchan Kim @ 2015-04-30  5:47 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky

Hello Sergey,

On Wed, Apr 29, 2015 at 04:23:28PM +0900, Sergey Senozhatsky wrote:
> On (04/29/15 16:02), Sergey Senozhatsky wrote:
> > sure. I was talking about this one:
> > 
> > 	CPU0			CPU1
> > 	umount
> > 	zram_remove()
> > 	lock ->bd_mutex
> > 	zram_reset_device()
> > 	unlock ->bd_mutex
> > 				disksize_store
> > 				mount
> > 				echo 'test' > /mnt/test
> > 	kfree zram
> > 				zram write
> > 
> 
> I'll take a look later today. currently I think of something like:
> 
> 
> 	sysfs_remove_group()
> 	lock ->bd_mutex
> 		... check ->bd_openers
> 
> 	zram_reset_device()
> 	blk_cleanup_queue()
> 	del_gendisk()
> 	put_disk()
> 
> 	unlock ->bd_mutex
> 	bdput bdev
> 
> 	idr_remove()
> 	kfree(zram)
> 
> 
> iow, idr_remove() and kfree() are done outside of ->bd_mutex lock.
> but I may be wrong. haven't tested yet. but seems reasonable: we
> invalidate ->bdev, delete partitions, etc., holding ->bd_mutex and
> then release ->bdev, which does final put. need to check that in
> detail.
> 
> 	-ss


Isn't it related to bd_mutex?
I think the problem of deadlock is that you are trying to remove sysfs file
in sysfs handler.

#> echo 1 > /sys/xxx/zram_remove

kernfs_fop_write - hold s_active
  -> zram_remove_store
    -> zram_remove
      -> sysfs_remove_group - hold s_active *again*

Right?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-30  5:47           ` Minchan Kim
@ 2015-04-30  6:34             ` Sergey Senozhatsky
  2015-04-30  6:44               ` Minchan Kim
  2015-04-30  6:44               ` Sergey Senozhatsky
  0 siblings, 2 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-30  6:34 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

Hello Minchan,

On (04/30/15 14:47), Minchan Kim wrote:
[..]
> 
> Isn't it related to bd_mutex?

I think it is:

[  216.713922]  Possible unsafe locking scenario:
[  216.713923]        CPU0                    CPU1
[  216.713924]        ----                    ----
[  216.713925]   lock(&bdev->bd_mutex);
[  216.713927]                                lock(s_active#162);
[  216.713929]                                lock(&bdev->bd_mutex);
[  216.713930]   lock(s_active#162);
[  216.713932] 
                *** DEADLOCK ***

> I think the problem of deadlock is that you are trying to remove sysfs file
> in sysfs handler.
> 
> #> echo 1 > /sys/xxx/zram_remove
> 
> kernfs_fop_write - hold s_active
>   -> zram_remove_store
>     -> zram_remove
>       -> sysfs_remove_group - hold s_active *again*
> 
> Right?
> 

are those same s_active locks?


we hold (s_active#163) and (&bdev->bd_mutex) and want to acquire (s_active#162)

[  216.713934] 5 locks held by bash/342:
[  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
[  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
[  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
[  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
[  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]


full log:

[  216.713826] ======================================================
[  216.713827] [ INFO: possible circular locking dependency detected ]
[  216.713829] 4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121 Tainted: G           O   
[  216.713831] -------------------------------------------------------
[  216.713832] bash/342 is trying to acquire lock:
[  216.713833]  (s_active#162){++++.+}, at: [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
[  216.713840] 
               but task is already holding lock:
[  216.713842]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
[  216.713846] 
               which lock already depends on the new lock.

[  216.713848] 
               the existing dependency chain (in reverse order) is:
[  216.713849] 
               -> #1 (&bdev->bd_mutex){+.+.+.}:
[  216.713852]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
[  216.713856]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
[  216.713858]        [<ffffffff81528fc6>] mutex_lock_nested+0x5e/0x35f
[  216.713860]        [<ffffffff81184148>] revalidate_disk+0x4b/0x7c
[  216.713863]        [<ffffffffa02224d0>] disksize_store+0x1b1/0x1f4 [zram]
[  216.713866]        [<ffffffff813f8994>] dev_attr_store+0x19/0x23
[  216.713870]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
[  216.713872]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
[  216.713874]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
[  216.713877]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
[  216.713879]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
[  216.713881]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
[  216.713884] 
               -> #0 (s_active#162){++++.+}:
[  216.713886]        [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
[  216.713889]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
[  216.713891]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
[  216.713892]        [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
[  216.713895]        [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
[  216.713897]        [<ffffffff811b0872>] remove_files+0x42/0x67
[  216.713899]        [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
[  216.713901]        [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
[  216.713904]        [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
[  216.713906]        [<ffffffff813fe053>] class_attr_store+0x1c/0x26
[  216.713909]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
[  216.713911]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
[  216.713913]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
[  216.713915]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
[  216.713917]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
[  216.713918]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
[  216.713920] 
               other info that might help us debug this:

[  216.713922]  Possible unsafe locking scenario:

[  216.713923]        CPU0                    CPU1
[  216.713924]        ----                    ----
[  216.713925]   lock(&bdev->bd_mutex);
[  216.713927]                                lock(s_active#162);
[  216.713929]                                lock(&bdev->bd_mutex);
[  216.713930]   lock(s_active#162);
[  216.713932] 
                *** DEADLOCK ***

[  216.713934] 5 locks held by bash/342:
[  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
[  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
[  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
[  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
[  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
[  216.713954] 
               stack backtrace:
[  216.713957] CPU: 1 PID: 342 Comm: bash Tainted: G           O    4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121
[  216.713958] Hardware name: SAMSUNG ELECTRONICS CO.,LTD Samsung DeskTop System/Samsung DeskTop System, BIOS 05CC                   04/09/2010
[  216.713960]  ffffffff82400210 ffff8800ba367a28 ffffffff815265b1 ffffffff810785f2
[  216.713962]  ffffffff8242f970 ffff8800ba367a78 ffffffff8107aac7 ffffffff817bd85e
[  216.713965]  ffff8800bdeca1a0 ffff8800bdeca9c0 ffff8800bdeca998 ffff8800bdeca9c0
[  216.713967] Call Trace:
[  216.713971]  [<ffffffff815265b1>] dump_stack+0x4c/0x6e
[  216.713973]  [<ffffffff810785f2>] ? up+0x39/0x3e
[  216.713975]  [<ffffffff8107aac7>] print_circular_bug+0x2b1/0x2c2
[  216.713976]  [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
[  216.713979]  [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
[  216.713981]  [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
[  216.713983]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
[  216.713985]  [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
[  216.713987]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
[  216.713989]  [<ffffffff811adca8>] ? kernfs_find_ns+0xcd/0x10e
[  216.713990]  [<ffffffff81529294>] ? mutex_lock_nested+0x32c/0x35f
[  216.713992]  [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
[  216.713994]  [<ffffffff811b0872>] remove_files+0x42/0x67
[  216.713996]  [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
[  216.713999]  [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
[  216.714001]  [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
[  216.714003]  [<ffffffff813fe053>] class_attr_store+0x1c/0x26
[  216.714005]  [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
[  216.714007]  [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
[  216.714009]  [<ffffffff811502c2>] __vfs_write+0x26/0xbe
[  216.714011]  [<ffffffff8116b29b>] ? __close_fd+0x25/0xdd
[  216.714013]  [<ffffffff81079a27>] ? __lock_is_held+0x3c/0x57
[  216.714015]  [<ffffffff811508b2>] vfs_write+0xc0/0x145
[  216.714017]  [<ffffffff81150fd0>] SyS_write+0x51/0x8f
[  216.714019]  [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
[  216.714063] zram: Removed device: zram0


	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-30  6:34             ` Sergey Senozhatsky
@ 2015-04-30  6:44               ` Minchan Kim
  2015-04-30  6:51                 ` Sergey Senozhatsky
  2015-04-30  6:44               ` Sergey Senozhatsky
  1 sibling, 1 reply; 25+ messages in thread
From: Minchan Kim @ 2015-04-30  6:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky

On Thu, Apr 30, 2015 at 03:34:57PM +0900, Sergey Senozhatsky wrote:
> Hello Minchan,
> 
> On (04/30/15 14:47), Minchan Kim wrote:
> [..]
> > 
> > Isn't it related to bd_mutex?
> 
> I think it is:
> 
> [  216.713922]  Possible unsafe locking scenario:
> [  216.713923]        CPU0                    CPU1
> [  216.713924]        ----                    ----
> [  216.713925]   lock(&bdev->bd_mutex);
> [  216.713927]                                lock(s_active#162);
> [  216.713929]                                lock(&bdev->bd_mutex);
> [  216.713930]   lock(s_active#162);
> [  216.713932] 
>                 *** DEADLOCK ***
> 
> > I think the problem of deadlock is that you are trying to remove sysfs file
> > in sysfs handler.
> > 
> > #> echo 1 > /sys/xxx/zram_remove
> > 
> > kernfs_fop_write - hold s_active
> >   -> zram_remove_store
> >     -> zram_remove
> >       -> sysfs_remove_group - hold s_active *again*
> > 
> > Right?
> > 
> 
> are those same s_active locks?
> 
> 
> we hold (s_active#163) and (&bdev->bd_mutex) and want to acquire (s_active#162)

Thanks for sharing the message.
You're right. It's another lock so it shouldn't be a reason.
Okay, I will review it. Please give me time.

> 
> [  216.713934] 5 locks held by bash/342:
> [  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
> [  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
> [  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
> [  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
> [  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> 
> 
> full log:
> 
> [  216.713826] ======================================================
> [  216.713827] [ INFO: possible circular locking dependency detected ]
> [  216.713829] 4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121 Tainted: G           O   
> [  216.713831] -------------------------------------------------------
> [  216.713832] bash/342 is trying to acquire lock:
> [  216.713833]  (s_active#162){++++.+}, at: [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713840] 
>                but task is already holding lock:
> [  216.713842]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> [  216.713846] 
>                which lock already depends on the new lock.
> 
> [  216.713848] 
>                the existing dependency chain (in reverse order) is:
> [  216.713849] 
>                -> #1 (&bdev->bd_mutex){+.+.+.}:
> [  216.713852]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> [  216.713856]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> [  216.713858]        [<ffffffff81528fc6>] mutex_lock_nested+0x5e/0x35f
> [  216.713860]        [<ffffffff81184148>] revalidate_disk+0x4b/0x7c
> [  216.713863]        [<ffffffffa02224d0>] disksize_store+0x1b1/0x1f4 [zram]
> [  216.713866]        [<ffffffff813f8994>] dev_attr_store+0x19/0x23
> [  216.713870]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> [  216.713872]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> [  216.713874]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> [  216.713877]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
> [  216.713879]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> [  216.713881]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> [  216.713884] 
>                -> #0 (s_active#162){++++.+}:
> [  216.713886]        [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
> [  216.713889]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> [  216.713891]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> [  216.713892]        [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
> [  216.713895]        [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713897]        [<ffffffff811b0872>] remove_files+0x42/0x67
> [  216.713899]        [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
> [  216.713901]        [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
> [  216.713904]        [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
> [  216.713906]        [<ffffffff813fe053>] class_attr_store+0x1c/0x26
> [  216.713909]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> [  216.713911]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> [  216.713913]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> [  216.713915]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
> [  216.713917]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> [  216.713918]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> [  216.713920] 
>                other info that might help us debug this:
> 
> [  216.713922]  Possible unsafe locking scenario:
> 
> [  216.713923]        CPU0                    CPU1
> [  216.713924]        ----                    ----
> [  216.713925]   lock(&bdev->bd_mutex);
> [  216.713927]                                lock(s_active#162);
> [  216.713929]                                lock(&bdev->bd_mutex);
> [  216.713930]   lock(s_active#162);
> [  216.713932] 
>                 *** DEADLOCK ***
> 
> [  216.713934] 5 locks held by bash/342:
> [  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
> [  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
> [  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
> [  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
> [  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> [  216.713954] 
>                stack backtrace:
> [  216.713957] CPU: 1 PID: 342 Comm: bash Tainted: G           O    4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121
> [  216.713958] Hardware name: SAMSUNG ELECTRONICS CO.,LTD Samsung DeskTop System/Samsung DeskTop System, BIOS 05CC                   04/09/2010
> [  216.713960]  ffffffff82400210 ffff8800ba367a28 ffffffff815265b1 ffffffff810785f2
> [  216.713962]  ffffffff8242f970 ffff8800ba367a78 ffffffff8107aac7 ffffffff817bd85e
> [  216.713965]  ffff8800bdeca1a0 ffff8800bdeca9c0 ffff8800bdeca998 ffff8800bdeca9c0
> [  216.713967] Call Trace:
> [  216.713971]  [<ffffffff815265b1>] dump_stack+0x4c/0x6e
> [  216.713973]  [<ffffffff810785f2>] ? up+0x39/0x3e
> [  216.713975]  [<ffffffff8107aac7>] print_circular_bug+0x2b1/0x2c2
> [  216.713976]  [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
> [  216.713979]  [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> [  216.713981]  [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> [  216.713983]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713985]  [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
> [  216.713987]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713989]  [<ffffffff811adca8>] ? kernfs_find_ns+0xcd/0x10e
> [  216.713990]  [<ffffffff81529294>] ? mutex_lock_nested+0x32c/0x35f
> [  216.713992]  [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713994]  [<ffffffff811b0872>] remove_files+0x42/0x67
> [  216.713996]  [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
> [  216.713999]  [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
> [  216.714001]  [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
> [  216.714003]  [<ffffffff813fe053>] class_attr_store+0x1c/0x26
> [  216.714005]  [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> [  216.714007]  [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> [  216.714009]  [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> [  216.714011]  [<ffffffff8116b29b>] ? __close_fd+0x25/0xdd
> [  216.714013]  [<ffffffff81079a27>] ? __lock_is_held+0x3c/0x57
> [  216.714015]  [<ffffffff811508b2>] vfs_write+0xc0/0x145
> [  216.714017]  [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> [  216.714019]  [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> [  216.714063] zram: Removed device: zram0
> 
> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-30  6:34             ` Sergey Senozhatsky
  2015-04-30  6:44               ` Minchan Kim
@ 2015-04-30  6:44               ` Sergey Senozhatsky
  1 sibling, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-30  6:44 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

On (04/30/15 15:34), Sergey Senozhatsky wrote:
> > Isn't it related to bd_mutex?
> 
> I think it is:
> 

I meant: I think it's related

> [  216.713922]  Possible unsafe locking scenario:
> [  216.713923]        CPU0                    CPU1
> [  216.713924]        ----                    ----
> [  216.713925]   lock(&bdev->bd_mutex);
> [  216.713927]                                lock(s_active#162);
> [  216.713929]                                lock(&bdev->bd_mutex);
> [  216.713930]   lock(s_active#162);
> [  216.713932] 
>                 *** DEADLOCK ***
> 
> > I think the problem of deadlock is that you are trying to remove sysfs file
> > in sysfs handler.
> > 
> > #> echo 1 > /sys/xxx/zram_remove
> > 
> > kernfs_fop_write - hold s_active
> >   -> zram_remove_store
> >     -> zram_remove
> >       -> sysfs_remove_group - hold s_active *again*
> > 
> > Right?
> > 
> 
> are those same s_active locks?
> 

I meant: sysfs handler, s_active locks are from different kernfs nodes.
it should be fine: we remove sysfs group from another sysfs group's handler
(zram_control_class_attrs removes zram_disk_attr_group).

	-ss

> 
> we hold (s_active#163) and (&bdev->bd_mutex) and want to acquire (s_active#162)
> 
> [  216.713934] 5 locks held by bash/342:
> [  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
> [  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
> [  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
> [  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
> [  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> 
> 
> full log:
> 
> [  216.713826] ======================================================
> [  216.713827] [ INFO: possible circular locking dependency detected ]
> [  216.713829] 4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121 Tainted: G           O   
> [  216.713831] -------------------------------------------------------
> [  216.713832] bash/342 is trying to acquire lock:
> [  216.713833]  (s_active#162){++++.+}, at: [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713840] 
>                but task is already holding lock:
> [  216.713842]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> [  216.713846] 
>                which lock already depends on the new lock.
> 
> [  216.713848] 
>                the existing dependency chain (in reverse order) is:
> [  216.713849] 
>                -> #1 (&bdev->bd_mutex){+.+.+.}:
> [  216.713852]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> [  216.713856]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> [  216.713858]        [<ffffffff81528fc6>] mutex_lock_nested+0x5e/0x35f
> [  216.713860]        [<ffffffff81184148>] revalidate_disk+0x4b/0x7c
> [  216.713863]        [<ffffffffa02224d0>] disksize_store+0x1b1/0x1f4 [zram]
> [  216.713866]        [<ffffffff813f8994>] dev_attr_store+0x19/0x23
> [  216.713870]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> [  216.713872]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> [  216.713874]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> [  216.713877]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
> [  216.713879]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> [  216.713881]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> [  216.713884] 
>                -> #0 (s_active#162){++++.+}:
> [  216.713886]        [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
> [  216.713889]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> [  216.713891]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> [  216.713892]        [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
> [  216.713895]        [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713897]        [<ffffffff811b0872>] remove_files+0x42/0x67
> [  216.713899]        [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
> [  216.713901]        [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
> [  216.713904]        [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
> [  216.713906]        [<ffffffff813fe053>] class_attr_store+0x1c/0x26
> [  216.713909]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> [  216.713911]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> [  216.713913]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> [  216.713915]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
> [  216.713917]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> [  216.713918]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> [  216.713920] 
>                other info that might help us debug this:
> 
> [  216.713922]  Possible unsafe locking scenario:
> 
> [  216.713923]        CPU0                    CPU1
> [  216.713924]        ----                    ----
> [  216.713925]   lock(&bdev->bd_mutex);
> [  216.713927]                                lock(s_active#162);
> [  216.713929]                                lock(&bdev->bd_mutex);
> [  216.713930]   lock(s_active#162);
> [  216.713932] 
>                 *** DEADLOCK ***
> 
> [  216.713934] 5 locks held by bash/342:
> [  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
> [  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
> [  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
> [  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
> [  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> [  216.713954] 
>                stack backtrace:
> [  216.713957] CPU: 1 PID: 342 Comm: bash Tainted: G           O    4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121
> [  216.713958] Hardware name: SAMSUNG ELECTRONICS CO.,LTD Samsung DeskTop System/Samsung DeskTop System, BIOS 05CC                   04/09/2010
> [  216.713960]  ffffffff82400210 ffff8800ba367a28 ffffffff815265b1 ffffffff810785f2
> [  216.713962]  ffffffff8242f970 ffff8800ba367a78 ffffffff8107aac7 ffffffff817bd85e
> [  216.713965]  ffff8800bdeca1a0 ffff8800bdeca9c0 ffff8800bdeca998 ffff8800bdeca9c0
> [  216.713967] Call Trace:
> [  216.713971]  [<ffffffff815265b1>] dump_stack+0x4c/0x6e
> [  216.713973]  [<ffffffff810785f2>] ? up+0x39/0x3e
> [  216.713975]  [<ffffffff8107aac7>] print_circular_bug+0x2b1/0x2c2
> [  216.713976]  [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
> [  216.713979]  [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> [  216.713981]  [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> [  216.713983]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713985]  [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
> [  216.713987]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713989]  [<ffffffff811adca8>] ? kernfs_find_ns+0xcd/0x10e
> [  216.713990]  [<ffffffff81529294>] ? mutex_lock_nested+0x32c/0x35f
> [  216.713992]  [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> [  216.713994]  [<ffffffff811b0872>] remove_files+0x42/0x67
> [  216.713996]  [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
> [  216.713999]  [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
> [  216.714001]  [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
> [  216.714003]  [<ffffffff813fe053>] class_attr_store+0x1c/0x26
> [  216.714005]  [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> [  216.714007]  [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> [  216.714009]  [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> [  216.714011]  [<ffffffff8116b29b>] ? __close_fd+0x25/0xdd
> [  216.714013]  [<ffffffff81079a27>] ? __lock_is_held+0x3c/0x57
> [  216.714015]  [<ffffffff811508b2>] vfs_write+0xc0/0x145
> [  216.714017]  [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> [  216.714019]  [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> [  216.714063] zram: Removed device: zram0
> 
> 
> 	-ss
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-30  6:44               ` Minchan Kim
@ 2015-04-30  6:51                 ` Sergey Senozhatsky
  2015-05-04  2:20                   ` Minchan Kim
  0 siblings, 1 reply; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-04-30  6:51 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

On (04/30/15 15:44), Minchan Kim wrote:
> > > I think the problem of deadlock is that you are trying to remove sysfs file
> > > in sysfs handler.
> > > 
> > > #> echo 1 > /sys/xxx/zram_remove
> > > 
> > > kernfs_fop_write - hold s_active
> > >   -> zram_remove_store
> > >     -> zram_remove
> > >       -> sysfs_remove_group - hold s_active *again*
> > > 
> > > Right?
> > > 
> > 
> > are those same s_active locks?
> > 
> > 
> > we hold (s_active#163) and (&bdev->bd_mutex) and want to acquire (s_active#162)
> 
> Thanks for sharing the message.
> You're right. It's another lock so it shouldn't be a reason.
> Okay, I will review it. Please give me time.
> 

sure, no problem and no rush. thanks!

	-ss

> > 
> > [  216.713934] 5 locks held by bash/342:
> > [  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
> > [  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
> > [  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
> > [  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
> > [  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> > 
> > 
> > full log:
> > 
> > [  216.713826] ======================================================
> > [  216.713827] [ INFO: possible circular locking dependency detected ]
> > [  216.713829] 4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121 Tainted: G           O   
> > [  216.713831] -------------------------------------------------------
> > [  216.713832] bash/342 is trying to acquire lock:
> > [  216.713833]  (s_active#162){++++.+}, at: [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> > [  216.713840] 
> >                but task is already holding lock:
> > [  216.713842]  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> > [  216.713846] 
> >                which lock already depends on the new lock.
> > 
> > [  216.713848] 
> >                the existing dependency chain (in reverse order) is:
> > [  216.713849] 
> >                -> #1 (&bdev->bd_mutex){+.+.+.}:
> > [  216.713852]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> > [  216.713856]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> > [  216.713858]        [<ffffffff81528fc6>] mutex_lock_nested+0x5e/0x35f
> > [  216.713860]        [<ffffffff81184148>] revalidate_disk+0x4b/0x7c
> > [  216.713863]        [<ffffffffa02224d0>] disksize_store+0x1b1/0x1f4 [zram]
> > [  216.713866]        [<ffffffff813f8994>] dev_attr_store+0x19/0x23
> > [  216.713870]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> > [  216.713872]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> > [  216.713874]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> > [  216.713877]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
> > [  216.713879]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> > [  216.713881]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> > [  216.713884] 
> >                -> #0 (s_active#162){++++.+}:
> > [  216.713886]        [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
> > [  216.713889]        [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> > [  216.713891]        [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> > [  216.713892]        [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
> > [  216.713895]        [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> > [  216.713897]        [<ffffffff811b0872>] remove_files+0x42/0x67
> > [  216.713899]        [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
> > [  216.713901]        [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
> > [  216.713904]        [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
> > [  216.713906]        [<ffffffff813fe053>] class_attr_store+0x1c/0x26
> > [  216.713909]        [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> > [  216.713911]        [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> > [  216.713913]        [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> > [  216.713915]        [<ffffffff811508b2>] vfs_write+0xc0/0x145
> > [  216.713917]        [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> > [  216.713918]        [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> > [  216.713920] 
> >                other info that might help us debug this:
> > 
> > [  216.713922]  Possible unsafe locking scenario:
> > 
> > [  216.713923]        CPU0                    CPU1
> > [  216.713924]        ----                    ----
> > [  216.713925]   lock(&bdev->bd_mutex);
> > [  216.713927]                                lock(s_active#162);
> > [  216.713929]                                lock(&bdev->bd_mutex);
> > [  216.713930]   lock(s_active#162);
> > [  216.713932] 
> >                 *** DEADLOCK ***
> > 
> > [  216.713934] 5 locks held by bash/342:
> > [  216.713935]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811508a1>] vfs_write+0xaf/0x145
> > [  216.713938]  #1:  (&of->mutex){+.+.+.}, at: [<ffffffff811af1d3>] kernfs_fop_write+0x9c/0x14c
> > [  216.713942]  #2:  (s_active#163){.+.+.+}, at: [<ffffffff811af1dc>] kernfs_fop_write+0xa5/0x14c
> > [  216.713946]  #3:  (zram_index_mutex){+.+.+.}, at: [<ffffffffa022276f>] zram_remove_store+0x45/0xba [zram]
> > [  216.713950]  #4:  (&bdev->bd_mutex){+.+.+.}, at: [<ffffffffa022267b>] zram_remove+0x41/0xf0 [zram]
> > [  216.713954] 
> >                stack backtrace:
> > [  216.713957] CPU: 1 PID: 342 Comm: bash Tainted: G           O    4.1.0-rc1-next-20150430-dbg-00010-ga86accf-dirty #121
> > [  216.713958] Hardware name: SAMSUNG ELECTRONICS CO.,LTD Samsung DeskTop System/Samsung DeskTop System, BIOS 05CC                   04/09/2010
> > [  216.713960]  ffffffff82400210 ffff8800ba367a28 ffffffff815265b1 ffffffff810785f2
> > [  216.713962]  ffffffff8242f970 ffff8800ba367a78 ffffffff8107aac7 ffffffff817bd85e
> > [  216.713965]  ffff8800bdeca1a0 ffff8800bdeca9c0 ffff8800bdeca998 ffff8800bdeca9c0
> > [  216.713967] Call Trace:
> > [  216.713971]  [<ffffffff815265b1>] dump_stack+0x4c/0x6e
> > [  216.713973]  [<ffffffff810785f2>] ? up+0x39/0x3e
> > [  216.713975]  [<ffffffff8107aac7>] print_circular_bug+0x2b1/0x2c2
> > [  216.713976]  [<ffffffff8107b69e>] check_prevs_add+0x19e/0x747
> > [  216.713979]  [<ffffffff8107d806>] __lock_acquire+0x10c2/0x11cb
> > [  216.713981]  [<ffffffff8107e11c>] lock_acquire+0x13d/0x250
> > [  216.713983]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
> > [  216.713985]  [<ffffffff811adac4>] __kernfs_remove+0x1b6/0x2cd
> > [  216.713987]  [<ffffffff811ae88d>] ? kernfs_remove_by_name_ns+0x70/0x8c
> > [  216.713989]  [<ffffffff811adca8>] ? kernfs_find_ns+0xcd/0x10e
> > [  216.713990]  [<ffffffff81529294>] ? mutex_lock_nested+0x32c/0x35f
> > [  216.713992]  [<ffffffff811ae88d>] kernfs_remove_by_name_ns+0x70/0x8c
> > [  216.713994]  [<ffffffff811b0872>] remove_files+0x42/0x67
> > [  216.713996]  [<ffffffff811b0b39>] sysfs_remove_group+0x69/0x88
> > [  216.713999]  [<ffffffffa02226a0>] zram_remove+0x66/0xf0 [zram]
> > [  216.714001]  [<ffffffffa02227bf>] zram_remove_store+0x95/0xba [zram]
> > [  216.714003]  [<ffffffff813fe053>] class_attr_store+0x1c/0x26
> > [  216.714005]  [<ffffffff811afd84>] sysfs_kf_write+0x48/0x54
> > [  216.714007]  [<ffffffff811af238>] kernfs_fop_write+0x101/0x14c
> > [  216.714009]  [<ffffffff811502c2>] __vfs_write+0x26/0xbe
> > [  216.714011]  [<ffffffff8116b29b>] ? __close_fd+0x25/0xdd
> > [  216.714013]  [<ffffffff81079a27>] ? __lock_is_held+0x3c/0x57
> > [  216.714015]  [<ffffffff811508b2>] vfs_write+0xc0/0x145
> > [  216.714017]  [<ffffffff81150fd0>] SyS_write+0x51/0x8f
> > [  216.714019]  [<ffffffff8152d097>] system_call_fastpath+0x12/0x6f
> > [  216.714063] zram: Removed device: zram0
> > 
> > 
> > 	-ss
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-04-30  6:51                 ` Sergey Senozhatsky
@ 2015-05-04  2:20                   ` Minchan Kim
  2015-05-04  2:28                     ` Minchan Kim
                                       ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Minchan Kim @ 2015-05-04  2:20 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky

Hello Sergey,

On Thu, Apr 30, 2015 at 03:51:12PM +0900, Sergey Senozhatsky wrote:
> On (04/30/15 15:44), Minchan Kim wrote:
> > > > I think the problem of deadlock is that you are trying to remove sysfs file
> > > > in sysfs handler.
> > > > 
> > > > #> echo 1 > /sys/xxx/zram_remove
> > > > 
> > > > kernfs_fop_write - hold s_active
> > > >   -> zram_remove_store
> > > >     -> zram_remove
> > > >       -> sysfs_remove_group - hold s_active *again*
> > > > 
> > > > Right?
> > > > 
> > > 
> > > are those same s_active locks?
> > > 
> > > 
> > > we hold (s_active#163) and (&bdev->bd_mutex) and want to acquire (s_active#162)
> > 
> > Thanks for sharing the message.
> > You're right. It's another lock so it shouldn't be a reason.
> > Okay, I will review it. Please give me time.
> > 
> 
> sure, no problem and no rush. thanks!

I had a time to think over it.

I think your patch is rather tricky so someone cannot see sysfs
although he already opened /dev/zram but after a while he can see sysfs.
It's weired.

I want to fix it more generic way. Othewise, we might have trouble with
locking problem sometime. We already have experieced it with init_lock
although we finally fixed it.

I think we can fix it with below patch I hope it's more general and right
approach. It's based on your [zram: return zram device_id from zram_add()]

What do you think about?

>From e943df5407b880f9262ef959b270226fdc81bc9f Mon Sep 17 00:00:00 2001
From: Minchan Kim <minchan@kernel.org>
Date: Mon, 4 May 2015 08:36:07 +0900
Subject: [PATCH 1/2] zram: close race by open overriding

[1] introduced bdev->bd_mutex to protect a race between mount
and reset. At that time, we don't have dynamic zram-add/remove
feature so it was okay.

However, as we introduce dynamic device feature, bd_mutex became
trouble.

	CPU 0

echo 1 > /sys/block/zram<id>/reset
  -> kernfs->s_active(A)
    -> zram:reset_store->bd_mutex(B)

	CPU 1

echo <id> > /sys/class/zram/zram-remove
  ->zram:zram_remove: bd_mutex(B)
  -> sysfs_remove_group
    -> kernfs->s_active(A)

IOW, AB -> BA deadlock

The reason we are holding bd_mutex for zram_remove is to prevent
any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
others already have opened. But it causes above deadlock problem.

To fix the problem, this patch overrides block_device.open and
it returns -EBUSY if zram asserts he claims zram to reset so any
incoming open will be failed so we don't need to hold bd_mutex
for zram_remove ayn more.

This patch is to prepare for zram-add/remove feature.

[1] ba6b17: zram: fix umount-reset_store-mount race condition
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 42 ++++++++++++++++++++++++++++++++----------
 drivers/block/zram/zram_drv.h |  4 ++++
 2 files changed, 36 insertions(+), 10 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3df4394..7fb72dc 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1074,13 +1074,6 @@ static ssize_t reset_store(struct device *dev,
 	if (!bdev)
 		return -ENOMEM;
 
-	mutex_lock(&bdev->bd_mutex);
-	/* Do not reset an active device! */
-	if (bdev->bd_openers) {
-		ret = -EBUSY;
-		goto out;
-	}
-
 	ret = kstrtou16(buf, 10, &do_reset);
 	if (ret)
 		goto out;
@@ -1090,23 +1083,52 @@ static ssize_t reset_store(struct device *dev,
 		goto out;
 	}
 
+	mutex_lock(&bdev->bd_mutex);
+	/* Do not reset an active device or claimed device */
+	if (bdev->bd_openers || zram->claim) {
+		ret = -EBUSY;
+		mutex_unlock(&bdev->bd_mutex);
+		goto out;
+	}
+
+	/* From now on, anyone can't open /dev/zram[0-9] */
+	zram->claim = true;
+	mutex_unlock(&bdev->bd_mutex);
+
 	/* Make sure all pending I/O is finished */
 	fsync_bdev(bdev);
 	zram_reset_device(zram);
 
-	mutex_unlock(&bdev->bd_mutex);
 	revalidate_disk(zram->disk);
 	bdput(bdev);
 
-	return len;
+	mutex_lock(&bdev->bd_mutex);
+	zram->claim = false;
+	mutex_unlock(&bdev->bd_mutex);
 
+	return len;
 out:
-	mutex_unlock(&bdev->bd_mutex);
 	bdput(bdev);
 	return ret;
 }
 
+static int zram_open(struct block_device *bdev, fmode_t mode)
+{
+	int ret = 0;
+	struct zram *zram;
+
+	WARN_ON(!mutex_is_locked(&bdev->bd_mutex));
+
+	zram = bdev->bd_disk->private_data;
+	/* zram was claimed to reset so open request fails */
+	if (zram->claim)
+		ret = -EBUSY;
+
+	return ret;
+}
+
 static const struct block_device_operations zram_devops = {
+	.open = zram_open,
 	.swap_slot_free_notify = zram_slot_free_notify,
 	.rw_page = zram_rw_page,
 	.owner = THIS_MODULE
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 042994e..6dbe2df 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -115,5 +115,9 @@ struct zram {
 	 */
 	u64 disksize;	/* bytes */
 	char compressor[10];
+	/*
+	 * zram is claimed so open request will be failed
+	 */
+	bool claim; /* Protected by bdev->bd_mutex */
 };
 #endif
-- 
1.9.3

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-05-04  2:20                   ` Minchan Kim
@ 2015-05-04  2:28                     ` Minchan Kim
  2015-05-04  6:32                       ` Sergey Senozhatsky
  2015-05-04  6:29                     ` Sergey Senozhatsky
  2015-05-04 11:34                     ` Sergey Senozhatsky
  2 siblings, 1 reply; 25+ messages in thread
From: Minchan Kim @ 2015-05-04  2:28 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky

On Mon, May 04, 2015 at 11:20:08AM +0900, Minchan Kim wrote:
> Hello Sergey,
> 
> On Thu, Apr 30, 2015 at 03:51:12PM +0900, Sergey Senozhatsky wrote:
> > On (04/30/15 15:44), Minchan Kim wrote:
> > > > > I think the problem of deadlock is that you are trying to remove sysfs file
> > > > > in sysfs handler.
> > > > > 
> > > > > #> echo 1 > /sys/xxx/zram_remove
> > > > > 
> > > > > kernfs_fop_write - hold s_active
> > > > >   -> zram_remove_store
> > > > >     -> zram_remove
> > > > >       -> sysfs_remove_group - hold s_active *again*
> > > > > 
> > > > > Right?
> > > > > 
> > > > 
> > > > are those same s_active locks?
> > > > 
> > > > 
> > > > we hold (s_active#163) and (&bdev->bd_mutex) and want to acquire (s_active#162)
> > > 
> > > Thanks for sharing the message.
> > > You're right. It's another lock so it shouldn't be a reason.
> > > Okay, I will review it. Please give me time.
> > > 
> > 
> > sure, no problem and no rush. thanks!
> 
> I had a time to think over it.
> 
> I think your patch is rather tricky so someone cannot see sysfs
> although he already opened /dev/zram but after a while he can see sysfs.
> It's weired.
> 
> I want to fix it more generic way. Othewise, we might have trouble with
> locking problem sometime. We already have experieced it with init_lock
> although we finally fixed it.
> 
> I think we can fix it with below patch I hope it's more general and right
> approach. It's based on your [zram: return zram device_id from zram_add()]
> 
> What do you think about?
> 
> From e943df5407b880f9262ef959b270226fdc81bc9f Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Mon, 4 May 2015 08:36:07 +0900
> Subject: [PATCH 1/2] zram: close race by open overriding
> 
> [1] introduced bdev->bd_mutex to protect a race between mount
> and reset. At that time, we don't have dynamic zram-add/remove
> feature so it was okay.
> 
> However, as we introduce dynamic device feature, bd_mutex became
> trouble.
> 
> 	CPU 0
> 
> echo 1 > /sys/block/zram<id>/reset
>   -> kernfs->s_active(A)
>     -> zram:reset_store->bd_mutex(B)
> 
> 	CPU 1
> 
> echo <id> > /sys/class/zram/zram-remove
>   ->zram:zram_remove: bd_mutex(B)
>   -> sysfs_remove_group
>     -> kernfs->s_active(A)
> 
> IOW, AB -> BA deadlock
> 
> The reason we are holding bd_mutex for zram_remove is to prevent
> any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
> others already have opened. But it causes above deadlock problem.
> 
> To fix the problem, this patch overrides block_device.open and
> it returns -EBUSY if zram asserts he claims zram to reset so any
> incoming open will be failed so we don't need to hold bd_mutex
> for zram_remove ayn more.
> 
> This patch is to prepare for zram-add/remove feature.
> 
> [1] ba6b17: zram: fix umount-reset_store-mount race condition
> Signed-off-by: Minchan Kim <minchan@kernel.org>

If above has no problem, we could apply your last patch on top of it.

>From 5bfa8a2e312a9c8493f574b1cf513ef4693a465c Mon Sep 17 00:00:00 2001
From: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Date: Mon, 4 May 2015 09:02:23 +0900
Subject: [PATCH 2/2] zram: add dynamic device add/remove functionality

We currently don't support on-demand device creation. The one and only way
to have N zram devices is to specify num_devices module parameter (default
value: 1). IOW if, for some reason, at some point, user wants to have
N + 1 devies he/she must umount all the existing devices, unload the
module, load the module passing num_devices equals to N + 1. And do this
again, if needed.

This patch introduces zram control sysfs class, which has two sysfs
attrs:
- zram_add      -- add a new zram device
- zram_remove   -- remove a specific (device_id) zram device

zram_add sysfs attr is read-only and has only automatic device id
assignment mode (as requested by Minchan Kim). read operation performed
on this attr creates a new zram device and returns back its device_id or
error status.

Usage example:
	# add a new specific zram device
	cat /sys/class/zram-control/zram_add
	2

	# remove a specific zram device
	echo 4 > /sys/class/zram-control/zram_remove

Returning zram_add() error code back to user (-ENOMEM in this case)

	cat /sys/class/zram-control/zram_add
	cat: /sys/class/zram-control/zram_add: Cannot allocate memory

NOTE, there might be users who already depend on the fact that at least
zram0 device gets always created by zram_init(). Preserve this behavior.

[minchan]: use zram->claim to avoid lockdep splat
Reported-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/blockdev/zram.txt | 23 ++++++++--
 drivers/block/zram/zram_drv.c   | 97 +++++++++++++++++++++++++++++++++++++++--
 2 files changed, 114 insertions(+), 6 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 65e9430..fc686d4 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -99,7 +99,24 @@ size of the disk when not in use so a huge zram is wasteful.
 	mkfs.ext4 /dev/zram1
 	mount /dev/zram1 /tmp
 
-7) Stats:
+7) Add/remove zram devices
+
+zram provides a control interface, which enables dynamic (on-demand) device
+addition and removal.
+
+In order to add a new /dev/zramX device, perform read operation on zram_add
+attribute. This will return either new device's device id (meaning that you
+can use /dev/zram<id>) or error code.
+
+Example:
+	cat /sys/class/zram-control/zram_add
+	1
+
+To remove the existing /dev/zramX device (where X is a device id)
+execute
+	echo X > /sys/class/zram-control/zram_remove
+
+8) Stats:
 Per-device statistics are exported as various nodes under /sys/block/zram<id>/
 
 A brief description of exported device attritbutes. For more details please
@@ -174,11 +191,11 @@ line of text and contains the following stats separated by whitespace:
 	zero_pages
 	num_migrated
 
-8) Deactivate:
+9) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
-9) Reset:
+10) Reset:
 	Write any positive value to 'reset' sysfs node
 	echo 1 > /sys/block/zram0/reset
 	echo 1 > /sys/block/zram1/reset
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7fb72dc..97cd4f3 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -29,10 +29,14 @@
 #include <linux/vmalloc.h>
 #include <linux/err.h>
 #include <linux/idr.h>
+#include <linux/sysfs.h>
 
 #include "zram_drv.h"
 
 static DEFINE_IDR(zram_index_idr);
+/* idr index must be protected */
+static DEFINE_MUTEX(zram_index_mutex);
+
 static int zram_major;
 static const char *default_compressor = "lzo";
 
@@ -1278,24 +1282,101 @@ out_free_dev:
 	return ret;
 }
 
-static void zram_remove(struct zram *zram)
+static int zram_remove(struct zram *zram)
 {
-	pr_info("Removed device: %s\n", zram->disk->disk_name);
+	struct block_device *bdev;
+
+	bdev = bdget_disk(zram->disk, 0);
+	if (!bdev)
+		return -ENOMEM;
+
+	mutex_lock(&bdev->bd_mutex);
+	if (bdev->bd_openers || zram->claim) {
+		mutex_unlock(&bdev->bd_mutex);
+		return -EBUSY;
+	}
+
+	zram->claim = true;
+	mutex_unlock(&bdev->bd_mutex);
+
 	/*
 	 * Remove sysfs first, so no one will perform a disksize
-	 * store while we destroy the devices
+	 * store while we destroy the devices. This also helps during
+	 * zram_remove() -- device_reset() is the last holder of
+	 * ->init_lock.
 	 */
 	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
 			&zram_disk_attr_group);
 
+	/* Make sure all pending I/O is finished */
+	fsync_bdev(bdev);
 	zram_reset_device(zram);
+	mutex_unlock(&bdev->bd_mutex);
+
+	pr_info("Removed device: %s\n", zram->disk->disk_name);
+
 	idr_remove(&zram_index_idr, zram->disk->first_minor);
 	blk_cleanup_queue(zram->disk->queue);
 	del_gendisk(zram->disk);
 	put_disk(zram->disk);
 	kfree(zram);
+
+	return 0;
 }
 
+/* zram module control sysfs attributes */
+static ssize_t zram_add_show(struct class *class,
+			struct class_attribute *attr,
+			char *buf)
+{
+	int ret;
+
+	mutex_lock(&zram_index_mutex);
+	ret = zram_add();
+	mutex_unlock(&zram_index_mutex);
+
+	if (ret < 0)
+		return ret;
+	return scnprintf(buf, PAGE_SIZE, "%d\n", ret);
+}
+
+static ssize_t zram_remove_store(struct class *class,
+			struct class_attribute *attr,
+			const char *buf,
+			size_t count)
+{
+	struct zram *zram;
+	int ret, dev_id;
+
+	/* dev_id is gendisk->first_minor, which is `int' */
+	ret = kstrtoint(buf, 10, &dev_id);
+	if (ret || dev_id < 0)
+		return -EINVAL;
+
+	mutex_lock(&zram_index_mutex);
+
+	zram = idr_find(&zram_index_idr, dev_id);
+	if (zram)
+		ret = zram_remove(zram);
+	else
+		ret = -ENODEV;
+
+	mutex_unlock(&zram_index_mutex);
+	return ret ? ret : count;
+}
+
+static struct class_attribute zram_control_class_attrs[] = {
+	__ATTR_RO(zram_add),
+	__ATTR_WO(zram_remove),
+	__ATTR_NULL,
+};
+
+static struct class zram_control_class = {
+	.name		= "zram-control",
+	.owner		= THIS_MODULE,
+	.class_attrs	= zram_control_class_attrs,
+};
+
 static int zram_remove_cb(int id, void *ptr, void *data)
 {
 	zram_remove(ptr);
@@ -1304,6 +1385,7 @@ static int zram_remove_cb(int id, void *ptr, void *data)
 
 static void destroy_devices(void)
 {
+	class_unregister(&zram_control_class);
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
@@ -1313,14 +1395,23 @@ static int __init zram_init(void)
 {
 	int ret;
 
+	ret = class_register(&zram_control_class);
+	if (ret) {
+		pr_warn("Unable to register zram-control class\n");
+		return ret;
+	}
+
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_warn("Unable to get major number\n");
+		class_unregister(&zram_control_class);
 		return -EBUSY;
 	}
 
 	while (num_devices != 0) {
+		mutex_lock(&zram_index_mutex);
 		ret = zram_add();
+		mutex_unlock(&zram_index_mutex);
 		if (ret < 0)
 			goto out_error;
 		num_devices--;
-- 
1.9.3

-- 
Kind regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-05-04  2:20                   ` Minchan Kim
  2015-05-04  2:28                     ` Minchan Kim
@ 2015-05-04  6:29                     ` Sergey Senozhatsky
  2015-05-04 11:34                     ` Sergey Senozhatsky
  2 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04  6:29 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

On (05/04/15 11:20), Minchan Kim wrote:
> I had a time to think over it.
> 
> I think your patch is rather tricky so someone cannot see sysfs
> although he already opened /dev/zram but after a while he can see sysfs.
> It's weired.
> 
> I want to fix it more generic way. Othewise, we might have trouble with
> locking problem sometime. We already have experieced it with init_lock
> although we finally fixed it.
> 
> I think we can fix it with below patch I hope it's more general and right
> approach. It's based on your [zram: return zram device_id from zram_add()]
> 
> What do you think about?
> 

thanks for looking! didn't have much time over the weekend to
investigate. will take a look later today. at glance I think that
may do the trick.

	-ss

> From e943df5407b880f9262ef959b270226fdc81bc9f Mon Sep 17 00:00:00 2001
> From: Minchan Kim <minchan@kernel.org>
> Date: Mon, 4 May 2015 08:36:07 +0900
> Subject: [PATCH 1/2] zram: close race by open overriding
> 
> [1] introduced bdev->bd_mutex to protect a race between mount
> and reset. At that time, we don't have dynamic zram-add/remove
> feature so it was okay.
> 
> However, as we introduce dynamic device feature, bd_mutex became
> trouble.
> 
> 	CPU 0
> 
> echo 1 > /sys/block/zram<id>/reset
>   -> kernfs->s_active(A)
>     -> zram:reset_store->bd_mutex(B)
> 
> 	CPU 1
> 
> echo <id> > /sys/class/zram/zram-remove
>   ->zram:zram_remove: bd_mutex(B)
>   -> sysfs_remove_group
>     -> kernfs->s_active(A)
> 
> IOW, AB -> BA deadlock
> 
> The reason we are holding bd_mutex for zram_remove is to prevent
> any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
> others already have opened. But it causes above deadlock problem.
> 
> To fix the problem, this patch overrides block_device.open and
> it returns -EBUSY if zram asserts he claims zram to reset so any
> incoming open will be failed so we don't need to hold bd_mutex
> for zram_remove ayn more.
> 
> This patch is to prepare for zram-add/remove feature.
> 
> [1] ba6b17: zram: fix umount-reset_store-mount race condition
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  drivers/block/zram/zram_drv.c | 42 ++++++++++++++++++++++++++++++++----------
>  drivers/block/zram/zram_drv.h |  4 ++++
>  2 files changed, 36 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 3df4394..7fb72dc 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1074,13 +1074,6 @@ static ssize_t reset_store(struct device *dev,
>  	if (!bdev)
>  		return -ENOMEM;
>  
> -	mutex_lock(&bdev->bd_mutex);
> -	/* Do not reset an active device! */
> -	if (bdev->bd_openers) {
> -		ret = -EBUSY;
> -		goto out;
> -	}
> -
>  	ret = kstrtou16(buf, 10, &do_reset);
>  	if (ret)
>  		goto out;
> @@ -1090,23 +1083,52 @@ static ssize_t reset_store(struct device *dev,
>  		goto out;
>  	}
>  
> +	mutex_lock(&bdev->bd_mutex);
> +	/* Do not reset an active device or claimed device */
> +	if (bdev->bd_openers || zram->claim) {
> +		ret = -EBUSY;
> +		mutex_unlock(&bdev->bd_mutex);
> +		goto out;
> +	}
> +
> +	/* From now on, anyone can't open /dev/zram[0-9] */
> +	zram->claim = true;
> +	mutex_unlock(&bdev->bd_mutex);
> +
>  	/* Make sure all pending I/O is finished */
>  	fsync_bdev(bdev);
>  	zram_reset_device(zram);
>  
> -	mutex_unlock(&bdev->bd_mutex);
>  	revalidate_disk(zram->disk);
>  	bdput(bdev);
>  
> -	return len;
> +	mutex_lock(&bdev->bd_mutex);
> +	zram->claim = false;
> +	mutex_unlock(&bdev->bd_mutex);
>  
> +	return len;
>  out:
> -	mutex_unlock(&bdev->bd_mutex);
>  	bdput(bdev);
>  	return ret;
>  }
>  
> +static int zram_open(struct block_device *bdev, fmode_t mode)
> +{
> +	int ret = 0;
> +	struct zram *zram;
> +
> +	WARN_ON(!mutex_is_locked(&bdev->bd_mutex));
> +
> +	zram = bdev->bd_disk->private_data;
> +	/* zram was claimed to reset so open request fails */
> +	if (zram->claim)
> +		ret = -EBUSY;
> +
> +	return ret;
> +}
> +
>  static const struct block_device_operations zram_devops = {
> +	.open = zram_open,
>  	.swap_slot_free_notify = zram_slot_free_notify,
>  	.rw_page = zram_rw_page,
>  	.owner = THIS_MODULE
> diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
> index 042994e..6dbe2df 100644
> --- a/drivers/block/zram/zram_drv.h
> +++ b/drivers/block/zram/zram_drv.h
> @@ -115,5 +115,9 @@ struct zram {
>  	 */
>  	u64 disksize;	/* bytes */
>  	char compressor[10];
> +	/*
> +	 * zram is claimed so open request will be failed
> +	 */
> +	bool claim; /* Protected by bdev->bd_mutex */
>  };
>  #endif
> -- 
> 1.9.3
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-05-04  2:28                     ` Minchan Kim
@ 2015-05-04  6:32                       ` Sergey Senozhatsky
  0 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04  6:32 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

On (05/04/15 11:28), Minchan Kim wrote:
> [minchan]: use zram->claim to avoid lockdep splat
> Reported-by: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
> ---

will take a look today, cleanup and re-submit. thanks!


>  	/*
>  	 * Remove sysfs first, so no one will perform a disksize
> -	 * store while we destroy the devices
> +	 * store while we destroy the devices. This also helps during
> +	 * zram_remove() -- device_reset() is the last holder of
> +	 * ->init_lock.
>  	 */
>  	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
>  			&zram_disk_attr_group);
>  
> +	/* Make sure all pending I/O is finished */
> +	fsync_bdev(bdev);
>  	zram_reset_device(zram);
> +	mutex_unlock(&bdev->bd_mutex);
 -	mutex_unlock(&bdev->bd_mutex);

> +	pr_info("Removed device: %s\n", zram->disk->disk_name);
> +
>  	idr_remove(&zram_index_idr, zram->disk->first_minor);
>  	blk_cleanup_queue(zram->disk->queue);
>  	del_gendisk(zram->disk);
>  	put_disk(zram->disk);
>  	kfree(zram);
> +
> +	return 0;
>  }
[..]

	-ss

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCHv3 9/9] zram: add dynamic device add/remove functionality
  2015-05-04  2:20                   ` Minchan Kim
  2015-05-04  2:28                     ` Minchan Kim
  2015-05-04  6:29                     ` Sergey Senozhatsky
@ 2015-05-04 11:34                     ` Sergey Senozhatsky
  2 siblings, 0 replies; 25+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 11:34 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index 3df4394..7fb72dc 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1074,13 +1074,6 @@ static ssize_t reset_store(struct device *dev,
>  	if (!bdev)
>  		return -ENOMEM;
>  
> -	mutex_lock(&bdev->bd_mutex);
> -	/* Do not reset an active device! */
> -	if (bdev->bd_openers) {
> -		ret = -EBUSY;
> -		goto out;
> -	}
> -
>  	ret = kstrtou16(buf, 10, &do_reset);
>  	if (ret)
>  		goto out;
> @@ -1090,23 +1083,52 @@ static ssize_t reset_store(struct device *dev,
>  		goto out;
>  	}
>  
> +	mutex_lock(&bdev->bd_mutex);
> +	/* Do not reset an active device or claimed device */
> +	if (bdev->bd_openers || zram->claim) {
> +		ret = -EBUSY;
> +		mutex_unlock(&bdev->bd_mutex);
> +		goto out;
> +	}
> +
> +	/* From now on, anyone can't open /dev/zram[0-9] */
> +	zram->claim = true;
> +	mutex_unlock(&bdev->bd_mutex);
> +
>  	/* Make sure all pending I/O is finished */
>  	fsync_bdev(bdev);
>  	zram_reset_device(zram);
>  
> -	mutex_unlock(&bdev->bd_mutex);
>  	revalidate_disk(zram->disk);
>  	bdput(bdev);
>  
> -	return len;
> +	mutex_lock(&bdev->bd_mutex);
> +	zram->claim = false;
> +	mutex_unlock(&bdev->bd_mutex);
>  
> +	return len;
>  out:
> -	mutex_unlock(&bdev->bd_mutex);
>  	bdput(bdev);
>  	return ret;

just backported reset_store() simplification from my another patch.

make validation outisde of ->bd_mutex and before we inc ndev ref
counter in bdget_disk(). this also makes it possible to get rid of
goto label.

the rest looks ok to me. will fold into your patch and submit.

---

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 7fb72dc..f50bd66 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1068,27 +1068,24 @@ static ssize_t reset_store(struct device *dev,
        struct zram *zram;
        struct block_device *bdev;
 
+       ret = kstrtou16(buf, 10, &do_reset);
+       if (ret)
+               return ret;
+
+       if (!do_reset)
+               return -EINVAL;
+
        zram = dev_to_zram(dev);
        bdev = bdget_disk(zram->disk, 0);
-
        if (!bdev)
                return -ENOMEM;
 
-       ret = kstrtou16(buf, 10, &do_reset);
-       if (ret)
-               goto out;
-
-       if (!do_reset) {
-               ret = -EINVAL;
-               goto out;
-       }
-
        mutex_lock(&bdev->bd_mutex);
        /* Do not reset an active device or claimed device */
        if (bdev->bd_openers || zram->claim) {
-               ret = -EBUSY;
                mutex_unlock(&bdev->bd_mutex);
-               goto out;
+               bdput(bdev);
+               return -EBUSY;
        }
 
        /* From now on, anyone can't open /dev/zram[0-9] */
@@ -1107,9 +1104,6 @@ static ssize_t reset_store(struct device *dev,
        mutex_unlock(&bdev->bd_mutex);
 
        return len;
-out:
-       bdput(bdev);
-       return ret;
 }
 
 static int zram_open(struct block_device *bdev, fmode_t mode)


^ permalink raw reply related	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2015-05-04 11:35 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-04-27 13:21 [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 1/9] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 2/9] zram: cosmetic ZRAM_ATTR_RO code formatting tweak Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 3/9] zram: use idr instead of `zram_devices' array Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 4/9] zram: reorganize code layout Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 5/9] zram: remove max_num_devices limitation Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 6/9] zram: report every added and removed device Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 7/9] zram: trivial: correct flag operations comment Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 8/9] zram: return zram device_id from zram_add() Sergey Senozhatsky
2015-04-27 13:21 ` [PATCHv3 9/9] zram: add dynamic device add/remove functionality Sergey Senozhatsky
2015-04-29  0:16   ` Sergey Senozhatsky
2015-04-29  6:48     ` Minchan Kim
2015-04-29  7:02       ` Sergey Senozhatsky
2015-04-29  7:23         ` Sergey Senozhatsky
2015-04-30  5:47           ` Minchan Kim
2015-04-30  6:34             ` Sergey Senozhatsky
2015-04-30  6:44               ` Minchan Kim
2015-04-30  6:51                 ` Sergey Senozhatsky
2015-05-04  2:20                   ` Minchan Kim
2015-05-04  2:28                     ` Minchan Kim
2015-05-04  6:32                       ` Sergey Senozhatsky
2015-05-04  6:29                     ` Sergey Senozhatsky
2015-05-04 11:34                     ` Sergey Senozhatsky
2015-04-30  6:44               ` Sergey Senozhatsky
2015-04-27 13:41 ` [PATCHv3 0/9] introduce on-demand device creation Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).