linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv4 00/10] add on-demand device creation
@ 2015-05-04 12:38 Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 01/10] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
                   ` (11 more replies)
  0 siblings, 12 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We currently don't support zram on-demand device creation.  The only way
to have N zram devices is to specify num_devices module parameter (default
value 1).  That means that if, for some reason, at some point, user wants
to have N + 1 devies he/she must umount all the existing devices, unload
the module, load the module passing num_devices equals to N + 1.

This patchset introduces zram-control sysfs class, which has two sysfs
attrs:

 - zram_add     -- add a new zram device
 - zram_remove  -- remove a specific (device_id) zram device

    Usage example:
        # add a new specific zram device
        cat /sys/class/zram-control/zram_add
        1

        # remove a specific zram device
        echo 4 > /sys/class/zram-control/zram_remove

V4:
-- add patch from Minchan to handle differently a deadlock suspected by lockdep
-- use zram->claim in zram_remove()

V3:
-- rebase against 4.1
-- review comments from Minchan were addressed
-- no sysfs RO tricks anymore

V2:
-- quick rebase and cleanup in attempt to catch 4.1 merge window


Minchan Kim (1):
  zram: close race by open overriding

Sergey Senozhatsky (9):
  zram: add `compact` sysfs entry to documentation
  zram: cosmetic ZRAM_ATTR_RO code formatting tweak
  zram: use idr instead of `zram_devices' array
  zram: reorganize code layout
  zram: remove max_num_devices limitation
  zram: report every added and removed device
  zram: trivial: correct flag operations comment
  zram: return zram device_id from zram_add()
  zram: add dynamic device add/remove functionality

 Documentation/ABI/testing/sysfs-class-zram |   24 +
 Documentation/blockdev/zram.txt            |   29 +-
 drivers/block/zram/zram_drv.c              | 1002 ++++++++++++++++------------
 drivers/block/zram/zram_drv.h              |   10 +-
 4 files changed, 612 insertions(+), 453 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-class-zram

-- 
2.4.0


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv4 01/10] zram: add `compact` sysfs entry to documentation
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
@ 2015-05-04 12:38 ` Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 02/10] zram: cosmetic ZRAM_ATTR_RO code formatting tweak Sergey Senozhatsky
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Briefly describe missing `compact` sysfs entry.

Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/blockdev/zram.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 48a183e..bef4998 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -126,7 +126,7 @@ mem_used_max      RW    the maximum amount memory zram have consumed to
 mem_limit         RW    the maximum amount of memory ZRAM can use to store
                         the compressed data
 num_migrated      RO    the number of objects migrated migrated by compaction
-
+compact           WO    trigger memory compaction
 
 WARNING
 =======
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 02/10] zram: cosmetic ZRAM_ATTR_RO code formatting tweak
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 01/10] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
@ 2015-05-04 12:38 ` Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 03/10] zram: use idr instead of `zram_devices' array Sergey Senozhatsky
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Fix a misplaced backslash.

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 01ec694..41816d0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -49,7 +49,7 @@ static inline void deprecated_attr_warn(const char *name)
 }
 
 #define ZRAM_ATTR_RO(name)						\
-static ssize_t name##_show(struct device *d,		\
+static ssize_t name##_show(struct device *d,				\
 				struct device_attribute *attr, char *b)	\
 {									\
 	struct zram *zram = dev_to_zram(d);				\
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 03/10] zram: use idr instead of `zram_devices' array
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 01/10] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 02/10] zram: cosmetic ZRAM_ATTR_RO code formatting tweak Sergey Senozhatsky
@ 2015-05-04 12:38 ` Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 04/10] zram: reorganize code layout Sergey Senozhatsky
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This patch makes some preparations for on-demand device add/remove
functionality.

Remove `zram_devices' array and switch to id-to-pointer translation (idr).
idr doesn't bloat zram struct with additional members, f.e. list_head,
yet still provides ability to match the device_id with the device pointer.

No user-space visible changes.

[do not lose -ENOMEM return code when `queue' allocation has failed]
Reported-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 86 ++++++++++++++++++++++++-------------------
 1 file changed, 49 insertions(+), 37 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 41816d0..9ee0d7f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -28,12 +28,12 @@
 #include <linux/string.h>
 #include <linux/vmalloc.h>
 #include <linux/err.h>
+#include <linux/idr.h>
 
 #include "zram_drv.h"
 
-/* Globals */
+static DEFINE_IDR(zram_index_idr);
 static int zram_major;
-static struct zram *zram_devices;
 static const char *default_compressor = "lzo";
 
 /* Module params (documentation at end) */
@@ -1152,10 +1152,20 @@ static struct attribute_group zram_disk_attr_group = {
 	.attrs = zram_disk_attrs,
 };
 
-static int create_device(struct zram *zram, int device_id)
+static int zram_add(int device_id)
 {
+	struct zram *zram;
 	struct request_queue *queue;
-	int ret = -ENOMEM;
+	int ret;
+
+	zram = kzalloc(sizeof(struct zram), GFP_KERNEL);
+	if (!zram)
+		return -ENOMEM;
+
+	ret = idr_alloc(&zram_index_idr, zram, device_id,
+			device_id + 1, GFP_KERNEL);
+	if (ret < 0)
+		goto out_free_dev;
 
 	init_rwsem(&zram->init_lock);
 
@@ -1163,12 +1173,13 @@ static int create_device(struct zram *zram, int device_id)
 	if (!queue) {
 		pr_err("Error allocating disk queue for device %d\n",
 			device_id);
-		goto out;
+		ret = -ENOMEM;
+		goto out_free_idr;
 	}
 
 	blk_queue_make_request(queue, zram_make_request);
 
-	 /* gendisk structure */
+	/* gendisk structure */
 	zram->disk = alloc_disk(1);
 	if (!zram->disk) {
 		pr_warn("Error allocating disk structure for device %d\n",
@@ -1233,34 +1244,42 @@ out_free_disk:
 	put_disk(zram->disk);
 out_free_queue:
 	blk_cleanup_queue(queue);
-out:
+out_free_idr:
+	idr_remove(&zram_index_idr, device_id);
+out_free_dev:
+	kfree(zram);
 	return ret;
 }
 
-static void destroy_devices(unsigned int nr)
+static void zram_remove(struct zram *zram)
 {
-	struct zram *zram;
-	unsigned int i;
-
-	for (i = 0; i < nr; i++) {
-		zram = &zram_devices[i];
-		/*
-		 * Remove sysfs first, so no one will perform a disksize
-		 * store while we destroy the devices
-		 */
-		sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
-				&zram_disk_attr_group);
+	/*
+	 * Remove sysfs first, so no one will perform a disksize
+	 * store while we destroy the devices
+	 */
+	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
+			&zram_disk_attr_group);
 
-		zram_reset_device(zram);
+	zram_reset_device(zram);
+	idr_remove(&zram_index_idr, zram->disk->first_minor);
+	blk_cleanup_queue(zram->disk->queue);
+	del_gendisk(zram->disk);
+	put_disk(zram->disk);
+	kfree(zram);
+}
 
-		blk_cleanup_queue(zram->disk->queue);
-		del_gendisk(zram->disk);
-		put_disk(zram->disk);
-	}
+static int zram_remove_cb(int id, void *ptr, void *data)
+{
+	zram_remove(ptr);
+	return 0;
+}
 
-	kfree(zram_devices);
+static void destroy_devices(void)
+{
+	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
+	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
-	pr_info("Destroyed %u device(s)\n", nr);
+	pr_info("Destroyed device(s)\n");
 }
 
 static int __init zram_init(void)
@@ -1279,16 +1298,9 @@ static int __init zram_init(void)
 		return -EBUSY;
 	}
 
-	/* Allocate the device array and initialize each one */
-	zram_devices = kzalloc(num_devices * sizeof(struct zram), GFP_KERNEL);
-	if (!zram_devices) {
-		unregister_blkdev(zram_major, "zram");
-		return -ENOMEM;
-	}
-
 	for (dev_id = 0; dev_id < num_devices; dev_id++) {
-		ret = create_device(&zram_devices[dev_id], dev_id);
-		if (ret)
+		ret = zram_add(dev_id);
+		if (ret != 0)
 			goto out_error;
 	}
 
@@ -1296,13 +1308,13 @@ static int __init zram_init(void)
 	return 0;
 
 out_error:
-	destroy_devices(dev_id);
+	destroy_devices();
 	return ret;
 }
 
 static void __exit zram_exit(void)
 {
-	destroy_devices(num_devices);
+	destroy_devices();
 }
 
 module_init(zram_init);
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 04/10] zram: reorganize code layout
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (2 preceding siblings ...)
  2015-05-04 12:38 ` [PATCHv4 03/10] zram: use idr instead of `zram_devices' array Sergey Senozhatsky
@ 2015-05-04 12:38 ` Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 05/10] zram: remove max_num_devices limitation Sergey Senozhatsky
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This patch looks big, but basically it just moves code blocks.
No functional changes.

Our current code layout looks like a sandwitch.

For example,
a) between read/write handlers, we have update_used_max() helper function:

static int zram_decompress_page
static int zram_bvec_read
static inline void update_used_max
static int zram_bvec_write
static int zram_bvec_rw

b) RW request handlers __zram_make_request/zram_bio_discard are divided by
sysfs attr reset_store() function and corresponding zram_reset_device()
handler:

static void zram_bio_discard
static void zram_reset_device
static ssize_t disksize_store
static ssize_t reset_store
static void __zram_make_request

c) we first a bunch of sysfs read/store functions. then a number of
one-liners, then helper functions, RW functions, sysfs functions, helper
functions again, and so on.

Reorganize layout to be more logically grouped (a brief description,
`cat zram_drv.c | grep static` gives a bigger picture):

-- one-liners: zram_test_flag/etc.

-- helpers: is_partial_io/update_position/etc

-- sysfs attr show/store functions + ZRAM_ATTR_RO() generated stats
show() functions
exception: reset and disksize store functions are required to be after
meta() functions. because we do device create/destroy actions in these
sysfs handlers.

-- "mm" functions: meta get/put, meta alloc/free, page free
static inline bool zram_meta_get
static inline void zram_meta_put
static void zram_meta_free
static struct zram_meta *zram_meta_alloc
static void zram_free_page

-- a block of I/O functions
static int zram_decompress_page
static int zram_bvec_read
static int zram_bvec_write
static void zram_bio_discard
static int zram_bvec_rw
static void __zram_make_request
static void zram_make_request
static void zram_slot_free_notify
static int zram_rw_page

-- device contol: add/remove/init/reset functions (+zram-control class
will sit here)
static int zram_reset_device
static ssize_t reset_store
static ssize_t disksize_store
static int zram_add
static void zram_remove
static int __init zram_init
static void __exit zram_exit

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 733 +++++++++++++++++++++---------------------
 1 file changed, 366 insertions(+), 367 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 9ee0d7f..3d7ac0c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -70,33 +70,117 @@ static inline struct zram *dev_to_zram(struct device *dev)
 	return (struct zram *)dev_to_disk(dev)->private_data;
 }
 
-static ssize_t compact_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+/* flag operations needs meta->tb_lock */
+static int zram_test_flag(struct zram_meta *meta, u32 index,
+			enum zram_pageflags flag)
 {
-	unsigned long nr_migrated;
-	struct zram *zram = dev_to_zram(dev);
-	struct zram_meta *meta;
+	return meta->table[index].value & BIT(flag);
+}
 
-	down_read(&zram->init_lock);
-	if (!init_done(zram)) {
-		up_read(&zram->init_lock);
-		return -EINVAL;
-	}
+static void zram_set_flag(struct zram_meta *meta, u32 index,
+			enum zram_pageflags flag)
+{
+	meta->table[index].value |= BIT(flag);
+}
 
-	meta = zram->meta;
-	nr_migrated = zs_compact(meta->mem_pool);
-	atomic64_add(nr_migrated, &zram->stats.num_migrated);
-	up_read(&zram->init_lock);
+static void zram_clear_flag(struct zram_meta *meta, u32 index,
+			enum zram_pageflags flag)
+{
+	meta->table[index].value &= ~BIT(flag);
+}
 
-	return len;
+static size_t zram_get_obj_size(struct zram_meta *meta, u32 index)
+{
+	return meta->table[index].value & (BIT(ZRAM_FLAG_SHIFT) - 1);
 }
 
-static ssize_t disksize_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static void zram_set_obj_size(struct zram_meta *meta,
+					u32 index, size_t size)
 {
-	struct zram *zram = dev_to_zram(dev);
+	unsigned long flags = meta->table[index].value >> ZRAM_FLAG_SHIFT;
 
-	return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
+	meta->table[index].value = (flags << ZRAM_FLAG_SHIFT) | size;
+}
+
+static inline int is_partial_io(struct bio_vec *bvec)
+{
+	return bvec->bv_len != PAGE_SIZE;
+}
+
+/*
+ * Check if request is within bounds and aligned on zram logical blocks.
+ */
+static inline int valid_io_request(struct zram *zram,
+		sector_t start, unsigned int size)
+{
+	u64 end, bound;
+
+	/* unaligned request */
+	if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
+		return 0;
+	if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
+		return 0;
+
+	end = start + (size >> SECTOR_SHIFT);
+	bound = zram->disksize >> SECTOR_SHIFT;
+	/* out of range range */
+	if (unlikely(start >= bound || end > bound || start > end))
+		return 0;
+
+	/* I/O request is valid */
+	return 1;
+}
+
+static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
+{
+	if (*offset + bvec->bv_len >= PAGE_SIZE)
+		(*index)++;
+	*offset = (*offset + bvec->bv_len) % PAGE_SIZE;
+}
+
+static inline void update_used_max(struct zram *zram,
+					const unsigned long pages)
+{
+	unsigned long old_max, cur_max;
+
+	old_max = atomic_long_read(&zram->stats.max_used_pages);
+
+	do {
+		cur_max = old_max;
+		if (pages > cur_max)
+			old_max = atomic_long_cmpxchg(
+				&zram->stats.max_used_pages, cur_max, pages);
+	} while (old_max != cur_max);
+}
+
+static int page_zero_filled(void *ptr)
+{
+	unsigned int pos;
+	unsigned long *page;
+
+	page = (unsigned long *)ptr;
+
+	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
+		if (page[pos])
+			return 0;
+	}
+
+	return 1;
+}
+
+static void handle_zero_page(struct bio_vec *bvec)
+{
+	struct page *page = bvec->bv_page;
+	void *user_mem;
+
+	user_mem = kmap_atomic(page);
+	if (is_partial_io(bvec))
+		memset(user_mem + bvec->bv_offset, 0, bvec->bv_len);
+	else
+		clear_page(user_mem);
+	kunmap_atomic(user_mem);
+
+	flush_dcache_page(page);
 }
 
 static ssize_t initstate_show(struct device *dev,
@@ -112,6 +196,14 @@ static ssize_t initstate_show(struct device *dev,
 	return scnprintf(buf, PAGE_SIZE, "%u\n", val);
 }
 
+static ssize_t disksize_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return scnprintf(buf, PAGE_SIZE, "%llu\n", zram->disksize);
+}
+
 static ssize_t orig_data_size_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -139,19 +231,6 @@ static ssize_t mem_used_total_show(struct device *dev,
 	return scnprintf(buf, PAGE_SIZE, "%llu\n", val << PAGE_SHIFT);
 }
 
-static ssize_t max_comp_streams_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
-{
-	int val;
-	struct zram *zram = dev_to_zram(dev);
-
-	down_read(&zram->init_lock);
-	val = zram->max_comp_streams;
-	up_read(&zram->init_lock);
-
-	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
-}
-
 static ssize_t mem_limit_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -221,6 +300,19 @@ static ssize_t mem_used_max_store(struct device *dev,
 	return len;
 }
 
+static ssize_t max_comp_streams_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	int val;
+	struct zram *zram = dev_to_zram(dev);
+
+	down_read(&zram->init_lock);
+	val = zram->max_comp_streams;
+	up_read(&zram->init_lock);
+
+	return scnprintf(buf, PAGE_SIZE, "%d\n", val);
+}
+
 static ssize_t max_comp_streams_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -278,71 +370,101 @@ static ssize_t comp_algorithm_store(struct device *dev,
 	return len;
 }
 
-/* flag operations needs meta->tb_lock */
-static int zram_test_flag(struct zram_meta *meta, u32 index,
-			enum zram_pageflags flag)
+static ssize_t compact_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
 {
-	return meta->table[index].value & BIT(flag);
-}
+	unsigned long nr_migrated;
+	struct zram *zram = dev_to_zram(dev);
+	struct zram_meta *meta;
 
-static void zram_set_flag(struct zram_meta *meta, u32 index,
-			enum zram_pageflags flag)
-{
-	meta->table[index].value |= BIT(flag);
-}
+	down_read(&zram->init_lock);
+	if (!init_done(zram)) {
+		up_read(&zram->init_lock);
+		return -EINVAL;
+	}
 
-static void zram_clear_flag(struct zram_meta *meta, u32 index,
-			enum zram_pageflags flag)
-{
-	meta->table[index].value &= ~BIT(flag);
-}
+	meta = zram->meta;
+	nr_migrated = zs_compact(meta->mem_pool);
+	atomic64_add(nr_migrated, &zram->stats.num_migrated);
+	up_read(&zram->init_lock);
 
-static size_t zram_get_obj_size(struct zram_meta *meta, u32 index)
-{
-	return meta->table[index].value & (BIT(ZRAM_FLAG_SHIFT) - 1);
+	return len;
 }
 
-static void zram_set_obj_size(struct zram_meta *meta,
-					u32 index, size_t size)
+static ssize_t io_stat_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
 {
-	unsigned long flags = meta->table[index].value >> ZRAM_FLAG_SHIFT;
+	struct zram *zram = dev_to_zram(dev);
+	ssize_t ret;
 
-	meta->table[index].value = (flags << ZRAM_FLAG_SHIFT) | size;
-}
+	down_read(&zram->init_lock);
+	ret = scnprintf(buf, PAGE_SIZE,
+			"%8llu %8llu %8llu %8llu\n",
+			(u64)atomic64_read(&zram->stats.failed_reads),
+			(u64)atomic64_read(&zram->stats.failed_writes),
+			(u64)atomic64_read(&zram->stats.invalid_io),
+			(u64)atomic64_read(&zram->stats.notify_free));
+	up_read(&zram->init_lock);
 
-static inline int is_partial_io(struct bio_vec *bvec)
-{
-	return bvec->bv_len != PAGE_SIZE;
+	return ret;
 }
 
-/*
- * Check if request is within bounds and aligned on zram logical blocks.
- */
-static inline int valid_io_request(struct zram *zram,
-		sector_t start, unsigned int size)
+static ssize_t mm_stat_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
 {
-	u64 end, bound;
+	struct zram *zram = dev_to_zram(dev);
+	u64 orig_size, mem_used = 0;
+	long max_used;
+	ssize_t ret;
 
-	/* unaligned request */
-	if (unlikely(start & (ZRAM_SECTOR_PER_LOGICAL_BLOCK - 1)))
-		return 0;
-	if (unlikely(size & (ZRAM_LOGICAL_BLOCK_SIZE - 1)))
-		return 0;
+	down_read(&zram->init_lock);
+	if (init_done(zram))
+		mem_used = zs_get_total_pages(zram->meta->mem_pool);
 
-	end = start + (size >> SECTOR_SHIFT);
-	bound = zram->disksize >> SECTOR_SHIFT;
-	/* out of range range */
-	if (unlikely(start >= bound || end > bound || start > end))
-		return 0;
+	orig_size = atomic64_read(&zram->stats.pages_stored);
+	max_used = atomic_long_read(&zram->stats.max_used_pages);
 
-	/* I/O request is valid */
-	return 1;
-}
+	ret = scnprintf(buf, PAGE_SIZE,
+			"%8llu %8llu %8llu %8lu %8ld %8llu %8llu\n",
+			orig_size << PAGE_SHIFT,
+			(u64)atomic64_read(&zram->stats.compr_data_size),
+			mem_used << PAGE_SHIFT,
+			zram->limit_pages << PAGE_SHIFT,
+			max_used << PAGE_SHIFT,
+			(u64)atomic64_read(&zram->stats.zero_pages),
+			(u64)atomic64_read(&zram->stats.num_migrated));
+	up_read(&zram->init_lock);
 
-static void zram_meta_free(struct zram_meta *meta, u64 disksize)
-{
-	size_t num_pages = disksize >> PAGE_SHIFT;
-	size_t index;
+	return ret;
+}
+
+static DEVICE_ATTR_RO(io_stat);
+static DEVICE_ATTR_RO(mm_stat);
+ZRAM_ATTR_RO(num_reads);
+ZRAM_ATTR_RO(num_writes);
+ZRAM_ATTR_RO(failed_reads);
+ZRAM_ATTR_RO(failed_writes);
+ZRAM_ATTR_RO(invalid_io);
+ZRAM_ATTR_RO(notify_free);
+ZRAM_ATTR_RO(zero_pages);
+ZRAM_ATTR_RO(compr_data_size);
+
+static inline bool zram_meta_get(struct zram *zram)
+{
+	if (atomic_inc_not_zero(&zram->refcount))
+		return true;
+	return false;
+}
+
+static inline void zram_meta_put(struct zram *zram)
+{
+	atomic_dec(&zram->refcount);
+}
+
+static void zram_meta_free(struct zram_meta *meta, u64 disksize)
+{
+	size_t num_pages = disksize >> PAGE_SHIFT;
+	size_t index;
 
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < num_pages; index++) {
@@ -390,56 +512,6 @@ out_error:
 	return NULL;
 }
 
-static inline bool zram_meta_get(struct zram *zram)
-{
-	if (atomic_inc_not_zero(&zram->refcount))
-		return true;
-	return false;
-}
-
-static inline void zram_meta_put(struct zram *zram)
-{
-	atomic_dec(&zram->refcount);
-}
-
-static void update_position(u32 *index, int *offset, struct bio_vec *bvec)
-{
-	if (*offset + bvec->bv_len >= PAGE_SIZE)
-		(*index)++;
-	*offset = (*offset + bvec->bv_len) % PAGE_SIZE;
-}
-
-static int page_zero_filled(void *ptr)
-{
-	unsigned int pos;
-	unsigned long *page;
-
-	page = (unsigned long *)ptr;
-
-	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
-		if (page[pos])
-			return 0;
-	}
-
-	return 1;
-}
-
-static void handle_zero_page(struct bio_vec *bvec)
-{
-	struct page *page = bvec->bv_page;
-	void *user_mem;
-
-	user_mem = kmap_atomic(page);
-	if (is_partial_io(bvec))
-		memset(user_mem + bvec->bv_offset, 0, bvec->bv_len);
-	else
-		clear_page(user_mem);
-	kunmap_atomic(user_mem);
-
-	flush_dcache_page(page);
-}
-
-
 /*
  * To protect concurrent access to the same index entry,
  * caller should hold this table index entry's bit_spinlock to
@@ -557,21 +629,6 @@ out_cleanup:
 	return ret;
 }
 
-static inline void update_used_max(struct zram *zram,
-					const unsigned long pages)
-{
-	unsigned long old_max, cur_max;
-
-	old_max = atomic_long_read(&zram->stats.max_used_pages);
-
-	do {
-		cur_max = old_max;
-		if (pages > cur_max)
-			old_max = atomic_long_cmpxchg(
-				&zram->stats.max_used_pages, cur_max, pages);
-	} while (old_max != cur_max);
-}
-
 static int zram_bvec_write(struct zram *zram, struct bio_vec *bvec, u32 index,
 			   int offset)
 {
@@ -699,35 +756,6 @@ out:
 	return ret;
 }
 
-static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
-			int offset, int rw)
-{
-	unsigned long start_time = jiffies;
-	int ret;
-
-	generic_start_io_acct(rw, bvec->bv_len >> SECTOR_SHIFT,
-			&zram->disk->part0);
-
-	if (rw == READ) {
-		atomic64_inc(&zram->stats.num_reads);
-		ret = zram_bvec_read(zram, bvec, index, offset);
-	} else {
-		atomic64_inc(&zram->stats.num_writes);
-		ret = zram_bvec_write(zram, bvec, index, offset);
-	}
-
-	generic_end_io_acct(rw, &zram->disk->part0, start_time);
-
-	if (unlikely(ret)) {
-		if (rw == READ)
-			atomic64_inc(&zram->stats.failed_reads);
-		else
-			atomic64_inc(&zram->stats.failed_writes);
-	}
-
-	return ret;
-}
-
 /*
  * zram_bio_discard - handler on discard request
  * @index: physical block index in PAGE_SIZE units
@@ -767,149 +795,32 @@ static void zram_bio_discard(struct zram *zram, u32 index,
 	}
 }
 
-static void zram_reset_device(struct zram *zram)
-{
-	struct zram_meta *meta;
-	struct zcomp *comp;
-	u64 disksize;
-
-	down_write(&zram->init_lock);
-
-	zram->limit_pages = 0;
-
-	if (!init_done(zram)) {
-		up_write(&zram->init_lock);
-		return;
-	}
-
-	meta = zram->meta;
-	comp = zram->comp;
-	disksize = zram->disksize;
-	/*
-	 * Refcount will go down to 0 eventually and r/w handler
-	 * cannot handle further I/O so it will bail out by
-	 * check zram_meta_get.
-	 */
-	zram_meta_put(zram);
-	/*
-	 * We want to free zram_meta in process context to avoid
-	 * deadlock between reclaim path and any other locks.
-	 */
-	wait_event(zram->io_done, atomic_read(&zram->refcount) == 0);
-
-	/* Reset stats */
-	memset(&zram->stats, 0, sizeof(zram->stats));
-	zram->disksize = 0;
-	zram->max_comp_streams = 1;
-	set_capacity(zram->disk, 0);
-
-	up_write(&zram->init_lock);
-	/* I/O operation under all of CPU are done so let's free */
-	zram_meta_free(meta, disksize);
-	zcomp_destroy(comp);
-}
-
-static ssize_t disksize_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
-{
-	u64 disksize;
-	struct zcomp *comp;
-	struct zram_meta *meta;
-	struct zram *zram = dev_to_zram(dev);
-	int err;
-
-	disksize = memparse(buf, NULL);
-	if (!disksize)
-		return -EINVAL;
-
-	disksize = PAGE_ALIGN(disksize);
-	meta = zram_meta_alloc(zram->disk->first_minor, disksize);
-	if (!meta)
-		return -ENOMEM;
-
-	comp = zcomp_create(zram->compressor, zram->max_comp_streams);
-	if (IS_ERR(comp)) {
-		pr_info("Cannot initialise %s compressing backend\n",
-				zram->compressor);
-		err = PTR_ERR(comp);
-		goto out_free_meta;
-	}
-
-	down_write(&zram->init_lock);
-	if (init_done(zram)) {
-		pr_info("Cannot change disksize for initialized device\n");
-		err = -EBUSY;
-		goto out_destroy_comp;
-	}
-
-	init_waitqueue_head(&zram->io_done);
-	atomic_set(&zram->refcount, 1);
-	zram->meta = meta;
-	zram->comp = comp;
-	zram->disksize = disksize;
-	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
-	up_write(&zram->init_lock);
-
-	/*
-	 * Revalidate disk out of the init_lock to avoid lockdep splat.
-	 * It's okay because disk's capacity is protected by init_lock
-	 * so that revalidate_disk always sees up-to-date capacity.
-	 */
-	revalidate_disk(zram->disk);
-
-	return len;
-
-out_destroy_comp:
-	up_write(&zram->init_lock);
-	zcomp_destroy(comp);
-out_free_meta:
-	zram_meta_free(meta, disksize);
-	return err;
-}
-
-static ssize_t reset_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
+static int zram_bvec_rw(struct zram *zram, struct bio_vec *bvec, u32 index,
+			int offset, int rw)
 {
+	unsigned long start_time = jiffies;
 	int ret;
-	unsigned short do_reset;
-	struct zram *zram;
-	struct block_device *bdev;
-
-	zram = dev_to_zram(dev);
-	bdev = bdget_disk(zram->disk, 0);
 
-	if (!bdev)
-		return -ENOMEM;
+	generic_start_io_acct(rw, bvec->bv_len >> SECTOR_SHIFT,
+			&zram->disk->part0);
 
-	mutex_lock(&bdev->bd_mutex);
-	/* Do not reset an active device! */
-	if (bdev->bd_openers) {
-		ret = -EBUSY;
-		goto out;
+	if (rw == READ) {
+		atomic64_inc(&zram->stats.num_reads);
+		ret = zram_bvec_read(zram, bvec, index, offset);
+	} else {
+		atomic64_inc(&zram->stats.num_writes);
+		ret = zram_bvec_write(zram, bvec, index, offset);
 	}
 
-	ret = kstrtou16(buf, 10, &do_reset);
-	if (ret)
-		goto out;
+	generic_end_io_acct(rw, &zram->disk->part0, start_time);
 
-	if (!do_reset) {
-		ret = -EINVAL;
-		goto out;
+	if (unlikely(ret)) {
+		if (rw == READ)
+			atomic64_inc(&zram->stats.failed_reads);
+		else
+			atomic64_inc(&zram->stats.failed_writes);
 	}
 
-	/* Make sure all pending I/O is finished */
-	fsync_bdev(bdev);
-	zram_reset_device(zram);
-
-	mutex_unlock(&bdev->bd_mutex);
-	revalidate_disk(zram->disk);
-	bdput(bdev);
-
-	return len;
-
-out:
-	mutex_unlock(&bdev->bd_mutex);
-	bdput(bdev);
 	return ret;
 }
 
@@ -1049,80 +960,168 @@ out:
 	return err;
 }
 
-static const struct block_device_operations zram_devops = {
-	.swap_slot_free_notify = zram_slot_free_notify,
-	.rw_page = zram_rw_page,
-	.owner = THIS_MODULE
-};
+static void zram_reset_device(struct zram *zram)
+{
+	struct zram_meta *meta;
+	struct zcomp *comp;
+	u64 disksize;
 
-static DEVICE_ATTR_WO(compact);
-static DEVICE_ATTR_RW(disksize);
-static DEVICE_ATTR_RO(initstate);
-static DEVICE_ATTR_WO(reset);
-static DEVICE_ATTR_RO(orig_data_size);
-static DEVICE_ATTR_RO(mem_used_total);
-static DEVICE_ATTR_RW(mem_limit);
-static DEVICE_ATTR_RW(mem_used_max);
-static DEVICE_ATTR_RW(max_comp_streams);
-static DEVICE_ATTR_RW(comp_algorithm);
+	down_write(&zram->init_lock);
 
-static ssize_t io_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+	zram->limit_pages = 0;
+
+	if (!init_done(zram)) {
+		up_write(&zram->init_lock);
+		return;
+	}
+
+	meta = zram->meta;
+	comp = zram->comp;
+	disksize = zram->disksize;
+	/*
+	 * Refcount will go down to 0 eventually and r/w handler
+	 * cannot handle further I/O so it will bail out by
+	 * check zram_meta_get.
+	 */
+	zram_meta_put(zram);
+	/*
+	 * We want to free zram_meta in process context to avoid
+	 * deadlock between reclaim path and any other locks.
+	 */
+	wait_event(zram->io_done, atomic_read(&zram->refcount) == 0);
+
+	/* Reset stats */
+	memset(&zram->stats, 0, sizeof(zram->stats));
+	zram->disksize = 0;
+	zram->max_comp_streams = 1;
+	set_capacity(zram->disk, 0);
+
+	up_write(&zram->init_lock);
+	/* I/O operation under all of CPU are done so let's free */
+	zram_meta_free(meta, disksize);
+	zcomp_destroy(comp);
+}
+
+static ssize_t disksize_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
 {
+	u64 disksize;
+	struct zcomp *comp;
+	struct zram_meta *meta;
 	struct zram *zram = dev_to_zram(dev);
-	ssize_t ret;
+	int err;
 
-	down_read(&zram->init_lock);
-	ret = scnprintf(buf, PAGE_SIZE,
-			"%8llu %8llu %8llu %8llu\n",
-			(u64)atomic64_read(&zram->stats.failed_reads),
-			(u64)atomic64_read(&zram->stats.failed_writes),
-			(u64)atomic64_read(&zram->stats.invalid_io),
-			(u64)atomic64_read(&zram->stats.notify_free));
-	up_read(&zram->init_lock);
+	disksize = memparse(buf, NULL);
+	if (!disksize)
+		return -EINVAL;
 
-	return ret;
+	disksize = PAGE_ALIGN(disksize);
+	meta = zram_meta_alloc(zram->disk->first_minor, disksize);
+	if (!meta)
+		return -ENOMEM;
+
+	comp = zcomp_create(zram->compressor, zram->max_comp_streams);
+	if (IS_ERR(comp)) {
+		pr_info("Cannot initialise %s compressing backend\n",
+				zram->compressor);
+		err = PTR_ERR(comp);
+		goto out_free_meta;
+	}
+
+	down_write(&zram->init_lock);
+	if (init_done(zram)) {
+		pr_info("Cannot change disksize for initialized device\n");
+		err = -EBUSY;
+		goto out_destroy_comp;
+	}
+
+	init_waitqueue_head(&zram->io_done);
+	atomic_set(&zram->refcount, 1);
+	zram->meta = meta;
+	zram->comp = comp;
+	zram->disksize = disksize;
+	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
+	up_write(&zram->init_lock);
+
+	/*
+	 * Revalidate disk out of the init_lock to avoid lockdep splat.
+	 * It's okay because disk's capacity is protected by init_lock
+	 * so that revalidate_disk always sees up-to-date capacity.
+	 */
+	revalidate_disk(zram->disk);
+
+	return len;
+
+out_destroy_comp:
+	up_write(&zram->init_lock);
+	zcomp_destroy(comp);
+out_free_meta:
+	zram_meta_free(meta, disksize);
+	return err;
 }
 
-static ssize_t mm_stat_show(struct device *dev,
-		struct device_attribute *attr, char *buf)
+static ssize_t reset_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
 {
-	struct zram *zram = dev_to_zram(dev);
-	u64 orig_size, mem_used = 0;
-	long max_used;
-	ssize_t ret;
+	int ret;
+	unsigned short do_reset;
+	struct zram *zram;
+	struct block_device *bdev;
 
-	down_read(&zram->init_lock);
-	if (init_done(zram))
-		mem_used = zs_get_total_pages(zram->meta->mem_pool);
+	zram = dev_to_zram(dev);
+	bdev = bdget_disk(zram->disk, 0);
 
-	orig_size = atomic64_read(&zram->stats.pages_stored);
-	max_used = atomic_long_read(&zram->stats.max_used_pages);
+	if (!bdev)
+		return -ENOMEM;
 
-	ret = scnprintf(buf, PAGE_SIZE,
-			"%8llu %8llu %8llu %8lu %8ld %8llu %8llu\n",
-			orig_size << PAGE_SHIFT,
-			(u64)atomic64_read(&zram->stats.compr_data_size),
-			mem_used << PAGE_SHIFT,
-			zram->limit_pages << PAGE_SHIFT,
-			max_used << PAGE_SHIFT,
-			(u64)atomic64_read(&zram->stats.zero_pages),
-			(u64)atomic64_read(&zram->stats.num_migrated));
-	up_read(&zram->init_lock);
+	mutex_lock(&bdev->bd_mutex);
+	/* Do not reset an active device! */
+	if (bdev->bd_openers) {
+		ret = -EBUSY;
+		goto out;
+	}
+
+	ret = kstrtou16(buf, 10, &do_reset);
+	if (ret)
+		goto out;
+
+	if (!do_reset) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	/* Make sure all pending I/O is finished */
+	fsync_bdev(bdev);
+	zram_reset_device(zram);
+
+	mutex_unlock(&bdev->bd_mutex);
+	revalidate_disk(zram->disk);
+	bdput(bdev);
+
+	return len;
 
+out:
+	mutex_unlock(&bdev->bd_mutex);
+	bdput(bdev);
 	return ret;
 }
 
-static DEVICE_ATTR_RO(io_stat);
-static DEVICE_ATTR_RO(mm_stat);
-ZRAM_ATTR_RO(num_reads);
-ZRAM_ATTR_RO(num_writes);
-ZRAM_ATTR_RO(failed_reads);
-ZRAM_ATTR_RO(failed_writes);
-ZRAM_ATTR_RO(invalid_io);
-ZRAM_ATTR_RO(notify_free);
-ZRAM_ATTR_RO(zero_pages);
-ZRAM_ATTR_RO(compr_data_size);
+static const struct block_device_operations zram_devops = {
+	.swap_slot_free_notify = zram_slot_free_notify,
+	.rw_page = zram_rw_page,
+	.owner = THIS_MODULE
+};
+
+static DEVICE_ATTR_WO(compact);
+static DEVICE_ATTR_RW(disksize);
+static DEVICE_ATTR_RO(initstate);
+static DEVICE_ATTR_WO(reset);
+static DEVICE_ATTR_RO(orig_data_size);
+static DEVICE_ATTR_RO(mem_used_total);
+static DEVICE_ATTR_RW(mem_limit);
+static DEVICE_ATTR_RW(mem_used_max);
+static DEVICE_ATTR_RW(max_comp_streams);
+static DEVICE_ATTR_RW(comp_algorithm);
 
 static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_disksize.attr,
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 05/10] zram: remove max_num_devices limitation
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (3 preceding siblings ...)
  2015-05-04 12:38 ` [PATCHv4 04/10] zram: reorganize code layout Sergey Senozhatsky
@ 2015-05-04 12:38 ` Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 06/10] zram: report every added and removed device Sergey Senozhatsky
                   ` (6 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

Limiting the number of zram devices to 32 (default max_num_devices value)
is confusing, let's drop it.  A user with 2TB or 4TB of RAM, for example,
can request as many devices as he can handle.

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/blockdev/zram.txt | 4 +++-
 drivers/block/zram/zram_drv.c   | 8 +-------
 drivers/block/zram/zram_drv.h   | 6 ------
 3 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index bef4998..65e9430 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -19,7 +19,9 @@ Following shows a typical sequence of steps for using zram.
 1) Load Module:
 	modprobe zram num_devices=4
 	This creates 4 devices: /dev/zram{0,1,2,3}
-	(num_devices parameter is optional. Default: 1)
+
+num_devices parameter is optional and tells zram how many devices should be
+pre-created. Default: 1.
 
 2) Set max number of compression streams
 	Compression backend may use up to max_comp_streams compression streams,
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3d7ac0c..b875517 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1285,12 +1285,6 @@ static int __init zram_init(void)
 {
 	int ret, dev_id;
 
-	if (num_devices > max_num_devices) {
-		pr_warn("Invalid value for num_devices: %u\n",
-				num_devices);
-		return -EINVAL;
-	}
-
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_warn("Unable to get major number\n");
@@ -1320,7 +1314,7 @@ module_init(zram_init);
 module_exit(zram_exit);
 
 module_param(num_devices, uint, 0);
-MODULE_PARM_DESC(num_devices, "Number of zram devices");
+MODULE_PARM_DESC(num_devices, "Number of pre-created zram devices");
 
 MODULE_LICENSE("Dual BSD/GPL");
 MODULE_AUTHOR("Nitin Gupta <ngupta@vflare.org>");
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 570c598..042994e 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -20,12 +20,6 @@
 
 #include "zcomp.h"
 
-/*
- * Some arbitrary value. This is just to catch
- * invalid value for num_devices module parameter.
- */
-static const unsigned max_num_devices = 32;
-
 /*-- Configurable parameters */
 
 /*
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 06/10] zram: report every added and removed device
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (4 preceding siblings ...)
  2015-05-04 12:38 ` [PATCHv4 05/10] zram: remove max_num_devices limitation Sergey Senozhatsky
@ 2015-05-04 12:38 ` Sergey Senozhatsky
  2015-05-04 12:38 ` [PATCHv4 07/10] zram: trivial: correct flag operations comment Sergey Senozhatsky
                   ` (5 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

With dynamic device creation/removal (which will be introduced later in
the series) printing num_devices in zram_init() will not make a lot of
sense, as well as printing the number of destroyed devices in
destroy_devices(). Print per-device action (added/removed) in zram_add()
and zram_remove() instead.

Example:

[ 3645.259652] zram: Added device: zram5
[ 3646.152074] zram: Added device: zram6
[ 3650.585012] zram: Removed device: zram5
[ 3655.845584] zram: Added device: zram8
[ 3660.975223] zram: Removed device: zram6

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index b875517..ec6396c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1236,6 +1236,8 @@ static int zram_add(int device_id)
 	strlcpy(zram->compressor, default_compressor, sizeof(zram->compressor));
 	zram->meta = NULL;
 	zram->max_comp_streams = 1;
+
+	pr_info("Added device: %s\n", zram->disk->disk_name);
 	return 0;
 
 out_free_disk:
@@ -1252,6 +1254,7 @@ out_free_dev:
 
 static void zram_remove(struct zram *zram)
 {
+	pr_info("Removed device: %s\n", zram->disk->disk_name);
 	/*
 	 * Remove sysfs first, so no one will perform a disksize
 	 * store while we destroy the devices
@@ -1278,7 +1281,6 @@ static void destroy_devices(void)
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
-	pr_info("Destroyed device(s)\n");
 }
 
 static int __init zram_init(void)
@@ -1297,7 +1299,6 @@ static int __init zram_init(void)
 			goto out_error;
 	}
 
-	pr_info("Created %u device(s)\n", num_devices);
 	return 0;
 
 out_error:
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 07/10] zram: trivial: correct flag operations comment
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (5 preceding siblings ...)
  2015-05-04 12:38 ` [PATCHv4 06/10] zram: report every added and removed device Sergey Senozhatsky
@ 2015-05-04 12:38 ` Sergey Senozhatsky
  2015-05-04 12:39 ` [PATCHv4 08/10] zram: return zram device_id from zram_add() Sergey Senozhatsky
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:38 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We don't have meta->tb_lock anymore and use meta table entry bit_spin_lock
instead. update corresponding comment.

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index ec6396c..37a753c 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -70,7 +70,7 @@ static inline struct zram *dev_to_zram(struct device *dev)
 	return (struct zram *)dev_to_disk(dev)->private_data;
 }
 
-/* flag operations needs meta->tb_lock */
+/* flag operations require table entry bit_spin_lock() being held */
 static int zram_test_flag(struct zram_meta *meta, u32 index,
 			enum zram_pageflags flag)
 {
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 08/10] zram: return zram device_id from zram_add()
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (6 preceding siblings ...)
  2015-05-04 12:38 ` [PATCHv4 07/10] zram: trivial: correct flag operations comment Sergey Senozhatsky
@ 2015-05-04 12:39 ` Sergey Senozhatsky
  2015-05-04 12:39 ` [PATCHv4 09/10] zram: close race by open overriding Sergey Senozhatsky
                   ` (3 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:39 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

This patch prepares zram to enable on-demand device creation.
zram_add() performs automatic device_id assignment and returns
new device id (>= 0) or error code (< 0).

Acked-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 drivers/block/zram/zram_drv.c | 23 ++++++++++++++---------
 1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 37a753c..3df4394 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1151,20 +1151,24 @@ static struct attribute_group zram_disk_attr_group = {
 	.attrs = zram_disk_attrs,
 };
 
-static int zram_add(int device_id)
+/*
+ * Allocate and initialize new zram device. the function returns
+ * '>= 0' device_id upon success, and negative value otherwise.
+ */
+static int zram_add(void)
 {
 	struct zram *zram;
 	struct request_queue *queue;
-	int ret;
+	int ret, device_id;
 
 	zram = kzalloc(sizeof(struct zram), GFP_KERNEL);
 	if (!zram)
 		return -ENOMEM;
 
-	ret = idr_alloc(&zram_index_idr, zram, device_id,
-			device_id + 1, GFP_KERNEL);
+	ret = idr_alloc(&zram_index_idr, zram, 0, 0, GFP_KERNEL);
 	if (ret < 0)
 		goto out_free_dev;
+	device_id = ret;
 
 	init_rwsem(&zram->init_lock);
 
@@ -1238,7 +1242,7 @@ static int zram_add(int device_id)
 	zram->max_comp_streams = 1;
 
 	pr_info("Added device: %s\n", zram->disk->disk_name);
-	return 0;
+	return device_id;
 
 out_free_disk:
 	del_gendisk(zram->disk);
@@ -1285,7 +1289,7 @@ static void destroy_devices(void)
 
 static int __init zram_init(void)
 {
-	int ret, dev_id;
+	int ret;
 
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
@@ -1293,10 +1297,11 @@ static int __init zram_init(void)
 		return -EBUSY;
 	}
 
-	for (dev_id = 0; dev_id < num_devices; dev_id++) {
-		ret = zram_add(dev_id);
-		if (ret != 0)
+	while (num_devices != 0) {
+		ret = zram_add();
+		if (ret < 0)
 			goto out_error;
+		num_devices--;
 	}
 
 	return 0;
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 09/10] zram: close race by open overriding
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (7 preceding siblings ...)
  2015-05-04 12:39 ` [PATCHv4 08/10] zram: return zram device_id from zram_add() Sergey Senozhatsky
@ 2015-05-04 12:39 ` Sergey Senozhatsky
  2015-05-04 12:39 ` [PATCHv4 10/10] zram: add dynamic device add/remove functionality Sergey Senozhatsky
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:39 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim; +Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky

From: Minchan Kim <minchan@kernel.org>

[1] introduced bdev->bd_mutex to protect a race between mount
and reset. At that time, we don't have dynamic zram-add/remove
feature so it was okay.

However, as we introduce dynamic device feature, bd_mutex became
trouble.

	CPU 0

echo 1 > /sys/block/zram<id>/reset
  -> kernfs->s_active(A)
    -> zram:reset_store->bd_mutex(B)

	CPU 1

echo <id> > /sys/class/zram/zram-remove
  ->zram:zram_remove: bd_mutex(B)
  -> sysfs_remove_group
    -> kernfs->s_active(A)

IOW, AB -> BA deadlock

The reason we are holding bd_mutex for zram_remove is to prevent
any incoming open /dev/zram[0-9]. Otherwise, we could remove zram
others already have opened. But it causes above deadlock problem.

To fix the problem, this patch overrides block_device.open and
it returns -EBUSY if zram asserts he claims zram to reset so any
incoming open will be failed so we don't need to hold bd_mutex
for zram_remove ayn more.

This patch is to prepare for zram-add/remove feature.

[1] ba6b17: zram: fix umount-reset_store-mount race condition

[Sergey: simplify reset_store()]
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 53 +++++++++++++++++++++++++++----------------
 drivers/block/zram/zram_drv.h |  4 ++++
 2 files changed, 38 insertions(+), 19 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 3df4394..b3541df 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1068,45 +1068,60 @@ static ssize_t reset_store(struct device *dev,
 	struct zram *zram;
 	struct block_device *bdev;
 
+	ret = kstrtou16(buf, 10, &do_reset);
+	if (ret)
+		return ret;
+
+	if (!do_reset)
+		return -EINVAL;
+
 	zram = dev_to_zram(dev);
 	bdev = bdget_disk(zram->disk, 0);
-
 	if (!bdev)
 		return -ENOMEM;
 
 	mutex_lock(&bdev->bd_mutex);
-	/* Do not reset an active device! */
-	if (bdev->bd_openers) {
-		ret = -EBUSY;
-		goto out;
+	/* Do not reset an active device or claimed device */
+	if (bdev->bd_openers || zram->claim) {
+		mutex_unlock(&bdev->bd_mutex);
+		bdput(bdev);
+		return -EBUSY;
 	}
 
-	ret = kstrtou16(buf, 10, &do_reset);
-	if (ret)
-		goto out;
-
-	if (!do_reset) {
-		ret = -EINVAL;
-		goto out;
-	}
+	/* From now on, anyone can't open /dev/zram[0-9] */
+	zram->claim = true;
+	mutex_unlock(&bdev->bd_mutex);
 
-	/* Make sure all pending I/O is finished */
+	/* Make sure all the pending I/O are finished */
 	fsync_bdev(bdev);
 	zram_reset_device(zram);
-
-	mutex_unlock(&bdev->bd_mutex);
 	revalidate_disk(zram->disk);
 	bdput(bdev);
 
+	mutex_lock(&bdev->bd_mutex);
+	zram->claim = false;
+	mutex_unlock(&bdev->bd_mutex);
+
 	return len;
+}
+
+static int zram_open(struct block_device *bdev, fmode_t mode)
+{
+	int ret = 0;
+	struct zram *zram;
+
+	WARN_ON(!mutex_is_locked(&bdev->bd_mutex));
+
+	zram = bdev->bd_disk->private_data;
+	/* zram was claimed to reset so open request fails */
+	if (zram->claim)
+		ret = -EBUSY;
 
-out:
-	mutex_unlock(&bdev->bd_mutex);
-	bdput(bdev);
 	return ret;
 }
 
 static const struct block_device_operations zram_devops = {
+	.open = zram_open,
 	.swap_slot_free_notify = zram_slot_free_notify,
 	.rw_page = zram_rw_page,
 	.owner = THIS_MODULE
diff --git a/drivers/block/zram/zram_drv.h b/drivers/block/zram/zram_drv.h
index 042994e..6dbe2df 100644
--- a/drivers/block/zram/zram_drv.h
+++ b/drivers/block/zram/zram_drv.h
@@ -115,5 +115,9 @@ struct zram {
 	 */
 	u64 disksize;	/* bytes */
 	char compressor[10];
+	/*
+	 * zram is claimed so open request will be failed
+	 */
+	bool claim; /* Protected by bdev->bd_mutex */
 };
 #endif
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv4 10/10] zram: add dynamic device add/remove functionality
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (8 preceding siblings ...)
  2015-05-04 12:39 ` [PATCHv4 09/10] zram: close race by open overriding Sergey Senozhatsky
@ 2015-05-04 12:39 ` Sergey Senozhatsky
  2015-05-06  5:01 ` [PATCHv4 00/10] add on-demand device creation Minchan Kim
  2015-05-06  5:31 ` Sergey Senozhatsky
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-04 12:39 UTC (permalink / raw)
  To: Andrew Morton, Minchan Kim
  Cc: Nitin Gupta, linux-kernel, Sergey Senozhatsky, Sergey Senozhatsky

We currently don't support on-demand device creation. The one and only way
to have N zram devices is to specify num_devices module parameter (default
value: 1). IOW if, for some reason, at some point, user wants to have
N + 1 devies he/she must umount all the existing devices, unload the
module, load the module passing num_devices equals to N + 1. And do this
again, if needed.

This patch introduces zram control sysfs class, which has two sysfs
attrs:
- zram_add      -- add a new zram device
- zram_remove   -- remove a specific (device_id) zram device

zram_add sysfs attr is read-only and has only automatic device id
assignment mode (as requested by Minchan Kim). read operation performed
on this attr creates a new zram device and returns back its device_id or
error status.

Usage example:
	# add a new specific zram device
	cat /sys/class/zram-control/zram_add
	2

	# remove a specific zram device
	echo 4 > /sys/class/zram-control/zram_remove

Returning zram_add() error code back to user (-ENOMEM in this case)

	cat /sys/class/zram-control/zram_add
	cat: /sys/class/zram-control/zram_add: Cannot allocate memory

NOTE, there might be users who already depend on the fact that at least
zram0 device gets always created by zram_init(). Preserve this behavior.

[Minchan: use zram->claim to avoid lockdep splat]
Reported-by: Minchan Kim <minchan@kernel.org>
Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
---
 Documentation/ABI/testing/sysfs-class-zram |  24 +++++++
 Documentation/blockdev/zram.txt            |  23 ++++++-
 drivers/block/zram/zram_drv.c              | 100 ++++++++++++++++++++++++++++-
 3 files changed, 141 insertions(+), 6 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-class-zram

diff --git a/Documentation/ABI/testing/sysfs-class-zram b/Documentation/ABI/testing/sysfs-class-zram
new file mode 100644
index 0000000..74c8850
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-class-zram
@@ -0,0 +1,24 @@
+What:		/sys/class/zram-control/
+Date:		August 2015
+KernelVersion:	4.2
+Contact:	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
+Description:
+		The zram-control/ class sub-directory belongs to zram
+		device class
+
+What:		/sys/class/zram-control/zram_add
+Date:		August 2015
+KernelVersion:	4.2
+Contact:	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
+Description:
+		RO attribute. Read operation will cause zram to add a new
+		device and return its device id back to user (so one can
+		use /dev/zram<id>), or error code.
+
+What:		/sys/class/zram-control/zram_add
+Date:		August 2015
+KernelVersion:	4.2
+Contact:	Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
+Description:
+		Remove a specific /dev/zramX device, where X is a device_id
+		provided by user
diff --git a/Documentation/blockdev/zram.txt b/Documentation/blockdev/zram.txt
index 65e9430..fc686d4 100644
--- a/Documentation/blockdev/zram.txt
+++ b/Documentation/blockdev/zram.txt
@@ -99,7 +99,24 @@ size of the disk when not in use so a huge zram is wasteful.
 	mkfs.ext4 /dev/zram1
 	mount /dev/zram1 /tmp
 
-7) Stats:
+7) Add/remove zram devices
+
+zram provides a control interface, which enables dynamic (on-demand) device
+addition and removal.
+
+In order to add a new /dev/zramX device, perform read operation on zram_add
+attribute. This will return either new device's device id (meaning that you
+can use /dev/zram<id>) or error code.
+
+Example:
+	cat /sys/class/zram-control/zram_add
+	1
+
+To remove the existing /dev/zramX device (where X is a device id)
+execute
+	echo X > /sys/class/zram-control/zram_remove
+
+8) Stats:
 Per-device statistics are exported as various nodes under /sys/block/zram<id>/
 
 A brief description of exported device attritbutes. For more details please
@@ -174,11 +191,11 @@ line of text and contains the following stats separated by whitespace:
 	zero_pages
 	num_migrated
 
-8) Deactivate:
+9) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
-9) Reset:
+10) Reset:
 	Write any positive value to 'reset' sysfs node
 	echo 1 > /sys/block/zram0/reset
 	echo 1 > /sys/block/zram1/reset
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index b3541df..e6c4316 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -29,10 +29,14 @@
 #include <linux/vmalloc.h>
 #include <linux/err.h>
 #include <linux/idr.h>
+#include <linux/sysfs.h>
 
 #include "zram_drv.h"
 
 static DEFINE_IDR(zram_index_idr);
+/* idr index must be protected */
+static DEFINE_MUTEX(zram_index_mutex);
+
 static int zram_major;
 static const char *default_compressor = "lzo";
 
@@ -1271,24 +1275,104 @@ out_free_dev:
 	return ret;
 }
 
-static void zram_remove(struct zram *zram)
+static int zram_remove(struct zram *zram)
 {
-	pr_info("Removed device: %s\n", zram->disk->disk_name);
+	struct block_device *bdev;
+
+	bdev = bdget_disk(zram->disk, 0);
+	if (!bdev)
+		return -ENOMEM;
+
+	mutex_lock(&bdev->bd_mutex);
+	if (bdev->bd_openers || zram->claim) {
+		mutex_unlock(&bdev->bd_mutex);
+		bdput(bdev);
+		return -EBUSY;
+	}
+
+	zram->claim = true;
+	mutex_unlock(&bdev->bd_mutex);
+
 	/*
 	 * Remove sysfs first, so no one will perform a disksize
-	 * store while we destroy the devices
+	 * store while we destroy the devices. This also helps during
+	 * zram_remove() -- zram_reset_device() is the last holder of
+	 * ->init_lock, no later/concurrent disksize_store() or any
+	 * other sysfs handlers are possible.
 	 */
 	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
 			&zram_disk_attr_group);
 
+	/* Make sure all the pending I/O are finished */
+	fsync_bdev(bdev);
 	zram_reset_device(zram);
+	bdput(bdev);
+
+	pr_info("Removed device: %s\n", zram->disk->disk_name);
+
 	idr_remove(&zram_index_idr, zram->disk->first_minor);
 	blk_cleanup_queue(zram->disk->queue);
 	del_gendisk(zram->disk);
 	put_disk(zram->disk);
 	kfree(zram);
+	return 0;
+}
+
+/* zram module control sysfs attributes */
+static ssize_t zram_add_show(struct class *class,
+			struct class_attribute *attr,
+			char *buf)
+{
+	int ret;
+
+	mutex_lock(&zram_index_mutex);
+	ret = zram_add();
+	mutex_unlock(&zram_index_mutex);
+
+	if (ret < 0)
+		return ret;
+	return scnprintf(buf, PAGE_SIZE, "%d\n", ret);
+}
+
+static ssize_t zram_remove_store(struct class *class,
+			struct class_attribute *attr,
+			const char *buf,
+			size_t count)
+{
+	struct zram *zram;
+	int ret, dev_id;
+
+	/* dev_id is gendisk->first_minor, which is `int' */
+	ret = kstrtoint(buf, 10, &dev_id);
+	if (ret)
+		return ret;
+	if (dev_id < 0)
+		return -EINVAL;
+
+	mutex_lock(&zram_index_mutex);
+
+	zram = idr_find(&zram_index_idr, dev_id);
+	if (zram)
+		ret = zram_remove(zram);
+	else
+		ret = -ENODEV;
+
+	mutex_unlock(&zram_index_mutex);
+	return ret ? ret : count;
 }
 
+static struct class_attribute zram_control_class_attrs[] = {
+	__ATTR_RO(zram_add),
+	__ATTR_WO(zram_remove),
+	__ATTR_NULL,
+};
+
+static struct class zram_control_class = {
+	.name		= "zram-control",
+	.owner		= THIS_MODULE,
+	.class_attrs	= zram_control_class_attrs,
+};
+
 static int zram_remove_cb(int id, void *ptr, void *data)
 {
 	zram_remove(ptr);
@@ -1297,6 +1381,7 @@ static int zram_remove_cb(int id, void *ptr, void *data)
 
 static void destroy_devices(void)
 {
+	class_unregister(&zram_control_class);
 	idr_for_each(&zram_index_idr, &zram_remove_cb, NULL);
 	idr_destroy(&zram_index_idr);
 	unregister_blkdev(zram_major, "zram");
@@ -1306,14 +1391,23 @@ static int __init zram_init(void)
 {
 	int ret;
 
+	ret = class_register(&zram_control_class);
+	if (ret) {
+		pr_warn("Unable to register zram-control class\n");
+		return ret;
+	}
+
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_warn("Unable to get major number\n");
+		class_unregister(&zram_control_class);
 		return -EBUSY;
 	}
 
 	while (num_devices != 0) {
+		mutex_lock(&zram_index_mutex);
 		ret = zram_add();
+		mutex_unlock(&zram_index_mutex);
 		if (ret < 0)
 			goto out_error;
 		num_devices--;
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (9 preceding siblings ...)
  2015-05-04 12:39 ` [PATCHv4 10/10] zram: add dynamic device add/remove functionality Sergey Senozhatsky
@ 2015-05-06  5:01 ` Minchan Kim
  2015-05-06  5:25   ` Sergey Senozhatsky
  2015-05-06  5:31 ` Sergey Senozhatsky
  11 siblings, 1 reply; 26+ messages in thread
From: Minchan Kim @ 2015-05-06  5:01 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky

Hello Sergey,

On Mon, May 04, 2015 at 09:38:52PM +0900, Sergey Senozhatsky wrote:
> We currently don't support zram on-demand device creation.  The only way
> to have N zram devices is to specify num_devices module parameter (default
> value 1).  That means that if, for some reason, at some point, user wants
> to have N + 1 devies he/she must umount all the existing devices, unload
> the module, load the module passing num_devices equals to N + 1.
> 
> This patchset introduces zram-control sysfs class, which has two sysfs
> attrs:
> 
>  - zram_add     -- add a new zram device
>  - zram_remove  -- remove a specific (device_id) zram device
> 
>     Usage example:
>         # add a new specific zram device
>         cat /sys/class/zram-control/zram_add
>         1
> 
>         # remove a specific zram device
>         echo 4 > /sys/class/zram-control/zram_remove

I just reported bug. Please handle it.

Other nits:

1) How about inserting a step to reset before zram_remove?
IOW, user should echo "1" > /sys/block/zram[0-9]*/reset before
echo zram_id > /sys/class/zram-control/zram_remove.

Actually, I can't think any benefit other than consistency of
zram interface but you might have.

2) How about using hot_add/hot_remove?

/class/zram-control includes prefix zram meaning so I think
we don't need zram prefix of the knobs. Instead, let's add
*hot* which is more straightforward for representing *dynamic*.

What do you think about it?

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-06  5:01 ` [PATCHv4 00/10] add on-demand device creation Minchan Kim
@ 2015-05-06  5:25   ` Sergey Senozhatsky
  2015-05-06  6:52     ` Minchan Kim
  2015-05-06  7:07     ` Sergey Senozhatsky
  0 siblings, 2 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-06  5:25 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky


Hi,

On (05/06/15 14:01), Minchan Kim wrote:
> Hello Sergey,
> 
> On Mon, May 04, 2015 at 09:38:52PM +0900, Sergey Senozhatsky wrote:
> > We currently don't support zram on-demand device creation.  The only way
> > to have N zram devices is to specify num_devices module parameter (default
> > value 1).  That means that if, for some reason, at some point, user wants
> > to have N + 1 devies he/she must umount all the existing devices, unload
> > the module, load the module passing num_devices equals to N + 1.
> > 
> > This patchset introduces zram-control sysfs class, which has two sysfs
> > attrs:
> > 
> >  - zram_add     -- add a new zram device
> >  - zram_remove  -- remove a specific (device_id) zram device
> > 
> >     Usage example:
> >         # add a new specific zram device
> >         cat /sys/class/zram-control/zram_add
> >         1
> > 
> >         # remove a specific zram device
> >         echo 4 > /sys/class/zram-control/zram_remove
> 
> I just reported bug. Please handle it.

a-ha... found it:
http://lkml.iu.edu/hypermail/linux/kernel/1505.0/04389.html

will take a look. thanks!

> Other nits:
> 
> 1) How about inserting a step to reset before zram_remove?
> IOW, user should echo "1" > /sys/block/zram[0-9]*/reset before
> echo zram_id > /sys/class/zram-control/zram_remove.
> 
> Actually, I can't think any benefit other than consistency of
> zram interface but you might have.

well, I did this the way it is because there is no requirement to reset any
devices before `rmmod zram' (which eventually removes all zram devices from the
system), a set of umount-s is enough. so requiring both umount and reset before
hot_remove seems to be a bit different.

zram_remove() is called from both hot_remove and zram_exit()->destroy_devices()
(which requires reset step anyway). so I'm not sure about this change. do you
have any strong objections?


> 2) How about using hot_add/hot_remove?
> 
> /class/zram-control includes prefix zram meaning so I think
> we don't need zram prefix of the knobs. Instead, let's add
> *hot* which is more straightforward for representing *dynamic*.
> 
> What do you think about it?

ok. I can change it. I'll ask Andrew to drop the entire patch series and
will resubmit once we settle it down.

	-ss

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
                   ` (10 preceding siblings ...)
  2015-05-06  5:01 ` [PATCHv4 00/10] add on-demand device creation Minchan Kim
@ 2015-05-06  5:31 ` Sergey Senozhatsky
  11 siblings, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-06  5:31 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Sergey Senozhatsky, Minchan Kim, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

On (05/04/15 21:38), Sergey Senozhatsky wrote:
> We currently don't support zram on-demand device creation.  The only way
> to have N zram devices is to specify num_devices module parameter (default
> value 1).  That means that if, for some reason, at some point, user wants
> to have N + 1 devies he/she must umount all the existing devices, unload
> the module, load the module passing num_devices equals to N + 1.
> 
> This patchset introduces zram-control sysfs class, which has two sysfs
> attrs:
> 
>  - zram_add     -- add a new zram device
>  - zram_remove  -- remove a specific (device_id) zram device
> 
>     Usage example:
>         # add a new specific zram device
>         cat /sys/class/zram-control/zram_add
>         1
> 
>         # remove a specific zram device
>         echo 4 > /sys/class/zram-control/zram_remove
> 

Andrew,

could you please drop the entire series from -mm? Minchan has reported a
problem (requires some investigation) and proposed several changes, f.e.
to rename zram confrol's sysfs nodes to zram_control/{hot_add, hot_remove}.

to drop:

 zram-add-dynamic-device-add-remove-functionality.patch added to -mm tree
 zram-close-race-by-open-overriding.patch added to -mm tree
 zram-return-zram-device_id-from-zram_add.patch added to -mm tree
 zram-trivial-correct-flag-operations-comment.patch added to -mm tree
 zram-report-every-added-and-removed-device.patch added to -mm tree
 zram-reorganize-code-layout.patch added to -mm tree
 zram-remove-max_num_devices-limitation.patch added to -mm tree
 zram-use-idr-instead-of-zram_devices-array.patch added to -mm tree
 zram-add-compact-sysfs-entry-to-documentation.patch added to -mm tree
 zram-cosmetic-zram_attr_ro-code-formatting-tweak.patch added to -mm tree

thank you.

	-ss

> V4:
> -- add patch from Minchan to handle differently a deadlock suspected by lockdep
> -- use zram->claim in zram_remove()
> 
> V3:
> -- rebase against 4.1
> -- review comments from Minchan were addressed
> -- no sysfs RO tricks anymore
> 
> V2:
> -- quick rebase and cleanup in attempt to catch 4.1 merge window
> 
> 
> Minchan Kim (1):
>   zram: close race by open overriding
> 
> Sergey Senozhatsky (9):
>   zram: add `compact` sysfs entry to documentation
>   zram: cosmetic ZRAM_ATTR_RO code formatting tweak
>   zram: use idr instead of `zram_devices' array
>   zram: reorganize code layout
>   zram: remove max_num_devices limitation
>   zram: report every added and removed device
>   zram: trivial: correct flag operations comment
>   zram: return zram device_id from zram_add()
>   zram: add dynamic device add/remove functionality
> 
>  Documentation/ABI/testing/sysfs-class-zram |   24 +
>  Documentation/blockdev/zram.txt            |   29 +-
>  drivers/block/zram/zram_drv.c              | 1002 ++++++++++++++++------------
>  drivers/block/zram/zram_drv.h              |   10 +-
>  4 files changed, 612 insertions(+), 453 deletions(-)
>  create mode 100644 Documentation/ABI/testing/sysfs-class-zram
> 
> -- 
> 2.4.0
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-06  5:25   ` Sergey Senozhatsky
@ 2015-05-06  6:52     ` Minchan Kim
  2015-05-06  7:07     ` Sergey Senozhatsky
  1 sibling, 0 replies; 26+ messages in thread
From: Minchan Kim @ 2015-05-06  6:52 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel

On Wed, May 06, 2015 at 02:25:57PM +0900, Sergey Senozhatsky wrote:
> 
> Hi,
> 
> On (05/06/15 14:01), Minchan Kim wrote:
> > Hello Sergey,
> > 
> > On Mon, May 04, 2015 at 09:38:52PM +0900, Sergey Senozhatsky wrote:
> > > We currently don't support zram on-demand device creation.  The only way
> > > to have N zram devices is to specify num_devices module parameter (default
> > > value 1).  That means that if, for some reason, at some point, user wants
> > > to have N + 1 devies he/she must umount all the existing devices, unload
> > > the module, load the module passing num_devices equals to N + 1.
> > > 
> > > This patchset introduces zram-control sysfs class, which has two sysfs
> > > attrs:
> > > 
> > >  - zram_add     -- add a new zram device
> > >  - zram_remove  -- remove a specific (device_id) zram device
> > > 
> > >     Usage example:
> > >         # add a new specific zram device
> > >         cat /sys/class/zram-control/zram_add
> > >         1
> > > 
> > >         # remove a specific zram device
> > >         echo 4 > /sys/class/zram-control/zram_remove
> > 
> > I just reported bug. Please handle it.
> 
> a-ha... found it:
> http://lkml.iu.edu/hypermail/linux/kernel/1505.0/04389.html
> 
> will take a look. thanks!
> 
> > Other nits:
> > 
> > 1) How about inserting a step to reset before zram_remove?
> > IOW, user should echo "1" > /sys/block/zram[0-9]*/reset before
> > echo zram_id > /sys/class/zram-control/zram_remove.
> > 
> > Actually, I can't think any benefit other than consistency of
> > zram interface but you might have.
> 
> well, I did this the way it is because there is no requirement to reset any
> devices before `rmmod zram' (which eventually removes all zram devices from the
> system), a set of umount-s is enough. so requiring both umount and reset before
> hot_remove seems to be a bit different.

Okay.

> 
> zram_remove() is called from both hot_remove and zram_exit()->destroy_devices()
> (which requires reset step anyway). so I'm not sure about this change. do you
> have any strong objections?

No. I just wanted to know you have any idea.

> 
> 
> > 2) How about using hot_add/hot_remove?
> > 
> > /class/zram-control includes prefix zram meaning so I think
> > we don't need zram prefix of the knobs. Instead, let's add
> > *hot* which is more straightforward for representing *dynamic*.
> > 
> > What do you think about it?
> 
> ok. I can change it. I'll ask Andrew to drop the entire patch series and
> will resubmit once we settle it down.
> 

Thanks!

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-06  5:25   ` Sergey Senozhatsky
  2015-05-06  6:52     ` Minchan Kim
@ 2015-05-06  7:07     ` Sergey Senozhatsky
  2015-05-06  7:28       ` Minchan Kim
  1 sibling, 1 reply; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-06  7:07 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Minchan Kim, Sergey Senozhatsky, Andrew Morton, Nitin Gupta,
	linux-kernel

On (05/06/15 14:25), Sergey Senozhatsky wrote:
> a-ha... found it:
> http://lkml.iu.edu/hypermail/linux/kernel/1505.0/04389.html
> 

well... that's weird. can you reproduce this easily?


we do

zram_add():		zram_remove():
-------------------------------------------
mutex_lock(idr);
idr_alloc()
...
->major = 252;
->first_minor = idr index;
add_disk();
mumutex_unlock(idr);
			mutex_lock(idr);
			...
			idr_remove(idr index);
			del_gendisk();
			put_disk();
			mutex_unlock(idr);


script attempted to create a new device with minor (idr index) 2, but there
was a kernfs node from already removed zram2: '/devices/virtual/bdi/252:2'

from your logs:
...
[ 98.756017] zram: Removed device: zram2
[ 98.757087] ------------[ cut here ]------------
...

locked zram_index_mutex, removed zram2, unlocked zram_index_mutex,
locked zram_index_mutex, attempted to create zram2: zram2 sysfs already exist.


hm... need to think. zram hot/remove is serialized.




a separate issue:


/*
 * FIXME: error handling
 */
void add_disk(struct gendisk *disk)    does not handle any errors at all nor it
reports any errors back to caller:

 612         /* Register BDI before referencing it from bdev */
 613         bdi = &disk->queue->backing_dev_info;
 614         bdi_register_dev(bdi, disk_devt(disk));
 615 
 616         blk_register_region(disk_devt(disk), disk->minors, NULL,
 617                             exact_match, exact_lock, disk);
 618         register_disk(disk);
 619         blk_register_queue(disk);
 620 
 621         /*
 622          * Take an extra ref on queue which will be put on disk_release()
 623          * so that it sticks around as long as @disk is there.
 624          */
 625         WARN_ON_ONCE(!blk_get_queue(disk->queue));
 626 
 627         retval = sysfs_create_link(&disk_to_dev(disk)->kobj, &bdi->dev->kobj,
 628                                    "bdi");
 629         WARN_ON(retval);


if bdi_register_dev()->device_add() fails (for example), then sysfs_create_link()
is just "expected" to cause:


[ 98.819810] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
[ 98.820591] IP: [<ffffffff8126da50>] sysfs_do_create_link_sd.isra.2+0x40/0xd0
[ 98.821290] PGD 61669067 PUD 61553067 PMD 0
[ 98.821709] Oops: 0000 [#1] SMP
[..]
[ 98.823663] Call Trace:
[ 98.823663] [<ffffffff8135931b>] ? disk_part_iter_next+0x2b/0x280
[ 98.823663] [<ffffffff8126db05>] sysfs_create_link+0x25/0x50
[ 98.823663] [<ffffffff81359fc2>] add_disk+0x252/0x4e0


or:

[ 98.823663] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:21
[ 98.823663] in_atomic(): 1, irqs_disabled(): 1, pid: 2952, name: cat
[ 98.823663] INFO: lockdep is turned off.
[ 98.823663] irq event stamp: 3392
[ 98.823663] hardirqs last enabled at (3391): [<ffffffff81644133>] __mutex_unlock_slowpath+0xb3/0x180
[ 98.823663] hardirqs last disabled at (3392): [<ffffffff81648d33>] error_sti+0x5/0x6
[ 98.823663] softirqs last enabled at (0): [<ffffffff8105c6a9>] copy_process.part.35+0x4d9/0x1ce0
[ 98.823663] softirqs last disabled at (0): [< (null)>] (null)
[ 98.823663] CPU: 5 PID: 2952 Comm: cat Tainted: G D W 4.1.0-rc2-next-20150505+ #1260
[ 98.823663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
[ 98.823663] ffff88005d40f8c8 ffff88005d40f8c8 ffffffff8163d8d8 0000000000000000
[ 98.823663] 0000000000000000 ffff88005d40f8f8 ffffffff8108c18e ffffffff8108c005
[ 98.823663] ffffffff819ec8b3 0000000000000015 0000000000000000 ffff88005d40f928
[ 98.823663] Call Trace:
[ 98.823663] [<ffffffff8163d8d8>] dump_stack+0x4c/0x65
[ 98.823663] [<ffffffff8108c18e>] ___might_sleep+0x18e/0x250
[ 98.823663] [<ffffffff8108c005>] ? ___might_sleep+0x5/0x250
[ 98.823663] [<ffffffff8108c29d>] __might_sleep+0x4d/0x90
[ 98.823663] [<ffffffff8108c255>] ? __might_sleep+0x5/0x90
[ 98.823663] [<ffffffff81644494>] down_read+0x24/0x70
[ 98.823663] [<ffffffff81644475>] ? down_read+0x5/0x70
[ 98.823663] [<ffffffff810722a4>] ? exit_signals+0x24/0x130
[ 98.823663] [<ffffffff810722a4>] exit_signals+0x24/0x130
[ 98.823663] [<ffffffff81072285>] ? exit_signals+0x5/0x130
[ 98.823663] [<ffffffff810606f8>] ? do_exit+0xb8/0xbc0
[ 98.823663] [<ffffffff810606f8>] do_exit+0xb8/0xbc0
[ 98.823663] [<ffffffff81060645>] ? do_exit+0x5/0xbc0
[ 98.823663] [<ffffffff810c4a79>] ? kmsg_dump+0x119/0x1a0
[ 98.823663] [<ffffffff810c4985>] ? kmsg_dump+0x25/0x1a0
[ 98.823663] [<ffffffff8100665e>] oops_end+0x8e/0xd0
[ 98.823663] [<ffffffff810065d5>] ? oops_end+0x5/0xd0
[ 98.823663] [<ffffffff81637f65>] no_context+0x2d9/0x33e
[ 98.823663] [<ffffffff81637c91>] ? no_context+0x5/0x33e
[ 98.823663] [<ffffffff81638042>] __bad_area_nosemaphore+0x78/0x1d1
[ 98.823663] [<ffffffff816381a0>] ? bad_area_nosemaphore+0x5/0x15
[ 98.823663] [<ffffffff816381ae>] bad_area_nosemaphore+0x13/0x15
[ 98.823663] [<ffffffff8104c4de>] __do_page_fault+0x9e/0x490
[ 98.823663] [<ffffffff8104c445>] ? __do_page_fault+0x5/0x490
[ 98.823663] [<ffffffff8104c8dc>] do_page_fault+0xc/0x10
[ 98.823663] [<ffffffff81648b62>] page_fault+0x22/0x30
[ 98.823663] [<ffffffff8126da50>] ? sysfs_do_create_link_sd.isra.2+0x40/0xd0
[ 98.823663] [<ffffffff8126da50>] ? sysfs_do_create_link_sd.isra.2+0x40/0xd0
[ 98.823663] [<ffffffff8135931b>] ? disk_part_iter_next+0x2b/0x280
[ 98.823663] [<ffffffff8126db05>] sysfs_create_link+0x25/0x50
[ 98.823663] [<ffffffff81359fc2>] add_disk+0x252/0x4e0


there are lots of places in add_disk() that can potentially fail. it's probably time
to "fix" it. including numerous add_disk() callers, that don't check for errors.

	-ss

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-06  7:07     ` Sergey Senozhatsky
@ 2015-05-06  7:28       ` Minchan Kim
  2015-05-06  8:10         ` Sergey Senozhatsky
  0 siblings, 1 reply; 26+ messages in thread
From: Minchan Kim @ 2015-05-06  7:28 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel

On Wed, May 06, 2015 at 04:07:56PM +0900, Sergey Senozhatsky wrote:
> On (05/06/15 14:25), Sergey Senozhatsky wrote:
> > a-ha... found it:
> > http://lkml.iu.edu/hypermail/linux/kernel/1505.0/04389.html
> > 
> 
> well... that's weird. can you reproduce this easily?

Easy in a second.

> 
> 
> we do
> 
> zram_add():		zram_remove():
> -------------------------------------------
> mutex_lock(idr);
> idr_alloc()
> ...
> ->major = 252;
> ->first_minor = idr index;
> add_disk();
> mumutex_unlock(idr);
> 			mutex_lock(idr);
> 			...
> 			idr_remove(idr index);
> 			del_gendisk();
> 			put_disk();
> 			mutex_unlock(idr);
> 
> 
> script attempted to create a new device with minor (idr index) 2, but there
> was a kernfs node from already removed zram2: '/devices/virtual/bdi/252:2'
> 
> from your logs:
> ...
> [ 98.756017] zram: Removed device: zram2
> [ 98.757087] ------------[ cut here ]------------
> ...
> 
> locked zram_index_mutex, removed zram2, unlocked zram_index_mutex,
> locked zram_index_mutex, attempted to create zram2: zram2 sysfs already exist.
> 
> 
> hm... need to think. zram hot/remove is serialized.

I never look at the code but I doubt others(ex, some admin process on my machine)
holds a ref so kobj doesn't disapper yet but zram_add try to create it?

> 
> 
> 
> 
> a separate issue:
> 
> 
> /*
>  * FIXME: error handling
>  */
> void add_disk(struct gendisk *disk)    does not handle any errors at all nor it
> reports any errors back to caller:
> 
>  612         /* Register BDI before referencing it from bdev */
>  613         bdi = &disk->queue->backing_dev_info;
>  614         bdi_register_dev(bdi, disk_devt(disk));
>  615 
>  616         blk_register_region(disk_devt(disk), disk->minors, NULL,
>  617                             exact_match, exact_lock, disk);
>  618         register_disk(disk);
>  619         blk_register_queue(disk);
>  620 
>  621         /*
>  622          * Take an extra ref on queue which will be put on disk_release()
>  623          * so that it sticks around as long as @disk is there.
>  624          */
>  625         WARN_ON_ONCE(!blk_get_queue(disk->queue));
>  626 
>  627         retval = sysfs_create_link(&disk_to_dev(disk)->kobj, &bdi->dev->kobj,
>  628                                    "bdi");
>  629         WARN_ON(retval);
> 
> 
> if bdi_register_dev()->device_add() fails (for example), then sysfs_create_link()
> is just "expected" to cause:
> 
> 
> [ 98.819810] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
> [ 98.820591] IP: [<ffffffff8126da50>] sysfs_do_create_link_sd.isra.2+0x40/0xd0
> [ 98.821290] PGD 61669067 PUD 61553067 PMD 0
> [ 98.821709] Oops: 0000 [#1] SMP
> [..]
> [ 98.823663] Call Trace:
> [ 98.823663] [<ffffffff8135931b>] ? disk_part_iter_next+0x2b/0x280
> [ 98.823663] [<ffffffff8126db05>] sysfs_create_link+0x25/0x50
> [ 98.823663] [<ffffffff81359fc2>] add_disk+0x252/0x4e0
> 
> 
> or:
> 
> [ 98.823663] BUG: sleeping function called from invalid context at kernel/locking/rwsem.c:21
> [ 98.823663] in_atomic(): 1, irqs_disabled(): 1, pid: 2952, name: cat
> [ 98.823663] INFO: lockdep is turned off.
> [ 98.823663] irq event stamp: 3392
> [ 98.823663] hardirqs last enabled at (3391): [<ffffffff81644133>] __mutex_unlock_slowpath+0xb3/0x180
> [ 98.823663] hardirqs last disabled at (3392): [<ffffffff81648d33>] error_sti+0x5/0x6
> [ 98.823663] softirqs last enabled at (0): [<ffffffff8105c6a9>] copy_process.part.35+0x4d9/0x1ce0
> [ 98.823663] softirqs last disabled at (0): [< (null)>] (null)
> [ 98.823663] CPU: 5 PID: 2952 Comm: cat Tainted: G D W 4.1.0-rc2-next-20150505+ #1260
> [ 98.823663] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
> [ 98.823663] ffff88005d40f8c8 ffff88005d40f8c8 ffffffff8163d8d8 0000000000000000
> [ 98.823663] 0000000000000000 ffff88005d40f8f8 ffffffff8108c18e ffffffff8108c005
> [ 98.823663] ffffffff819ec8b3 0000000000000015 0000000000000000 ffff88005d40f928
> [ 98.823663] Call Trace:
> [ 98.823663] [<ffffffff8163d8d8>] dump_stack+0x4c/0x65
> [ 98.823663] [<ffffffff8108c18e>] ___might_sleep+0x18e/0x250
> [ 98.823663] [<ffffffff8108c005>] ? ___might_sleep+0x5/0x250
> [ 98.823663] [<ffffffff8108c29d>] __might_sleep+0x4d/0x90
> [ 98.823663] [<ffffffff8108c255>] ? __might_sleep+0x5/0x90
> [ 98.823663] [<ffffffff81644494>] down_read+0x24/0x70
> [ 98.823663] [<ffffffff81644475>] ? down_read+0x5/0x70
> [ 98.823663] [<ffffffff810722a4>] ? exit_signals+0x24/0x130
> [ 98.823663] [<ffffffff810722a4>] exit_signals+0x24/0x130
> [ 98.823663] [<ffffffff81072285>] ? exit_signals+0x5/0x130
> [ 98.823663] [<ffffffff810606f8>] ? do_exit+0xb8/0xbc0
> [ 98.823663] [<ffffffff810606f8>] do_exit+0xb8/0xbc0
> [ 98.823663] [<ffffffff81060645>] ? do_exit+0x5/0xbc0
> [ 98.823663] [<ffffffff810c4a79>] ? kmsg_dump+0x119/0x1a0
> [ 98.823663] [<ffffffff810c4985>] ? kmsg_dump+0x25/0x1a0
> [ 98.823663] [<ffffffff8100665e>] oops_end+0x8e/0xd0
> [ 98.823663] [<ffffffff810065d5>] ? oops_end+0x5/0xd0
> [ 98.823663] [<ffffffff81637f65>] no_context+0x2d9/0x33e
> [ 98.823663] [<ffffffff81637c91>] ? no_context+0x5/0x33e
> [ 98.823663] [<ffffffff81638042>] __bad_area_nosemaphore+0x78/0x1d1
> [ 98.823663] [<ffffffff816381a0>] ? bad_area_nosemaphore+0x5/0x15
> [ 98.823663] [<ffffffff816381ae>] bad_area_nosemaphore+0x13/0x15
> [ 98.823663] [<ffffffff8104c4de>] __do_page_fault+0x9e/0x490
> [ 98.823663] [<ffffffff8104c445>] ? __do_page_fault+0x5/0x490
> [ 98.823663] [<ffffffff8104c8dc>] do_page_fault+0xc/0x10
> [ 98.823663] [<ffffffff81648b62>] page_fault+0x22/0x30
> [ 98.823663] [<ffffffff8126da50>] ? sysfs_do_create_link_sd.isra.2+0x40/0xd0
> [ 98.823663] [<ffffffff8126da50>] ? sysfs_do_create_link_sd.isra.2+0x40/0xd0
> [ 98.823663] [<ffffffff8135931b>] ? disk_part_iter_next+0x2b/0x280
> [ 98.823663] [<ffffffff8126db05>] sysfs_create_link+0x25/0x50
> [ 98.823663] [<ffffffff81359fc2>] add_disk+0x252/0x4e0
> 
> 
> there are lots of places in add_disk() that can potentially fail. it's probably time
> to "fix" it. including numerous add_disk() callers, that don't check for errors.
> 
> 	-ss

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-06  7:28       ` Minchan Kim
@ 2015-05-06  8:10         ` Sergey Senozhatsky
  2015-05-06  8:20           ` Minchan Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-06  8:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Nitin Gupta, linux-kernel

On (05/06/15 16:28), Minchan Kim wrote:
> > 
> > from your logs:
> > ...
> > [ 98.756017] zram: Removed device: zram2
> > [ 98.757087] ------------[ cut here ]------------
> > ...
> > 
> > locked zram_index_mutex, removed zram2, unlocked zram_index_mutex,
> > locked zram_index_mutex, attempted to create zram2: zram2 sysfs already exist.
> > 
> > 
> > hm... need to think. zram hot/remove is serialized.
> 
> I never look at the code but I doubt others(ex, some admin process on my machine)
> holds a ref so kobj doesn't disapper yet but zram_add try to create it?
> 

well, something like this is certainly happening. hm, if this is the
case, then a very quick thing I can think of is to stop re-using previously
used zram ids. add_disk() doesn't give us a chance to handle any errors and
testing for sysfs leftovers in zram_add() looks weird.


btw, do you also want me to rename zram-control sysfs handlers?
zram_add -> zram_hot_add() ? i think zram_add()/zram_remove() is just
ok.



can you please test this patch?

---

 drivers/block/zram/zram_drv.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e6c4316..b31f0c20 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1184,7 +1184,7 @@ static int zram_add(void)
 	if (!zram)
 		return -ENOMEM;
 
-	ret = idr_alloc(&zram_index_idr, zram, 0, 0, GFP_KERNEL);
+	ret = idr_alloc_cyclic(&zram_index_idr, zram, 0, 0, GFP_KERNEL);
 	if (ret < 0)
 		goto out_free_dev;
 	device_id = ret;

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-06  8:10         ` Sergey Senozhatsky
@ 2015-05-06  8:20           ` Minchan Kim
  2015-05-07  0:33             ` Sergey Senozhatsky
  0 siblings, 1 reply; 26+ messages in thread
From: Minchan Kim @ 2015-05-06  8:20 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel

On Wed, May 06, 2015 at 05:10:29PM +0900, Sergey Senozhatsky wrote:
> On (05/06/15 16:28), Minchan Kim wrote:
> > > 
> > > from your logs:
> > > ...
> > > [ 98.756017] zram: Removed device: zram2
> > > [ 98.757087] ------------[ cut here ]------------
> > > ...
> > > 
> > > locked zram_index_mutex, removed zram2, unlocked zram_index_mutex,
> > > locked zram_index_mutex, attempted to create zram2: zram2 sysfs already exist.
> > > 
> > > 
> > > hm... need to think. zram hot/remove is serialized.
> > 
> > I never look at the code but I doubt others(ex, some admin process on my machine)
> > holds a ref so kobj doesn't disapper yet but zram_add try to create it?
> > 
> 
> well, something like this is certainly happening. hm, if this is the
> case, then a very quick thing I can think of is to stop re-using previously
> used zram ids. add_disk() doesn't give us a chance to handle any errors and
> testing for sysfs leftovers in zram_add() looks weird.
> 
> 
> btw, do you also want me to rename zram-control sysfs handlers?
> zram_add -> zram_hot_add() ? i think zram_add()/zram_remove() is just
> ok.

I'm fine, too.

> 
> 
> 
> can you please test this patch?

I tested it and couldn't reproduce it during 5 minutes so it seems
we are correct about culpit.
However, it's trick and we should find a right way. :)

> 
> ---
> 
>  drivers/block/zram/zram_drv.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index e6c4316..b31f0c20 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1184,7 +1184,7 @@ static int zram_add(void)
>  	if (!zram)
>  		return -ENOMEM;
>  
> -	ret = idr_alloc(&zram_index_idr, zram, 0, 0, GFP_KERNEL);
> +	ret = idr_alloc_cyclic(&zram_index_idr, zram, 0, 0, GFP_KERNEL);
>  	if (ret < 0)
>  		goto out_free_dev;
>  	device_id = ret;

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-06  8:20           ` Minchan Kim
@ 2015-05-07  0:33             ` Sergey Senozhatsky
  2015-05-07  7:41               ` Minchan Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-07  0:33 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Nitin Gupta, linux-kernel

On (05/06/15 17:20), Minchan Kim wrote:
> I'm fine, too.
> 
> > 
> > 
> > 
> > can you please test this patch?
> 
> I tested it and couldn't reproduce it during 5 minutes so it seems
> we are correct about culpit.
> However, it's trick and we should find a right way. :)
> 

well, I can't reproduce this race.

btw, just for info. your script does

zram_add
dd &
zram_remove

by the time dd starts writing zramX device may be gone, so along with
block devices we can see regular files in /dev/

[..]
brw-rw---- 1 root disk   254, 0 May  7 09:16 /dev/zram0
-rw-r--r-- 1 root root 10485760 May  7 09:17 /dev/zram1
-rw-r--r-- 1 root root 10485760 May  7 09:20 /dev/zram10
[..]


can you please test thit patch?

---

 drivers/block/zram/zram_drv.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index e6c4316..1e99d9f 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1278,6 +1278,8 @@ out_free_dev:
 static int zram_remove(struct zram *zram)
 {
 	struct block_device *bdev;
+	struct request_queue *q;
+	int device_id;
 
 	bdev = bdget_disk(zram->disk, 0);
 	if (!bdev)
@@ -1306,14 +1308,18 @@ static int zram_remove(struct zram *zram)
 	/* Make sure all the pending I/O are finished */
 	fsync_bdev(bdev);
 	zram_reset_device(zram);
-	bdput(bdev);
+
+	q = zram->disk->queue;
+	device_id = zram->disk->first_minor;
 
 	pr_info("Removed device: %s\n", zram->disk->disk_name);
 
-	idr_remove(&zram_index_idr, zram->disk->first_minor);
-	blk_cleanup_queue(zram->disk->queue);
 	del_gendisk(zram->disk);
 	put_disk(zram->disk);
+	blk_cleanup_queue(q);
+	bdput(bdev);
+
+	idr_remove(&zram_index_idr, device_id);
 	kfree(zram);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-07  0:33             ` Sergey Senozhatsky
@ 2015-05-07  7:41               ` Minchan Kim
  2015-05-07 12:09                 ` Sergey Senozhatsky
  0 siblings, 1 reply; 26+ messages in thread
From: Minchan Kim @ 2015-05-07  7:41 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel

On Thu, May 07, 2015 at 09:33:22AM +0900, Sergey Senozhatsky wrote:
> On (05/06/15 17:20), Minchan Kim wrote:
> > I'm fine, too.
> > 
> > > 
> > > 
> > > 
> > > can you please test this patch?
> > 
> > I tested it and couldn't reproduce it during 5 minutes so it seems
> > we are correct about culpit.
> > However, it's trick and we should find a right way. :)
> > 
> 
> well, I can't reproduce this race.
> 
> btw, just for info. your script does
> 
> zram_add
> dd &
> zram_remove
> 
> by the time dd starts writing zramX device may be gone, so along with
> block devices we can see regular files in /dev/
> 
> [..]
> brw-rw---- 1 root disk   254, 0 May  7 09:16 /dev/zram0
> -rw-r--r-- 1 root root 10485760 May  7 09:17 /dev/zram1
> -rw-r--r-- 1 root root 10485760 May  7 09:20 /dev/zram10
> [..]
> 
> 
> can you please test thit patch?

Hello Sergey,

Unfortunately, I can reproduce in a few second with this patch.

> 
> ---
> 
>  drivers/block/zram/zram_drv.c | 12 +++++++++---
>  1 file changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index e6c4316..1e99d9f 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -1278,6 +1278,8 @@ out_free_dev:
>  static int zram_remove(struct zram *zram)
>  {
>  	struct block_device *bdev;
> +	struct request_queue *q;
> +	int device_id;
>  
>  	bdev = bdget_disk(zram->disk, 0);
>  	if (!bdev)
> @@ -1306,14 +1308,18 @@ static int zram_remove(struct zram *zram)
>  	/* Make sure all the pending I/O are finished */
>  	fsync_bdev(bdev);
>  	zram_reset_device(zram);
> -	bdput(bdev);
> +
> +	q = zram->disk->queue;
> +	device_id = zram->disk->first_minor;
>  
>  	pr_info("Removed device: %s\n", zram->disk->disk_name);
>  
> -	idr_remove(&zram_index_idr, zram->disk->first_minor);
> -	blk_cleanup_queue(zram->disk->queue);
>  	del_gendisk(zram->disk);
>  	put_disk(zram->disk);
> +	blk_cleanup_queue(q);
> +	bdput(bdev);
> +
> +	idr_remove(&zram_index_idr, device_id);
>  	kfree(zram);
>  	return 0;
>  }

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-07  7:41               ` Minchan Kim
@ 2015-05-07 12:09                 ` Sergey Senozhatsky
  2015-05-07 13:04                   ` Sergey Senozhatsky
  0 siblings, 1 reply; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-07 12:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Sergey Senozhatsky, Andrew Morton,
	Nitin Gupta, linux-kernel

On (05/07/15 16:41), Minchan Kim wrote:
> 
> Unfortunately, I can reproduce in a few second with this patch.
> 

so I googled a bit, and it seems that zram is not the only one who suffered
from add_disk() behaviour.

http://www.spinics.net/lists/dm-devel/msg23465.html

and there is
http://git.kernel.dk/?p=linux-block.git;a=commitdiff;h=6cd18e71;hp=393a33970540ac6a2c894b0d6ef3f5d485860884

that looks interesting:

|
| Because of the peculiar way that md devices are created (automatically
| when the device node is opened), a new device can be created and
| registered immediately after the
| blk_unregister_region(disk_devt(disk), disk->minors);
| call in del_gendisk().
|
| Therefore it is important that all visible artifacts of the previous
| device are removed before this call.  In particular, the 'bdi'.
|

basically, it destroys bdi during queue cleanup.


> moved the
>     device_unregister(bdi->dev);
> call from bdi_unregister() to bdi_destroy() it has been quite easy to
> lose a race and have a new (e.g.) "md127" be created after the
> blk_unregister_region() call and before bdi_destroy() is ultimately
> called by the final 'put_disk', which must come after del_gendisk().
>
> The new device finds that the bdi name is already registered in sysfs
> and complains
>
>> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
>> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
>
> We can fix this by moving the bdi_destroy() call out of
> blk_release_queue() (which can happen very late when a refcount
> reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> device driver calls it.


that does look like something that can happen in our case.


and here:
http://www.spinics.net/lists/dm-devel/msg23415.html


is there any chance to ask you to test with these patches (no rush, take your time)?
as I'm still unable to reproduce it locally.


	-ss

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-07 12:09                 ` Sergey Senozhatsky
@ 2015-05-07 13:04                   ` Sergey Senozhatsky
  2015-05-07 15:11                     ` Minchan Kim
  0 siblings, 1 reply; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-07 13:04 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky,
	Sergey Senozhatsky

[-- Attachment #1: Type: text/plain, Size: 418 bytes --]

On (05/07/15 21:09), Sergey Senozhatsky wrote:
[..]
> 
> and there is
> http://git.kernel.dk/?p=linux-block.git;a=commitdiff;h=6cd18e71;hp=393a33970540ac6a2c894b0d6ef3f5d485860884
> 
[..]
> and here:
> http://www.spinics.net/lists/dm-devel/msg23415.html
> 
> 
> is there any chance to ask you to test with these patches (no rush, take your time)?
> as I'm still unable to reproduce it locally.

pathes attached.

	-ss

[-- Attachment #2: 0002-block-destroy-bdi-before-blockdev-is-unregistered.patch --]
[-- Type: text/x-diff, Size: 3812 bytes --]

>From 7056096317825dbe582d005385bc4077ecb62c76 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Mon, 27 Apr 2015 04:12:22 +1000
Subject: [PATCH 2/2] block: destroy bdi before blockdev is unregistered.

Because of the peculiar way that md devices are created (automatically
when the device node is opened), a new device can be created and
registered immediately after the
	blk_unregister_region(disk_devt(disk), disk->minors);
call in del_gendisk().

Therefore it is important that all visible artifacts of the previous
device are removed before this call.  In particular, the 'bdi'.

Since:
commit c4db59d31e39ea067c32163ac961e9c80198fd37
Author: Christoph Hellwig <hch@lst.de>
    fs: don't reassign dirty inodes to default_backing_dev_info

moved the
   device_unregister(bdi->dev);
call from bdi_unregister() to bdi_destroy() it has been quite easy to
lose a race and have a new (e.g.) "md127" be created after the
blk_unregister_region() call and before bdi_destroy() is ultimately
called by the final 'put_disk', which must come after del_gendisk().

The new device finds that the bdi name is already registered in sysfs
and complains

> [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'

We can fix this by moving the bdi_destroy() call out of
blk_release_queue() (which can happen very late when a refcount
reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
device driver calls it.

Then it is only necessary for md to call blk_cleanup_queue() before
del_gendisk().  As loop.c devices are also created on demand by
opening the device node, we make the same change there.

Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: stable@vger.kernel.org (v4.0)
Signed-off-by: NeilBrown <neilb@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
---
 block/blk-core.c     | 2 ++
 block/blk-sysfs.c    | 2 --
 drivers/block/loop.c | 2 +-
 drivers/md/md.c      | 4 ++--
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index fd154b9..7871603 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
 		q->queue_lock = &q->__queue_lock;
 	spin_unlock_irq(lock);
 
+	bdi_destroy(&q->backing_dev_info);
+
 	/* @q is and will stay empty, shutdown and put */
 	blk_put_queue(q);
 }
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index faaf36a..2b8fd30 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
 
 	blk_trace_shutdown(q);
 
-	bdi_destroy(&q->backing_dev_info);
-
 	ida_simple_remove(&blk_queue_ida, q->id);
 	call_rcu(&q->rcu_head, blk_free_queue_rcu);
 }
diff --git a/drivers/block/loop.c b/drivers/block/loop.c
index ae3fcb4..d7173cb 100644
--- a/drivers/block/loop.c
+++ b/drivers/block/loop.c
@@ -1620,8 +1620,8 @@ out:
 
 static void loop_remove(struct loop_device *lo)
 {
-	del_gendisk(lo->lo_disk);
 	blk_cleanup_queue(lo->lo_queue);
+	del_gendisk(lo->lo_disk);
 	blk_mq_free_tag_set(&lo->tag_set);
 	put_disk(lo->lo_disk);
 	kfree(lo);
diff --git a/drivers/md/md.c b/drivers/md/md.c
index d4f31e1..593a024 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
 	if (mddev->sysfs_state)
 		sysfs_put(mddev->sysfs_state);
 
+	if (mddev->queue)
+		blk_cleanup_queue(mddev->queue);
 	if (mddev->gendisk) {
 		del_gendisk(mddev->gendisk);
 		put_disk(mddev->gendisk);
 	}
-	if (mddev->queue)
-		blk_cleanup_queue(mddev->queue);
 
 	kfree(mddev);
 }
-- 
2.4.0


[-- Attachment #3: 0001-block-bdi_unregister-now-contains-very-little-functi.patch --]
[-- Type: text/x-diff, Size: 4438 bytes --]

>From ef8878259ce360ec352b591746761249393cf952 Mon Sep 17 00:00:00 2001
From: NeilBrown <neilb@suse.de>
Date: Thu, 7 May 2015 21:49:41 +0900
Subject: [PATCH 1/2] block: bdi_unregister() now contains very little
 functionality.

It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
real consequence as bdi->dev isn't needed by anything else in the function,
and it triggers if
   blk_cleanup_queue() -> bdi_destroy()
is called before bdi_unregister, which a subsequent patch will make happen.
So this isn't wanted.

It also calls bdi_set_min_ratio().  This needs to be called after
writes through the bdi have all been flushed, and before the bdi is destroyed.
Calling it early is better than calling it late as it frees up a global
resource.

Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
perfectly fits these requirements.

So bdi_unregister can be discarded with the important content moved to
bdi_destroy, as can the
  writeback_bdi_unregister
event which is already not used.

This is tagged for 'stable' as it is a pre-requisite for a subsequent
patch which moves calls to blk_cleanup_queue() before calls to
del_gendisk().  The commit identified as 'Fixes' removed a lot of
other functionality from bdi_unregister(), and made a change which
necessitated moving the blk_cleanup_queue() calls.

Reported-by: Mike Snitzer <snitzer@redhat.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org (v4.0)
Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
Signed-off-by: NeilBrown <neilb@suse.de>
---
 block/genhd.c                    |  1 -
 include/linux/backing-dev.h      |  1 -
 include/trace/events/writeback.h |  1 -
 mm/backing-dev.c                 | 18 +-----------------
 4 files changed, 1 insertion(+), 20 deletions(-)

diff --git a/block/genhd.c b/block/genhd.c
index 64600e9..2fba82c 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -653,7 +653,6 @@ void del_gendisk(struct gendisk *disk)
 	disk->flags &= ~GENHD_FL_UP;
 
 	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
-	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_queue(disk);
 	blk_unregister_region(disk_devt(disk), disk->minors);
 
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index aff923a..d87d8ec 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -116,7 +116,6 @@ __printf(3, 4)
 int bdi_register(struct backing_dev_info *bdi, struct device *parent,
 		const char *fmt, ...);
 int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
-void bdi_unregister(struct backing_dev_info *bdi);
 int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
 void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
 			enum wb_reason reason);
diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
index 880dd74..c178d13 100644
--- a/include/trace/events/writeback.h
+++ b/include/trace/events/writeback.h
@@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
 DEFINE_WRITEBACK_EVENT(writeback_nowork);
 DEFINE_WRITEBACK_EVENT(writeback_wake_background);
 DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
-DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
 
 DECLARE_EVENT_CLASS(wbc_class,
 	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 6dc4580..000e7b3 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
 	flush_delayed_work(&bdi->wb.dwork);
 }
 
-/*
- * Called when the device behind @bdi has been removed or ejected.
- *
- * We can't really do much here except for reducing the dirty ratio at
- * the moment.  In the future we should be able to set a flag so that
- * the filesystem can handle errors at mark_inode_dirty time instead
- * of only at writeback time.
- */
-void bdi_unregister(struct backing_dev_info *bdi)
-{
-	if (WARN_ON_ONCE(!bdi->dev))
-		return;
-
-	bdi_set_min_ratio(bdi, 0);
-}
-EXPORT_SYMBOL(bdi_unregister);
-
 static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
 {
 	memset(wb, 0, sizeof(*wb));
@@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
 	int i;
 
 	bdi_wb_shutdown(bdi);
+	bdi_set_min_ratio(bdi, 0);
 
 	WARN_ON(!list_empty(&bdi->work_list));
 	WARN_ON(delayed_work_pending(&bdi->wb.dwork));
-- 
2.4.0


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-07 13:04                   ` Sergey Senozhatsky
@ 2015-05-07 15:11                     ` Minchan Kim
  2015-05-08  0:15                       ` Sergey Senozhatsky
  2015-05-10  8:47                       ` Sergey Senozhatsky
  0 siblings, 2 replies; 26+ messages in thread
From: Minchan Kim @ 2015-05-07 15:11 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, Nitin Gupta, linux-kernel, Sergey Senozhatsky

Hey,

On Thu, May 07, 2015 at 10:04:54PM +0900, Sergey Senozhatsky wrote:
> On (05/07/15 21:09), Sergey Senozhatsky wrote:
> [..]
> > 
> > and there is
> > http://git.kernel.dk/?p=linux-block.git;a=commitdiff;h=6cd18e71;hp=393a33970540ac6a2c894b0d6ef3f5d485860884
> > 
> [..]
> > and here:
> > http://www.spinics.net/lists/dm-devel/msg23415.html
> > 
> > 
> > is there any chance to ask you to test with these patches (no rush, take your time)?
> > as I'm still unable to reproduce it locally.
> 
> pathes attached.

It's good timing. When I'm about to dig in, I got your patches
and tested it.

Passed my test script!

BTW, patches you sent landed on mainline?

Anyway, Thanks!

> 
> 	-ss

> From 7056096317825dbe582d005385bc4077ecb62c76 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Mon, 27 Apr 2015 04:12:22 +1000
> Subject: [PATCH 2/2] block: destroy bdi before blockdev is unregistered.
> 
> Because of the peculiar way that md devices are created (automatically
> when the device node is opened), a new device can be created and
> registered immediately after the
> 	blk_unregister_region(disk_devt(disk), disk->minors);
> call in del_gendisk().
> 
> Therefore it is important that all visible artifacts of the previous
> device are removed before this call.  In particular, the 'bdi'.
> 
> Since:
> commit c4db59d31e39ea067c32163ac961e9c80198fd37
> Author: Christoph Hellwig <hch@lst.de>
>     fs: don't reassign dirty inodes to default_backing_dev_info
> 
> moved the
>    device_unregister(bdi->dev);
> call from bdi_unregister() to bdi_destroy() it has been quite easy to
> lose a race and have a new (e.g.) "md127" be created after the
> blk_unregister_region() call and before bdi_destroy() is ultimately
> called by the final 'put_disk', which must come after del_gendisk().
> 
> The new device finds that the bdi name is already registered in sysfs
> and complains
> 
> > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
> 
> We can fix this by moving the bdi_destroy() call out of
> blk_release_queue() (which can happen very late when a refcount
> reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> device driver calls it.
> 
> Then it is only necessary for md to call blk_cleanup_queue() before
> del_gendisk().  As loop.c devices are also created on demand by
> opening the device node, we make the same change there.
> 
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: stable@vger.kernel.org (v4.0)
> Signed-off-by: NeilBrown <neilb@suse.de>
> Reviewed-by: Christoph Hellwig <hch@lst.de>
> Signed-off-by: Jens Axboe <axboe@fb.com>
> ---
>  block/blk-core.c     | 2 ++
>  block/blk-sysfs.c    | 2 --
>  drivers/block/loop.c | 2 +-
>  drivers/md/md.c      | 4 ++--
>  4 files changed, 5 insertions(+), 5 deletions(-)
> 
> diff --git a/block/blk-core.c b/block/blk-core.c
> index fd154b9..7871603 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
>  		q->queue_lock = &q->__queue_lock;
>  	spin_unlock_irq(lock);
>  
> +	bdi_destroy(&q->backing_dev_info);
> +
>  	/* @q is and will stay empty, shutdown and put */
>  	blk_put_queue(q);
>  }
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> index faaf36a..2b8fd30 100644
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
>  
>  	blk_trace_shutdown(q);
>  
> -	bdi_destroy(&q->backing_dev_info);
> -
>  	ida_simple_remove(&blk_queue_ida, q->id);
>  	call_rcu(&q->rcu_head, blk_free_queue_rcu);
>  }
> diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> index ae3fcb4..d7173cb 100644
> --- a/drivers/block/loop.c
> +++ b/drivers/block/loop.c
> @@ -1620,8 +1620,8 @@ out:
>  
>  static void loop_remove(struct loop_device *lo)
>  {
> -	del_gendisk(lo->lo_disk);
>  	blk_cleanup_queue(lo->lo_queue);
> +	del_gendisk(lo->lo_disk);
>  	blk_mq_free_tag_set(&lo->tag_set);
>  	put_disk(lo->lo_disk);
>  	kfree(lo);
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index d4f31e1..593a024 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
>  	if (mddev->sysfs_state)
>  		sysfs_put(mddev->sysfs_state);
>  
> +	if (mddev->queue)
> +		blk_cleanup_queue(mddev->queue);
>  	if (mddev->gendisk) {
>  		del_gendisk(mddev->gendisk);
>  		put_disk(mddev->gendisk);
>  	}
> -	if (mddev->queue)
> -		blk_cleanup_queue(mddev->queue);
>  
>  	kfree(mddev);
>  }
> -- 
> 2.4.0
> 

> From ef8878259ce360ec352b591746761249393cf952 Mon Sep 17 00:00:00 2001
> From: NeilBrown <neilb@suse.de>
> Date: Thu, 7 May 2015 21:49:41 +0900
> Subject: [PATCH 1/2] block: bdi_unregister() now contains very little
>  functionality.
> 
> It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
> real consequence as bdi->dev isn't needed by anything else in the function,
> and it triggers if
>    blk_cleanup_queue() -> bdi_destroy()
> is called before bdi_unregister, which a subsequent patch will make happen.
> So this isn't wanted.
> 
> It also calls bdi_set_min_ratio().  This needs to be called after
> writes through the bdi have all been flushed, and before the bdi is destroyed.
> Calling it early is better than calling it late as it frees up a global
> resource.
> 
> Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
> perfectly fits these requirements.
> 
> So bdi_unregister can be discarded with the important content moved to
> bdi_destroy, as can the
>   writeback_bdi_unregister
> event which is already not used.
> 
> This is tagged for 'stable' as it is a pre-requisite for a subsequent
> patch which moves calls to blk_cleanup_queue() before calls to
> del_gendisk().  The commit identified as 'Fixes' removed a lot of
> other functionality from bdi_unregister(), and made a change which
> necessitated moving the blk_cleanup_queue() calls.
> 
> Reported-by: Mike Snitzer <snitzer@redhat.com>
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Cc: stable@vger.kernel.org (v4.0)
> Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
>  block/genhd.c                    |  1 -
>  include/linux/backing-dev.h      |  1 -
>  include/trace/events/writeback.h |  1 -
>  mm/backing-dev.c                 | 18 +-----------------
>  4 files changed, 1 insertion(+), 20 deletions(-)
> 
> diff --git a/block/genhd.c b/block/genhd.c
> index 64600e9..2fba82c 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -653,7 +653,6 @@ void del_gendisk(struct gendisk *disk)
>  	disk->flags &= ~GENHD_FL_UP;
>  
>  	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
> -	bdi_unregister(&disk->queue->backing_dev_info);
>  	blk_unregister_queue(disk);
>  	blk_unregister_region(disk_devt(disk), disk->minors);
>  
> diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> index aff923a..d87d8ec 100644
> --- a/include/linux/backing-dev.h
> +++ b/include/linux/backing-dev.h
> @@ -116,7 +116,6 @@ __printf(3, 4)
>  int bdi_register(struct backing_dev_info *bdi, struct device *parent,
>  		const char *fmt, ...);
>  int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
> -void bdi_unregister(struct backing_dev_info *bdi);
>  int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
>  void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
>  			enum wb_reason reason);
> diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> index 880dd74..c178d13 100644
> --- a/include/trace/events/writeback.h
> +++ b/include/trace/events/writeback.h
> @@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
>  DEFINE_WRITEBACK_EVENT(writeback_nowork);
>  DEFINE_WRITEBACK_EVENT(writeback_wake_background);
>  DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
> -DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
>  
>  DECLARE_EVENT_CLASS(wbc_class,
>  	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 6dc4580..000e7b3 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
>  	flush_delayed_work(&bdi->wb.dwork);
>  }
>  
> -/*
> - * Called when the device behind @bdi has been removed or ejected.
> - *
> - * We can't really do much here except for reducing the dirty ratio at
> - * the moment.  In the future we should be able to set a flag so that
> - * the filesystem can handle errors at mark_inode_dirty time instead
> - * of only at writeback time.
> - */
> -void bdi_unregister(struct backing_dev_info *bdi)
> -{
> -	if (WARN_ON_ONCE(!bdi->dev))
> -		return;
> -
> -	bdi_set_min_ratio(bdi, 0);
> -}
> -EXPORT_SYMBOL(bdi_unregister);
> -
>  static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
>  {
>  	memset(wb, 0, sizeof(*wb));
> @@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
>  	int i;
>  
>  	bdi_wb_shutdown(bdi);
> +	bdi_set_min_ratio(bdi, 0);
>  
>  	WARN_ON(!list_empty(&bdi->work_list));
>  	WARN_ON(delayed_work_pending(&bdi->wb.dwork));
> -- 
> 2.4.0
> 


-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-07 15:11                     ` Minchan Kim
@ 2015-05-08  0:15                       ` Sergey Senozhatsky
  2015-05-10  8:47                       ` Sergey Senozhatsky
  1 sibling, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-08  0:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

On (05/08/15 00:11), Minchan Kim wrote:
> It's good timing. When I'm about to dig in, I got your patches
> and tested it.
> 
> Passed my test script!
> 
> BTW, patches you sent landed on mainline?
> 
> Anyway, Thanks!
> 

Hello Minchan,

thanks a ton!

not mainlined yet, to the best of my knowledge. will keep an eye on these patches.


I'll rename zram-contol interface and resubmit zram hot add/remove
patchset today.
thanks.

	-ss

> > From 7056096317825dbe582d005385bc4077ecb62c76 Mon Sep 17 00:00:00 2001
> > From: NeilBrown <neilb@suse.de>
> > Date: Mon, 27 Apr 2015 04:12:22 +1000
> > Subject: [PATCH 2/2] block: destroy bdi before blockdev is unregistered.
> > 
> > Because of the peculiar way that md devices are created (automatically
> > when the device node is opened), a new device can be created and
> > registered immediately after the
> > 	blk_unregister_region(disk_devt(disk), disk->minors);
> > call in del_gendisk().
> > 
> > Therefore it is important that all visible artifacts of the previous
> > device are removed before this call.  In particular, the 'bdi'.
> > 
> > Since:
> > commit c4db59d31e39ea067c32163ac961e9c80198fd37
> > Author: Christoph Hellwig <hch@lst.de>
> >     fs: don't reassign dirty inodes to default_backing_dev_info
> > 
> > moved the
> >    device_unregister(bdi->dev);
> > call from bdi_unregister() to bdi_destroy() it has been quite easy to
> > lose a race and have a new (e.g.) "md127" be created after the
> > blk_unregister_region() call and before bdi_destroy() is ultimately
> > called by the final 'put_disk', which must come after del_gendisk().
> > 
> > The new device finds that the bdi name is already registered in sysfs
> > and complains
> > 
> > > [ 9627.630029] WARNING: CPU: 18 PID: 3330 at fs/sysfs/dir.c:31 sysfs_warn_dup+0x5a/0x70()
> > > [ 9627.630032] sysfs: cannot create duplicate filename '/devices/virtual/bdi/9:127'
> > 
> > We can fix this by moving the bdi_destroy() call out of
> > blk_release_queue() (which can happen very late when a refcount
> > reaches zero) and into blk_cleanup_queue() - which happens exactly when the md
> > device driver calls it.
> > 
> > Then it is only necessary for md to call blk_cleanup_queue() before
> > del_gendisk().  As loop.c devices are also created on demand by
> > opening the device node, we make the same change there.
> > 
> > Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> > Reported-by: Azat Khuzhin <a3at.mail@gmail.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: stable@vger.kernel.org (v4.0)
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > Signed-off-by: Jens Axboe <axboe@fb.com>
> > ---
> >  block/blk-core.c     | 2 ++
> >  block/blk-sysfs.c    | 2 --
> >  drivers/block/loop.c | 2 +-
> >  drivers/md/md.c      | 4 ++--
> >  4 files changed, 5 insertions(+), 5 deletions(-)
> > 
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index fd154b9..7871603 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -552,6 +552,8 @@ void blk_cleanup_queue(struct request_queue *q)
> >  		q->queue_lock = &q->__queue_lock;
> >  	spin_unlock_irq(lock);
> >  
> > +	bdi_destroy(&q->backing_dev_info);
> > +
> >  	/* @q is and will stay empty, shutdown and put */
> >  	blk_put_queue(q);
> >  }
> > diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> > index faaf36a..2b8fd30 100644
> > --- a/block/blk-sysfs.c
> > +++ b/block/blk-sysfs.c
> > @@ -522,8 +522,6 @@ static void blk_release_queue(struct kobject *kobj)
> >  
> >  	blk_trace_shutdown(q);
> >  
> > -	bdi_destroy(&q->backing_dev_info);
> > -
> >  	ida_simple_remove(&blk_queue_ida, q->id);
> >  	call_rcu(&q->rcu_head, blk_free_queue_rcu);
> >  }
> > diff --git a/drivers/block/loop.c b/drivers/block/loop.c
> > index ae3fcb4..d7173cb 100644
> > --- a/drivers/block/loop.c
> > +++ b/drivers/block/loop.c
> > @@ -1620,8 +1620,8 @@ out:
> >  
> >  static void loop_remove(struct loop_device *lo)
> >  {
> > -	del_gendisk(lo->lo_disk);
> >  	blk_cleanup_queue(lo->lo_queue);
> > +	del_gendisk(lo->lo_disk);
> >  	blk_mq_free_tag_set(&lo->tag_set);
> >  	put_disk(lo->lo_disk);
> >  	kfree(lo);
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index d4f31e1..593a024 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -4818,12 +4818,12 @@ static void md_free(struct kobject *ko)
> >  	if (mddev->sysfs_state)
> >  		sysfs_put(mddev->sysfs_state);
> >  
> > +	if (mddev->queue)
> > +		blk_cleanup_queue(mddev->queue);
> >  	if (mddev->gendisk) {
> >  		del_gendisk(mddev->gendisk);
> >  		put_disk(mddev->gendisk);
> >  	}
> > -	if (mddev->queue)
> > -		blk_cleanup_queue(mddev->queue);
> >  
> >  	kfree(mddev);
> >  }
> > -- 
> > 2.4.0
> > 
> 
> > From ef8878259ce360ec352b591746761249393cf952 Mon Sep 17 00:00:00 2001
> > From: NeilBrown <neilb@suse.de>
> > Date: Thu, 7 May 2015 21:49:41 +0900
> > Subject: [PATCH 1/2] block: bdi_unregister() now contains very little
> >  functionality.
> > 
> > It contains a "WARN_ON" if bdi->dev is NULL.  This warning is of no
> > real consequence as bdi->dev isn't needed by anything else in the function,
> > and it triggers if
> >    blk_cleanup_queue() -> bdi_destroy()
> > is called before bdi_unregister, which a subsequent patch will make happen.
> > So this isn't wanted.
> > 
> > It also calls bdi_set_min_ratio().  This needs to be called after
> > writes through the bdi have all been flushed, and before the bdi is destroyed.
> > Calling it early is better than calling it late as it frees up a global
> > resource.
> > 
> > Calling it immediately after bdi_wb_shutdown() in bdi_destroy()
> > perfectly fits these requirements.
> > 
> > So bdi_unregister can be discarded with the important content moved to
> > bdi_destroy, as can the
> >   writeback_bdi_unregister
> > event which is already not used.
> > 
> > This is tagged for 'stable' as it is a pre-requisite for a subsequent
> > patch which moves calls to blk_cleanup_queue() before calls to
> > del_gendisk().  The commit identified as 'Fixes' removed a lot of
> > other functionality from bdi_unregister(), and made a change which
> > necessitated moving the blk_cleanup_queue() calls.
> > 
> > Reported-by: Mike Snitzer <snitzer@redhat.com>
> > Cc: Christoph Hellwig <hch@lst.de>
> > Cc: Peter Zijlstra <peterz@infradead.org>
> > Cc: stable@vger.kernel.org (v4.0)
> > Fixes: c4db59d31e39ea067c32163ac961e9c80198fd37
> > Signed-off-by: NeilBrown <neilb@suse.de>
> > ---
> >  block/genhd.c                    |  1 -
> >  include/linux/backing-dev.h      |  1 -
> >  include/trace/events/writeback.h |  1 -
> >  mm/backing-dev.c                 | 18 +-----------------
> >  4 files changed, 1 insertion(+), 20 deletions(-)
> > 
> > diff --git a/block/genhd.c b/block/genhd.c
> > index 64600e9..2fba82c 100644
> > --- a/block/genhd.c
> > +++ b/block/genhd.c
> > @@ -653,7 +653,6 @@ void del_gendisk(struct gendisk *disk)
> >  	disk->flags &= ~GENHD_FL_UP;
> >  
> >  	sysfs_remove_link(&disk_to_dev(disk)->kobj, "bdi");
> > -	bdi_unregister(&disk->queue->backing_dev_info);
> >  	blk_unregister_queue(disk);
> >  	blk_unregister_region(disk_devt(disk), disk->minors);
> >  
> > diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
> > index aff923a..d87d8ec 100644
> > --- a/include/linux/backing-dev.h
> > +++ b/include/linux/backing-dev.h
> > @@ -116,7 +116,6 @@ __printf(3, 4)
> >  int bdi_register(struct backing_dev_info *bdi, struct device *parent,
> >  		const char *fmt, ...);
> >  int bdi_register_dev(struct backing_dev_info *bdi, dev_t dev);
> > -void bdi_unregister(struct backing_dev_info *bdi);
> >  int __must_check bdi_setup_and_register(struct backing_dev_info *, char *);
> >  void bdi_start_writeback(struct backing_dev_info *bdi, long nr_pages,
> >  			enum wb_reason reason);
> > diff --git a/include/trace/events/writeback.h b/include/trace/events/writeback.h
> > index 880dd74..c178d13 100644
> > --- a/include/trace/events/writeback.h
> > +++ b/include/trace/events/writeback.h
> > @@ -250,7 +250,6 @@ DEFINE_EVENT(writeback_class, name, \
> >  DEFINE_WRITEBACK_EVENT(writeback_nowork);
> >  DEFINE_WRITEBACK_EVENT(writeback_wake_background);
> >  DEFINE_WRITEBACK_EVENT(writeback_bdi_register);
> > -DEFINE_WRITEBACK_EVENT(writeback_bdi_unregister);
> >  
> >  DECLARE_EVENT_CLASS(wbc_class,
> >  	TP_PROTO(struct writeback_control *wbc, struct backing_dev_info *bdi),
> > diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> > index 6dc4580..000e7b3 100644
> > --- a/mm/backing-dev.c
> > +++ b/mm/backing-dev.c
> > @@ -359,23 +359,6 @@ static void bdi_wb_shutdown(struct backing_dev_info *bdi)
> >  	flush_delayed_work(&bdi->wb.dwork);
> >  }
> >  
> > -/*
> > - * Called when the device behind @bdi has been removed or ejected.
> > - *
> > - * We can't really do much here except for reducing the dirty ratio at
> > - * the moment.  In the future we should be able to set a flag so that
> > - * the filesystem can handle errors at mark_inode_dirty time instead
> > - * of only at writeback time.
> > - */
> > -void bdi_unregister(struct backing_dev_info *bdi)
> > -{
> > -	if (WARN_ON_ONCE(!bdi->dev))
> > -		return;
> > -
> > -	bdi_set_min_ratio(bdi, 0);
> > -}
> > -EXPORT_SYMBOL(bdi_unregister);
> > -
> >  static void bdi_wb_init(struct bdi_writeback *wb, struct backing_dev_info *bdi)
> >  {
> >  	memset(wb, 0, sizeof(*wb));
> > @@ -443,6 +426,7 @@ void bdi_destroy(struct backing_dev_info *bdi)
> >  	int i;
> >  
> >  	bdi_wb_shutdown(bdi);
> > +	bdi_set_min_ratio(bdi, 0);
> >  
> >  	WARN_ON(!list_empty(&bdi->work_list));
> >  	WARN_ON(delayed_work_pending(&bdi->wb.dwork));
> > -- 
> > 2.4.0
> > 
> 
> 
> -- 
> Kind regards,
> Minchan Kim
> 

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv4 00/10] add on-demand device creation
  2015-05-07 15:11                     ` Minchan Kim
  2015-05-08  0:15                       ` Sergey Senozhatsky
@ 2015-05-10  8:47                       ` Sergey Senozhatsky
  1 sibling, 0 replies; 26+ messages in thread
From: Sergey Senozhatsky @ 2015-05-10  8:47 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Sergey Senozhatsky, Andrew Morton, Nitin Gupta, linux-kernel,
	Sergey Senozhatsky

On (05/08/15 00:11), Minchan Kim wrote:
> 
> BTW, patches you sent landed on mainline?
> 

Hi,

"block: destroy bdi before blockdev is unregistered."

-- mainlined 6cd18e711dd8075da9d78cfc1239f912ff28968a


https://lkml.org/lkml/2015/5/8/29

-- not yet. w/o this patch del_gendisk() hits a WARN_ON()
(a known issue https://lkml.org/lkml/2015/4/28/568).

	-ss

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2015-05-10  8:48 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-04 12:38 [PATCHv4 00/10] add on-demand device creation Sergey Senozhatsky
2015-05-04 12:38 ` [PATCHv4 01/10] zram: add `compact` sysfs entry to documentation Sergey Senozhatsky
2015-05-04 12:38 ` [PATCHv4 02/10] zram: cosmetic ZRAM_ATTR_RO code formatting tweak Sergey Senozhatsky
2015-05-04 12:38 ` [PATCHv4 03/10] zram: use idr instead of `zram_devices' array Sergey Senozhatsky
2015-05-04 12:38 ` [PATCHv4 04/10] zram: reorganize code layout Sergey Senozhatsky
2015-05-04 12:38 ` [PATCHv4 05/10] zram: remove max_num_devices limitation Sergey Senozhatsky
2015-05-04 12:38 ` [PATCHv4 06/10] zram: report every added and removed device Sergey Senozhatsky
2015-05-04 12:38 ` [PATCHv4 07/10] zram: trivial: correct flag operations comment Sergey Senozhatsky
2015-05-04 12:39 ` [PATCHv4 08/10] zram: return zram device_id from zram_add() Sergey Senozhatsky
2015-05-04 12:39 ` [PATCHv4 09/10] zram: close race by open overriding Sergey Senozhatsky
2015-05-04 12:39 ` [PATCHv4 10/10] zram: add dynamic device add/remove functionality Sergey Senozhatsky
2015-05-06  5:01 ` [PATCHv4 00/10] add on-demand device creation Minchan Kim
2015-05-06  5:25   ` Sergey Senozhatsky
2015-05-06  6:52     ` Minchan Kim
2015-05-06  7:07     ` Sergey Senozhatsky
2015-05-06  7:28       ` Minchan Kim
2015-05-06  8:10         ` Sergey Senozhatsky
2015-05-06  8:20           ` Minchan Kim
2015-05-07  0:33             ` Sergey Senozhatsky
2015-05-07  7:41               ` Minchan Kim
2015-05-07 12:09                 ` Sergey Senozhatsky
2015-05-07 13:04                   ` Sergey Senozhatsky
2015-05-07 15:11                     ` Minchan Kim
2015-05-08  0:15                       ` Sergey Senozhatsky
2015-05-10  8:47                       ` Sergey Senozhatsky
2015-05-06  5:31 ` Sergey Senozhatsky

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).