All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/6] a caching layer for raid5/6
@ 2015-05-19  2:57 Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 1/6] MD: add a new disk role to present cache device Shaohua Li
                   ` (6 more replies)
  0 siblings, 7 replies; 8+ messages in thread
From: Shaohua Li @ 2015-05-19  2:57 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, songliubraving, hch, dan.j.williams, neilb

Hi,

This is the second version of the raid5/6 caching layer patches. The patches add a
caching layer for raid5/6. The caching layer uses a SSD as a cache for a raid
5/6. It works like the similar way of a hardware raid controller. The purpose
is to improve raid performance (reduce read-modify-write) and fix write hole
issue. The main patch is patch 3 and the description has all details about the
implementation.

Main changes of V2 are to improve performance. Meta data write doesn't use FUA
any more. Discard request is only dispatched when discard range is big enough.
Also have some bug fixing and code cleanup. Please review!

Thanks,
Shaohua


Shaohua Li (5):
  raid5: directly use mddev->queue
  raid5: A caching layer for RAID5/6
  raid5: add some sysfs entries
  md: don't allow resize/reshape with cache support
  raid5: skip resync if caching is enabled

Song Liu (1):
  MD: add a new disk role to present cache device

 drivers/md/Makefile            |    2 +-
 drivers/md/md.c                |   14 +-
 drivers/md/md.h                |    4 +
 drivers/md/raid5-cache.c       | 3519 ++++++++++++++++++++++++++++++++++++++++
 drivers/md/raid5.c             |   97 +-
 drivers/md/raid5.h             |   16 +-
 include/uapi/linux/raid/md_p.h |   73 +
 7 files changed, 3705 insertions(+), 20 deletions(-)
 create mode 100644 drivers/md/raid5-cache.c

-- 
1.8.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v2 1/6] MD: add a new disk role to present cache device
  2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
@ 2015-05-19  2:57 ` Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 2/6] raid5: directly use mddev->queue Shaohua Li
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2015-05-19  2:57 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, songliubraving, hch, dan.j.williams, neilb

From: Song Liu <songliubraving@fb.com>

Next patches will use a disk as raid5/6 caching. We need a new disk role
to present the cache device

Not sure if we should bump up the MD superblock version for the disk
role.

Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/md.c                | 14 +++++++++++++-
 drivers/md/md.h                |  4 ++++
 include/uapi/linux/raid/md_p.h |  1 +
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index d4f31e1..b6ece48 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -1656,6 +1656,9 @@ static int super_1_validate(struct mddev *mddev, struct md_rdev *rdev)
 		case 0xfffe: /* faulty */
 			set_bit(Faulty, &rdev->flags);
 			break;
+		case 0xfffd: /* cache device */
+			set_bit(WriteCache, &rdev->flags);
+			break;
 		default:
 			rdev->saved_raid_disk = role;
 			if ((le32_to_cpu(sb->feature_map) &
@@ -1811,6 +1814,8 @@ static void super_1_sync(struct mddev *mddev, struct md_rdev *rdev)
 			sb->dev_roles[i] = cpu_to_le16(0xfffe);
 		else if (test_bit(In_sync, &rdev2->flags))
 			sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
+		else if (test_bit(WriteCache, &rdev2->flags))
+			sb->dev_roles[i] = cpu_to_le16(0xfffd);
 		else if (rdev2->raid_disk >= 0)
 			sb->dev_roles[i] = cpu_to_le16(rdev2->raid_disk);
 		else
@@ -5778,7 +5783,8 @@ static int get_disk_info(struct mddev *mddev, void __user * arg)
 		else if (test_bit(In_sync, &rdev->flags)) {
 			info.state |= (1<<MD_DISK_ACTIVE);
 			info.state |= (1<<MD_DISK_SYNC);
-		}
+		} else if (test_bit(WriteCache, &rdev->flags))
+			info.state |= (1<<MD_DISK_WRITECACHE);
 		if (test_bit(WriteMostly, &rdev->flags))
 			info.state |= (1<<MD_DISK_WRITEMOSTLY);
 	} else {
@@ -5893,6 +5899,8 @@ static int add_new_disk(struct mddev *mddev, mdu_disk_info_t *info)
 		else
 			clear_bit(WriteMostly, &rdev->flags);
 
+		if (info->state & (1<<MD_DISK_WRITECACHE))
+			set_bit(WriteCache, &rdev->flags);
 		/*
 		 * check whether the device shows up in other nodes
 		 */
@@ -7261,6 +7269,10 @@ static int md_seq_show(struct seq_file *seq, void *v)
 				seq_printf(seq, "(F)");
 				continue;
 			}
+			if (test_bit(WriteCache, &rdev->flags)) {
+				seq_printf(seq, "(C)");
+				continue;
+			}
 			if (rdev->raid_disk < 0)
 				seq_printf(seq, "(S)"); /* spare */
 			if (test_bit(Replacement, &rdev->flags))
diff --git a/drivers/md/md.h b/drivers/md/md.h
index 4046a6c..6857592 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -175,6 +175,10 @@ enum flag_bits {
 				 * This device is seen locally but not
 				 * by the whole cluster
 				 */
+	WriteCache,		/* This device is used as write cache.
+				 * Usually, this device should be faster
+				 * than other devices in the array
+				 */
 };
 
 #define BB_LEN_MASK	(0x00000000000001FFULL)
diff --git a/include/uapi/linux/raid/md_p.h b/include/uapi/linux/raid/md_p.h
index 2ae6131..9d36b91 100644
--- a/include/uapi/linux/raid/md_p.h
+++ b/include/uapi/linux/raid/md_p.h
@@ -89,6 +89,7 @@
 				   * read requests will only be sent here in
 				   * dire need
 				   */
+#define MD_DISK_WRITECACHE      18 /* disk is used as the write cache in RAID-5/6 */
 
 typedef struct mdp_device_descriptor_s {
 	__u32 number;		/* 0 Device number in the entire set	      */
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 2/6] raid5: directly use mddev->queue
  2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 1/6] MD: add a new disk role to present cache device Shaohua Li
@ 2015-05-19  2:57 ` Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 4/6] raid5: add some sysfs entries Shaohua Li
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2015-05-19  2:57 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, songliubraving, hch, dan.j.williams, neilb

When the cache layer flushes data from cache disk to raid disks, it will
dipsatch IO to raid disks. At that time, we don't have a block device
attached to the bio, so directly use mddev->queue. That should not
impact IO dispatched to rdev, which has rdev block device attached.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/raid5.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 77dfd72..950c3c6 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -223,7 +223,7 @@ static int raid6_idx_to_slot(int idx, struct stripe_head *sh,
 	return slot;
 }
 
-static void return_io(struct bio *return_bi)
+static void return_io(struct r5conf *conf, struct bio *return_bi)
 {
 	struct bio *bi = return_bi;
 	while (bi) {
@@ -231,8 +231,7 @@ static void return_io(struct bio *return_bi)
 		return_bi = bi->bi_next;
 		bi->bi_next = NULL;
 		bi->bi_iter.bi_size = 0;
-		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-					 bi, 0);
+		trace_block_bio_complete(conf->mddev->queue, bi, 0);
 		bio_endio(bi, 0);
 		bi = return_bi;
 	}
@@ -1193,7 +1192,7 @@ static void ops_complete_biofill(void *stripe_head_ref)
 	}
 	clear_bit(STRIPE_BIOFILL_RUN, &sh->state);
 
-	return_io(return_bi);
+	return_io(sh->raid_conf, return_bi);
 
 	set_bit(STRIPE_HANDLE, &sh->state);
 	release_stripe(sh);
@@ -4563,7 +4562,7 @@ static void handle_stripe(struct stripe_head *sh)
 			md_wakeup_thread(conf->mddev->thread);
 	}
 
-	return_io(s.return_bi);
+	return_io(conf, s.return_bi);
 
 	clear_bit_unlock(STRIPE_ACTIVE, &sh->state);
 }
@@ -5267,8 +5266,7 @@ static void make_request(struct mddev *mddev, struct bio * bi)
 		if ( rw == WRITE )
 			md_write_end(mddev);
 
-		trace_block_bio_complete(bdev_get_queue(bi->bi_bdev),
-					 bi, 0);
+		trace_block_bio_complete(mddev->queue, bi, 0);
 		bio_endio(bi, 0);
 	}
 }
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 4/6] raid5: add some sysfs entries
  2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 1/6] MD: add a new disk role to present cache device Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 2/6] raid5: directly use mddev->queue Shaohua Li
@ 2015-05-19  2:57 ` Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 5/6] md: don't allow resize/reshape with cache support Shaohua Li
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2015-05-19  2:57 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, songliubraving, hch, dan.j.williams, neilb

Add some sysfs entries.
-cache_memory. Control the cache memory size.
-cache_reclaim_batch. Control how many stripes reclaim should run in one
time.
-cache_memory_watermark. The background reclaim runs if cache memory
hits the watermark and stops after hit 1.5x of the watermark.
-cache_disk_watermark. The background reclaim runs if cache disk space
hits the watermark and stops after hit 1.5x of the watermark.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/raid5-cache.c | 253 ++++++++++++++++++++++++++++++++++++++++++++++-
 drivers/md/raid5.c       |   3 +
 drivers/md/raid5.h       |   1 +
 3 files changed, 256 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5-cache.c b/drivers/md/raid5-cache.c
index 4ea24cb..b93d29a 100644
--- a/drivers/md/raid5-cache.c
+++ b/drivers/md/raid5-cache.c
@@ -314,6 +314,12 @@ static inline int r5l_page_blocks(struct r5l_log *log, int pages)
 	return pages << log->page_block_shift;
 }
 
+static inline int r5l_max_flush_stripes(struct r5l_log *log)
+{
+	return (log->block_size - sizeof(struct r5l_flush_block)) /
+		sizeof(__le64);
+}
+
 static u32 r5l_calculate_checksum(struct r5l_log *log, u32 crc,
 	void *buf, size_t size, bool data)
 {
@@ -3124,6 +3130,247 @@ static int r5c_shrink_cache_memory(struct r5c_cache *cache, unsigned long size)
 	return 0;
 }
 
+static ssize_t r5c_show_cache_memory(struct mddev *mddev, char *page)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+
+	return sprintf(page, "%lld\n", cache->max_pages << PAGE_SHIFT);
+}
+
+static ssize_t r5c_store_cache_memory(struct mddev *mddev, const char *page,
+	size_t len)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+	unsigned long new;
+	LIST_HEAD(page_list);
+	u64 i;
+
+	if (len >= PAGE_SIZE)
+		return -EINVAL;
+	if (kstrtoul(page, 0, &new))
+		return -EINVAL;
+	new >>= PAGE_SHIFT;
+
+	if (new > cache->max_pages) {
+		i = cache->max_pages;
+		while (i < new) {
+			struct page *page = alloc_page(GFP_KERNEL);
+
+			if (!page)
+				break;
+			list_add(&page->lru, &page_list);
+			i++;
+		}
+
+		spin_lock_irq(&cache->pool_lock);
+		list_splice(&page_list, &cache->page_pool);
+		cache->free_pages += i - cache->max_pages;
+		cache->max_pages = i;
+		cache->total_pages = i;
+		r5c_calculate_watermark(cache);
+		spin_unlock_irq(&cache->pool_lock);
+		return len;
+	}
+	r5c_shrink_cache_memory(cache, new);
+	return len;
+}
+
+static struct md_sysfs_entry r5c_cache_memory = __ATTR(cache_memory,
+	S_IRUGO | S_IWUSR, r5c_show_cache_memory, r5c_store_cache_memory);
+
+int r5c_min_stripe_cache_size(struct r5c_cache *cache)
+{
+	struct r5conf *conf = cache->mddev->private;
+	return (conf->chunk_sectors >> PAGE_SECTOR_SHIFT) *
+		cache->reclaim_batch;
+}
+
+static void r5c_set_reclaim_batch(struct r5c_cache *cache, int batch)
+{
+	struct mddev *mddev = cache->mddev;
+	struct r5conf *conf = mddev->private;
+	int size;
+
+	size = (cache->stripe_parity_pages << PAGE_SECTOR_SHIFT) * batch;
+	if (size > cache->reserved_space) {
+		cache->reserved_space = size;
+		mutex_lock(&cache->log.io_mutex);
+		cache->log.reserved_blocks = r5l_sector_to_block(&cache->log,
+			cache->reserved_space) + 1;
+		mutex_unlock(&cache->log.io_mutex);
+		r5c_wake_wait_reclaimer(cache,
+				RECLAIM_DISK_BACKGROUND);
+	} else {
+		mutex_lock(&cache->log.io_mutex);
+		cache->log.reserved_blocks -= r5l_sector_to_block(&cache->log,
+			cache->reserved_space - size);
+		mutex_unlock(&cache->log.io_mutex);
+		cache->reserved_space = size;
+	}
+
+	size = (conf->chunk_sectors >> PAGE_SECTOR_SHIFT) * batch;
+
+	mddev_lock(mddev);
+	if (size > conf->max_nr_stripes)
+		raid5_set_cache_size(mddev, size);
+	mddev_unlock(mddev);
+
+	cache->reclaim_batch = batch;
+}
+
+static ssize_t r5c_show_cache_reclaim_batch(struct mddev *mddev, char *page)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+
+	return sprintf(page, "%d\n", cache->reclaim_batch);
+}
+
+static ssize_t r5c_store_cache_reclaim_batch(struct mddev *mddev,
+	const char *page, size_t len)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+	unsigned long new;
+
+	if (len >= PAGE_SIZE)
+		return -EINVAL;
+	if (kstrtoul(page, 0, &new))
+		return -EINVAL;
+
+	if (new > r5l_max_flush_stripes(&cache->log))
+		new = r5l_max_flush_stripes(&cache->log);
+
+	if (new != cache->reclaim_batch)
+		r5c_set_reclaim_batch(cache, new);
+	return len;
+}
+
+static struct md_sysfs_entry r5c_cache_reclaim_batch =
+	__ATTR(cache_reclaim_batch, S_IRUGO | S_IWUSR,
+	r5c_show_cache_reclaim_batch, r5c_store_cache_reclaim_batch);
+
+static ssize_t r5c_show_cache_disk_watermark(struct mddev *mddev, char *page)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+
+	return sprintf(page, "%lld\n", cache->log.low_watermark *
+		cache->log.block_size);
+}
+
+static ssize_t r5c_store_cache_disk_watermark(struct mddev *mddev,
+	const char *page, size_t len)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+	struct r5l_log *log = &cache->log;
+	unsigned long new;
+
+	if (len >= PAGE_SIZE)
+		return -EINVAL;
+	if (kstrtoul(page, 0, &new))
+		return -EINVAL;
+	new /= log->block_size;
+
+	if (new * 3 / 2 >= log->total_blocks)
+		return -EINVAL;
+
+	mutex_lock(&log->io_mutex);
+	log->low_watermark = new;
+	log->high_watermark = new * 3 / 2;
+	mutex_unlock(&log->io_mutex);
+	return len;
+}
+
+static struct md_sysfs_entry r5c_cache_disk_watermark =
+	__ATTR(cache_disk_watermark, S_IRUGO | S_IWUSR,
+	r5c_show_cache_disk_watermark, r5c_store_cache_disk_watermark);
+
+static ssize_t r5c_show_cache_memory_watermark(struct mddev *mddev, char *page)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+
+	return sprintf(page, "%lld\n", cache->low_watermark << PAGE_SHIFT);
+}
+
+static ssize_t r5c_store_cache_memory_watermark(struct mddev *mddev,
+	const char *page, size_t len)
+{
+	struct r5conf *conf = mddev->private;
+	struct r5c_cache *cache = conf->cache;
+	unsigned long new;
+
+	if (len >= PAGE_SIZE)
+		return -EINVAL;
+	if (kstrtoul(page, 0, &new))
+		return -EINVAL;
+	new >>= PAGE_SHIFT;
+
+	if (new * 3 / 2 >= cache->max_pages)
+		return -EINVAL;
+
+	spin_lock_irq(&cache->pool_lock);
+	cache->low_watermark = new;
+	cache->high_watermark = new * 3 / 2;
+	spin_unlock_irq(&cache->pool_lock);
+	return len;
+}
+
+static struct md_sysfs_entry r5c_cache_memory_watermark =
+	__ATTR(cache_memory_watermark, S_IRUGO | S_IWUSR,
+	r5c_show_cache_memory_watermark, r5c_store_cache_memory_watermark);
+
+static int r5c_init_sysfs(struct r5c_cache *cache)
+{
+	struct mddev *mddev = cache->mddev;
+	int ret;
+
+	ret = sysfs_add_file_to_group(&mddev->kobj, &r5c_cache_memory.attr,
+				      NULL);
+	if (ret)
+		return ret;
+	ret = sysfs_add_file_to_group(&mddev->kobj,
+				      &r5c_cache_reclaim_batch.attr, NULL);
+	if (ret)
+		goto err_reclaim;
+	ret = sysfs_add_file_to_group(&mddev->kobj,
+				      &r5c_cache_disk_watermark.attr, NULL);
+	if (ret)
+		goto disk_watermark;
+	ret = sysfs_add_file_to_group(&mddev->kobj,
+				      &r5c_cache_memory_watermark.attr, NULL);
+	if (ret)
+		goto memory_watermark;
+	return 0;
+memory_watermark:
+	sysfs_remove_file_from_group(&mddev->kobj,
+		&r5c_cache_disk_watermark.attr, NULL);
+disk_watermark:
+	sysfs_remove_file_from_group(&mddev->kobj,
+		&r5c_cache_reclaim_batch.attr, NULL);
+err_reclaim:
+	sysfs_remove_file_from_group(&mddev->kobj,
+		&r5c_cache_memory.attr, NULL);
+	return ret;
+}
+
+static void r5c_exit_sysfs(struct r5c_cache *cache)
+{
+	struct mddev *mddev = cache->mddev;
+	sysfs_remove_file_from_group(&mddev->kobj,
+		&r5c_cache_reclaim_batch.attr, NULL);
+	sysfs_remove_file_from_group(&mddev->kobj,
+		&r5c_cache_memory.attr, NULL);
+	sysfs_remove_file_from_group(&mddev->kobj,
+		&r5c_cache_disk_watermark.attr, NULL);
+	sysfs_remove_file_from_group(&mddev->kobj,
+		&r5c_cache_memory_watermark.attr, NULL);
+}
+
 static void r5c_free_cache_data(struct r5c_cache *cache)
 {
 	struct r5c_stripe *stripe;
@@ -3234,8 +3481,11 @@ struct r5c_cache *r5c_init_cache(struct r5conf *conf, struct md_rdev *rdev)
 	cache->reclaim_thread->timeout = CHECKPOINT_TIMEOUT;
 
 	r5c_shrink_cache_memory(cache, cache->max_pages);
-
+	if (r5c_init_sysfs(cache))
+		goto err_sysfs;
 	return cache;
+err_sysfs:
+	md_unregister_thread(&cache->reclaim_thread);
 err_page:
 	r5c_free_cache_data(cache);
 
@@ -3254,6 +3504,7 @@ struct r5c_cache *r5c_init_cache(struct r5conf *conf, struct md_rdev *rdev)
 
 void r5c_exit_cache(struct r5c_cache *cache)
 {
+	r5c_exit_sysfs(cache);
 	md_unregister_thread(&cache->reclaim_thread);
 	r5l_exit_log(&cache->log);
 
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4eb6e99..772b65f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5876,6 +5876,9 @@ raid5_set_cache_size(struct mddev *mddev, int size)
 	if (size <= 16 || size > 32768)
 		return -EINVAL;
 
+	if (conf->cache && size < r5c_min_stripe_cache_size(conf->cache))
+		size = r5c_min_stripe_cache_size(conf->cache);
+
 	conf->min_nr_stripes = size;
 	while (size < conf->max_nr_stripes &&
 	       drop_one_stripe(conf))
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index e4e93bb..899ec79 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -614,4 +614,5 @@ void r5c_exit_cache(struct r5c_cache *cache);
 void r5c_write_start(struct mddev *mddev, struct bio *bi);
 void r5c_write_end(struct mddev *mddev, struct bio *bi);
 void r5c_quiesce(struct r5conf *conf, int state);
+int r5c_min_stripe_cache_size(struct r5c_cache *cache);
 #endif
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 5/6] md: don't allow resize/reshape with cache support
  2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
                   ` (2 preceding siblings ...)
  2015-05-19  2:57 ` [PATCH v2 4/6] raid5: add some sysfs entries Shaohua Li
@ 2015-05-19  2:57 ` Shaohua Li
  2015-05-19  2:57 ` [PATCH v2 6/6] raid5: skip resync if caching is enabled Shaohua Li
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2015-05-19  2:57 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, songliubraving, hch, dan.j.williams, neilb

If cache support is enabled, don't allow resize/reshape in current
stage. In the future, we can flush all data from cache to raid before
resize/reshape and then allow resize/reshape.

Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/raid5.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 772b65f..53f582d 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -7173,6 +7173,10 @@ static int raid5_resize(struct mddev *mddev, sector_t sectors)
 	 * worth it.
 	 */
 	sector_t newsize;
+	struct r5conf *conf = mddev->private;
+
+	if (conf->cache)
+		return -EINVAL;
 	sectors &= ~((sector_t)mddev->chunk_sectors - 1);
 	newsize = raid5_size(mddev, sectors, mddev->raid_disks);
 	if (mddev->external_size &&
@@ -7224,6 +7228,8 @@ static int check_reshape(struct mddev *mddev)
 {
 	struct r5conf *conf = mddev->private;
 
+	if (conf->cache)
+		return -EINVAL;
 	if (mddev->delta_disks == 0 &&
 	    mddev->new_layout == mddev->layout &&
 	    mddev->new_chunk_sectors == mddev->chunk_sectors)
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v2 6/6] raid5: skip resync if caching is enabled
  2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
                   ` (3 preceding siblings ...)
  2015-05-19  2:57 ` [PATCH v2 5/6] md: don't allow resize/reshape with cache support Shaohua Li
@ 2015-05-19  2:57 ` Shaohua Li
  2015-05-19  9:22 ` [PATCH v2 0/6] a caching layer for raid5/6 Artur Paszkiewicz
  2015-05-20  5:23 ` NeilBrown
  6 siblings, 0 replies; 8+ messages in thread
From: Shaohua Li @ 2015-05-19  2:57 UTC (permalink / raw)
  To: linux-raid; +Cc: Kernel-team, songliubraving, hch, dan.j.williams, neilb

If caching is enabled, the caching layer will guarantee data
consistency, so skip resync for unclean shutdown

Signed-off-by: Shaohua Li <shli@fb.com>
---
 drivers/md/raid5.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 53f582d..52e016f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6941,6 +6941,13 @@ static int run(struct mddev *mddev)
 		if (mddev->queue)
 			blk_queue_logical_block_size(mddev->queue, STRIPE_SIZE);
 		conf->skip_copy = 1;
+
+		if (mddev->recovery_cp == 0) {
+			printk(KERN_NOTICE
+				"md/raid:%s: skip resync with caching enabled\n",
+				mdname(mddev));
+			mddev->recovery_cp = MaxSector;
+		}
 	}
 
 	return 0;
-- 
1.8.1


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/6] a caching layer for raid5/6
  2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
                   ` (4 preceding siblings ...)
  2015-05-19  2:57 ` [PATCH v2 6/6] raid5: skip resync if caching is enabled Shaohua Li
@ 2015-05-19  9:22 ` Artur Paszkiewicz
  2015-05-20  5:23 ` NeilBrown
  6 siblings, 0 replies; 8+ messages in thread
From: Artur Paszkiewicz @ 2015-05-19  9:22 UTC (permalink / raw)
  To: Shaohua Li, linux-raid
  Cc: Kernel-team, songliubraving, hch, dan.j.williams, neilb

On 05/19/2015 04:57 AM, Shaohua Li wrote:
> Hi,
> 
> This is the second version of the raid5/6 caching layer patches. The patches add a
> caching layer for raid5/6. The caching layer uses a SSD as a cache for a raid
> 5/6. It works like the similar way of a hardware raid controller. The purpose
> is to improve raid performance (reduce read-modify-write) and fix write hole
> issue. The main patch is patch 3 and the description has all details about the
> implementation.
> 
> Main changes of V2 are to improve performance. Meta data write doesn't use FUA
> any more. Discard request is only dispatched when discard range is big enough.
> Also have some bug fixing and code cleanup. Please review!

Hi,

It seems patch 3 is missing.

Artur



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v2 0/6] a caching layer for raid5/6
  2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
                   ` (5 preceding siblings ...)
  2015-05-19  9:22 ` [PATCH v2 0/6] a caching layer for raid5/6 Artur Paszkiewicz
@ 2015-05-20  5:23 ` NeilBrown
  6 siblings, 0 replies; 8+ messages in thread
From: NeilBrown @ 2015-05-20  5:23 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-raid, Kernel-team, songliubraving, hch, dan.j.williams

[-- Attachment #1: Type: text/plain, Size: 2033 bytes --]


hi,
 I cannot possible give any consideration to this caching layer until I'm
 happy with the code I got from you for stripe batching, and I'm not.

I asked:

Subject: Re: [PATCH 7/7] md/raid5: fix handling of degraded stripes in batches.
Date: Wed, 13 May 2015 10:56:04 +1000

> What exactly do you expect to happen after the stripes in a batch after they
> have been split up?

and haven't received a reply yet.
Did you not get that email?

Thanks,
NeilBrown



On Mon, 18 May 2015 19:57:28 -0700 Shaohua Li <shli@fb.com> wrote:

> Hi,
> 
> This is the second version of the raid5/6 caching layer patches. The patches add a
> caching layer for raid5/6. The caching layer uses a SSD as a cache for a raid
> 5/6. It works like the similar way of a hardware raid controller. The purpose
> is to improve raid performance (reduce read-modify-write) and fix write hole
> issue. The main patch is patch 3 and the description has all details about the
> implementation.
> 
> Main changes of V2 are to improve performance. Meta data write doesn't use FUA
> any more. Discard request is only dispatched when discard range is big enough.
> Also have some bug fixing and code cleanup. Please review!
> 
> Thanks,
> Shaohua
> 
> 
> Shaohua Li (5):
>   raid5: directly use mddev->queue
>   raid5: A caching layer for RAID5/6
>   raid5: add some sysfs entries
>   md: don't allow resize/reshape with cache support
>   raid5: skip resync if caching is enabled
> 
> Song Liu (1):
>   MD: add a new disk role to present cache device
> 
>  drivers/md/Makefile            |    2 +-
>  drivers/md/md.c                |   14 +-
>  drivers/md/md.h                |    4 +
>  drivers/md/raid5-cache.c       | 3519 ++++++++++++++++++++++++++++++++++++++++
>  drivers/md/raid5.c             |   97 +-
>  drivers/md/raid5.h             |   16 +-
>  include/uapi/linux/raid/md_p.h |   73 +
>  7 files changed, 3705 insertions(+), 20 deletions(-)
>  create mode 100644 drivers/md/raid5-cache.c
> 


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2015-05-20  5:23 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-19  2:57 [PATCH v2 0/6] a caching layer for raid5/6 Shaohua Li
2015-05-19  2:57 ` [PATCH v2 1/6] MD: add a new disk role to present cache device Shaohua Li
2015-05-19  2:57 ` [PATCH v2 2/6] raid5: directly use mddev->queue Shaohua Li
2015-05-19  2:57 ` [PATCH v2 4/6] raid5: add some sysfs entries Shaohua Li
2015-05-19  2:57 ` [PATCH v2 5/6] md: don't allow resize/reshape with cache support Shaohua Li
2015-05-19  2:57 ` [PATCH v2 6/6] raid5: skip resync if caching is enabled Shaohua Li
2015-05-19  9:22 ` [PATCH v2 0/6] a caching layer for raid5/6 Artur Paszkiewicz
2015-05-20  5:23 ` NeilBrown

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.