linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch v2 0/6] Add TRIM support for raid linear/0/1/10
@ 2012-03-16  7:32 Shaohua Li
  2012-03-16  7:32 ` [patch v2 1/6] block: makes bio_split support bio without data Shaohua Li
                   ` (7 more replies)
  0 siblings, 8 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-16  7:32 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, vgoyal, martin.petersen

The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
raid 4/5/6 later. The implementation is pretty straightforward and
self-explained.

v1->v2:
1. fixed a checking issue
2. dropped discard request plug and replace it with no discard merege, because
current SCSI layer can't handle discard request merge.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 1/6] block: makes bio_split support bio without data
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
@ 2012-03-16  7:32 ` Shaohua Li
  2012-03-16  7:32 ` [patch v2 2/6] blk: dont allow discard request merge temporarily Shaohua Li
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-16  7:32 UTC (permalink / raw)
  To: linux-kernel, linux-raid
  Cc: neilb, axboe, vgoyal, martin.petersen, Shaohua Li

[-- Attachment #1: bio_split-fix.patch --]
[-- Type: text/plain, Size: 1638 bytes --]

discard bio hasn't data attached. We hit a BUG_ON with such bio. This makes
bio_split works for such bio.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 fs/bio.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Index: linux/fs/bio.c
===================================================================
--- linux.orig/fs/bio.c	2012-03-09 16:56:41.203790008 +0800
+++ linux/fs/bio.c	2012-03-12 10:10:40.696612399 +0800
@@ -1492,7 +1492,7 @@ struct bio_pair *bio_split(struct bio *b
 	trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
 				bi->bi_sector + first_sectors);
 
-	BUG_ON(bi->bi_vcnt != 1);
+	BUG_ON(bi->bi_vcnt != 1 && bi->bi_vcnt != 0);
 	BUG_ON(bi->bi_idx != 0);
 	atomic_set(&bp->cnt, 3);
 	bp->error = 0;
@@ -1502,17 +1502,19 @@ struct bio_pair *bio_split(struct bio *b
 	bp->bio2.bi_size -= first_sectors << 9;
 	bp->bio1.bi_size = first_sectors << 9;
 
-	bp->bv1 = bi->bi_io_vec[0];
-	bp->bv2 = bi->bi_io_vec[0];
-	bp->bv2.bv_offset += first_sectors << 9;
-	bp->bv2.bv_len -= first_sectors << 9;
-	bp->bv1.bv_len = first_sectors << 9;
+	if (bi->bi_vcnt != 0) {
+		bp->bv1 = bi->bi_io_vec[0];
+		bp->bv2 = bi->bi_io_vec[0];
+		bp->bv2.bv_offset += first_sectors << 9;
+		bp->bv2.bv_len -= first_sectors << 9;
+		bp->bv1.bv_len = first_sectors << 9;
 
-	bp->bio1.bi_io_vec = &bp->bv1;
-	bp->bio2.bi_io_vec = &bp->bv2;
+		bp->bio1.bi_io_vec = &bp->bv1;
+		bp->bio2.bi_io_vec = &bp->bv2;
 
-	bp->bio1.bi_max_vecs = 1;
-	bp->bio2.bi_max_vecs = 1;
+		bp->bio1.bi_max_vecs = 1;
+		bp->bio2.bi_max_vecs = 1;
+	}
 
 	bp->bio1.bi_end_io = bio_pair_end_1;
 	bp->bio2.bi_end_io = bio_pair_end_2;


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 2/6] blk: dont allow discard request merge temporarily
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
  2012-03-16  7:32 ` [patch v2 1/6] block: makes bio_split support bio without data Shaohua Li
@ 2012-03-16  7:32 ` Shaohua Li
  2012-03-20 16:21   ` Vivek Goyal
  2012-03-16  7:32 ` [patch v2 3/6] md: linear supports TRIM Shaohua Li
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 35+ messages in thread
From: Shaohua Li @ 2012-03-16  7:32 UTC (permalink / raw)
  To: linux-kernel, linux-raid
  Cc: neilb, axboe, vgoyal, martin.petersen, Shaohua Li

[-- Attachment #1: blk-discard-nomerge.patch --]
[-- Type: text/plain, Size: 1166 bytes --]

Didn't allow discard request merge temporarily, as SCSI layer isn't ready
for discard merge as Martin Petersen pointed out. This isn't fair for
non-scsi device, but looks this is the only way I can do currently.

We should have the same issue before, but maybe because discard merge is
very rare case. But now raid0/10 makes the merge quite possible, so we need
disable it explicitly.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 include/linux/blkdev.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/include/linux/blkdev.h
===================================================================
--- linux.orig/include/linux/blkdev.h	2012-03-14 09:20:06.787261188 +0800
+++ linux/include/linux/blkdev.h	2012-03-14 09:20:47.797261248 +0800
@@ -575,7 +575,7 @@ static inline void blk_clear_queue_full(
  * it already be started by driver.
  */
 #define RQ_NOMERGE_FLAGS	\
-	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
+	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | REQ_DISCARD)
 #define rq_mergeable(rq)	\
 	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
 	 (((rq)->cmd_flags & REQ_DISCARD) || \


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 3/6] md: linear supports TRIM
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
  2012-03-16  7:32 ` [patch v2 1/6] block: makes bio_split support bio without data Shaohua Li
  2012-03-16  7:32 ` [patch v2 2/6] blk: dont allow discard request merge temporarily Shaohua Li
@ 2012-03-16  7:32 ` Shaohua Li
  2012-03-16  7:32 ` [patch v2 4/6] md: raid 0 " Shaohua Li
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-16  7:32 UTC (permalink / raw)
  To: linux-kernel, linux-raid
  Cc: neilb, axboe, vgoyal, martin.petersen, Shaohua Li

[-- Attachment #1: md-linear-discard-support.patch --]
[-- Type: text/plain, Size: 1621 bytes --]

This makes md linear support TRIM.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/linear.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux/drivers/md/linear.c
===================================================================
--- linux.orig/drivers/md/linear.c	2012-03-14 09:16:37.380435112 +0800
+++ linux/drivers/md/linear.c	2012-03-14 09:23:55.907261497 +0800
@@ -129,6 +129,7 @@ static struct linear_conf *linear_conf(s
 	struct linear_conf *conf;
 	struct md_rdev *rdev;
 	int i, cnt;
+	bool discard_supported = false;
 
 	conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(struct dev_info),
 			GFP_KERNEL);
@@ -171,6 +172,8 @@ static struct linear_conf *linear_conf(s
 		conf->array_sectors += rdev->sectors;
 		cnt++;
 
+		if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+			discard_supported = true;
 	}
 	if (cnt != raid_disks) {
 		printk(KERN_ERR "md/linear:%s: not enough drives present. Aborting!\n",
@@ -178,6 +181,11 @@ static struct linear_conf *linear_conf(s
 		goto out;
 	}
 
+	if (!discard_supported)
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	/*
 	 * Here we calculate the device offsets.
 	 */
@@ -319,6 +327,14 @@ static void linear_make_request(struct m
 	bio->bi_sector = bio->bi_sector - start_sector
 		+ tmp_dev->rdev->data_offset;
 	rcu_read_unlock();
+
+	if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+		!blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
+		/* Just ignore it */
+		bio_endio(bio, 0);
+		return;
+	}
+
 	generic_make_request(bio);
 }
 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 4/6] md: raid 0 supports TRIM
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (2 preceding siblings ...)
  2012-03-16  7:32 ` [patch v2 3/6] md: linear supports TRIM Shaohua Li
@ 2012-03-16  7:32 ` Shaohua Li
  2012-03-16  7:32 ` [patch v2 5/6] md: raid 1 " Shaohua Li
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-16  7:32 UTC (permalink / raw)
  To: linux-kernel, linux-raid
  Cc: neilb, axboe, vgoyal, martin.petersen, Shaohua Li

[-- Attachment #1: md-raid0-discard-support.patch --]
[-- Type: text/plain, Size: 2433 bytes --]

This makes md raid 0 support TRIM.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/raid0.c |   19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

Index: linux/drivers/md/raid0.c
===================================================================
--- linux.orig/drivers/md/raid0.c	2012-03-14 09:16:37.360435113 +0800
+++ linux/drivers/md/raid0.c	2012-03-14 09:32:11.017262138 +0800
@@ -88,6 +88,7 @@ static int create_strip_zones(struct mdd
 	char b[BDEVNAME_SIZE];
 	char b2[BDEVNAME_SIZE];
 	struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+	bool discard_supported = false;
 
 	if (!conf)
 		return -ENOMEM;
@@ -201,6 +202,9 @@ static int create_strip_zones(struct mdd
 		if (!smallest || (rdev1->sectors < smallest->sectors))
 			smallest = rdev1;
 		cnt++;
+
+		if (blk_queue_discard(bdev_get_queue(rdev1->bdev)))
+			discard_supported = true;
 	}
 	if (cnt != mddev->raid_disks) {
 		printk(KERN_ERR "md/raid0:%s: too few disks (%d of %d) - "
@@ -278,6 +282,11 @@ static int create_strip_zones(struct mdd
 	blk_queue_io_opt(mddev->queue,
 			 (mddev->chunk_sectors << 9) * mddev->raid_disks);
 
+	if (!discard_supported)
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	pr_debug("md/raid0:%s: done.\n", mdname(mddev));
 	*private_conf = conf;
 
@@ -348,6 +357,7 @@ static int raid0_run(struct mddev *mddev
 	if (md_check_no_bitmap(mddev))
 		return -EINVAL;
 	blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
+	blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
 
 	/* if private is not null, we are here after takeover */
 	if (mddev->private == NULL) {
@@ -486,7 +496,7 @@ static void raid0_make_request(struct md
 		sector_t sector = bio->bi_sector;
 		struct bio_pair *bp;
 		/* Sanity check -- queue functions should prevent this happening */
-		if (bio->bi_vcnt != 1 ||
+		if ((bio->bi_vcnt != 1 && bio->bi_vcnt != 0) ||
 		    bio->bi_idx != 0)
 			goto bad_map;
 		/* This is a one page bio that upper layers
@@ -512,6 +522,13 @@ static void raid0_make_request(struct md
 	bio->bi_sector = sector_offset + zone->dev_start +
 		tmp_dev->data_offset;
 
+	if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+		!blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
+		/* Just ignore it */
+		bio_endio(bio, 0);
+		return;
+	}
+
 	generic_make_request(bio);
 	return;
 


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 5/6] md: raid 1 supports TRIM
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (3 preceding siblings ...)
  2012-03-16  7:32 ` [patch v2 4/6] md: raid 0 " Shaohua Li
@ 2012-03-16  7:32 ` Shaohua Li
  2012-03-16  7:32 ` [patch v2 6/6] md: raid 10 " Shaohua Li
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-16  7:32 UTC (permalink / raw)
  To: linux-kernel, linux-raid
  Cc: neilb, axboe, vgoyal, martin.petersen, Shaohua Li

[-- Attachment #1: md-raid1-discard-support.patch --]
[-- Type: text/plain, Size: 2829 bytes --]

This makes md raid 1 support TRIM.
If one disk supports discard and another not, or one has discard_zero_data and
another not, there could be inconsistent between data from such disks. But this
should not matter, discarded data is useless. This will add extra copy in rebuild
though.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/raid1.c |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

Index: linux/drivers/md/raid1.c
===================================================================
--- linux.orig/drivers/md/raid1.c	2012-03-12 10:09:42.426612652 +0800
+++ linux/drivers/md/raid1.c	2012-03-12 10:16:50.276610783 +0800
@@ -673,7 +673,12 @@ static void flush_pending_writes(struct
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			bio->bi_next = NULL;
-			generic_make_request(bio);
+			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+				/* Just ignore it */
+				bio_endio(bio, 0);
+			else
+				generic_make_request(bio);
 			bio = next;
 		}
 	} else
@@ -835,6 +840,7 @@ static void make_request(struct mddev *m
 	const int rw = bio_data_dir(bio);
 	const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
 	const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA));
+	const unsigned long do_discard = (bio->bi_rw & (REQ_DISCARD | REQ_SECURE));
 	struct md_rdev *blocked_rdev;
 	int plugged;
 	int first_clone;
@@ -1135,7 +1141,7 @@ read_again:
 				   conf->mirrors[i].rdev->data_offset);
 		mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
 		mbio->bi_end_io	= raid1_end_write_request;
-		mbio->bi_rw = WRITE | do_flush_fua | do_sync;
+		mbio->bi_rw = WRITE | do_flush_fua | do_sync | do_discard;
 		mbio->bi_private = r1_bio;
 
 		atomic_inc(&r1_bio->remaining);
@@ -1371,6 +1377,8 @@ static int raid1_add_disk(struct mddev *
 		}
 	}
 	md_integrity_add_rdev(rdev, mddev);
+	if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
 	print_conf(conf);
 	return err;
 }
@@ -2585,6 +2593,7 @@ static int run(struct mddev *mddev)
 	struct r1conf *conf;
 	int i;
 	struct md_rdev *rdev;
+	bool discard_supported = false;
 
 	if (mddev->level != 1) {
 		printk(KERN_ERR "md/raid1:%s: raid level not set to mirroring (%d)\n",
@@ -2623,8 +2632,16 @@ static int run(struct mddev *mddev)
 			blk_queue_segment_boundary(mddev->queue,
 						   PAGE_CACHE_SIZE - 1);
 		}
+
+		if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+			discard_supported = true;
 	}
 
+	if (discard_supported)
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	mddev->degraded = 0;
 	for (i=0; i < conf->raid_disks; i++)
 		if (conf->mirrors[i].rdev == NULL ||


^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch v2 6/6] md: raid 10 supports TRIM
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (4 preceding siblings ...)
  2012-03-16  7:32 ` [patch v2 5/6] md: raid 1 " Shaohua Li
@ 2012-03-16  7:32 ` Shaohua Li
  2012-03-19 19:38 ` [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Holger Kiehl
  2012-05-08  9:59 ` Patelczyk, Maciej
  7 siblings, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-16  7:32 UTC (permalink / raw)
  To: linux-kernel, linux-raid
  Cc: neilb, axboe, vgoyal, martin.petersen, Shaohua Li

[-- Attachment #1: md-raid10-discard-support.patch --]
[-- Type: text/plain, Size: 3859 bytes --]

This makes md raid 10 support TRIM.
If one disk supports discard and another not, or one has discard_zero_data and
another not, there could be inconsistent between data from such disks. But this
should not matter, discarded data is useless. This will add extra copy in rebuild
though.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/raid10.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

Index: linux/drivers/md/raid10.c
===================================================================
--- linux.orig/drivers/md/raid10.c	2012-03-14 09:16:37.300435113 +0800
+++ linux/drivers/md/raid10.c	2012-03-14 09:33:44.017262275 +0800
@@ -800,7 +800,12 @@ static void flush_pending_writes(struct
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			bio->bi_next = NULL;
-			generic_make_request(bio);
+			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+				/* Just ignore it */
+				bio_endio(bio, 0);
+			else
+				generic_make_request(bio);
 			bio = next;
 		}
 	} else
@@ -926,6 +931,7 @@ static void make_request(struct mddev *m
 	const int rw = bio_data_dir(bio);
 	const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
 	const unsigned long do_fua = (bio->bi_rw & REQ_FUA);
+	const unsigned long do_discard = (bio->bi_rw & (REQ_DISCARD | REQ_SECURE));
 	unsigned long flags;
 	struct md_rdev *blocked_rdev;
 	int plugged;
@@ -945,7 +951,7 @@ static void make_request(struct mddev *m
 		    conf->near_copies < conf->raid_disks)) {
 		struct bio_pair *bp;
 		/* Sanity check -- queue functions should prevent this happening */
-		if (bio->bi_vcnt != 1 ||
+		if ((bio->bi_vcnt != 1 && bio->bi_vcnt !=0) ||
 		    bio->bi_idx != 0)
 			goto bad_map;
 		/* This is a one page bio that upper layers
@@ -1241,7 +1247,7 @@ retry_write:
 				   conf->mirrors[d].rdev->data_offset);
 		mbio->bi_bdev = conf->mirrors[d].rdev->bdev;
 		mbio->bi_end_io	= raid10_end_write_request;
-		mbio->bi_rw = WRITE | do_sync | do_fua;
+		mbio->bi_rw = WRITE | do_sync | do_fua | do_discard;
 		mbio->bi_private = r10_bio;
 
 		atomic_inc(&r10_bio->remaining);
@@ -1266,7 +1272,7 @@ retry_write:
 				   conf->mirrors[d].replacement->data_offset);
 		mbio->bi_bdev = conf->mirrors[d].replacement->bdev;
 		mbio->bi_end_io	= raid10_end_write_request;
-		mbio->bi_rw = WRITE | do_sync | do_fua;
+		mbio->bi_rw = WRITE | do_sync | do_fua | do_discard;
 		mbio->bi_private = r10_bio;
 
 		atomic_inc(&r10_bio->remaining);
@@ -1543,6 +1549,9 @@ static int raid10_add_disk(struct mddev
 	}
 
 	md_integrity_add_rdev(rdev, mddev);
+	if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	print_conf(conf);
 	return err;
 }
@@ -3214,6 +3223,7 @@ static int run(struct mddev *mddev)
 	struct mirror_info *disk;
 	struct md_rdev *rdev;
 	sector_t size;
+	bool discard_supported = false;
 
 	/*
 	 * copy the already verified devices into our private RAID10
@@ -3234,6 +3244,7 @@ static int run(struct mddev *mddev)
 	mddev->thread = conf->thread;
 	conf->thread = NULL;
 
+	blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
 	chunk_size = mddev->chunk_sectors << 9;
 	blk_queue_io_min(mddev->queue, chunk_size);
 	if (conf->raid_disks % conf->near_copies)
@@ -3273,7 +3284,16 @@ static int run(struct mddev *mddev)
 		}
 
 		disk->head_position = 0;
+
+		if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+			discard_supported = true;
 	}
+
+	if (discard_supported)
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	/* need to check that every block has at least one working mirror */
 	if (!enough(conf, -1)) {
 		printk(KERN_ERR "md/raid10:%s: not enough operational mirrors.\n",


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (5 preceding siblings ...)
  2012-03-16  7:32 ` [patch v2 6/6] md: raid 10 " Shaohua Li
@ 2012-03-19 19:38 ` Holger Kiehl
  2012-03-20  1:27   ` Shaohua Li
  2012-05-08  9:59 ` Patelczyk, Maciej
  7 siblings, 1 reply; 35+ messages in thread
From: Holger Kiehl @ 2012-03-19 19:38 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-kernel, linux-raid, neilb, axboe, vgoyal, martin.petersen

Hello,

On Fri, 16 Mar 2012, Shaohua Li wrote:

> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
> raid 4/5/6 later. The implementation is pretty straightforward and
> self-explained.
>
> v1->v2:
> 1. fixed a checking issue
> 2. dropped discard request plug and replace it with no discard merege, because
> current SCSI layer can't handle discard request merge.
>
Have tested TRIM patches on three different systems with the following
hardware/ setup:

    1) root mounted on a raid1 over two SAS SSD's (200GB) and /home partition
       on a raid0 over a fusionio ioDrive Duo. Is very new and seen very
       little usage.

    2) root and /home mounted on a raid0 over two Intel X25 Postville
       (160GB) connected to a Intel P55 Express chipset. Has seen very
       heavy usage for approx. 2 years.

    3) root and /home mounted on a raid0 over three OCZ-VERTEX2 (120GB)
       connected via ICH7 south bridge. Has seen mild usage for approx.
       1.5 years.

Made the following observations when running my own benchmark which
copies around a lot of small files and deletes them. The benchmark on
all systems was always run only on the /home partition ie. on a raid0.

For system 1) there is hardly any measurable differnce whether discard
is enabled or not (~29000 files per second).

On system 2) the performance drops from 6500->3700 files per second,
but under normal usage one does not notice any difference.

System 3) has problems during boot, it is so slow that some operations
receive a timeout during boot:

   udevd[474]: timeout '/sbin/blkid -o udev -p /dev/md0'
   udevd[474]: timeout: killing '/sbin/blkid -o udev -p /dev/md0' [866]
   systemd[1]: dev-md3.swap activation timed out. Stopping.

Even removing discard does not help and the above errors happen during
boot and booting takes a long time.

The performance in the benchmark drops from 4000->600 files per second.

Regards,
Holger

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-19 19:38 ` [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Holger Kiehl
@ 2012-03-20  1:27   ` Shaohua Li
  2012-03-20  9:50     ` Holger Kiehl
  0 siblings, 1 reply; 35+ messages in thread
From: Shaohua Li @ 2012-03-20  1:27 UTC (permalink / raw)
  To: Holger Kiehl
  Cc: linux-kernel, linux-raid, neilb, axboe, vgoyal, martin.petersen

2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
> Hello,
>
>
> On Fri, 16 Mar 2012, Shaohua Li wrote:
>
>> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support
>> for
>> raid 4/5/6 later. The implementation is pretty straightforward and
>> self-explained.
>>
>> v1->v2:
>> 1. fixed a checking issue
>> 2. dropped discard request plug and replace it with no discard merege,
>> because
>> current SCSI layer can't handle discard request merge.
>>
> Have tested TRIM patches on three different systems with the following
> hardware/ setup:
>
>   1) root mounted on a raid1 over two SAS SSD's (200GB) and /home partition
>      on a raid0 over a fusionio ioDrive Duo. Is very new and seen very
>      little usage.
>
>   2) root and /home mounted on a raid0 over two Intel X25 Postville
>      (160GB) connected to a Intel P55 Express chipset. Has seen very
>      heavy usage for approx. 2 years.
>
>   3) root and /home mounted on a raid0 over three OCZ-VERTEX2 (120GB)
>      connected via ICH7 south bridge. Has seen mild usage for approx.
>      1.5 years.
>
> Made the following observations when running my own benchmark which
> copies around a lot of small files and deletes them. The benchmark on
> all systems was always run only on the /home partition ie. on a raid0.
>
> For system 1) there is hardly any measurable differnce whether discard
> is enabled or not (~29000 files per second).
>
> On system 2) the performance drops from 6500->3700 files per second,
> but under normal usage one does not notice any difference.
do you have the blktrace data when the benchmark is running, especially
when doing file deletion. I'd like to check the latency of discard in this case.

> System 3) has problems during boot, it is so slow that some operations
> receive a timeout during boot:
>
>  udevd[474]: timeout '/sbin/blkid -o udev -p /dev/md0'
>  udevd[474]: timeout: killing '/sbin/blkid -o udev -p /dev/md0' [866]
>  systemd[1]: dev-md3.swap activation timed out. Stopping.
In this one, discard request is slow. And per SATA standard, discard request
can't be parallel, so only one request one time, which further slows it down.

> Even removing discard does not help and the above errors happen during
> boot and booting takes a long time.
this doesn't make sense. If you don't mount with discard option, no
discard request
is issued.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-20  1:27   ` Shaohua Li
@ 2012-03-20  9:50     ` Holger Kiehl
  2012-03-20 12:09       ` Shaohua Li
  0 siblings, 1 reply; 35+ messages in thread
From: Holger Kiehl @ 2012-03-20  9:50 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-kernel, linux-raid, neilb, axboe, vgoyal, martin.petersen

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2861 bytes --]

Hello,

On Tue, 20 Mar 2012, Shaohua Li wrote:

> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>> Hello,
>>
>>
>> On Fri, 16 Mar 2012, Shaohua Li wrote:
>>
>>> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support
>>> for
>>> raid 4/5/6 later. The implementation is pretty straightforward and
>>> self-explained.
>>>
>>> v1->v2:
>>> 1. fixed a checking issue
>>> 2. dropped discard request plug and replace it with no discard merege,
>>> because
>>> current SCSI layer can't handle discard request merge.
>>>
>> Have tested TRIM patches on three different systems with the following
>> hardware/ setup:
>>
>>   1) root mounted on a raid1 over two SAS SSD's (200GB) and /home partition
>>      on a raid0 over a fusionio ioDrive Duo. Is very new and seen very
>>      little usage.
>>
>>   2) root and /home mounted on a raid0 over two Intel X25 Postville
>>      (160GB) connected to a Intel P55 Express chipset. Has seen very
>>      heavy usage for approx. 2 years.
>>
>>   3) root and /home mounted on a raid0 over three OCZ-VERTEX2 (120GB)
>>      connected via ICH7 south bridge. Has seen mild usage for approx.
>>      1.5 years.
>>
>> Made the following observations when running my own benchmark which
>> copies around a lot of small files and deletes them. The benchmark on
>> all systems was always run only on the /home partition ie. on a raid0.
>>
>> For system 1) there is hardly any measurable differnce whether discard
>> is enabled or not (~29000 files per second).
>>
>> On system 2) the performance drops from 6500->3700 files per second,
>> but under normal usage one does not notice any difference.
> do you have the blktrace data when the benchmark is running, especially
> when doing file deletion. I'd like to check the latency of discard in this case.
>
It is uploaded on ftp://ftp.dwd.de/pub/afd/test/trim

>> System 3) has problems during boot, it is so slow that some operations
>> receive a timeout during boot:
>>
>>  udevd[474]: timeout '/sbin/blkid -o udev -p /dev/md0'
>>  udevd[474]: timeout: killing '/sbin/blkid -o udev -p /dev/md0' [866]
>>  systemd[1]: dev-md3.swap activation timed out. Stopping.
> In this one, discard request is slow. And per SATA standard, discard request
> can't be parallel, so only one request one time, which further slows it down.
>
Ok, so having three disc in a raid 0 is even worth.

>> Even removing discard does not help and the above errors happen during
>> boot and booting takes a long time.
> this doesn't make sense. If you don't mount with discard option, no
> discard request
> is issued.
>
I read some where that swap uses discard by default and if I remember
correctly it initializes the whole swap area at boot. So doing that
over three disks might explain why I get those timeout error messages
and after boot these commands work without delay.

Regards,
Holger

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-20  9:50     ` Holger Kiehl
@ 2012-03-20 12:09       ` Shaohua Li
  2012-03-21  2:08         ` Shaohua Li
  0 siblings, 1 reply; 35+ messages in thread
From: Shaohua Li @ 2012-03-20 12:09 UTC (permalink / raw)
  To: Holger Kiehl
  Cc: linux-kernel, linux-raid, neilb, axboe, vgoyal, martin.petersen

2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
> Hello,
>
>
> On Tue, 20 Mar 2012, Shaohua Li wrote:
>
>> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>
>>> Hello,
>>>
>>>
>>> On Fri, 16 Mar 2012, Shaohua Li wrote:
>>>
>>>> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
>>>> support
>>>> for
>>>> raid 4/5/6 later. The implementation is pretty straightforward and
>>>> self-explained.
>>>>
>>>> v1->v2:
>>>> 1. fixed a checking issue
>>>> 2. dropped discard request plug and replace it with no discard merege,
>>>> because
>>>> current SCSI layer can't handle discard request merge.
>>>>
>>> Have tested TRIM patches on three different systems with the following
>>> hardware/ setup:
>>>
>>>   1) root mounted on a raid1 over two SAS SSD's (200GB) and /home
>>> partition
>>>      on a raid0 over a fusionio ioDrive Duo. Is very new and seen very
>>>      little usage.
>>>
>>>   2) root and /home mounted on a raid0 over two Intel X25 Postville
>>>      (160GB) connected to a Intel P55 Express chipset. Has seen very
>>>      heavy usage for approx. 2 years.
>>>
>>>   3) root and /home mounted on a raid0 over three OCZ-VERTEX2 (120GB)
>>>      connected via ICH7 south bridge. Has seen mild usage for approx.
>>>      1.5 years.
>>>
>>> Made the following observations when running my own benchmark which
>>> copies around a lot of small files and deletes them. The benchmark on
>>> all systems was always run only on the /home partition ie. on a raid0.
>>>
>>> For system 1) there is hardly any measurable differnce whether discard
>>> is enabled or not (~29000 files per second).
>>>
>>> On system 2) the performance drops from 6500->3700 files per second,
>>> but under normal usage one does not notice any difference.
>>
>> do you have the blktrace data when the benchmark is running, especially
>> when doing file deletion. I'd like to check the latency of discard in this
>> case.
>>
> It is uploaded on ftp://ftp.dwd.de/pub/afd/test/trim
Thanks, I'll check it.

>>> System 3) has problems during boot, it is so slow that some operations
>>> receive a timeout during boot:
>>>
>>>  udevd[474]: timeout '/sbin/blkid -o udev -p /dev/md0'
>>>  udevd[474]: timeout: killing '/sbin/blkid -o udev -p /dev/md0' [866]
>>>  systemd[1]: dev-md3.swap activation timed out. Stopping.
>>
>> In this one, discard request is slow. And per SATA standard, discard
>> request
>> can't be parallel, so only one request one time, which further slows it
>> down.
>>
> Ok, so having three disc in a raid 0 is even worth.
>
>
>>> Even removing discard does not help and the above errors happen during
>>> boot and booting takes a long time.
>>
>> this doesn't make sense. If you don't mount with discard option, no
>> discard request
>> is issued.
>>
> I read some where that swap uses discard by default and if I remember
> correctly it initializes the whole swap area at boot. So doing that
> over three disks might explain why I get those timeout error messages
> and after boot these commands work without delay.
This looks a bug to me. if swapon doesn't add discard option, we shouldn't
do discard. Let me post a patch, and see how people think.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 2/6] blk: dont allow discard request merge temporarily
  2012-03-16  7:32 ` [patch v2 2/6] blk: dont allow discard request merge temporarily Shaohua Li
@ 2012-03-20 16:21   ` Vivek Goyal
  2012-03-21  1:22     ` Shaohua Li
  0 siblings, 1 reply; 35+ messages in thread
From: Vivek Goyal @ 2012-03-20 16:21 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe, martin.petersen

On Fri, Mar 16, 2012 at 03:32:15PM +0800, Shaohua Li wrote:
> Didn't allow discard request merge temporarily, as SCSI layer isn't ready
> for discard merge as Martin Petersen pointed out. This isn't fair for
> non-scsi device, but looks this is the only way I can do currently.
> 
> We should have the same issue before, but maybe because discard merge is
> very rare case. But now raid0/10 makes the merge quite possible, so we need
> disable it explicitly.
> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---
>  include/linux/blkdev.h |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> Index: linux/include/linux/blkdev.h
> ===================================================================
> --- linux.orig/include/linux/blkdev.h	2012-03-14 09:20:06.787261188 +0800
> +++ linux/include/linux/blkdev.h	2012-03-14 09:20:47.797261248 +0800
> @@ -575,7 +575,7 @@ static inline void blk_clear_queue_full(
>   * it already be started by driver.
>   */
>  #define RQ_NOMERGE_FLAGS	\
> -	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
> +	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | REQ_DISCARD)
>  #define rq_mergeable(rq)	\
>  	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
>  	 (((rq)->cmd_flags & REQ_DISCARD) || \

I think you will need to do little more cleanup to make discard
unmergeable.

- Change rq_mergeable(rq)
- Change attempt_merge() and get rid of special conditions of allowing
  discard merge.

Martin had a bigger patch where he wanted to cleanup many discard specific
condition checks.

As you are just focusing on disabling merging for discard requests, you
might as well just pick the relevant pieces from the patch.

http://www.spinics.net/lists/linux-scsi/msg57779.html

Thanks
Vivek

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 2/6] blk: dont allow discard request merge temporarily
  2012-03-20 16:21   ` Vivek Goyal
@ 2012-03-21  1:22     ` Shaohua Li
  2012-03-21 12:14       ` [patch 1/2]block: handle merged discard request Shaohua Li
  2012-03-22  2:18       ` [patch v2 2/6] blk: dont allow discard request merge temporarily Martin K. Petersen
  0 siblings, 2 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-21  1:22 UTC (permalink / raw)
  To: Vivek Goyal; +Cc: linux-kernel, linux-raid, neilb, axboe, martin.petersen

2012/3/21 Vivek Goyal <vgoyal@redhat.com>:
> On Fri, Mar 16, 2012 at 03:32:15PM +0800, Shaohua Li wrote:
>> Didn't allow discard request merge temporarily, as SCSI layer isn't ready
>> for discard merge as Martin Petersen pointed out. This isn't fair for
>> non-scsi device, but looks this is the only way I can do currently.
>>
>> We should have the same issue before, but maybe because discard merge is
>> very rare case. But now raid0/10 makes the merge quite possible, so we need
>> disable it explicitly.
>>
>> Signed-off-by: Shaohua Li <shli@fusionio.com>
>> ---
>>  include/linux/blkdev.h |    2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> Index: linux/include/linux/blkdev.h
>> ===================================================================
>> --- linux.orig/include/linux/blkdev.h 2012-03-14 09:20:06.787261188 +0800
>> +++ linux/include/linux/blkdev.h      2012-03-14 09:20:47.797261248 +0800
>> @@ -575,7 +575,7 @@ static inline void blk_clear_queue_full(
>>   * it already be started by driver.
>>   */
>>  #define RQ_NOMERGE_FLAGS     \
>> -     (REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
>> +     (REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | REQ_DISCARD)
>>  #define rq_mergeable(rq)     \
>>       (!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
>>        (((rq)->cmd_flags & REQ_DISCARD) || \
>
> I think you will need to do little more cleanup to make discard
> unmergeable.
>
> - Change rq_mergeable(rq)
> - Change attempt_merge() and get rid of special conditions of allowing
>  discard merge.
>
> Martin had a bigger patch where he wanted to cleanup many discard specific
> condition checks.
>
> As you are just focusing on disabling merging for discard requests, you
> might as well just pick the relevant pieces from the patch.
>
> http://www.spinics.net/lists/linux-scsi/msg57779.html
Thanks for pointing out the thread. I didn't think disabling discard merging
permanently is a good idea. We can't do the merge because that code isn't
ready (actually just for driver of SCSI). Enabling discard merge is required
for device with slow discard (and very helpful for raid), so I just want a
temporarily disabling for the merge. Just changing RQ_NOMERGE_FLAGS
is an easy workaround for this goal.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-20 12:09       ` Shaohua Li
@ 2012-03-21  2:08         ` Shaohua Li
  2012-03-21  2:24           ` Roberto Spadim
  2012-03-21  2:29           ` Mathias Burén
  0 siblings, 2 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-21  2:08 UTC (permalink / raw)
  To: Holger Kiehl
  Cc: linux-kernel, linux-raid, neilb, axboe, vgoyal, martin.petersen

2012/3/20 Shaohua Li <shli@kernel.org>:
> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>> Hello,
>>
>>
>> On Tue, 20 Mar 2012, Shaohua Li wrote:
>>
>>> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>>
>>>> Hello,
>>>>
>>>>
>>>> On Fri, 16 Mar 2012, Shaohua Li wrote:
>>>>
>>>>> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
>>>>> support
>>>>> for
>>>>> raid 4/5/6 later. The implementation is pretty straightforward and
>>>>> self-explained.
>>>>>
>>>>> v1->v2:
>>>>> 1. fixed a checking issue
>>>>> 2. dropped discard request plug and replace it with no discard merege,
>>>>> because
>>>>> current SCSI layer can't handle discard request merge.
>>>>>
>>>> Have tested TRIM patches on three different systems with the following
>>>> hardware/ setup:
>>>>
>>>>   1) root mounted on a raid1 over two SAS SSD's (200GB) and /home
>>>> partition
>>>>      on a raid0 over a fusionio ioDrive Duo. Is very new and seen very
>>>>      little usage.
>>>>
>>>>   2) root and /home mounted on a raid0 over two Intel X25 Postville
>>>>      (160GB) connected to a Intel P55 Express chipset. Has seen very
>>>>      heavy usage for approx. 2 years.
>>>>
>>>>   3) root and /home mounted on a raid0 over three OCZ-VERTEX2 (120GB)
>>>>      connected via ICH7 south bridge. Has seen mild usage for approx.
>>>>      1.5 years.
>>>>
>>>> Made the following observations when running my own benchmark which
>>>> copies around a lot of small files and deletes them. The benchmark on
>>>> all systems was always run only on the /home partition ie. on a raid0.
>>>>
>>>> For system 1) there is hardly any measurable differnce whether discard
>>>> is enabled or not (~29000 files per second).
>>>>
>>>> On system 2) the performance drops from 6500->3700 files per second,
>>>> but under normal usage one does not notice any difference.
>>>
>>> do you have the blktrace data when the benchmark is running, especially
>>> when doing file deletion. I'd like to check the latency of discard in this
>>> case.
>>>
>> It is uploaded on ftp://ftp.dwd.de/pub/afd/test/trim
> Thanks, I'll check it.
Thanks for the testing. The trace data is very helpful. In the intel
SSD, trace data
shows a discard request uses about 1 ~ 3 ms. The filesystem suffers from
fragmentation too, so lots of small discard requests. When ext4 starts doing
discard, it usually uses more than 1 minutes. That's too bad.
If just looking one disk's trace data, there are some extra latencies between
two discard requests. The combined trace data of two disks show the latency
comes from waiting for another disk, so nothing abnormal. I thought we could
do an optimization for this case in the future.
So in summary, discard from the SSDs is slow. When your filesystem is
fragmented, the performance will be terrible.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-21  2:08         ` Shaohua Li
@ 2012-03-21  2:24           ` Roberto Spadim
  2012-03-21  2:29             ` Mathias Burén
  2012-03-21  2:29           ` Mathias Burén
  1 sibling, 1 reply; 35+ messages in thread
From: Roberto Spadim @ 2012-03-21  2:24 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Holger Kiehl, linux-kernel, linux-raid, neilb, axboe, vgoyal,
	martin.petersen

=/ well maybe with a sas disk it could be faster? maybe a pciexpress disk too?

Em 20 de março de 2012 23:08, Shaohua Li <shli@kernel.org> escreveu:
> 2012/3/20 Shaohua Li <shli@kernel.org>:
>> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>> Hello,
>>>
>>>
>>> On Tue, 20 Mar 2012, Shaohua Li wrote:
>>>
>>>> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>>>
>>>>> Hello,
>>>>>
>>>>>
>>>>> On Fri, 16 Mar 2012, Shaohua Li wrote:
>>>>>
>>>>>> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
>>>>>> support
>>>>>> for
>>>>>> raid 4/5/6 later. The implementation is pretty straightforward and
>>>>>> self-explained.
>>>>>>
>>>>>> v1->v2:
>>>>>> 1. fixed a checking issue
>>>>>> 2. dropped discard request plug and replace it with no discard merege,
>>>>>> because
>>>>>> current SCSI layer can't handle discard request merge.
>>>>>>
>>>>> Have tested TRIM patches on three different systems with the following
>>>>> hardware/ setup:
>>>>>
>>>>>   1) root mounted on a raid1 over two SAS SSD's (200GB) and /home
>>>>> partition
>>>>>      on a raid0 over a fusionio ioDrive Duo. Is very new and seen very
>>>>>      little usage.
>>>>>
>>>>>   2) root and /home mounted on a raid0 over two Intel X25 Postville
>>>>>      (160GB) connected to a Intel P55 Express chipset. Has seen very
>>>>>      heavy usage for approx. 2 years.
>>>>>
>>>>>   3) root and /home mounted on a raid0 over three OCZ-VERTEX2 (120GB)
>>>>>      connected via ICH7 south bridge. Has seen mild usage for approx.
>>>>>      1.5 years.
>>>>>
>>>>> Made the following observations when running my own benchmark which
>>>>> copies around a lot of small files and deletes them. The benchmark on
>>>>> all systems was always run only on the /home partition ie. on a raid0.
>>>>>
>>>>> For system 1) there is hardly any measurable differnce whether discard
>>>>> is enabled or not (~29000 files per second).
>>>>>
>>>>> On system 2) the performance drops from 6500->3700 files per second,
>>>>> but under normal usage one does not notice any difference.
>>>>
>>>> do you have the blktrace data when the benchmark is running, especially
>>>> when doing file deletion. I'd like to check the latency of discard in this
>>>> case.
>>>>
>>> It is uploaded on ftp://ftp.dwd.de/pub/afd/test/trim
>> Thanks, I'll check it.
> Thanks for the testing. The trace data is very helpful. In the intel
> SSD, trace data
> shows a discard request uses about 1 ~ 3 ms. The filesystem suffers from
> fragmentation too, so lots of small discard requests. When ext4 starts doing
> discard, it usually uses more than 1 minutes. That's too bad.
> If just looking one disk's trace data, there are some extra latencies between
> two discard requests. The combined trace data of two disks show the latency
> comes from waiting for another disk, so nothing abnormal. I thought we could
> do an optimization for this case in the future.
> So in summary, discard from the SSDs is slow. When your filesystem is
> fragmented, the performance will be terrible.
>
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-21  2:08         ` Shaohua Li
  2012-03-21  2:24           ` Roberto Spadim
@ 2012-03-21  2:29           ` Mathias Burén
  1 sibling, 0 replies; 35+ messages in thread
From: Mathias Burén @ 2012-03-21  2:29 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Holger Kiehl, linux-kernel, linux-raid, neilb, axboe, vgoyal,
	martin.petersen

On 21 March 2012 02:08, Shaohua Li <shli@kernel.org> wrote:
> 2012/3/20 Shaohua Li <shli@kernel.org>:
>> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>> Hello,
>>>
>>>
>>> On Tue, 20 Mar 2012, Shaohua Li wrote:
>>>
<snip>
> Thanks for the testing. The trace data is very helpful. In the intel
> SSD, trace data
> shows a discard request uses about 1 ~ 3 ms. The filesystem suffers from
> fragmentation too, so lots of small discard requests. When ext4 starts doing
> discard, it usually uses more than 1 minutes. That's too bad.
> If just looking one disk's trace data, there are some extra latencies between
> two discard requests. The combined trace data of two disks show the latency
> comes from waiting for another disk, so nothing abnormal. I thought we could
> do an optimization for this case in the future.
> So in summary, discard from the SSDs is slow. When your filesystem is
> fragmented, the performance will be terrible.
>
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Many people prefer to have the filesystems mounted without discard,
then they run a TRIM command/script at night (or whenever the disk
isn't used very much). This way the TRIM commands don't slow down the
'overall' performance.

Mathias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-21  2:24           ` Roberto Spadim
@ 2012-03-21  2:29             ` Mathias Burén
  0 siblings, 0 replies; 35+ messages in thread
From: Mathias Burén @ 2012-03-21  2:29 UTC (permalink / raw)
  To: Roberto Spadim
  Cc: Shaohua Li, Holger Kiehl, linux-kernel, linux-raid, neilb, axboe,
	vgoyal, martin.petersen

On 21 March 2012 02:24, Roberto Spadim <roberto@spadim.com.br> wrote:
> =/ well maybe with a sas disk it could be faster? maybe a pciexpress disk too?
>
> Em 20 de março de 2012 23:08, Shaohua Li <shli@kernel.org> escreveu:
>> 2012/3/20 Shaohua Li <shli@kernel.org>:
>>> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>> Hello,
>>>>
>>>>
>>>> On Tue, 20 Mar 2012, Shaohua Li wrote:
>>>>
>>>>> 2012/3/20 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>>>>
>>>>>> Hello,
>>>>>>
>>>>>>
>>>>>> On Fri, 16 Mar 2012, Shaohua Li wrote:
>>>>>>
>>>>>>> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
>>>>>>> support
>>>>>>> for
>>>>>>> raid 4/5/6 later. The implementation is pretty straightforward and
>>>>>>> self-explained.
>>>>>>>
>>>>>>> v1->v2:
>>>>>>> 1. fixed a checking issue
>>>>>>> 2. dropped discard request plug and replace it with no discard merege,
>>>>>>> because
>>>>>>> current SCSI layer can't handle discard request merge.
>>>>>>>
>>>>>> Have tested TRIM patches on three different systems with the following
>>>>>> hardware/ setup:
>>>>>>
>>>>>>   1) root mounted on a raid1 over two SAS SSD's (200GB) and /home
>>>>>> partition
>>>>>>      on a raid0 over a fusionio ioDrive Duo. Is very new and seen very
>>>>>>      little usage.
>>>>>>
>>>>>>   2) root and /home mounted on a raid0 over two Intel X25 Postville
>>>>>>      (160GB) connected to a Intel P55 Express chipset. Has seen very
>>>>>>      heavy usage for approx. 2 years.
>>>>>>
>>>>>>   3) root and /home mounted on a raid0 over three OCZ-VERTEX2 (120GB)
>>>>>>      connected via ICH7 south bridge. Has seen mild usage for approx.
>>>>>>      1.5 years.
>>>>>>
>>>>>> Made the following observations when running my own benchmark which
>>>>>> copies around a lot of small files and deletes them. The benchmark on
>>>>>> all systems was always run only on the /home partition ie. on a raid0.
>>>>>>
>>>>>> For system 1) there is hardly any measurable differnce whether discard
>>>>>> is enabled or not (~29000 files per second).
>>>>>>
>>>>>> On system 2) the performance drops from 6500->3700 files per second,
>>>>>> but under normal usage one does not notice any difference.
>>>>>
>>>>> do you have the blktrace data when the benchmark is running, especially
>>>>> when doing file deletion. I'd like to check the latency of discard in this
>>>>> case.
>>>>>
>>>> It is uploaded on ftp://ftp.dwd.de/pub/afd/test/trim
>>> Thanks, I'll check it.
>> Thanks for the testing. The trace data is very helpful. In the intel
>> SSD, trace data
>> shows a discard request uses about 1 ~ 3 ms. The filesystem suffers from
>> fragmentation too, so lots of small discard requests. When ext4 starts doing
>> discard, it usually uses more than 1 minutes. That's too bad.
>> If just looking one disk's trace data, there are some extra latencies between
>> two discard requests. The combined trace data of two disks show the latency
>> comes from waiting for another disk, so nothing abnormal. I thought we could
>> do an optimization for this case in the future.
>> So in summary, discard from the SSDs is slow. When your filesystem is
>> fragmented, the performance will be terrible.
>>
>> Thanks,
>> Shaohua
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Not relevant.

Mathias

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [patch 1/2]block: handle merged discard request
  2012-03-21  1:22     ` Shaohua Li
@ 2012-03-21 12:14       ` Shaohua Li
  2012-03-22  2:32         ` Martin K. Petersen
  2012-03-22  2:18       ` [patch v2 2/6] blk: dont allow discard request merge temporarily Martin K. Petersen
  1 sibling, 1 reply; 35+ messages in thread
From: Shaohua Li @ 2012-03-21 12:14 UTC (permalink / raw)
  To: Vivek Goyal, axboe; +Cc: linux-kernel, neilb, martin.petersen

On 3/21/12 9:22 AM, Shaohua Li wrote:
> 2012/3/21 Vivek Goyal<vgoyal@redhat.com>:
>> On Fri, Mar 16, 2012 at 03:32:15PM +0800, Shaohua Li wrote:
>>> Didn't allow discard request merge temporarily, as SCSI layer isn't ready
>>> for discard merge as Martin Petersen pointed out. This isn't fair for
>>> non-scsi device, but looks this is the only way I can do currently.
>>>
>>> We should have the same issue before, but maybe because discard merge is
>>> very rare case. But now raid0/10 makes the merge quite possible, so we need
>>> disable it explicitly.
>> I think you will need to do little more cleanup to make discard
>> unmergeable.
>>
>> - Change rq_mergeable(rq)
>> - Change attempt_merge() and get rid of special conditions of allowing
>>   discard merge.
>>
>> Martin had a bigger patch where he wanted to cleanup many discard specific
>> condition checks.
>>
>> As you are just focusing on disabling merging for discard requests, you
>> might as well just pick the relevant pieces from the patch.
>>
>> http://www.spinics.net/lists/linux-scsi/msg57779.html
> Thanks for pointing out the thread. I didn't think disabling discard merging
> permanently is a good idea. We can't do the merge because that code isn't
> ready (actually just for driver of SCSI). Enabling discard merge is required
> for device with slow discard (and very helpful for raid), so I just want a
> temporarily disabling for the merge. Just changing RQ_NOMERGE_FLAGS
> is an easy workaround for this goal.
I looked at the SCSI code for discard again, looks we can easily make 
discard
mergeable. It's a little hacky (the whole SCSI discard implementation is 
hacky
actually), but quite simple and end the trouble of discard merge 
immediately.

Thanks,
Shaohua


The SCSI discard implementation hacks the first bio of request to
add payload, which makes blk_update_request() can't correctly mark
bios finish.
The patch solves it. We set discard bio size to 0 and finish it after
the hacked payload finishes. The check in blk_update_request() should
make us safe.
It's a little hack here (but the whole discard implementation of SCSI
is hacky) and this makes us have discard request merge immediately,
which is great for some SSDs with slow discard.

Signed-off-by: Shaohua Li <shli@fusionio.com>

---
  block/blk-core.c |   11 +++++++++--
  1 file changed, 9 insertions(+), 2 deletions(-)

Index: linux/block/blk-core.c
===================================================================
--- linux.orig/block/blk-core.c	2012-03-21 17:58:07.322320702 +0800
+++ linux/block/blk-core.c	2012-03-21 18:04:34.662320467 +0800
@@ -1177,7 +1177,7 @@ EXPORT_SYMBOL(blk_put_request);
  void blk_add_request_payload(struct request *rq, struct page *page,
  		unsigned int len)
  {
-	struct bio *bio = rq->bio;
+	struct bio *bio = rq->bio, *next = bio->bi_next;

  	bio->bi_io_vec->bv_page = page;
  	bio->bi_io_vec->bv_offset = 0;
@@ -1187,6 +1187,11 @@ void blk_add_request_payload(struct requ
  	bio->bi_vcnt = 1;
  	bio->bi_phys_segments = 1;

+	while (next) {
+		next->bi_size = 0;
+		next = next->bi_next;
+	}
+
  	rq->__data_len = rq->resid_len = len;
  	rq->nr_phys_segments = 1;
  	rq->buffer = bio_data(bio);
@@ -2185,8 +2190,10 @@ bool blk_update_request(struct request *
  		if (bio) {
  			/*
  			 * end more in this run, or just return 'not-done'
+			 * The discard check is a hack, see blk_add_request_payload
  			 */
-			if (unlikely(nr_bytes <= 0))
+			if (unlikely(nr_bytes <= 0 &&
+			   !((req->cmd_flags & REQ_DISCARD) && bio->bi_size == 0)))
  				break;
  		}
  	}

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 2/6] blk: dont allow discard request merge temporarily
  2012-03-21  1:22     ` Shaohua Li
  2012-03-21 12:14       ` [patch 1/2]block: handle merged discard request Shaohua Li
@ 2012-03-22  2:18       ` Martin K. Petersen
  2012-03-22  2:33         ` Shaohua Li
  2012-03-22  6:28         ` Christoph Hellwig
  1 sibling, 2 replies; 35+ messages in thread
From: Martin K. Petersen @ 2012-03-22  2:18 UTC (permalink / raw)
  To: Shaohua Li
  Cc: Vivek Goyal, linux-kernel, linux-raid, neilb, axboe, martin.petersen

>>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:

Shaohua> Enabling discard merge is required for device with slow discard
Shaohua> (and very helpful for raid),

Before RAID support there really wasn't much point. That's one of the
reasons the current discard merge code has survived despite being
completely broken.

As for how I'd like to handle contiguous discard merges going forward
here's a proof of concept. It's on top of my write same tree so you may
have to noodle a bit.

It is crucially important that we never merge a command that can't be
processed by the device. So I'd like to do something like this:


block: Consolidate command flag and queue limit checks for merges

 - blk_check_merge_flags() verifies that cmd_flags / bi_rw are
   compatible. This function is called for both req-req and req-bio
   merging.

 - blk_rq_get_max_sectors() and blk_queue_get_max_sectors() can be used
   to query the maximum sector count for a given request or queue. The
   calls will return the right value from the queue limits given the
   type of command (RW, discard, write same, etc.)

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

diff --git a/block/blk-core.c b/block/blk-core.c
index 1cd55be..6619a6d 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1757,8 +1757,7 @@ int blk_rq_check_limits(struct request_queue *q, struct request *rq)
 	if (!rq_mergeable(rq))
 		return 0;
 
-	if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
-	    blk_rq_bytes(rq) > queue_max_hw_sectors(q) << 9) {
+	if (blk_rq_sectors(rq) > blk_queue_get_max_sectors(q, rq->cmd_flags)) {
 		printk(KERN_ERR "%s: over max size limit.\n", __func__);
 		return -EIO;
 	}
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 9172606..8bd91d2 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -228,14 +228,8 @@ no_merge:
 int ll_back_merge_fn(struct request_queue *q, struct request *req,
 		     struct bio *bio)
 {
-	unsigned short max_sectors;
-
-	if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
-		max_sectors = queue_max_hw_sectors(q);
-	else
-		max_sectors = queue_max_sectors(q);
-
-	if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
+	if (blk_rq_sectors(req) + bio_sectors(bio) >
+	    blk_rq_get_max_sectors(req)) {
 		req->cmd_flags |= REQ_NOMERGE;
 		if (req == q->last_merge)
 			q->last_merge = NULL;
@@ -252,15 +246,8 @@ int ll_back_merge_fn(struct request_queue *q, struct request *req,
 int ll_front_merge_fn(struct request_queue *q, struct request *req,
 		      struct bio *bio)
 {
-	unsigned short max_sectors;
-
-	if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
-		max_sectors = queue_max_hw_sectors(q);
-	else
-		max_sectors = queue_max_sectors(q);
-
-
-	if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
+	if (blk_rq_sectors(req) + bio_sectors(bio) >
+	    blk_rq_get_max_sectors(req)) {
 		req->cmd_flags |= REQ_NOMERGE;
 		if (req == q->last_merge)
 			q->last_merge = NULL;
@@ -291,7 +278,8 @@ static int ll_merge_requests_fn(struct request_queue *q, struct request *req,
 	/*
 	 * Will it become too large?
 	 */
-	if ((blk_rq_sectors(req) + blk_rq_sectors(next)) > queue_max_sectors(q))
+	if ((blk_rq_sectors(req) + blk_rq_sectors(next)) >
+	    blk_rq_get_max_sectors(req))
 		return 0;
 
 	total_phys_segments = req->nr_phys_segments + next->nr_phys_segments;
@@ -370,6 +358,9 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 	if (!rq_mergeable(req) || !rq_mergeable(next))
 		return 0;
 
+	if (!blk_check_merge_flags(req->cmd_flags, next->cmd_flags))
+		return 0;
+
 	/*
 	 * not contiguous
 	 */
@@ -465,6 +456,9 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (!rq_mergeable(rq) || !bio_mergeable(bio))
 		return false;
 
+	if (!blk_check_merge_flags(rq->cmd_flags, bio->bi_rw))
+		return false;
+
 	/* different data direction or already started, don't merge */
 	if (bio_data_dir(bio) != rq_data_dir(rq))
 		return false;
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 7718e69..b7994eb 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -175,8 +175,7 @@ enum rq_flag_bits {
 
 /* This mask is used for both bio and request merge checking */
 #define REQ_NOMERGE_FLAGS \
-	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | \
-	 REQ_DISCARD | REQ_WRITE_SAME)
+	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
 
 #define REQ_RAHEAD		(1 << __REQ_RAHEAD)
 #define REQ_THROTTLED		(1 << __REQ_THROTTLED)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 92956b7..cf11980 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -580,6 +580,20 @@ static inline bool rq_mergeable(struct request *rq)
 	return true;
 }
 
+static inline bool blk_check_merge_flags(unsigned int flags1,
+					 unsigned int flags2)
+{
+	if ((flags1 & REQ_DISCARD) != (flags2 & REQ_DISCARD))
+		return false;
+
+	if ((flags1 & REQ_SECURE) != (flags2 & REQ_SECURE))
+		return false;
+
+	if ((flags1 & REQ_WRITE_SAME) != (flags2 & REQ_WRITE_SAME))
+		return false;
+
+	return true;
+}
 
 /*
  * q->prep_rq_fn return values
@@ -776,6 +790,28 @@ static inline unsigned int blk_rq_cur_sectors(const struct request *rq)
 	return blk_rq_cur_bytes(rq) >> 9;
 }
 
+static inline unsigned int blk_queue_get_max_sectors(struct request_queue *q,
+						     unsigned int cmd_flags)
+{
+	if (unlikely(cmd_flags & REQ_DISCARD))
+		return q->limits.max_discard_sectors;
+
+	if (unlikely(cmd_flags & REQ_WRITE_SAME))
+		return q->limits.max_write_same_sectors;
+
+	return q->limits.max_sectors;
+}
+
+static inline unsigned int blk_rq_get_max_sectors(struct request *rq)
+{
+	struct request_queue *q = rq->q;
+
+	if (unlikely(rq->cmd_type == REQ_TYPE_BLOCK_PC))
+		return q->limits.max_hw_sectors;
+
+	return blk_queue_get_max_sectors(q, rq->cmd_flags);
+}
+
 /*
  * Request issue related functions.
  */

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-03-21 12:14       ` [patch 1/2]block: handle merged discard request Shaohua Li
@ 2012-03-22  2:32         ` Martin K. Petersen
  2012-03-22  2:39           ` Shaohua Li
  0 siblings, 1 reply; 35+ messages in thread
From: Martin K. Petersen @ 2012-03-22  2:32 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Vivek Goyal, axboe, linux-kernel, neilb, martin.petersen

>>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:

Shaohua> The SCSI discard implementation hacks the first bio of request
Shaohua> to add payload, which makes blk_update_request() can't
Shaohua> correctly mark bios finish.  The patch solves it. We set
Shaohua> discard bio size to 0 and finish it after the hacked payload
Shaohua> finishes.

Ick!

Also, you can't muck with bi_size because if we get an I/O error and
have to reissue the command we no longer know how much to write.

I have had to deal with the same issue for WRITE SAME. And the only sane
approach is to distinguish between the DMA transfer size and the blocks
"affected" by the command. That's what I'm working on right now for copy
offload...

If this is something you want in 3.4 I guess we could temporarily add a
separate length field to struct request. If you can wait I suggest we
talk at LSF.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 2/6] blk: dont allow discard request merge temporarily
  2012-03-22  2:18       ` [patch v2 2/6] blk: dont allow discard request merge temporarily Martin K. Petersen
@ 2012-03-22  2:33         ` Shaohua Li
  2012-03-22  6:28         ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-03-22  2:33 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Vivek Goyal, linux-kernel, linux-raid, neilb, axboe

2012/3/22 Martin K. Petersen <martin.petersen@oracle.com>:
>>>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:
>
> Shaohua> Enabling discard merge is required for device with slow discard
> Shaohua> (and very helpful for raid),
>
> Before RAID support there really wasn't much point. That's one of the
> reasons the current discard merge code has survived despite being
> completely broken.
Exactly.

> As for how I'd like to handle contiguous discard merges going forward
> here's a proof of concept. It's on top of my write same tree so you may
> have to noodle a bit.
>
> It is crucially important that we never merge a command that can't be
> processed by the device. So I'd like to do something like this:
This is great to consolidate the check with this patch. But looks this
isn't enough. As far as I know the current discard merge is broken
because blk_update_request() can't handle such request because
we override the first BIO to add payload. I had a quick hack and post
out yesterday. It works in my SATA ssd, but I really didn't know if there
is something missing in SCSI side, can you share anything here?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-03-22  2:32         ` Martin K. Petersen
@ 2012-03-22  2:39           ` Shaohua Li
  2012-03-22  2:53             ` Martin K. Petersen
  0 siblings, 1 reply; 35+ messages in thread
From: Shaohua Li @ 2012-03-22  2:39 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Vivek Goyal, axboe, linux-kernel, neilb

2012/3/22 Martin K. Petersen <martin.petersen@oracle.com>:
>>>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:
>
> Shaohua> The SCSI discard implementation hacks the first bio of request
> Shaohua> to add payload, which makes blk_update_request() can't
> Shaohua> correctly mark bios finish.  The patch solves it. We set
> Shaohua> discard bio size to 0 and finish it after the hacked payload
> Shaohua> finishes.
>
> Ick!
>
> Also, you can't muck with bi_size because if we get an I/O error and
> have to reissue the command we no longer know how much to write.
> I have had to deal with the same issue for WRITE SAME. And the only sane
> approach is to distinguish between the DMA transfer size and the blocks
> "affected" by the command. That's what I'm working on right now for copy
> offload...
Yes, this is a problem. But note we already have this problem with
discard request for a long time. The payload has just (sector, length).
And looks the request finish returns the transfered data of the
payload instead of the discarded data. Am I missing anything?

> If this is something you want in 3.4 I guess we could temporarily add a
> separate length field to struct request. If you can wait I suggest we
> talk at LSF.
This blocked my raid discard patches, but I can wait. Sure we can have
a talk at LSF.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-03-22  2:39           ` Shaohua Li
@ 2012-03-22  2:53             ` Martin K. Petersen
  2012-06-20  8:57               ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Martin K. Petersen @ 2012-03-22  2:53 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Martin K. Petersen, Vivek Goyal, axboe, linux-kernel, neilb

>>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:

Shaohua> Yes, this is a problem. But note we already have this problem
Shaohua> with discard request for a long time. The payload has just
Shaohua> (sector, length).  And looks the request finish returns the
Shaohua> transfered data of the payload instead of the discarded
Shaohua> data. Am I missing anything?

Nope, it's broken. But if you check my write same patches you'll see
that I actually return the correct completion size there.

There are several additional commands in the pipeline where the 1:1
mapping between DMA size and block range is invalid. I want to get rid
of the 1:1 assumption in general so we can handle any command without
these evil workarounds.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 2/6] blk: dont allow discard request merge temporarily
  2012-03-22  2:18       ` [patch v2 2/6] blk: dont allow discard request merge temporarily Martin K. Petersen
  2012-03-22  2:33         ` Shaohua Li
@ 2012-03-22  6:28         ` Christoph Hellwig
  1 sibling, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2012-03-22  6:28 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Shaohua Li, Vivek Goyal, linux-kernel, linux-raid, neilb, axboe

On Wed, Mar 21, 2012 at 10:18:54PM -0400, Martin K. Petersen wrote:
> >>>>> "Shaohua" == Shaohua Li <shli@kernel.org> writes:
> 
> Shaohua> Enabling discard merge is required for device with slow discard
> Shaohua> (and very helpful for raid),
> 
> Before RAID support there really wasn't much point. That's one of the
> reasons the current discard merge code has survived despite being
> completely broken.

I've actually send repeated bug reports about it beeing broken when
I enabled async discard in XFS which led to merges, and debugged it down
to beeing a problem with merges.  But Jens didn't like my patch to mark
them nomerged, so we still couldn't commit the async discard support for
XFS which would have helped quite a bit with discard performance.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (6 preceding siblings ...)
  2012-03-19 19:38 ` [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Holger Kiehl
@ 2012-05-08  9:59 ` Patelczyk, Maciej
  2012-05-09  2:27   ` Shaohua Li
  7 siblings, 1 reply; 35+ messages in thread
From: Patelczyk, Maciej @ 2012-05-08  9:59 UTC (permalink / raw)
  To: Shaohua Li, linux-kernel, linux-raid; +Cc: neilb

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of Shaohua Li
> Sent: Friday, March 16, 2012 8:32 AM
> To: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org
> Cc: neilb@suse.de; axboe@kernel.dk; vgoyal@redhat.com;
> martin.petersen@oracle.com
> Subject: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
> 
> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
> support for raid 4/5/6 later. The implementation is pretty
> straightforward and self-explained.
> 
> v1->v2:
> 1. fixed a checking issue
> 2. dropped discard request plug and replace it with no discard merege,
> because current SCSI layer can't handle discard request merge.
> 
> Thanks,
> Shaohua

Hi,
Can't see anywhere patch 2/6 from this series.


maciej

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
  2012-05-08  9:59 ` Patelczyk, Maciej
@ 2012-05-09  2:27   ` Shaohua Li
  0 siblings, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-05-09  2:27 UTC (permalink / raw)
  To: Patelczyk, Maciej; +Cc: linux-kernel, linux-raid, neilb

2012/5/8 Patelczyk, Maciej <maciej.patelczyk@intel.com>:
>> -----Original Message-----
>> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
>> owner@vger.kernel.org] On Behalf Of Shaohua Li
>> Sent: Friday, March 16, 2012 8:32 AM
>> To: linux-kernel@vger.kernel.org; linux-raid@vger.kernel.org
>> Cc: neilb@suse.de; axboe@kernel.dk; vgoyal@redhat.com;
>> martin.petersen@oracle.com
>> Subject: [patch v2 0/6] Add TRIM support for raid linear/0/1/10
>>
>> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
>> support for raid 4/5/6 later. The implementation is pretty
>> straightforward and self-explained.
>>
>> v1->v2:
>> 1. fixed a checking issue
>> 2. dropped discard request plug and replace it with no discard merege,
>> because current SCSI layer can't handle discard request merge.
>>
>> Thanks,
>> Shaohua
>
> Hi,
> Can't see anywhere patch 2/6 from this series.
http://marc.info/?l=linux-kernel&m=133188350823292&w=2

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-03-22  2:53             ` Martin K. Petersen
@ 2012-06-20  8:57               ` Christoph Hellwig
  2012-06-22  3:46                 ` Martin K. Petersen
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2012-06-20  8:57 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Shaohua Li, Vivek Goyal, axboe, linux-kernel, neilb

On Wed, Mar 21, 2012 at 10:53:02PM -0400, Martin K. Petersen wrote:
> Nope, it's broken. But if you check my write same patches you'll see
> that I actually return the correct completion size there.
> 
> There are several additional commands in the pipeline where the 1:1
> mapping between DMA size and block range is invalid. I want to get rid
> of the 1:1 assumption in general so we can handle any command without
> these evil workarounds.

What's the progress on getting these issues sorted out?


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-06-20  8:57               ` Christoph Hellwig
@ 2012-06-22  3:46                 ` Martin K. Petersen
  2012-08-03  2:10                   ` Shaohua Li
  2012-08-18  3:06                   ` Mike Snitzer
  0 siblings, 2 replies; 35+ messages in thread
From: Martin K. Petersen @ 2012-06-22  3:46 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin K. Petersen, Shaohua Li, Vivek Goyal, axboe, linux-kernel, neilb

>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:

>> There are several additional commands in the pipeline where the 1:1
>> mapping between DMA size and block range is invalid. I want to get
>> rid of the 1:1 assumption in general so we can handle any command
>> without these evil workarounds.

Christoph> What's the progress on getting these issues sorted out?

This has bitrotted for a while. I'll put it on my list. I should finally
have some bandwidth again next week...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-06-22  3:46                 ` Martin K. Petersen
@ 2012-08-03  2:10                   ` Shaohua Li
  2012-08-18  3:06                   ` Mike Snitzer
  1 sibling, 0 replies; 35+ messages in thread
From: Shaohua Li @ 2012-08-03  2:10 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Vivek Goyal, axboe, linux-kernel, neilb

2012/6/22 Martin K. Petersen <martin.petersen@oracle.com>:
>>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
>
>>> There are several additional commands in the pipeline where the 1:1
>>> mapping between DMA size and block range is invalid. I want to get
>>> rid of the 1:1 assumption in general so we can handle any command
>>> without these evil workarounds.
>
> Christoph> What's the progress on getting these issues sorted out?
>
> This has bitrotted for a while. I'll put it on my list. I should finally
> have some bandwidth again next week...

Any update on this? If this will not happen soon, should we just disable
discard request merge now?

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-06-22  3:46                 ` Martin K. Petersen
  2012-08-03  2:10                   ` Shaohua Li
@ 2012-08-18  3:06                   ` Mike Snitzer
  2012-08-18  3:47                     ` Martin K. Petersen
  1 sibling, 1 reply; 35+ messages in thread
From: Mike Snitzer @ 2012-08-18  3:06 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Shaohua Li, Vivek Goyal, axboe, linux-kernel,
	neilb, linux-scsi

On Thu, Jun 21, 2012 at 11:46 PM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Christoph" == Christoph Hellwig <hch@infradead.org> writes:
>
>>> There are several additional commands in the pipeline where the 1:1
>>> mapping between DMA size and block range is invalid. I want to get
>>> rid of the 1:1 assumption in general so we can handle any command
>>> without these evil workarounds.
>
> Christoph> What's the progress on getting these issues sorted out?
>
> This has bitrotted for a while. I'll put it on my list. I should finally
> have some bandwidth again next week...

Hey Martin,

I rebased (and fixed/tested) your writesame patches on v3.6-rc2 +
jens' for-linus branch, the git tree is available here:
https://github.com/snitm/linux/tree/writesame

I've also made the updated patchset available here:
http://people.redhat.com/msnitzer/patches/upstream/writesame/series.html

Should the writesame patches come before any discard merge or 1:1 DMA
and block range assumption fixes?
NOTE (for others besides martin):
http://people.redhat.com/msnitzer/patches/upstream/writesame/0001-block-Clean-up-merge-logic.patch
removes all the discard merge hacks; I think it provides a clean
baseline to then layer discard merge support back in -- but maybe
that's a flawed strategy?

Could be I've wasted a few hours by rebasing these patches...
regardless, it would be great if you could share what your plans are.

Thanks!

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-08-18  3:06                   ` Mike Snitzer
@ 2012-08-18  3:47                     ` Martin K. Petersen
  2012-08-20 13:57                       ` Mike Snitzer
  0 siblings, 1 reply; 35+ messages in thread
From: Martin K. Petersen @ 2012-08-18  3:47 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, Christoph Hellwig, Shaohua Li, Vivek Goyal,
	axboe, linux-kernel, neilb, linux-scsi

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> Could be I've wasted a few hours by rebasing these patches...
Mike> regardless, it would be great if you could share what your plans
Mike> are.

Heh, I worked on syncing my patch queue up to Jens' and James' trees
this afternoon. But I didn't quite finish the block stuff, mainly due to
some conflicts with a few topology changes I also have pending.

I'll take a look at your series. Maybe I'll swap things around and put
the topology changes on top instead of below. Leverage some of the work
you did...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-08-18  3:47                     ` Martin K. Petersen
@ 2012-08-20 13:57                       ` Mike Snitzer
  2012-08-20 13:58                         ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Snitzer @ 2012-08-20 13:57 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Christoph Hellwig, Shaohua Li, Vivek Goyal, axboe, linux-kernel,
	neilb, linux-scsi

On Fri, Aug 17 2012 at 11:47pm -0400,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> Could be I've wasted a few hours by rebasing these patches...
> Mike> regardless, it would be great if you could share what your plans
> Mike> are.
> 
> Heh, I worked on syncing my patch queue up to Jens' and James' trees
> this afternoon. But I didn't quite finish the block stuff, mainly due to
> some conflicts with a few topology changes I also have pending.
> 
> I'll take a look at your series. Maybe I'll swap things around and put
> the topology changes on top instead of below. Leverage some of the work
> you did...

OK, just FYI, I had to change bio_has_data() to test bio->bi_vcnt
(rather than bio->bi_io_vec != NULL) because a discard bio has a
non-NULL bio->bi_io_vec (likely points to the bio->bi_inline_vecs but I
didn't check yet).

But I haven't put my finger on _why_ a discard bio has bio->bi_io_vec
(but given my use of DM, bio comes from bio_alloc_bioset, and DM passes
original bio->bi_max_vecs for nr_iovecs).

Anyway, this bio_has_data() change seemed reasonable considering
bio_data() checks bio->bi_vcnt.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-08-20 13:57                       ` Mike Snitzer
@ 2012-08-20 13:58                         ` Christoph Hellwig
  2012-08-20 14:12                           ` Mike Snitzer
  0 siblings, 1 reply; 35+ messages in thread
From: Christoph Hellwig @ 2012-08-20 13:58 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, Christoph Hellwig, Shaohua Li, Vivek Goyal,
	axboe, linux-kernel, neilb, linux-scsi

On Mon, Aug 20, 2012 at 09:57:39AM -0400, Mike Snitzer wrote:
> But I haven't put my finger on _why_ a discard bio has bio->bi_io_vec
> (but given my use of DM, bio comes from bio_alloc_bioset, and DM passes
> original bio->bi_max_vecs for nr_iovecs).

TRIM has a payload and we cheay by preallocation a data page for it.


^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-08-20 13:58                         ` Christoph Hellwig
@ 2012-08-20 14:12                           ` Mike Snitzer
  2012-08-20 14:15                             ` Christoph Hellwig
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Snitzer @ 2012-08-20 14:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin K. Petersen, Shaohua Li, Vivek Goyal, axboe, linux-kernel,
	neilb, linux-scsi

On Mon, Aug 20 2012 at  9:58am -0400,
Christoph Hellwig <hch@infradead.org> wrote:

> On Mon, Aug 20, 2012 at 09:57:39AM -0400, Mike Snitzer wrote:
> > But I haven't put my finger on _why_ a discard bio has bio->bi_io_vec
> > (but given my use of DM, bio comes from bio_alloc_bioset, and DM passes
> > original bio->bi_max_vecs for nr_iovecs).
> 
> TRIM has a payload and we cheay by preallocation a data page for it.

Thought we pushed that down?  Hence sd_setup_discard_cmnd's
alloc_page + blk_add_request_payload hack.

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [patch 1/2]block: handle merged discard request
  2012-08-20 14:12                           ` Mike Snitzer
@ 2012-08-20 14:15                             ` Christoph Hellwig
  0 siblings, 0 replies; 35+ messages in thread
From: Christoph Hellwig @ 2012-08-20 14:15 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Christoph Hellwig, Martin K. Petersen, Shaohua Li, Vivek Goyal,
	axboe, linux-kernel, neilb, linux-scsi

On Mon, Aug 20, 2012 at 10:12:29AM -0400, Mike Snitzer wrote:
> Thought we pushed that down?  Hence sd_setup_discard_cmnd's
> alloc_page + blk_add_request_payload hack.

Yeah, but we still need the bio_vec from early on, as it's allocated as
part of the bio.

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2012-08-20 14:15 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-16  7:32 [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Shaohua Li
2012-03-16  7:32 ` [patch v2 1/6] block: makes bio_split support bio without data Shaohua Li
2012-03-16  7:32 ` [patch v2 2/6] blk: dont allow discard request merge temporarily Shaohua Li
2012-03-20 16:21   ` Vivek Goyal
2012-03-21  1:22     ` Shaohua Li
2012-03-21 12:14       ` [patch 1/2]block: handle merged discard request Shaohua Li
2012-03-22  2:32         ` Martin K. Petersen
2012-03-22  2:39           ` Shaohua Li
2012-03-22  2:53             ` Martin K. Petersen
2012-06-20  8:57               ` Christoph Hellwig
2012-06-22  3:46                 ` Martin K. Petersen
2012-08-03  2:10                   ` Shaohua Li
2012-08-18  3:06                   ` Mike Snitzer
2012-08-18  3:47                     ` Martin K. Petersen
2012-08-20 13:57                       ` Mike Snitzer
2012-08-20 13:58                         ` Christoph Hellwig
2012-08-20 14:12                           ` Mike Snitzer
2012-08-20 14:15                             ` Christoph Hellwig
2012-03-22  2:18       ` [patch v2 2/6] blk: dont allow discard request merge temporarily Martin K. Petersen
2012-03-22  2:33         ` Shaohua Li
2012-03-22  6:28         ` Christoph Hellwig
2012-03-16  7:32 ` [patch v2 3/6] md: linear supports TRIM Shaohua Li
2012-03-16  7:32 ` [patch v2 4/6] md: raid 0 " Shaohua Li
2012-03-16  7:32 ` [patch v2 5/6] md: raid 1 " Shaohua Li
2012-03-16  7:32 ` [patch v2 6/6] md: raid 10 " Shaohua Li
2012-03-19 19:38 ` [patch v2 0/6] Add TRIM support for raid linear/0/1/10 Holger Kiehl
2012-03-20  1:27   ` Shaohua Li
2012-03-20  9:50     ` Holger Kiehl
2012-03-20 12:09       ` Shaohua Li
2012-03-21  2:08         ` Shaohua Li
2012-03-21  2:24           ` Roberto Spadim
2012-03-21  2:29             ` Mathias Burén
2012-03-21  2:29           ` Mathias Burén
2012-05-08  9:59 ` Patelczyk, Maciej
2012-05-09  2:27   ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).