linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/7] Add TRIM support for raid linear/0/1/10
@ 2012-03-12  3:04 Shaohua Li
  2012-03-12  3:04 ` [patch 1/7] block: makes bio_split support bio without data Shaohua Li
                   ` (9 more replies)
  0 siblings, 10 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe

The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
raid 4/5/6 later. The implementation is pretty straightforward and
self-explained.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 1/7] block: makes bio_split support bio without data
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
@ 2012-03-12  3:04 ` Shaohua Li
  2012-03-12  3:04 ` [patch 2/7] md: linear supports TRIM Shaohua Li
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, Shaohua Li

[-- Attachment #1: bio_split-fix.patch --]
[-- Type: text/plain, Size: 1638 bytes --]

discard bio hasn't data attached. We hit a BUG_ON with such bio. This makes
bio_split works for such bio.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 fs/bio.c |   22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

Index: linux/fs/bio.c
===================================================================
--- linux.orig/fs/bio.c	2012-03-09 16:56:41.203790008 +0800
+++ linux/fs/bio.c	2012-03-12 10:10:40.696612399 +0800
@@ -1492,7 +1492,7 @@ struct bio_pair *bio_split(struct bio *b
 	trace_block_split(bdev_get_queue(bi->bi_bdev), bi,
 				bi->bi_sector + first_sectors);
 
-	BUG_ON(bi->bi_vcnt != 1);
+	BUG_ON(bi->bi_vcnt != 1 && bi->bi_vcnt != 0);
 	BUG_ON(bi->bi_idx != 0);
 	atomic_set(&bp->cnt, 3);
 	bp->error = 0;
@@ -1502,17 +1502,19 @@ struct bio_pair *bio_split(struct bio *b
 	bp->bio2.bi_size -= first_sectors << 9;
 	bp->bio1.bi_size = first_sectors << 9;
 
-	bp->bv1 = bi->bi_io_vec[0];
-	bp->bv2 = bi->bi_io_vec[0];
-	bp->bv2.bv_offset += first_sectors << 9;
-	bp->bv2.bv_len -= first_sectors << 9;
-	bp->bv1.bv_len = first_sectors << 9;
+	if (bi->bi_vcnt != 0) {
+		bp->bv1 = bi->bi_io_vec[0];
+		bp->bv2 = bi->bi_io_vec[0];
+		bp->bv2.bv_offset += first_sectors << 9;
+		bp->bv2.bv_len -= first_sectors << 9;
+		bp->bv1.bv_len = first_sectors << 9;
 
-	bp->bio1.bi_io_vec = &bp->bv1;
-	bp->bio2.bi_io_vec = &bp->bv2;
+		bp->bio1.bi_io_vec = &bp->bv1;
+		bp->bio2.bi_io_vec = &bp->bv2;
 
-	bp->bio1.bi_max_vecs = 1;
-	bp->bio2.bi_max_vecs = 1;
+		bp->bio1.bi_max_vecs = 1;
+		bp->bio2.bi_max_vecs = 1;
+	}
 
 	bp->bio1.bi_end_io = bio_pair_end_1;
 	bp->bio2.bi_end_io = bio_pair_end_2;


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 2/7] md: linear supports TRIM
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
  2012-03-12  3:04 ` [patch 1/7] block: makes bio_split support bio without data Shaohua Li
@ 2012-03-12  3:04 ` Shaohua Li
  2012-03-12  3:04 ` [patch 3/7] md: raid 0 " Shaohua Li
                   ` (7 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, Shaohua Li

[-- Attachment #1: md-linear-discard-support.patch --]
[-- Type: text/plain, Size: 1621 bytes --]

This makes md linear support TRIM.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/linear.c |   16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Index: linux/drivers/md/linear.c
===================================================================
--- linux.orig/drivers/md/linear.c	2012-03-09 16:56:41.173790011 +0800
+++ linux/drivers/md/linear.c	2012-03-12 10:15:44.916611071 +0800
@@ -129,6 +129,7 @@ static struct linear_conf *linear_conf(s
 	struct linear_conf *conf;
 	struct md_rdev *rdev;
 	int i, cnt;
+	bool discard_supported = false;
 
 	conf = kzalloc (sizeof (*conf) + raid_disks*sizeof(struct dev_info),
 			GFP_KERNEL);
@@ -171,6 +172,8 @@ static struct linear_conf *linear_conf(s
 		conf->array_sectors += rdev->sectors;
 		cnt++;
 
+		if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+			discard_supported = true;
 	}
 	if (cnt != raid_disks) {
 		printk(KERN_ERR "md/linear:%s: not enough drives present. Aborting!\n",
@@ -178,6 +181,11 @@ static struct linear_conf *linear_conf(s
 		goto out;
 	}
 
+	if (!discard_supported)
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	/*
 	 * Here we calculate the device offsets.
 	 */
@@ -319,6 +327,14 @@ static void linear_make_request(struct m
 	bio->bi_sector = bio->bi_sector - start_sector
 		+ tmp_dev->rdev->data_offset;
 	rcu_read_unlock();
+
+	if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+		!blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
+		/* Just ignore it */
+		bio_endio(bio, 0);
+		return;
+	}
+
 	generic_make_request(bio);
 }
 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 3/7] md: raid 0 supports TRIM
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
  2012-03-12  3:04 ` [patch 1/7] block: makes bio_split support bio without data Shaohua Li
  2012-03-12  3:04 ` [patch 2/7] md: linear supports TRIM Shaohua Li
@ 2012-03-12  3:04 ` Shaohua Li
  2012-03-12  3:04 ` [patch 4/7] md: raid 1 " Shaohua Li
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, Shaohua Li

[-- Attachment #1: md-raid0-discard-support.patch --]
[-- Type: text/plain, Size: 2054 bytes --]

This makes md raid 0 support TRIM.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/raid0.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Index: linux/drivers/md/raid0.c
===================================================================
--- linux.orig/drivers/md/raid0.c	2012-03-09 16:56:41.133790013 +0800
+++ linux/drivers/md/raid0.c	2012-03-12 10:16:18.526610922 +0800
@@ -88,6 +88,7 @@ static int create_strip_zones(struct mdd
 	char b[BDEVNAME_SIZE];
 	char b2[BDEVNAME_SIZE];
 	struct r0conf *conf = kzalloc(sizeof(*conf), GFP_KERNEL);
+	bool discard_supported = false;
 
 	if (!conf)
 		return -ENOMEM;
@@ -201,6 +202,9 @@ static int create_strip_zones(struct mdd
 		if (!smallest || (rdev1->sectors < smallest->sectors))
 			smallest = rdev1;
 		cnt++;
+
+		if (blk_queue_discard(bdev_get_queue(rdev1->bdev)))
+			discard_supported = true;
 	}
 	if (cnt != mddev->raid_disks) {
 		printk(KERN_ERR "md/raid0:%s: too few disks (%d of %d) - "
@@ -278,6 +282,11 @@ static int create_strip_zones(struct mdd
 	blk_queue_io_opt(mddev->queue,
 			 (mddev->chunk_sectors << 9) * mddev->raid_disks);
 
+	if (!discard_supported)
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	pr_debug("md/raid0:%s: done.\n", mdname(mddev));
 	*private_conf = conf;
 
@@ -348,6 +357,7 @@ static int raid0_run(struct mddev *mddev
 	if (md_check_no_bitmap(mddev))
 		return -EINVAL;
 	blk_queue_max_hw_sectors(mddev->queue, mddev->chunk_sectors);
+	blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
 
 	/* if private is not null, we are here after takeover */
 	if (mddev->private == NULL) {
@@ -512,6 +522,13 @@ static void raid0_make_request(struct md
 	bio->bi_sector = sector_offset + zone->dev_start +
 		tmp_dev->data_offset;
 
+	if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+		!blk_queue_discard(bdev_get_queue(bio->bi_bdev)))) {
+		/* Just ignore it */
+		bio_endio(bio, 0);
+		return;
+	}
+
 	generic_make_request(bio);
 	return;
 


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 4/7] md: raid 1 supports TRIM
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (2 preceding siblings ...)
  2012-03-12  3:04 ` [patch 3/7] md: raid 0 " Shaohua Li
@ 2012-03-12  3:04 ` Shaohua Li
  2012-03-12  3:04 ` [patch 5/7] md: raid 10 " Shaohua Li
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, Shaohua Li

[-- Attachment #1: md-raid1-discard-support.patch --]
[-- Type: text/plain, Size: 2829 bytes --]

This makes md raid 1 support TRIM.
If one disk supports discard and another not, or one has discard_zero_data and
another not, there could be inconsistent between data from such disks. But this
should not matter, discarded data is useless. This will add extra copy in rebuild
though.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/raid1.c |   21 +++++++++++++++++++--
 1 file changed, 19 insertions(+), 2 deletions(-)

Index: linux/drivers/md/raid1.c
===================================================================
--- linux.orig/drivers/md/raid1.c	2012-03-12 10:09:42.426612652 +0800
+++ linux/drivers/md/raid1.c	2012-03-12 10:16:50.276610783 +0800
@@ -673,7 +673,12 @@ static void flush_pending_writes(struct
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			bio->bi_next = NULL;
-			generic_make_request(bio);
+			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+				/* Just ignore it */
+				bio_endio(bio, 0);
+			else
+				generic_make_request(bio);
 			bio = next;
 		}
 	} else
@@ -835,6 +840,7 @@ static void make_request(struct mddev *m
 	const int rw = bio_data_dir(bio);
 	const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
 	const unsigned long do_flush_fua = (bio->bi_rw & (REQ_FLUSH | REQ_FUA));
+	const unsigned long do_discard = (bio->bi_rw & (REQ_DISCARD | REQ_SECURE));
 	struct md_rdev *blocked_rdev;
 	int plugged;
 	int first_clone;
@@ -1135,7 +1141,7 @@ read_again:
 				   conf->mirrors[i].rdev->data_offset);
 		mbio->bi_bdev = conf->mirrors[i].rdev->bdev;
 		mbio->bi_end_io	= raid1_end_write_request;
-		mbio->bi_rw = WRITE | do_flush_fua | do_sync;
+		mbio->bi_rw = WRITE | do_flush_fua | do_sync | do_discard;
 		mbio->bi_private = r1_bio;
 
 		atomic_inc(&r1_bio->remaining);
@@ -1371,6 +1377,8 @@ static int raid1_add_disk(struct mddev *
 		}
 	}
 	md_integrity_add_rdev(rdev, mddev);
+	if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
 	print_conf(conf);
 	return err;
 }
@@ -2585,6 +2593,7 @@ static int run(struct mddev *mddev)
 	struct r1conf *conf;
 	int i;
 	struct md_rdev *rdev;
+	bool discard_supported = false;
 
 	if (mddev->level != 1) {
 		printk(KERN_ERR "md/raid1:%s: raid level not set to mirroring (%d)\n",
@@ -2623,8 +2632,16 @@ static int run(struct mddev *mddev)
 			blk_queue_segment_boundary(mddev->queue,
 						   PAGE_CACHE_SIZE - 1);
 		}
+
+		if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+			discard_supported = true;
 	}
 
+	if (discard_supported)
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	mddev->degraded = 0;
 	for (i=0; i < conf->raid_disks; i++)
 		if (conf->mirrors[i].rdev == NULL ||


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 5/7] md: raid 10 supports TRIM
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (3 preceding siblings ...)
  2012-03-12  3:04 ` [patch 4/7] md: raid 1 " Shaohua Li
@ 2012-03-12  3:04 ` Shaohua Li
  2012-03-12  3:04 ` [patch 6/7] blk: add plug for blkdev_issue_discard Shaohua Li
                   ` (4 subsequent siblings)
  9 siblings, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, Shaohua Li

[-- Attachment #1: md-raid10-discard-support.patch --]
[-- Type: text/plain, Size: 3485 bytes --]

This makes md raid 10 support TRIM.
If one disk supports discard and another not, or one has discard_zero_data and
another not, there could be inconsistent between data from such disks. But this
should not matter, discarded data is useless. This will add extra copy in rebuild
though.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 drivers/md/raid10.c |   26 +++++++++++++++++++++++---
 1 file changed, 23 insertions(+), 3 deletions(-)

Index: linux/drivers/md/raid10.c
===================================================================
--- linux.orig/drivers/md/raid10.c	2012-03-12 10:09:42.426612652 +0800
+++ linux/drivers/md/raid10.c	2012-03-12 10:19:05.046610195 +0800
@@ -800,7 +800,12 @@ static void flush_pending_writes(struct
 		while (bio) { /* submit pending writes */
 			struct bio *next = bio->bi_next;
 			bio->bi_next = NULL;
-			generic_make_request(bio);
+			if (unlikely((bio->bi_rw & REQ_DISCARD) &&
+			    !blk_queue_discard(bdev_get_queue(bio->bi_bdev))))
+				/* Just ignore it */
+				bio_endio(bio, 0);
+			else
+				generic_make_request(bio);
 			bio = next;
 		}
 	} else
@@ -926,6 +931,7 @@ static void make_request(struct mddev *m
 	const int rw = bio_data_dir(bio);
 	const unsigned long do_sync = (bio->bi_rw & REQ_SYNC);
 	const unsigned long do_fua = (bio->bi_rw & REQ_FUA);
+	const unsigned long do_discard = (bio->bi_rw & (REQ_DISCARD | REQ_SECURE));
 	unsigned long flags;
 	struct md_rdev *blocked_rdev;
 	int plugged;
@@ -1241,7 +1247,7 @@ retry_write:
 				   conf->mirrors[d].rdev->data_offset);
 		mbio->bi_bdev = conf->mirrors[d].rdev->bdev;
 		mbio->bi_end_io	= raid10_end_write_request;
-		mbio->bi_rw = WRITE | do_sync | do_fua;
+		mbio->bi_rw = WRITE | do_sync | do_fua | do_discard;
 		mbio->bi_private = r10_bio;
 
 		atomic_inc(&r10_bio->remaining);
@@ -1266,7 +1272,7 @@ retry_write:
 				   conf->mirrors[d].replacement->data_offset);
 		mbio->bi_bdev = conf->mirrors[d].replacement->bdev;
 		mbio->bi_end_io	= raid10_end_write_request;
-		mbio->bi_rw = WRITE | do_sync | do_fua;
+		mbio->bi_rw = WRITE | do_sync | do_fua | do_discard;
 		mbio->bi_private = r10_bio;
 
 		atomic_inc(&r10_bio->remaining);
@@ -1543,6 +1549,9 @@ static int raid10_add_disk(struct mddev
 	}
 
 	md_integrity_add_rdev(rdev, mddev);
+	if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	print_conf(conf);
 	return err;
 }
@@ -3214,6 +3223,7 @@ static int run(struct mddev *mddev)
 	struct mirror_info *disk;
 	struct md_rdev *rdev;
 	sector_t size;
+	bool discard_supported = false;
 
 	/*
 	 * copy the already verified devices into our private RAID10
@@ -3234,6 +3244,7 @@ static int run(struct mddev *mddev)
 	mddev->thread = conf->thread;
 	conf->thread = NULL;
 
+	blk_queue_max_discard_sectors(mddev->queue, mddev->chunk_sectors);
 	chunk_size = mddev->chunk_sectors << 9;
 	blk_queue_io_min(mddev->queue, chunk_size);
 	if (conf->raid_disks % conf->near_copies)
@@ -3273,7 +3284,16 @@ static int run(struct mddev *mddev)
 		}
 
 		disk->head_position = 0;
+
+		if (blk_queue_discard(bdev_get_queue(rdev->bdev)))
+			discard_supported = true;
 	}
+
+	if (discard_supported)
+		queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+	else
+		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, mddev->queue);
+
 	/* need to check that every block has at least one working mirror */
 	if (!enough(conf, -1)) {
 		printk(KERN_ERR "md/raid10:%s: not enough operational mirrors.\n",


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 6/7] blk: add plug for blkdev_issue_discard
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (4 preceding siblings ...)
  2012-03-12  3:04 ` [patch 5/7] md: raid 10 " Shaohua Li
@ 2012-03-12  3:04 ` Shaohua Li
  2012-03-13 15:51   ` Vivek Goyal
  2012-03-12  3:04 ` [patch 7/7] blk: use correct sectors limitation for discard request Shaohua Li
                   ` (3 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, Shaohua Li

[-- Attachment #1: blk-discard-plug.patch --]
[-- Type: text/plain, Size: 1248 bytes --]

In raid 0 case, a big discard request is divided into several small requests
in chunk_size unit. Such requests can be merged in low layer if we have
correct plug added. This should improve the performance a little bit.

raid 10 case doesn't matter, as we dispatch request in a separate thread
and there is plug there.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 block/blk-lib.c |    3 +++
 1 file changed, 3 insertions(+)

Index: linux/block/blk-lib.c
===================================================================
--- linux.orig/block/blk-lib.c	2012-03-09 16:56:41.043790011 +0800
+++ linux/block/blk-lib.c	2012-03-12 10:21:38.716609525 +0800
@@ -47,6 +47,7 @@ int blkdev_issue_discard(struct block_de
 	struct bio_batch bb;
 	struct bio *bio;
 	int ret = 0;
+	struct blk_plug plug;
 
 	if (!q)
 		return -ENXIO;
@@ -78,6 +79,7 @@ int blkdev_issue_discard(struct block_de
 	bb.flags = 1 << BIO_UPTODATE;
 	bb.wait = &wait;
 
+	blk_start_plug(&plug);
 	while (nr_sects) {
 		bio = bio_alloc(gfp_mask, 1);
 		if (!bio) {
@@ -102,6 +104,7 @@ int blkdev_issue_discard(struct block_de
 		atomic_inc(&bb.done);
 		submit_bio(type, bio);
 	}
+	blk_finish_plug(&plug);
 
 	/* Wait for bios in-flight */
 	if (!atomic_dec_and_test(&bb.done))


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [patch 7/7] blk: use correct sectors limitation for discard request
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (5 preceding siblings ...)
  2012-03-12  3:04 ` [patch 6/7] blk: add plug for blkdev_issue_discard Shaohua Li
@ 2012-03-12  3:04 ` Shaohua Li
  2012-03-13 16:00   ` Vivek Goyal
  2012-03-12  3:18 ` [patch 0/7] Add TRIM support for raid linear/0/1/10 Roberto Spadim
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 33+ messages in thread
From: Shaohua Li @ 2012-03-12  3:04 UTC (permalink / raw)
  To: linux-kernel, linux-raid; +Cc: neilb, axboe, Shaohua Li

[-- Attachment #1: blk-discard-merge-maxsector-check.patch --]
[-- Type: text/plain, Size: 2225 bytes --]

max_discard_sectors doesn't equal to max_sectors/max_hw_sectors. Without this,
discard request merge might be ignored.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 block/blk-merge.c      |    9 +++++++--
 include/linux/blkdev.h |    5 +++++
 2 files changed, 12 insertions(+), 2 deletions(-)

Index: linux/block/blk-merge.c
===================================================================
--- linux.orig/block/blk-merge.c	2012-03-09 14:05:35.562062857 +0800
+++ linux/block/blk-merge.c	2012-03-09 14:07:55.432062246 +0800
@@ -228,13 +228,16 @@ no_merge:
 int ll_back_merge_fn(struct request_queue *q, struct request *req,
 		     struct bio *bio)
 {
-	unsigned short max_sectors;
+	unsigned int max_sectors;
 
 	if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
 		max_sectors = queue_max_hw_sectors(q);
 	else
 		max_sectors = queue_max_sectors(q);
 
+	if (unlikely(req->cmd_flags & REQ_DISCARD))
+		max_sectors = queue_max_discard_sectors(q);
+
 	if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
 		if (req == q->last_merge)
@@ -252,13 +255,15 @@ int ll_back_merge_fn(struct request_queu
 int ll_front_merge_fn(struct request_queue *q, struct request *req,
 		      struct bio *bio)
 {
-	unsigned short max_sectors;
+	unsigned int max_sectors;
 
 	if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
 		max_sectors = queue_max_hw_sectors(q);
 	else
 		max_sectors = queue_max_sectors(q);
 
+	if (unlikely(req->cmd_flags & REQ_DISCARD))
+		max_sectors = queue_max_discard_sectors(q);
 
 	if (blk_rq_sectors(req) + bio_sectors(bio) > max_sectors) {
 		req->cmd_flags |= REQ_NOMERGE;
Index: linux/include/linux/blkdev.h
===================================================================
--- linux.orig/include/linux/blkdev.h	2012-03-09 14:05:35.562062857 +0800
+++ linux/include/linux/blkdev.h	2012-03-09 14:07:55.432062246 +0800
@@ -1006,6 +1006,11 @@ static inline unsigned int queue_max_hw_
 	return q->limits.max_hw_sectors;
 }
 
+static inline unsigned int queue_max_discard_sectors(struct request_queue *q)
+{
+	return q->limits.max_discard_sectors;
+}
+
 static inline unsigned short queue_max_segments(struct request_queue *q)
 {
 	return q->limits.max_segments;


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (6 preceding siblings ...)
  2012-03-12  3:04 ` [patch 7/7] blk: use correct sectors limitation for discard request Shaohua Li
@ 2012-03-12  3:18 ` Roberto Spadim
  2012-03-12 18:22 ` Holger Kiehl
  2012-03-14  2:24 ` NeilBrown
  9 siblings, 0 replies; 33+ messages in thread
From: Roberto Spadim @ 2012-03-12  3:18 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe

nice!

Em 12 de março de 2012 00:04, Shaohua Li <shli@fusionio.com> escreveu:
> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
> raid 4/5/6 later. The implementation is pretty straightforward and
> self-explained.
>
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (7 preceding siblings ...)
  2012-03-12  3:18 ` [patch 0/7] Add TRIM support for raid linear/0/1/10 Roberto Spadim
@ 2012-03-12 18:22 ` Holger Kiehl
       [not found]   ` <4F5EFEB6.4060402@kernel.org>
       [not found]   ` <4F5EA8E9.5010502@fusionio.com>
  2012-03-14  2:24 ` NeilBrown
  9 siblings, 2 replies; 33+ messages in thread
From: Holger Kiehl @ 2012-03-12 18:22 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe

Hello,

On Mon, 12 Mar 2012, Shaohua Li wrote:

> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
> raid 4/5/6 later. The implementation is pretty straightforward and
> self-explained.
>
First, thanks for this patch!

I have applied those patches against 3.3.0-rc7 and during boot the kernel
reports a lot of the following:

    Mar 12 18:56:00 c3po kernel: [    7.611045] md/raid0:md3: make_request bug: can't convert block across chunks or bigger than 512k 18861064 512
    Mar 12 18:56:00 c3po kernel: [    7.611047] md/raid0:md3: make_request bug: can't convert block across chunks or bigger than 512k 18862088 512
    Mar 12 18:56:00 c3po kernel: [    7.611049] md/raid0:md3: make_request bug: can't convert block across chunks or bigger than 512k 18863112 512
    Mar 12 18:56:00 c3po kernel: [    7.611052] md/raid0:md3: make_request bug: can't convert block across chunks or bigger than 512k 18864136 512
    Mar 12 18:56:00 c3po kernel: [    7.611054] md/raid0:md3: make_request bug: can't convert block across chunks or bigger than 512k 18865160 512
    Mar 12 18:56:00 c3po kernel: [    7.611056] md/raid0:md3: make_request bug: can't convert block across chunks or bigger than 512k 18866184 512

The raid looks as follows:

   cat /proc/mdstat
   Personalities : [raid0] [raid1]
   md2 : active raid0 sdc3[2] sdb3[1] sda3[0]
         50328576 blocks super 1.1 512k chunks

   md0 : active raid1 sdc1[1] sdb1[2]
         245748 blocks super 1.0 [2/2] [UU]

   md3 : active raid0 sdc5[2] sda5[0] sdb5[1]
         9434112 blocks super 1.1 512k chunks

   md1 : active raid0 sdc2[2] sda2[0] sdb2[1]
         22017024 blocks super 1.1 512k chunks

   unused devices: <none>

/dev/md3 is the swap partition.

I also get these reports from /dev/md2 which is my /home partition:

    Mar 12 19:06:42 c3po kernel: [  658.419035] md/raid0:md2: make_request bug: can't convert block across chunks or bigger than 512k 152088 512
    Mar 12 19:06:42 c3po kernel: [  658.419042] md/raid0:md2: make_request bug: can't convert block across chunks or bigger than 512k 153112 512
    Mar 12 19:06:42 c3po kernel: [  658.451489] md/raid0:md2: make_request bug: can't convert block across chunks or bigger than 512k 3206944 512
    Mar 12 19:06:42 c3po kernel: [  658.451494] md/raid0:md2: make_request bug: can't convert block across chunks or bigger than 512k 3207968 512
    Mar 12 19:06:42 c3po kernel: [  658.451499] md/raid0:md2: make_request bug: can't convert block across chunks or bigger than 512k 3208992 164

I then did a 'make clean' in my kernel tree which is on /dev/md1. After a
sync, which took very long the following errors appear:

    Mar 12 19:12:34 c3po kernel: [ 1010.609388] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9986936 512
    Mar 12 19:12:34 c3po kernel: [ 1010.609393] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9987960 512
    Mar 12 19:12:34 c3po kernel: [ 1010.609396] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9988984 512
    Mar 12 19:12:34 c3po kernel: [ 1010.670542] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 11535480 512
    Mar 12 19:12:34 c3po kernel: [ 1010.787087] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 8998632 160
    Mar 12 19:12:34 c3po kernel: [ 1010.799357] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9037136 476
    Mar 12 19:12:34 c3po kernel: [ 1010.807195] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 8999792 512
    Mar 12 19:12:34 c3po kernel: [ 1010.807201] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9000816 108
    Mar 12 19:12:34 c3po kernel: [ 1010.899348] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 8996824 28
    Mar 12 19:12:34 c3po kernel: [ 1010.905625] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9977808 512
    Mar 12 19:12:34 c3po kernel: [ 1010.905631] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9978832 512
    Mar 12 19:12:34 c3po kernel: [ 1010.905636] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9979856 512
    Mar 12 19:12:34 c3po kernel: [ 1010.905641] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9980880 44
    Mar 12 19:12:34 c3po kernel: [ 1011.145590] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 8940504 72
       .
       .
       .
    Mar 12 19:12:37 c3po kernel: [ 1013.781899] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9036704 208
    Mar 12 19:12:37 c3po kernel: [ 1013.781910] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 9040824 76
    Mar 12 19:12:37 c3po kernel: [ 1013.901750] md/raid0:md1: make_request bug: can't convert block across chunks or bigger than 512k 8841152 64
    Mar 12 19:12:37 c3po kernel: [ 1014.015810] request botched: dev sda: type=1, flags=9164081
    Mar 12 19:12:37 c3po kernel: [ 1014.015814]   sector 2615297, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.015818]   bio ffff880193fd4140, biotail ffff880193fd45c0, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.017423] request botched: dev sdc: type=1, flags=9164081
    Mar 12 19:12:37 c3po kernel: [ 1014.017426]   sector 2614273, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.017430]   bio ffff880193fd4080, biotail ffff880193fd4500, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.019569] request botched: dev sdb: type=1, flags=9164081
    Mar 12 19:12:37 c3po kernel: [ 1014.019572]   sector 2614273, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.019576]   bio ffff88016e1fb2c0, biotail ffff880193fd4680, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.025916] request botched: dev sda: type=1, flags=916c081
    Mar 12 19:12:37 c3po kernel: [ 1014.025920]   sector 2615298, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.025923]   bio ffff880193fd4380, biotail ffff880193fd45c0, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.027496] request botched: dev sdc: type=1, flags=916c081
    Mar 12 19:12:37 c3po kernel: [ 1014.027499]   sector 2614274, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.027503]   bio ffff880193fd42c0, biotail ffff880193fd4500, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.034183] request botched: dev sdb: type=1, flags=916c081
    Mar 12 19:12:37 c3po kernel: [ 1014.034187]   sector 2614274, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.034190]   bio ffff880193fd4200, biotail ffff880193fd4680, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.037478] request botched: dev sdc: type=1, flags=916c081
    Mar 12 19:12:37 c3po kernel: [ 1014.037482]   sector 2614275, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.037485]   bio ffff880193fd4500, biotail ffff880193fd4500, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.039616] request botched: dev sdb: type=1, flags=916c081
    Mar 12 19:12:37 c3po kernel: [ 1014.039620]   sector 2614275, nr/cnr 0/1024
    Mar 12 19:12:37 c3po kernel: [ 1014.039623]   bio ffff880193fd4440, biotail ffff880193fd4680, buffer           (null), len 0
    Mar 12 19:12:37 c3po kernel: [ 1014.040477] request botched: dev sda: type=1, flags=916c081
    Mar 12 19:12:37 c3po kernel: [ 1014.040481]   sector 2615299, nr/cnr 0/1024
       .
       .
       .

The list goes on for very long. Here the mount options:

    /dev/md1 / ext4 rw,noatime,user_xattr,commit=600,barrier=1,journal_async_commit,stripe=384,data=ordered,discard 0 0
    /dev/md0 /boot ext4 rw,noatime,user_xattr,commit=2400,barrier=1,journal_async_commit,data=ordered,discard 0 0
    /dev/md2 /home ext4 rw,noatime,user_xattr,acl,commit=600,barrier=1,journal_async_commit,stripe=384,data=ordered,discard 0 0

The disk in use are the following:

    Mar 12 09:03:57 c3po kernel: [    1.206716] ata2.00: ATA-8: OCZ-VERTEX2, 1.35, max UDMA/133
    Mar 12 09:03:57 c3po kernel: [    1.208374] ata2.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
    Mar 12 09:03:57 c3po kernel: [    1.209939] ata1.00: ATA-8: OCZ-VERTEX2, 1.35, max UDMA/133
    Mar 12 09:03:57 c3po kernel: [    1.211507] ata1.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA
    Mar 12 09:03:57 c3po kernel: [    1.216427] ata3.00: ATA-8: OCZ-VERTEX2, 1.35, max UDMA/133
    Mar 12 09:03:57 c3po kernel: [    1.218064] ata3.00: 234441648 sectors, multi 16: LBA48 NCQ (depth 31/32), AA


Any idea what is wrong? Please let me know if I can do any further tests
or supply more information.

Regards,
Holger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
       [not found]   ` <4F5EFEB6.4060402@kernel.org>
@ 2012-03-13 12:22     ` Holger Kiehl
  2012-03-13 14:15       ` Shaohua Li
  0 siblings, 1 reply; 33+ messages in thread
From: Holger Kiehl @ 2012-03-13 12:22 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Holger Kiehl, linux-kernel, linux-raid, neilb, axboe

On Tue, 13 Mar 2012, Shaohua Li wrote:

> On 3/13/12 2:22 AM, Holger Kiehl wrote:
> > Hello,
> >
> > On Mon, 12 Mar 2012, Shaohua Li wrote:
> >
> >> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
> support for
> >> raid 4/5/6 later. The implementation is pretty straightforward and
> >> self-explained.
> >>
> > First, thanks for this patch!
> >
> > I have applied those patches against 3.3.0-rc7 and during boot the kernel
> > reports a lot of the following:
> >
> > Mar 12 18:56:00 c3po kernel: [ 7.611045] md/raid0:md3: make_request bug:
> can't convert block across chunks or bigger than 512k 18861064 512
> > Mar 12 18:56:00 c3po kernel: [ 7.611047] md/raid0:md3: make_request bug:
> can't convert block across chunks or bigger than 512k 18862088 512
> > Mar 12 18:56:00 c3po kernel: [ 7.611049] md/raid0:md3: make_request bug:
> can't convert block across chunks or bigger than 512k 18863112 512
> > Mar 12 18:56:00 c3po kernel: [ 7.611052] md/raid0:md3: make_request bug:
> can't convert block across chunks or bigger than 512k 18864136 512
> > Mar 12 18:56:00 c3po kernel: [ 7.611054] md/raid0:md3: make_request bug:
> can't convert block across chunks or bigger than 512k 18865160 512
> > Mar 12 18:56:00 c3po kernel: [ 7.611056] md/raid0:md3: make_request bug:
> can't convert block across chunks or bigger than 512k 18866184 512
> Looks our SMTP server does something stupid. Sorry if you get two copies
> of the mail.
> 
> Thanks for testing. Looks I fixed a sanity check in bio.c but there are
> similar check in raid0/10 which I forgot to fix. Below patch should fix it.
> please try.
> 
Now I get following messages during boot (and it takes a very long time):

    Mar 13 10:23:25 c3po kernel: [  251.355041]   bio ffff88019e0abc70, biotail ffff88019e0ddc00, buffer           (null), len 0
    Mar 13 10:23:25 c3po kernel: [  251.355052] request botched: dev sdb: type=1, flags=916c081
    Mar 13 10:23:25 c3po kernel: [  251.355054]   sector 52929353, nr/cnr 0/8
    Mar 13 10:23:25 c3po kernel: [  251.355055]   bio ffff88019e0aba70, biotail ffff88019e0dda00, buffer           (null), len 0
    Mar 13 10:23:25 c3po kernel: [  251.355068] request botched: dev sdc: type=1, flags=916c081
    Mar 13 10:23:25 c3po kernel: [  251.355069]   sector 52929346, nr/cnr 0/1016
    Mar 13 10:23:25 c3po kernel: [  251.355071]   bio ffff88019e0aae00, biotail ffff88019e0db380, buffer           (null), len 0
    Mar 13 10:23:25 c3po kernel: [  251.373583] request botched: dev sda: type=1, flags=916c081
    Mar 13 10:23:25 c3po kernel: [  251.373585]   sector 52929354, nr/cnr 0/1016
    Mar 13 10:23:25 c3po kernel: [  251.373587]   bio ffff88019e0aba00, biotail ffff88019e0ddc00, buffer           (null), len 0
    Mar 13 10:23:25 c3po kernel: [  251.373597] request botched: dev sdb: type=1, flags=916c081
    Mar 13 10:23:25 c3po kernel: [  251.373599]   sector 52929354, nr/cnr 0/1016
    Mar 13 10:23:25 c3po kernel: [  251.373600]   bio ffff88019e0ab800, biotail ffff88019e0dda00, buffer           (null), len 0
    Mar 13 10:23:25 c3po kernel: [  251.373612] request botched: dev sdc: type=1, flags=916c081
    Mar 13 10:23:25 c3po kernel: [  251.373614]   sector 52929347, nr/cnr 0/8
    Mar 13 10:23:25 c3po kernel: [  251.373616]   bio ffff88019e0aaa70, biotail ffff88019e0db380, buffer           (null), len 0
    Mar 13 10:23:25 c3po kernel: [  251.392135] request botched: dev sda: type=1, flags=916c081
    Mar 13 10:23:25 c3po kernel: [  251.392137]   sector 52929355, nr/cnr 0/8
    Mar 13 10:23:25 c3po kernel: [  251.392139]   bio ffff88019e0ab670, biotail ffff88019e0ddc00, buffer           (null), len 0
    Mar 13 10:23:25 c3po kernel: [  251.392150] request botched: dev sdb: type=1, flags=916c081
    Mar 13 10:23:25 c3po kernel: [  251.392152]   sector 52929355, nr/cnr 0/8
    Mar 13 10:23:25 c3po kernel: [  251.392153]   bio ffff88019e0ab470, biotail ffff88019e0dda00, buffer           (null), len 0

After boot the system runs fine, but as soon as I do something (make clean of
kernel tree with a sync) I get the same messages as above and it takes a
long time to sync:

    Mar 13 10:44:59 c3po kernel: [ 1550.740528] request botched: dev sda: type=1, flags=9164081
    Mar 13 10:44:59 c3po kernel: [ 1550.740533]   sector 12580617, nr/cnr 0/776
    Mar 13 10:44:59 c3po kernel: [ 1550.740537]   bio ffff8801a362b670, biotail ffff8801a30f3d80, buffer           (null), len 0
    Mar 13 10:44:59 c3po kernel: [ 1550.747141] request botched: dev sdb: type=1, flags=9164081
    Mar 13 10:44:59 c3po kernel: [ 1550.747144]   sector 12579841, nr/cnr 0/248
    Mar 13 10:44:59 c3po kernel: [ 1550.747148]   bio ffff88019e0dd200, biotail ffff88019e0dd200, buffer           (null), len 0
    Mar 13 10:44:59 c3po kernel: [ 1550.749429] request botched: dev sdc: type=1, flags=9164081
    Mar 13 10:44:59 c3po kernel: [ 1550.749432]   sector 12579841, nr/cnr 0/248
    Mar 13 10:44:59 c3po kernel: [ 1550.749436]   bio ffff8801a362b600, biotail ffff8801a362b600, buffer           (null), len 0
    Mar 13 10:44:59 c3po kernel: [ 1550.755332] request botched: dev sda: type=1, flags=916c081
    Mar 13 10:44:59 c3po kernel: [ 1550.755335]   sector 12580618, nr/cnr 0/248
    Mar 13 10:44:59 c3po kernel: [ 1550.755339]   bio ffff8801a30f3d80, biotail ffff8801a30f3d80, buffer           (null), len 0
    Mar 13 10:44:59 c3po kernel: [ 1550.806832] request botched: dev sdb: type=1, flags=9164081
    Mar 13 10:44:59 c3po kernel: [ 1550.806836]   sector 12266497, nr/cnr 0/1000
    Mar 13 10:44:59 c3po kernel: [ 1550.806840]   bio ffff8801a4dca800, biotail ffff88019e0c9000, buffer           (null), len 0
    Mar 13 10:44:59 c3po kernel: [ 1550.811972] request botched: dev sda: type=1, flags=9164081
    Mar 13 10:44:59 c3po kernel: [ 1550.811976]   sector 12266497, nr/cnr 0/1000
    Mar 13 10:44:59 c3po kernel: [ 1550.811979]   bio ffff88019e0dd200, biotail ffff8801a3794000, buffer           (null), len 0
    Mar 13 10:44:59 c3po kernel: [ 1550.814081] request botched: dev sdc: type=1, flags=9164081
    Mar 13 10:44:59 c3po kernel: [ 1550.814084]   sector 12265497, nr/cnr 0/24
    Mar 13 10:44:59 c3po kernel: [ 1550.814087]   bio ffff8801a4dca870, biotail ffff8801a37f4080, buffer           (null), len 0
    Mar 13 10:44:59 c3po kernel: [ 1550.819150] request botched: dev sdc: type=1, flags=916c081
    Mar 13 10:44:59 c3po kernel: [ 1550.819153]   sector 12265498, nr/cnr 0/1000

Please give me any hints what I can try next.

Regards,
Holger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-13 12:22     ` Holger Kiehl
@ 2012-03-13 14:15       ` Shaohua Li
  2012-03-13 14:58         ` Roberto Spadim
  2012-03-13 15:44         ` Holger Kiehl
  0 siblings, 2 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-13 14:15 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel, linux-raid, neilb, axboe

2012/3/13 Holger Kiehl <Holger.Kiehl@dwd.de>:
> On Tue, 13 Mar 2012, Shaohua Li wrote:
>
>> On 3/13/12 2:22 AM, Holger Kiehl wrote:
>> > Hello,
>> >
>> > On Mon, 12 Mar 2012, Shaohua Li wrote:
>> >
>> >> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
>> support for
>> >> raid 4/5/6 later. The implementation is pretty straightforward and
>> >> self-explained.
>> >>
>> > First, thanks for this patch!
>> >
>> > I have applied those patches against 3.3.0-rc7 and during boot the
>> > kernel
>> > reports a lot of the following:
>> >
>> > Mar 12 18:56:00 c3po kernel: [ 7.611045] md/raid0:md3: make_request bug:
>> can't convert block across chunks or bigger than 512k 18861064 512
>> > Mar 12 18:56:00 c3po kernel: [ 7.611047] md/raid0:md3: make_request bug:
>> can't convert block across chunks or bigger than 512k 18862088 512
>> > Mar 12 18:56:00 c3po kernel: [ 7.611049] md/raid0:md3: make_request bug:
>> can't convert block across chunks or bigger than 512k 18863112 512
>> > Mar 12 18:56:00 c3po kernel: [ 7.611052] md/raid0:md3: make_request bug:
>> can't convert block across chunks or bigger than 512k 18864136 512
>> > Mar 12 18:56:00 c3po kernel: [ 7.611054] md/raid0:md3: make_request bug:
>> can't convert block across chunks or bigger than 512k 18865160 512
>> > Mar 12 18:56:00 c3po kernel: [ 7.611056] md/raid0:md3: make_request bug:
>> can't convert block across chunks or bigger than 512k 18866184 512
>> Looks our SMTP server does something stupid. Sorry if you get two copies
>> of the mail.
>>
>> Thanks for testing. Looks I fixed a sanity check in bio.c but there are
>> similar check in raid0/10 which I forgot to fix. Below patch should fix
>> it.
>> please try.
>>
> Now I get following messages during boot (and it takes a very long time):
>
>   Mar 13 10:23:25 c3po kernel: [  251.355041]   bio ffff88019e0abc70,
> biotail ffff88019e0ddc00, buffer           (null), len 0
>   Mar 13 10:23:25 c3po kernel: [  251.355052] request botched: dev sdb:
> type=1, flags=916c081
>   Mar 13 10:23:25 c3po kernel: [  251.355054]   sector 52929353, nr/cnr 0/8
>   Mar 13 10:23:25 c3po kernel: [  251.355055]   bio ffff88019e0aba70,
> biotail ffff88019e0dda00, buffer           (null), len 0
>   Mar 13 10:23:25 c3po kernel: [  251.355068] request botched: dev sdc:
> type=1, flags=916c081
>   Mar 13 10:23:25 c3po kernel: [  251.355069]   sector 52929346, nr/cnr
> 0/1016
>   Mar 13 10:23:25 c3po kernel: [  251.355071]   bio ffff88019e0aae00,
> biotail ffff88019e0db380, buffer           (null), len 0
>   Mar 13 10:23:25 c3po kernel: [  251.373583] request botched: dev sda:
> type=1, flags=916c081
>   Mar 13 10:23:25 c3po kernel: [  251.373585]   sector 52929354, nr/cnr
> 0/1016
>   Mar 13 10:23:25 c3po kernel: [  251.373587]   bio ffff88019e0aba00,
> biotail ffff88019e0ddc00, buffer           (null), len 0
>   Mar 13 10:23:25 c3po kernel: [  251.373597] request botched: dev sdb:
> type=1, flags=916c081
>   Mar 13 10:23:25 c3po kernel: [  251.373599]   sector 52929354, nr/cnr
> 0/1016
>   Mar 13 10:23:25 c3po kernel: [  251.373600]   bio ffff88019e0ab800,
> biotail ffff88019e0dda00, buffer           (null), len 0
>   Mar 13 10:23:25 c3po kernel: [  251.373612] request botched: dev sdc:
> type=1, flags=916c081
>   Mar 13 10:23:25 c3po kernel: [  251.373614]   sector 52929347, nr/cnr 0/8
>   Mar 13 10:23:25 c3po kernel: [  251.373616]   bio ffff88019e0aaa70,
> biotail ffff88019e0db380, buffer           (null), len 0
>   Mar 13 10:23:25 c3po kernel: [  251.392135] request botched: dev sda:
> type=1, flags=916c081
>   Mar 13 10:23:25 c3po kernel: [  251.392137]   sector 52929355, nr/cnr 0/8
>   Mar 13 10:23:25 c3po kernel: [  251.392139]   bio ffff88019e0ab670,
> biotail ffff88019e0ddc00, buffer           (null), len 0
>   Mar 13 10:23:25 c3po kernel: [  251.392150] request botched: dev sdb:
> type=1, flags=916c081
>   Mar 13 10:23:25 c3po kernel: [  251.392152]   sector 52929355, nr/cnr 0/8
>   Mar 13 10:23:25 c3po kernel: [  251.392153]   bio ffff88019e0ab470,
> biotail ffff88019e0dda00, buffer           (null), len 0
>
> After boot the system runs fine, but as soon as I do something (make clean
> of
> kernel tree with a sync) I get the same messages as above and it takes a
> long time to sync:
>
>   Mar 13 10:44:59 c3po kernel: [ 1550.740528] request botched: dev sda:
> type=1, flags=9164081
>   Mar 13 10:44:59 c3po kernel: [ 1550.740533]   sector 12580617, nr/cnr
> 0/776
>   Mar 13 10:44:59 c3po kernel: [ 1550.740537]   bio ffff8801a362b670,
> biotail ffff8801a30f3d80, buffer           (null), len 0
>   Mar 13 10:44:59 c3po kernel: [ 1550.747141] request botched: dev sdb:
> type=1, flags=9164081
>   Mar 13 10:44:59 c3po kernel: [ 1550.747144]   sector 12579841, nr/cnr
> 0/248
>   Mar 13 10:44:59 c3po kernel: [ 1550.747148]   bio ffff88019e0dd200,
> biotail ffff88019e0dd200, buffer           (null), len 0
>   Mar 13 10:44:59 c3po kernel: [ 1550.749429] request botched: dev sdc:
> type=1, flags=9164081
>   Mar 13 10:44:59 c3po kernel: [ 1550.749432]   sector 12579841, nr/cnr
> 0/248
>   Mar 13 10:44:59 c3po kernel: [ 1550.749436]   bio ffff8801a362b600,
> biotail ffff8801a362b600, buffer           (null), len 0
>   Mar 13 10:44:59 c3po kernel: [ 1550.755332] request botched: dev sda:
> type=1, flags=916c081
>   Mar 13 10:44:59 c3po kernel: [ 1550.755335]   sector 12580618, nr/cnr
> 0/248
>   Mar 13 10:44:59 c3po kernel: [ 1550.755339]   bio ffff8801a30f3d80,
> biotail ffff8801a30f3d80, buffer           (null), len 0
>   Mar 13 10:44:59 c3po kernel: [ 1550.806832] request botched: dev sdb:
> type=1, flags=9164081
>   Mar 13 10:44:59 c3po kernel: [ 1550.806836]   sector 12266497, nr/cnr
> 0/1000
>   Mar 13 10:44:59 c3po kernel: [ 1550.806840]   bio ffff8801a4dca800,
> biotail ffff88019e0c9000, buffer           (null), len 0
>   Mar 13 10:44:59 c3po kernel: [ 1550.811972] request botched: dev sda:
> type=1, flags=9164081
>   Mar 13 10:44:59 c3po kernel: [ 1550.811976]   sector 12266497, nr/cnr
> 0/1000
>   Mar 13 10:44:59 c3po kernel: [ 1550.811979]   bio ffff88019e0dd200,
> biotail ffff8801a3794000, buffer           (null), len 0
>   Mar 13 10:44:59 c3po kernel: [ 1550.814081] request botched: dev sdc:
> type=1, flags=9164081
>   Mar 13 10:44:59 c3po kernel: [ 1550.814084]   sector 12265497, nr/cnr 0/24
>   Mar 13 10:44:59 c3po kernel: [ 1550.814087]   bio ffff8801a4dca870,
> biotail ffff8801a37f4080, buffer           (null), len 0
>   Mar 13 10:44:59 c3po kernel: [ 1550.819150] request botched: dev sdc:
> type=1, flags=916c081
>   Mar 13 10:44:59 c3po kernel: [ 1550.819153]   sector 12265498, nr/cnr
> 0/1000
>
> Please give me any hints what I can try next.
Thanks for testing. This is very wield, the req->__data_len is wrong.
Is this a clean build?
didn't success to reproduce it, will check tomorrow again.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-13 14:15       ` Shaohua Li
@ 2012-03-13 14:58         ` Roberto Spadim
  2012-03-13 15:44         ` Holger Kiehl
  1 sibling, 0 replies; 33+ messages in thread
From: Roberto Spadim @ 2012-03-13 14:58 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Holger Kiehl, linux-kernel, linux-raid, neilb, axboe

could you send informations about yours devices? maybe a devices with
a diferent block sizes? are you running a virtual machine? a real one?

Em 13 de março de 2012 11:15, Shaohua Li <shli@kernel.org> escreveu:
> 2012/3/13 Holger Kiehl <Holger.Kiehl@dwd.de>:
>> On Tue, 13 Mar 2012, Shaohua Li wrote:
>>
>>> On 3/13/12 2:22 AM, Holger Kiehl wrote:
>>> > Hello,
>>> >
>>> > On Mon, 12 Mar 2012, Shaohua Li wrote:
>>> >
>>> >> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM
>>> support for
>>> >> raid 4/5/6 later. The implementation is pretty straightforward and
>>> >> self-explained.
>>> >>
>>> > First, thanks for this patch!
>>> >
>>> > I have applied those patches against 3.3.0-rc7 and during boot the
>>> > kernel
>>> > reports a lot of the following:
>>> >
>>> > Mar 12 18:56:00 c3po kernel: [ 7.611045] md/raid0:md3: make_request bug:
>>> can't convert block across chunks or bigger than 512k 18861064 512
>>> > Mar 12 18:56:00 c3po kernel: [ 7.611047] md/raid0:md3: make_request bug:
>>> can't convert block across chunks or bigger than 512k 18862088 512
>>> > Mar 12 18:56:00 c3po kernel: [ 7.611049] md/raid0:md3: make_request bug:
>>> can't convert block across chunks or bigger than 512k 18863112 512
>>> > Mar 12 18:56:00 c3po kernel: [ 7.611052] md/raid0:md3: make_request bug:
>>> can't convert block across chunks or bigger than 512k 18864136 512
>>> > Mar 12 18:56:00 c3po kernel: [ 7.611054] md/raid0:md3: make_request bug:
>>> can't convert block across chunks or bigger than 512k 18865160 512
>>> > Mar 12 18:56:00 c3po kernel: [ 7.611056] md/raid0:md3: make_request bug:
>>> can't convert block across chunks or bigger than 512k 18866184 512
>>> Looks our SMTP server does something stupid. Sorry if you get two copies
>>> of the mail.
>>>
>>> Thanks for testing. Looks I fixed a sanity check in bio.c but there are
>>> similar check in raid0/10 which I forgot to fix. Below patch should fix
>>> it.
>>> please try.
>>>
>> Now I get following messages during boot (and it takes a very long time):
>>
>>   Mar 13 10:23:25 c3po kernel: [  251.355041]   bio ffff88019e0abc70,
>> biotail ffff88019e0ddc00, buffer           (null), len 0
>>   Mar 13 10:23:25 c3po kernel: [  251.355052] request botched: dev sdb:
>> type=1, flags=916c081
>>   Mar 13 10:23:25 c3po kernel: [  251.355054]   sector 52929353, nr/cnr 0/8
>>   Mar 13 10:23:25 c3po kernel: [  251.355055]   bio ffff88019e0aba70,
>> biotail ffff88019e0dda00, buffer           (null), len 0
>>   Mar 13 10:23:25 c3po kernel: [  251.355068] request botched: dev sdc:
>> type=1, flags=916c081
>>   Mar 13 10:23:25 c3po kernel: [  251.355069]   sector 52929346, nr/cnr
>> 0/1016
>>   Mar 13 10:23:25 c3po kernel: [  251.355071]   bio ffff88019e0aae00,
>> biotail ffff88019e0db380, buffer           (null), len 0
>>   Mar 13 10:23:25 c3po kernel: [  251.373583] request botched: dev sda:
>> type=1, flags=916c081
>>   Mar 13 10:23:25 c3po kernel: [  251.373585]   sector 52929354, nr/cnr
>> 0/1016
>>   Mar 13 10:23:25 c3po kernel: [  251.373587]   bio ffff88019e0aba00,
>> biotail ffff88019e0ddc00, buffer           (null), len 0
>>   Mar 13 10:23:25 c3po kernel: [  251.373597] request botched: dev sdb:
>> type=1, flags=916c081
>>   Mar 13 10:23:25 c3po kernel: [  251.373599]   sector 52929354, nr/cnr
>> 0/1016
>>   Mar 13 10:23:25 c3po kernel: [  251.373600]   bio ffff88019e0ab800,
>> biotail ffff88019e0dda00, buffer           (null), len 0
>>   Mar 13 10:23:25 c3po kernel: [  251.373612] request botched: dev sdc:
>> type=1, flags=916c081
>>   Mar 13 10:23:25 c3po kernel: [  251.373614]   sector 52929347, nr/cnr 0/8
>>   Mar 13 10:23:25 c3po kernel: [  251.373616]   bio ffff88019e0aaa70,
>> biotail ffff88019e0db380, buffer           (null), len 0
>>   Mar 13 10:23:25 c3po kernel: [  251.392135] request botched: dev sda:
>> type=1, flags=916c081
>>   Mar 13 10:23:25 c3po kernel: [  251.392137]   sector 52929355, nr/cnr 0/8
>>   Mar 13 10:23:25 c3po kernel: [  251.392139]   bio ffff88019e0ab670,
>> biotail ffff88019e0ddc00, buffer           (null), len 0
>>   Mar 13 10:23:25 c3po kernel: [  251.392150] request botched: dev sdb:
>> type=1, flags=916c081
>>   Mar 13 10:23:25 c3po kernel: [  251.392152]   sector 52929355, nr/cnr 0/8
>>   Mar 13 10:23:25 c3po kernel: [  251.392153]   bio ffff88019e0ab470,
>> biotail ffff88019e0dda00, buffer           (null), len 0
>>
>> After boot the system runs fine, but as soon as I do something (make clean
>> of
>> kernel tree with a sync) I get the same messages as above and it takes a
>> long time to sync:
>>
>>   Mar 13 10:44:59 c3po kernel: [ 1550.740528] request botched: dev sda:
>> type=1, flags=9164081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.740533]   sector 12580617, nr/cnr
>> 0/776
>>   Mar 13 10:44:59 c3po kernel: [ 1550.740537]   bio ffff8801a362b670,
>> biotail ffff8801a30f3d80, buffer           (null), len 0
>>   Mar 13 10:44:59 c3po kernel: [ 1550.747141] request botched: dev sdb:
>> type=1, flags=9164081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.747144]   sector 12579841, nr/cnr
>> 0/248
>>   Mar 13 10:44:59 c3po kernel: [ 1550.747148]   bio ffff88019e0dd200,
>> biotail ffff88019e0dd200, buffer           (null), len 0
>>   Mar 13 10:44:59 c3po kernel: [ 1550.749429] request botched: dev sdc:
>> type=1, flags=9164081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.749432]   sector 12579841, nr/cnr
>> 0/248
>>   Mar 13 10:44:59 c3po kernel: [ 1550.749436]   bio ffff8801a362b600,
>> biotail ffff8801a362b600, buffer           (null), len 0
>>   Mar 13 10:44:59 c3po kernel: [ 1550.755332] request botched: dev sda:
>> type=1, flags=916c081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.755335]   sector 12580618, nr/cnr
>> 0/248
>>   Mar 13 10:44:59 c3po kernel: [ 1550.755339]   bio ffff8801a30f3d80,
>> biotail ffff8801a30f3d80, buffer           (null), len 0
>>   Mar 13 10:44:59 c3po kernel: [ 1550.806832] request botched: dev sdb:
>> type=1, flags=9164081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.806836]   sector 12266497, nr/cnr
>> 0/1000
>>   Mar 13 10:44:59 c3po kernel: [ 1550.806840]   bio ffff8801a4dca800,
>> biotail ffff88019e0c9000, buffer           (null), len 0
>>   Mar 13 10:44:59 c3po kernel: [ 1550.811972] request botched: dev sda:
>> type=1, flags=9164081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.811976]   sector 12266497, nr/cnr
>> 0/1000
>>   Mar 13 10:44:59 c3po kernel: [ 1550.811979]   bio ffff88019e0dd200,
>> biotail ffff8801a3794000, buffer           (null), len 0
>>   Mar 13 10:44:59 c3po kernel: [ 1550.814081] request botched: dev sdc:
>> type=1, flags=9164081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.814084]   sector 12265497, nr/cnr 0/24
>>   Mar 13 10:44:59 c3po kernel: [ 1550.814087]   bio ffff8801a4dca870,
>> biotail ffff8801a37f4080, buffer           (null), len 0
>>   Mar 13 10:44:59 c3po kernel: [ 1550.819150] request botched: dev sdc:
>> type=1, flags=916c081
>>   Mar 13 10:44:59 c3po kernel: [ 1550.819153]   sector 12265498, nr/cnr
>> 0/1000
>>
>> Please give me any hints what I can try next.
> Thanks for testing. This is very wield, the req->__data_len is wrong.
> Is this a clean build?
> didn't success to reproduce it, will check tomorrow again.
>
> Thanks,
> Shaohua
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Roberto Spadim
Spadim Technology / SPAEmpresarial

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-13 14:15       ` Shaohua Li
  2012-03-13 14:58         ` Roberto Spadim
@ 2012-03-13 15:44         ` Holger Kiehl
  2012-03-14  1:30           ` Shaohua Li
  1 sibling, 1 reply; 33+ messages in thread
From: Holger Kiehl @ 2012-03-13 15:44 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe

On Tue, 13 Mar 2012, Shaohua Li wrote:

> Thanks for testing. This is very wield, the req->__data_len is wrong.
> Is this a clean build?
>
I just downloaded linux-3.3-rc7.tar.bz2 from kernel.org and applied
your patches again. The result is the same.

Am I the only one experiencing these problems?

This is on a fedora 16 system and it is NOT a virtual machine. It only has
3 SSD's and 6 GiB of ram. Below I have added some more information of the
system in case it helps.

Regards,
Holger


lscpu
=====
    Architecture:          x86_64
    CPU op-mode(s):        32-bit, 64-bit
    Byte Order:            Little Endian
    CPU(s):                4
    On-line CPU(s) list:   0-3
    Thread(s) per core:    1
    Core(s) per socket:    4
    Socket(s):             1
    NUMA node(s):          1
    Vendor ID:             GenuineIntel
    CPU family:            6
    Model:                 15
    Stepping:              11
    CPU MHz:               1596.000
    BogoMIPS:              5333.50
    Virtualization:        VT-x
    L1d cache:             32K
    L1i cache:             32K
    L2 cache:              4096K
    NUMA node0 CPU(s):     0-3


lspci
=====
    00:00.0 Host bridge: Intel Corporation 82975X Memory Controller Hub
    00:01.0 PCI bridge: Intel Corporation 82975X PCI Express Root Port
    00:1b.0 Audio device: Intel Corporation N10/ICH 7 Family High Definition Audio Controller (rev 01)
    00:1c.0 PCI bridge: Intel Corporation N10/ICH 7 Family PCI Express Port 1 (rev 01)
    00:1c.5 PCI bridge: Intel Corporation 82801GR/GH/GHM (ICH7 Family) PCI Express Port 6 (rev 01)
    00:1d.0 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #1 (rev 01)
    00:1d.1 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #2 (rev 01)
    00:1d.2 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #3 (rev 01)
    00:1d.3 USB Controller: Intel Corporation N10/ICH 7 Family USB UHCI Controller #4 (rev 01)
    00:1d.7 USB Controller: Intel Corporation N10/ICH 7 Family USB2 EHCI Controller (rev 01)
    00:1e.0 PCI bridge: Intel Corporation 82801 PCI Bridge (rev e1)
    00:1f.0 ISA bridge: Intel Corporation 82801GH (ICH7DH) LPC Interface Bridge (rev 01)
    00:1f.1 IDE interface: Intel Corporation 82801G (ICH7 Family) IDE Controller (rev 01)
    00:1f.2 SATA controller: Intel Corporation N10/ICH7 Family SATA AHCI Controller (rev 01)
    00:1f.3 SMBus: Intel Corporation N10/ICH 7 Family SMBus Controller (rev 01)
    01:00.0 VGA compatible controller: ATI Technologies Inc RV730XT [Radeon HD 4670]
    01:00.1 Audio device: ATI Technologies Inc RV710/730
    02:00.0 PCI bridge: PLX Technology, Inc. PEX 8114 PCI Express-to-PCI/PCI-X Bridge (rev bc)
    03:04.0 SCSI storage controller: Adaptec ASC-29320ALP U320 (rev 10)
    04:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
    05:04.0 FireWire (IEEE 1394): Texas Instruments TSB43AB23 IEEE-1394a-2000 Controller (PHY/Link


hdparm -I /dev/sda
==================
    /dev/sda:

    ATA device, with non-removable media
            Model Number:       OCZ-VERTEX2
            Serial Number:      OCZ-5036916D0T14323M
            Firmware Revision:  1.35
            Transport:          Serial
    Standards:
            Used: unknown (minor revision code 0x0028)
            Supported: 8 7 6 5
            Likely used: 8
    Configuration:
            Logical         max     current
            cylinders       16383   16383
            heads           16      16
            sectors/track   63      63
            --
            CHS current addressable sectors:   16514064
            LBA    user addressable sectors:  234441648
            LBA48  user addressable sectors:  234441648
            Logical  Sector size:                   512 bytes
            Physical Sector size:                   512 bytes
            Logical Sector-0 offset:                  0 bytes
            device size with M = 1024*1024:      114473 MBytes
            device size with M = 1000*1000:      120034 MBytes (120 GB)
            cache/buffer size  = unknown
            Nominal Media Rotation Rate: Solid State Device
    Capabilities:
            LBA, IORDY(can be disabled)
            Queue depth: 32
            Standby timer values: spec'd by Standard, no device specific minimum
            R/W multiple sector transfer: Max = 16  Current = 16
            DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
                 Cycle time: min=120ns recommended=120ns
            PIO: pio0 pio1 pio2 pio3 pio4
                 Cycle time: no flow control=120ns  IORDY flow control=120ns
    Commands/features:
            Enabled Supported:
               *    SMART feature set
                    Security Mode feature set
               *    Power Management feature set
               *    Write cache
               *    Look-ahead
                    Host Protected Area feature set
               *    WRITE_BUFFER command
               *    READ_BUFFER command
               *    NOP cmd
               *    DOWNLOAD_MICROCODE
                    SET_MAX security extension
               *    48-bit Address feature set
               *    Mandatory FLUSH_CACHE
               *    FLUSH_CACHE_EXT
               *    SMART error logging
               *    SMART self-test
               *    General Purpose Logging feature set
               *    WRITE_{DMA|MULTIPLE}_FUA_EXT
               *    64-bit World wide name
               *    IDLE_IMMEDIATE with UNLOAD
               *    WRITE_UNCORRECTABLE_EXT command
               *    Segmented DOWNLOAD_MICROCODE
               *    Gen1 signaling speed (1.5Gb/s)
               *    Gen2 signaling speed (3.0Gb/s)
               *    Native Command Queueing (NCQ)
               *    Host-initiated interface power management
               *    Phy event counters
               *    DMA Setup Auto-Activate optimization
                    Device-initiated interface power management
               *    Software settings preservation
               *    SMART Command Transport (SCT) feature set
               *    SCT LBA Segment Access (AC2)
               *    SCT Error Recovery Control (AC3)
               *    SCT Features Control (AC4)
               *    SCT Data Tables (AC5)
               *    Data Set Management TRIM supported (limit 1 block)
               *    Deterministic read data after TRIM
    Security:
                    supported
            not     enabled
            not     locked
                    frozen
            not     expired: security count
                    supported: enhanced erase
            400min for SECURITY ERASE UNIT. 400min for ENHANCED SECURITY ERASE UNIT.
    Logical Unit WWN Device Identifier: 5e83a97fe7238b7d
            NAA             : 5
            IEEE OUI        : e83a97
            Unique ID       : fe7238b7d
    Checksum: correct


hdparm -I /dev/sdb
==================
    /dev/sdb:

    ATA device, with non-removable media
            Model Number:       OCZ-VERTEX2
            Serial Number:      OCZ-JT00YSO1J56PNBG5
            Firmware Revision:  1.35
            Transport:          Serial
    Standards:
            Used: unknown (minor revision code 0x0028)
            Supported: 8 7 6 5
            Likely used: 8
    Configuration:
            Logical         max     current
            cylinders       16383   16383
            heads           16      16
            sectors/track   63      63
            --
            CHS current addressable sectors:   16514064
            LBA    user addressable sectors:  234441648
            LBA48  user addressable sectors:  234441648
            Logical  Sector size:                   512 bytes
            Physical Sector size:                   512 bytes
            Logical Sector-0 offset:                  0 bytes
            device size with M = 1024*1024:      114473 MBytes
            device size with M = 1000*1000:      120034 MBytes (120 GB)
            cache/buffer size  = unknown
            Nominal Media Rotation Rate: Solid State Device
    Capabilities:
            LBA, IORDY(can be disabled)
            Queue depth: 32
            Standby timer values: spec'd by Standard, no device specific minimum
            R/W multiple sector transfer: Max = 16  Current = 16
            DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
                 Cycle time: min=120ns recommended=120ns
            PIO: pio0 pio1 pio2 pio3 pio4
                 Cycle time: no flow control=120ns  IORDY flow control=120ns
    Commands/features:
            Enabled Supported:
               *    SMART feature set
                    Security Mode feature set
               *    Power Management feature set
               *    Write cache
               *    Look-ahead
                    Host Protected Area feature set
               *    WRITE_BUFFER command
               *    READ_BUFFER command
               *    NOP cmd
               *    DOWNLOAD_MICROCODE
                    SET_MAX security extension
               *    48-bit Address feature set
               *    Mandatory FLUSH_CACHE
               *    FLUSH_CACHE_EXT
               *    SMART error logging
               *    SMART self-test
               *    General Purpose Logging feature set
               *    WRITE_{DMA|MULTIPLE}_FUA_EXT
               *    64-bit World wide name
               *    IDLE_IMMEDIATE with UNLOAD
               *    WRITE_UNCORRECTABLE_EXT command
               *    Segmented DOWNLOAD_MICROCODE
               *    Gen1 signaling speed (1.5Gb/s)
               *    Gen2 signaling speed (3.0Gb/s)
               *    Native Command Queueing (NCQ)
               *    Host-initiated interface power management
               *    Phy event counters
               *    DMA Setup Auto-Activate optimization
                    Device-initiated interface power management
               *    Software settings preservation
               *    SMART Command Transport (SCT) feature set
               *    SCT LBA Segment Access (AC2)
               *    SCT Error Recovery Control (AC3)
               *    SCT Features Control (AC4)
               *    SCT Data Tables (AC5)
               *    Data Set Management TRIM supported (limit 1 block)
               *    Deterministic read data after TRIM
    Security:
                    supported
            not     enabled
            not     locked
                    frozen
            not     expired: security count
                    supported: enhanced erase
            400min for SECURITY ERASE UNIT. 400min for ENHANCED SECURITY ERASE UNIT.
    Logical Unit WWN Device Identifier: 5e83a97f98e23f26
            NAA             : 5
            IEEE OUI        : e83a97
            Unique ID       : f98e23f26
    Checksum: correct


hdparm -I /dev/sdc
==================
    /dev/sdc:

    ATA device, with non-removable media
            Model Number:       OCZ-VERTEX2
            Serial Number:      OCZ-1YJ3PX48285QOE4P
            Firmware Revision:  1.35
            Transport:          Serial
    Standards:
            Used: unknown (minor revision code 0x0028)
            Supported: 8 7 6 5
            Likely used: 8
    Configuration:
            Logical         max     current
            cylinders       16383   16383
            heads           16      16
            sectors/track   63      63
            --
            CHS current addressable sectors:   16514064
            LBA    user addressable sectors:  234441648
            LBA48  user addressable sectors:  234441648
            Logical  Sector size:                   512 bytes
            Physical Sector size:                   512 bytes
            Logical Sector-0 offset:                  0 bytes
            device size with M = 1024*1024:      114473 MBytes
            device size with M = 1000*1000:      120034 MBytes (120 GB)
            cache/buffer size  = unknown
            Nominal Media Rotation Rate: Solid State Device
    Capabilities:
            LBA, IORDY(can be disabled)
            Queue depth: 32
            Standby timer values: spec'd by Standard, no device specific minimum
            R/W multiple sector transfer: Max = 16  Current = 16
            DMA: mdma0 mdma1 mdma2 udma0 udma1 udma2 udma3 udma4 udma5 *udma6
                 Cycle time: min=120ns recommended=120ns
            PIO: pio0 pio1 pio2 pio3 pio4
                 Cycle time: no flow control=120ns  IORDY flow control=120ns
    Commands/features:
            Enabled Supported:
               *    SMART feature set
                    Security Mode feature set
               *    Power Management feature set
               *    Write cache
               *    Look-ahead
                    Host Protected Area feature set
               *    WRITE_BUFFER command
               *    READ_BUFFER command
               *    NOP cmd
               *    DOWNLOAD_MICROCODE
                    SET_MAX security extension
               *    48-bit Address feature set
               *    Mandatory FLUSH_CACHE
               *    FLUSH_CACHE_EXT
               *    SMART error logging
               *    SMART self-test
               *    General Purpose Logging feature set
               *    WRITE_{DMA|MULTIPLE}_FUA_EXT
               *    64-bit World wide name
               *    IDLE_IMMEDIATE with UNLOAD
               *    WRITE_UNCORRECTABLE_EXT command
               *    Segmented DOWNLOAD_MICROCODE
               *    Gen1 signaling speed (1.5Gb/s)
               *    Gen2 signaling speed (3.0Gb/s)
               *    Native Command Queueing (NCQ)
               *    Host-initiated interface power management
               *    Phy event counters
               *    DMA Setup Auto-Activate optimization
                    Device-initiated interface power management
               *    Software settings preservation
               *    SMART Command Transport (SCT) feature set
               *    SCT LBA Segment Access (AC2)
               *    SCT Error Recovery Control (AC3)
               *    SCT Features Control (AC4)
               *    SCT Data Tables (AC5)
               *    Data Set Management TRIM supported (limit 1 block)
               *    Deterministic read data after TRIM
    Security:
                    supported
            not     enabled
            not     locked
                    frozen
            not     expired: security count
                    supported: enhanced erase
            400min for SECURITY ERASE UNIT. 400min for ENHANCED SECURITY ERASE UNIT.
    Logical Unit WWN Device Identifier: 5e83a97f87718aff
            NAA             : 5
            IEEE OUI        : e83a97
            Unique ID       : f87718aff
    Checksum: correct

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 6/7] blk: add plug for blkdev_issue_discard
  2012-03-12  3:04 ` [patch 6/7] blk: add plug for blkdev_issue_discard Shaohua Li
@ 2012-03-13 15:51   ` Vivek Goyal
  2012-03-13 17:04     ` Martin K. Petersen
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2012-03-13 15:51 UTC (permalink / raw)
  To: Shaohua Li
  Cc: linux-kernel, linux-raid, neilb, axboe, Shaohua Li, Martin K. Petersen

On Mon, Mar 12, 2012 at 11:04:18AM +0800, Shaohua Li wrote:
> In raid 0 case, a big discard request is divided into several small requests
> in chunk_size unit. Such requests can be merged in low layer if we have
> correct plug added. This should improve the performance a little bit.

Martin posted a patch to remove the support for allowing merging of discard
requests. But this seems to be a reasonable use case for allowing mering
discard requests. CCing Martin.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 7/7] blk: use correct sectors limitation for discard request
  2012-03-12  3:04 ` [patch 7/7] blk: use correct sectors limitation for discard request Shaohua Li
@ 2012-03-13 16:00   ` Vivek Goyal
  0 siblings, 0 replies; 33+ messages in thread
From: Vivek Goyal @ 2012-03-13 16:00 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe, Shaohua Li

On Mon, Mar 12, 2012 at 11:04:19AM +0800, Shaohua Li wrote:
> max_discard_sectors doesn't equal to max_sectors/max_hw_sectors. Without this,
> discard request merge might be ignored.
> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---
>  block/blk-merge.c      |    9 +++++++--
>  include/linux/blkdev.h |    5 +++++
>  2 files changed, 12 insertions(+), 2 deletions(-)
> 
> Index: linux/block/blk-merge.c
> ===================================================================
> --- linux.orig/block/blk-merge.c	2012-03-09 14:05:35.562062857 +0800
> +++ linux/block/blk-merge.c	2012-03-09 14:07:55.432062246 +0800
> @@ -228,13 +228,16 @@ no_merge:
>  int ll_back_merge_fn(struct request_queue *q, struct request *req,
>  		     struct bio *bio)
>  {
> -	unsigned short max_sectors;
> +	unsigned int max_sectors;
>  
>  	if (unlikely(req->cmd_type == REQ_TYPE_BLOCK_PC))
>  		max_sectors = queue_max_hw_sectors(q);
>  	else
>  		max_sectors = queue_max_sectors(q);
>  
> +	if (unlikely(req->cmd_flags & REQ_DISCARD))
> +		max_sectors = queue_max_discard_sectors(q);
> +

May be make above check an "else if" condition above instead of starting
another "if" block.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 6/7] blk: add plug for blkdev_issue_discard
  2012-03-13 15:51   ` Vivek Goyal
@ 2012-03-13 17:04     ` Martin K. Petersen
  2012-03-13 17:14       ` Vivek Goyal
  0 siblings, 1 reply; 33+ messages in thread
From: Martin K. Petersen @ 2012-03-13 17:04 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Shaohua Li, linux-kernel, linux-raid, neilb, axboe, Martin K. Petersen

>>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:

Vivek> On Mon, Mar 12, 2012 at 11:04:18AM +0800, Shaohua Li wrote:
>> In raid 0 case, a big discard request is divided into several small
>> requests in chunk_size unit. Such requests can be merged in low layer
>> if we have correct plug added. This should improve the performance a
>> little bit.

Vivek> Martin posted a patch to remove the support for allowing merging
Vivek> of discard requests. But this seems to be a reasonable use case
Vivek> for allowing mering discard requests. CCing Martin.

Merging discard requests is hard given how we need to prepare the
command payload at the bottom of the stack. The current upstream merge
code pretends to be working but it actually doesn't. That's why I want
it dead and buried.

I have some changes pending (that I need for the REQ_COPY support) that
will make merging of non-rw requests easier to deal with. But that's a
kernel release cycle away...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 6/7] blk: add plug for blkdev_issue_discard
  2012-03-13 17:04     ` Martin K. Petersen
@ 2012-03-13 17:14       ` Vivek Goyal
  2012-03-13 17:19         ` Martin K. Petersen
  0 siblings, 1 reply; 33+ messages in thread
From: Vivek Goyal @ 2012-03-13 17:14 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Shaohua Li, linux-kernel, linux-raid, neilb, axboe

On Tue, Mar 13, 2012 at 01:04:58PM -0400, Martin K. Petersen wrote:
> >>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:
> 
> Vivek> On Mon, Mar 12, 2012 at 11:04:18AM +0800, Shaohua Li wrote:
> >> In raid 0 case, a big discard request is divided into several small
> >> requests in chunk_size unit. Such requests can be merged in low layer
> >> if we have correct plug added. This should improve the performance a
> >> little bit.
> 
> Vivek> Martin posted a patch to remove the support for allowing merging
> Vivek> of discard requests. But this seems to be a reasonable use case
> Vivek> for allowing mering discard requests. CCing Martin.
> 
> Merging discard requests is hard given how we need to prepare the
> command payload at the bottom of the stack. The current upstream merge
> code pretends to be working but it actually doesn't. That's why I want
> it dead and buried.
> 
> I have some changes pending (that I need for the REQ_COPY support) that
> will make merging of non-rw requests easier to deal with. But that's a
> kernel release cycle away...

So first we will get rid of mering discard request and then enable after
one release cycle once REQ_COPY support is in?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 6/7] blk: add plug for blkdev_issue_discard
  2012-03-13 17:14       ` Vivek Goyal
@ 2012-03-13 17:19         ` Martin K. Petersen
  0 siblings, 0 replies; 33+ messages in thread
From: Martin K. Petersen @ 2012-03-13 17:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin K. Petersen, Shaohua Li, linux-kernel, linux-raid, neilb, axboe

>>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:

Vivek> So first we will get rid of mering discard request 

We'll get rid of code that doesn't do anything other than confuse.


Vivek> and then enable after one release cycle once REQ_COPY support is
Vivek> in?

The new code will be entirely different. And one cycle is optimistic
given the current state of affairs in T10.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-13 15:44         ` Holger Kiehl
@ 2012-03-14  1:30           ` Shaohua Li
  2012-03-14 10:25             ` Holger Kiehl
  0 siblings, 1 reply; 33+ messages in thread
From: Shaohua Li @ 2012-03-14  1:30 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel, linux-raid, neilb, axboe

[-- Attachment #1: Type: text/plain, Size: 564 bytes --]

2012/3/13 Holger Kiehl <Holger.Kiehl@dwd.de>:
> On Tue, 13 Mar 2012, Shaohua Li wrote:
>
>> Thanks for testing. This is very wield, the req->__data_len is wrong.
>> Is this a clean build?
>>
> I just downloaded linux-3.3-rc7.tar.bz2 from kernel.org and applied
> your patches again. The result is the same.
>
> Am I the only one experiencing these problems?
Martin Petersen pointed out scsi layer doesn't support discard merge, which
might be the reason you see the error message (my drive isn't a scsi device).
can you please try attached patch?

Thanks,
Shaohua

[-- Attachment #2: blk-discard-nomerge.patch --]
[-- Type: application/octet-stream, Size: 904 bytes --]

Didn't allow discard request merge temporarily, as SCSI layer isn't ready
for discard merge as Martin Petersen pointed out.

Signed-off-by: Shaohua Li <shli@fusionio.com>
---
 include/linux/blkdev.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux/include/linux/blkdev.h
===================================================================
--- linux.orig/include/linux/blkdev.h	2012-03-14 09:20:06.787261188 +0800
+++ linux/include/linux/blkdev.h	2012-03-14 09:20:47.797261248 +0800
@@ -575,7 +575,7 @@ static inline void blk_clear_queue_full(
  * it already be started by driver.
  */
 #define RQ_NOMERGE_FLAGS	\
-	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
+	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | REQ_DISCARD)
 #define rq_mergeable(rq)	\
 	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
 	 (((rq)->cmd_flags & REQ_DISCARD) || \

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
                   ` (8 preceding siblings ...)
  2012-03-12 18:22 ` Holger Kiehl
@ 2012-03-14  2:24 ` NeilBrown
  2012-03-14  2:47   ` Shaohua Li
  9 siblings, 1 reply; 33+ messages in thread
From: NeilBrown @ 2012-03-14  2:24 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, axboe

[-- Attachment #1: Type: text/plain, Size: 966 bytes --]

On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli@fusionio.com> wrote:

> The patches add TRIM support for raid linear/0/1/10. I'll add TRIM support for
> raid 4/5/6 later. The implementation is pretty straightforward and
> self-explained.
> 
> Thanks,
> Shaohua

Thanks.
They look mostly OK.

In raid0.c, I think you'll need to change

		/* Sanity check -- queue functions should prevent this happening */
		if (bio->bi_vcnt != 1 ||
		    bio->bi_idx != 0)
			goto bad_map;

to also allow for 'bi_vcnt == 0' like you did in bio_split.

Also I wonder about handling failure in RAID1.
I think the code will currently treat it like a write error, and
maybe record a bad block (then fail the device is writing the badblock
record fails). Is that what were want?

And of course resync/recovery will mess up the discarded sector information,
so this isn't a complete solution for RAID1.  But  it is a reasonable start.

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
       [not found]   ` <4F5EA8E9.5010502@fusionio.com>
@ 2012-03-14  2:25     ` NeilBrown
  0 siblings, 0 replies; 33+ messages in thread
From: NeilBrown @ 2012-03-14  2:25 UTC (permalink / raw)
  To: Shaohua Li; +Cc: Holger Kiehl, linux-kernel, linux-raid, axboe

[-- Attachment #1: Type: text/plain, Size: 2615 bytes --]

On Tue, 13 Mar 2012 09:54:49 +0800 Shaohua Li <shli@fusionio.com> wrote:

> On 3/13/12 2:22 AM, Holger Kiehl wrote:
> >  Hello,
> >
> >  On Mon, 12 Mar 2012, Shaohua Li wrote:
> >
> > > The patches add TRIM support for raid linear/0/1/10. I'll add TRIM 
> support for
> > > raid 4/5/6 later. The implementation is pretty straightforward and
> > > self-explained.
> > >
> >  First, thanks for this patch!
> >
> >  I have applied those patches against 3.3.0-rc7 and during boot the kernel
> >  reports a lot of the following:
> >
> >  Mar 12 18:56:00 c3po kernel: [ 7.611045] md/raid0:md3: make_request 
> bug: can't convert block across chunks or bigger than 512k 18861064 512
> >  Mar 12 18:56:00 c3po kernel: [ 7.611047] md/raid0:md3: make_request 
> bug: can't convert block across chunks or bigger than 512k 18862088 512
> >  Mar 12 18:56:00 c3po kernel: [ 7.611049] md/raid0:md3: make_request 
> bug: can't convert block across chunks or bigger than 512k 18863112 512
> >  Mar 12 18:56:00 c3po kernel: [ 7.611052] md/raid0:md3: make_request 
> bug: can't convert block across chunks or bigger than 512k 18864136 512
> >  Mar 12 18:56:00 c3po kernel: [ 7.611054] md/raid0:md3: make_request 
> bug: can't convert block across chunks or bigger than 512k 18865160 512
> >  Mar 12 18:56:00 c3po kernel: [ 7.611056] md/raid0:md3: make_request 
> bug: can't convert block across chunks or bigger than 512k 18866184 512
> Thanks for testing. Looks I fixed a sanity check in bio.c but there are
> similar check in raid0/10 which I forgot to fix. Below patch should fix it.
> please try.
> 
> 
> Subject: md: fix sanity check
> 
> discard bio hasn't data attached and such bio can be split, don't consider
> this is illegial.
> 
> Signed-off-by: Shaohua Li <shli@fusionio.com>
> ---
>   drivers/md/raid0.c  |    2 +-
>   drivers/md/raid10.c |    2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> Index: linux/drivers/md/raid0.c
> ===================================================================
> --- linux.orig/drivers/md/raid0.c    2012-03-13 09:37:58.759976786 +0800
> +++ linux/drivers/md/raid0.c    2012-03-13 09:42:35.389975584 +0800
> @@ -496,7 +496,7 @@ static void raid0_make_request(struct md
>           sector_t sector = bio->bi_sector;
>           struct bio_pair *bp;
>           /* Sanity check -- queue functions should prevent this 
> happening */
> -        if (bio->bi_vcnt != 1 ||
> +        if ((bio->bi_vcnt != 1 && bio->bi_vcnt !=0) ||

oh .. and there is the fix I mentioned that you would need :-)
Thanks,
NeilBrown



[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 828 bytes --]

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14  2:24 ` NeilBrown
@ 2012-03-14  2:47   ` Shaohua Li
  2012-03-17 18:14     ` Mark Lord
  0 siblings, 1 reply; 33+ messages in thread
From: Shaohua Li @ 2012-03-14  2:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-kernel, linux-raid, axboe

On 3/14/12 10:24 AM, NeilBrown wrote:
>  On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli@fusionio.com> wrote:
>
> > The patches add TRIM support for raid linear/0/1/10. I'll add TRIM 
support for
> > raid 4/5/6 later. The implementation is pretty straightforward and
> > self-explained.
> >
> > Thanks,
> > Shaohua
>
>  Thanks.
>  They look mostly OK.
>
>  In raid0.c, I think you'll need to change
>
>  /* Sanity check -- queue functions should prevent this happening */
>  if (bio->bi_vcnt != 1 ||
>  bio->bi_idx != 0)
>  goto bad_map;
>
>  to also allow for 'bi_vcnt == 0' like you did in bio_split.
>
>  Also I wonder about handling failure in RAID1.
>  I think the code will currently treat it like a write error, and
>  maybe record a bad block (then fail the device is writing the badblock
>  record fails). Is that what were want?
Mainly to simplify the code. And I thought a normal discard should not fail.
If it fails, something is wrong, marked it as badblock maybe not bad.

>  And of course resync/recovery will mess up the discarded sector 
information,
>  so this isn't a complete solution for RAID1. But it is a reasonable start.
Yes, this is a mess. Looks impossible without ondisk format change at
first glance.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14  1:30           ` Shaohua Li
@ 2012-03-14 10:25             ` Holger Kiehl
  2012-03-14 11:14               ` Shaohua Li
  0 siblings, 1 reply; 33+ messages in thread
From: Holger Kiehl @ 2012-03-14 10:25 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe

On Wed, 14 Mar 2012, Shaohua Li wrote:

> 2012/3/13 Holger Kiehl <Holger.Kiehl@dwd.de>:
>> On Tue, 13 Mar 2012, Shaohua Li wrote:
>>
>>> Thanks for testing. This is very wield, the req->__data_len is wrong.
>>> Is this a clean build?
>>>
>> I just downloaded linux-3.3-rc7.tar.bz2 from kernel.org and applied
>> your patches again. The result is the same.
>>
>> Am I the only one experiencing these problems?
> Martin Petersen pointed out scsi layer doesn't support discard merge, which
> might be the reason you see the error message (my drive isn't a scsi device).
> can you please try attached patch?
>
Thanks! After this patch the error messages are away.

However, when the system boots it takes a very long time to boot:

    [   16.527389] EXT4-fs (md0): mounted filesystem with ordered data mode. Opts: discard,commit=2400,journal_async_commit
    [   24.218410] EXT4-fs (md2): mounted filesystem with ordered data mode. Opts: discard,commit=600,journal_async_commit
    [   69.823138] udevd[474]: timeout '/sbin/blkid -o udev -p /dev/md0'
    [   70.824197] udevd[474]: timeout: killing '/sbin/blkid -o udev -p /dev/md0' [866]
    [   70.826823] udevd[474]: '/sbin/blkid -o udev -p /dev/md0' [866] terminated by signal 9 (Killed)
    [   70.829290] udevd[474]: timeout 'udisks-part-id /dev/md0'
    [   74.942625] udevd[475]: timeout '/sbin/blkid -o udev -p /dev/md2'
    [   75.947158] udevd[475]: timeout: killing '/sbin/blkid -o udev -p /dev/md2' [874]
    [   75.949734] udevd[475]: '/sbin/blkid -o udev -p /dev/md2' [874] terminated by signal 9 (Killed)
    [   75.951945] udevd[475]: timeout 'udisks-part-id /dev/md2'
    [   79.023741] rmmod[886]: ERROR: Module scsi_wait_scan does not exist in /proc/modules
    [   96.005919] systemd[1]: dev-md3.swap activation timed out. Stopping.
    [  127.292002] Adding 9434108k swap on /dev/md3.  Priority:0 extents:1 across:9434108k

During another boot I saw this additional message:

    [   11.988732] scsi_verify_blk_ioctl: 930 callbacks suppressed

The strange thing is that after boot I tried to enter the command
'/sbin/blkid -o udev -p /dev/md0' and it works without any problems
(time for this is 0m0.001s). So I also tried booting without discard
option, but the result is the same.

Otherwise I observed no problems. Even running some more extensive
benchmark testing showed no problems. Only the performance drop is
dramatic. In my own benchmark where thousand of files get copied around
via FTP the performance drops from 4000 files per second to 520 when
mounting the filesystem with discard option. Also during the benchmark
any access to the disk can take very very long if discard is enabled.

Next, I will try this patch on a system without SATA/SCSI disks.

Regards,
Holger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14 10:25             ` Holger Kiehl
@ 2012-03-14 11:14               ` Shaohua Li
  2012-03-14 11:32                 ` Shaohua Li
  2012-03-14 21:13                 ` Holger Kiehl
  0 siblings, 2 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-14 11:14 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel, linux-raid, neilb, axboe

2012/3/14 Holger Kiehl <Holger.Kiehl@dwd.de>:
> On Wed, 14 Mar 2012, Shaohua Li wrote:
>
>> 2012/3/13 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>
>>> On Tue, 13 Mar 2012, Shaohua Li wrote:
>>>
>>>> Thanks for testing. This is very wield, the req->__data_len is wrong.
>>>> Is this a clean build?
>>>>
>>> I just downloaded linux-3.3-rc7.tar.bz2 from kernel.org and applied
>>> your patches again. The result is the same.
>>>
>>> Am I the only one experiencing these problems?
>>
>> Martin Petersen pointed out scsi layer doesn't support discard merge,
>> which
>> might be the reason you see the error message (my drive isn't a scsi
>> device).
>> can you please try attached patch?
>>
> Thanks! After this patch the error messages are away.
>
> However, when the system boots it takes a very long time to boot:
>
>   [   16.527389] EXT4-fs (md0): mounted filesystem with ordered data mode.
> Opts: discard,commit=2400,journal_async_commit
>   [   24.218410] EXT4-fs (md2): mounted filesystem with ordered data mode.
> Opts: discard,commit=600,journal_async_commit
>   [   69.823138] udevd[474]: timeout '/sbin/blkid -o udev -p /dev/md0'
>   [   70.824197] udevd[474]: timeout: killing '/sbin/blkid -o udev -p
> /dev/md0' [866]
>   [   70.826823] udevd[474]: '/sbin/blkid -o udev -p /dev/md0' [866]
> terminated by signal 9 (Killed)
>   [   70.829290] udevd[474]: timeout 'udisks-part-id /dev/md0'
>   [   74.942625] udevd[475]: timeout '/sbin/blkid -o udev -p /dev/md2'
>   [   75.947158] udevd[475]: timeout: killing '/sbin/blkid -o udev -p
> /dev/md2' [874]
>   [   75.949734] udevd[475]: '/sbin/blkid -o udev -p /dev/md2' [874]
> terminated by signal 9 (Killed)
>   [   75.951945] udevd[475]: timeout 'udisks-part-id /dev/md2'
>   [   79.023741] rmmod[886]: ERROR: Module scsi_wait_scan does not exist in
> /proc/modules
>   [   96.005919] systemd[1]: dev-md3.swap activation timed out. Stopping.
>   [  127.292002] Adding 9434108k swap on /dev/md3.  Priority:0 extents:1
> across:9434108k
>
> During another boot I saw this additional message:
>
>   [   11.988732] scsi_verify_blk_ioctl: 930 callbacks suppressed
>
> The strange thing is that after boot I tried to enter the command
> '/sbin/blkid -o udev -p /dev/md0' and it works without any problems
> (time for this is 0m0.001s). So I also tried booting without discard
> option, but the result is the same.
>
> Otherwise I observed no problems. Even running some more extensive
> benchmark testing showed no problems. Only the performance drop is
> dramatic. In my own benchmark where thousand of files get copied around
> via FTP the performance drops from 4000 files per second to 520 when
> mounting the filesystem with discard option. Also during the benchmark
> any access to the disk can take very very long if discard is enabled.
>
> Next, I will try this patch on a system without SATA/SCSI disks.
Maybe the discard runs slow with small size request in the disk.
please drop patch "blk: add plug for blkdev_issue_discard" and try again. Since
we can't do merge, the plug just introduces latency.
if it doesn't help, please capture a blktrace when you do the benchmark and
send it to me.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14 11:14               ` Shaohua Li
@ 2012-03-14 11:32                 ` Shaohua Li
  2012-03-14 21:01                   ` Holger Kiehl
  2012-03-14 21:13                 ` Holger Kiehl
  1 sibling, 1 reply; 33+ messages in thread
From: Shaohua Li @ 2012-03-14 11:32 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel, linux-raid, neilb, axboe

2012/3/14 Shaohua Li <shli@kernel.org>:
> 2012/3/14 Holger Kiehl <Holger.Kiehl@dwd.de>:
>> On Wed, 14 Mar 2012, Shaohua Li wrote:
>>
>>> 2012/3/13 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>>
>>>> On Tue, 13 Mar 2012, Shaohua Li wrote:
>>>>
>>>>> Thanks for testing. This is very wield, the req->__data_len is wrong.
>>>>> Is this a clean build?
>>>>>
>>>> I just downloaded linux-3.3-rc7.tar.bz2 from kernel.org and applied
>>>> your patches again. The result is the same.
>>>>
>>>> Am I the only one experiencing these problems?
>>>
>>> Martin Petersen pointed out scsi layer doesn't support discard merge,
>>> which
>>> might be the reason you see the error message (my drive isn't a scsi
>>> device).
>>> can you please try attached patch?
>>>
>> Thanks! After this patch the error messages are away.
>>
>> However, when the system boots it takes a very long time to boot:
>>
>>   [   16.527389] EXT4-fs (md0): mounted filesystem with ordered data mode.
>> Opts: discard,commit=2400,journal_async_commit
>>   [   24.218410] EXT4-fs (md2): mounted filesystem with ordered data mode.
>> Opts: discard,commit=600,journal_async_commit
>>   [   69.823138] udevd[474]: timeout '/sbin/blkid -o udev -p /dev/md0'
>>   [   70.824197] udevd[474]: timeout: killing '/sbin/blkid -o udev -p
>> /dev/md0' [866]
>>   [   70.826823] udevd[474]: '/sbin/blkid -o udev -p /dev/md0' [866]
>> terminated by signal 9 (Killed)
>>   [   70.829290] udevd[474]: timeout 'udisks-part-id /dev/md0'
>>   [   74.942625] udevd[475]: timeout '/sbin/blkid -o udev -p /dev/md2'
>>   [   75.947158] udevd[475]: timeout: killing '/sbin/blkid -o udev -p
>> /dev/md2' [874]
>>   [   75.949734] udevd[475]: '/sbin/blkid -o udev -p /dev/md2' [874]
>> terminated by signal 9 (Killed)
>>   [   75.951945] udevd[475]: timeout 'udisks-part-id /dev/md2'
>>   [   79.023741] rmmod[886]: ERROR: Module scsi_wait_scan does not exist in
>> /proc/modules
>>   [   96.005919] systemd[1]: dev-md3.swap activation timed out. Stopping.
>>   [  127.292002] Adding 9434108k swap on /dev/md3.  Priority:0 extents:1
>> across:9434108k
>>
>> During another boot I saw this additional message:
>>
>>   [   11.988732] scsi_verify_blk_ioctl: 930 callbacks suppressed
>>
>> The strange thing is that after boot I tried to enter the command
>> '/sbin/blkid -o udev -p /dev/md0' and it works without any problems
>> (time for this is 0m0.001s). So I also tried booting without discard
>> option, but the result is the same.
>>
>> Otherwise I observed no problems. Even running some more extensive
>> benchmark testing showed no problems. Only the performance drop is
>> dramatic. In my own benchmark where thousand of files get copied around
>> via FTP the performance drops from 4000 files per second to 520 when
>> mounting the filesystem with discard option. Also during the benchmark
>> any access to the disk can take very very long if discard is enabled.
>>
>> Next, I will try this patch on a system without SATA/SCSI disks.
> Maybe the discard runs slow with small size request in the disk.
> please drop patch "blk: add plug for blkdev_issue_discard" and try again. Since
> we can't do merge, the plug just introduces latency.
> if it doesn't help, please capture a blktrace when you do the benchmark and
> send it to me.
Is it possible if you can directly build a fs in such ssd (that is not
using raid), and check
the performance of discard? I'd like to make sure it's not the problem
of the ssd itself.
the raid chunk size is 512k, which is already big even without merge.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14 11:32                 ` Shaohua Li
@ 2012-03-14 21:01                   ` Holger Kiehl
  0 siblings, 0 replies; 33+ messages in thread
From: Holger Kiehl @ 2012-03-14 21:01 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe

On Wed, 14 Mar 2012, Shaohua Li wrote:

> Is it possible if you can directly build a fs in such ssd (that is not
> using raid), and check
> the performance of discard? I'd like to make sure it's not the problem
> of the ssd itself.
>
Yes, I just did the test on a partition without raid on the same SSD.
Then there is hardly any difference if using discard or not. So it is not
the problem of the SSD.

It must be hanging somewhere in kernel. I once had an strace running
on a rm command that was deleting thousand of small files and even the
output of the strace command was stuck:

   18:31:19.776588 unlinkat(7, "2K-3-4-16--207", 0) = 0
   18:31:19.776699 unlinkat(7, "2K-3-4-16--44", 0) = 0
   18:31:19.776808 unlinkat(7, "2K-3-4-16--1476", 0) = 0
   18:31:19.776919 unlinkat(7, "2K-3-4-16--1277", 0

And when it continued after several minutes being frozen, the time
given by strace was still 18:31:19.xxxx as if it did not hang.

These hang occur when deleting lots of small files. But they not only
effect the process doing the deleting, other process are also effected.
When I tried to open a file on a tmpfs disk in another shell it too
was stuck for several minutes.

Regards,
Holger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14 11:14               ` Shaohua Li
  2012-03-14 11:32                 ` Shaohua Li
@ 2012-03-14 21:13                 ` Holger Kiehl
  2012-03-15  2:39                   ` Shaohua Li
  1 sibling, 1 reply; 33+ messages in thread
From: Holger Kiehl @ 2012-03-14 21:13 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe

On Wed, 14 Mar 2012, Shaohua Li wrote:

> Maybe the discard runs slow with small size request in the disk.
> please drop patch "blk: add plug for blkdev_issue_discard" and try again. Since
> we can't do merge, the plug just introduces latency.
>
Tried again without the patch applied, but there is only a very small
performance increase (520->600 agains 4000 fps without discard).

The benchmark creates lots of small files (2 KiB) and deletes them again.

> if it doesn't help, please capture a blktrace when you do the benchmark and
> send it to me.
>
Ok, I will do this tomorrow. Need some sleep :-)

Thanks for your work on supporting discard in MD!

Regards,
Holger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14 21:13                 ` Holger Kiehl
@ 2012-03-15  2:39                   ` Shaohua Li
  2012-03-15  9:08                     ` Holger Kiehl
  0 siblings, 1 reply; 33+ messages in thread
From: Shaohua Li @ 2012-03-15  2:39 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel, linux-raid, neilb, axboe

2012/3/15 Holger Kiehl <Holger.Kiehl@dwd.de>:
> On Wed, 14 Mar 2012, Shaohua Li wrote:
>
>> Maybe the discard runs slow with small size request in the disk.
>> please drop patch "blk: add plug for blkdev_issue_discard" and try again.
>> Since
>> we can't do merge, the plug just introduces latency.
>>
> Tried again without the patch applied, but there is only a very small
> performance increase (520->600 agains 4000 fps without discard).
>
> The benchmark creates lots of small files (2 KiB) and deletes them again.
>
>
>> if it doesn't help, please capture a blktrace when you do the benchmark
>> and
>> send it to me.
>>
> Ok, I will do this tomorrow. Need some sleep :-)
>
> Thanks for your work on supporting discard in MD!
I tried your benchmark, create 2000k 2k files and delete them and
follows a sync.
the discard runs pretty fast for both raid 0/1. So can't reproduce the
issue. I'm using
a fusionio card though. I'm afraid nothing I can do till get you blktrace.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-15  2:39                   ` Shaohua Li
@ 2012-03-15  9:08                     ` Holger Kiehl
  2012-03-16  2:19                       ` Shaohua Li
  0 siblings, 1 reply; 33+ messages in thread
From: Holger Kiehl @ 2012-03-15  9:08 UTC (permalink / raw)
  To: Shaohua Li; +Cc: linux-kernel, linux-raid, neilb, axboe

On Thu, 15 Mar 2012, Shaohua Li wrote:

> 2012/3/15 Holger Kiehl <Holger.Kiehl@dwd.de>:
>> On Wed, 14 Mar 2012, Shaohua Li wrote:
>>
>>> Maybe the discard runs slow with small size request in the disk.
>>> please drop patch "blk: add plug for blkdev_issue_discard" and try again.
>>> Since
>>> we can't do merge, the plug just introduces latency.
>>>
>> Tried again without the patch applied, but there is only a very small
>> performance increase (520->600 agains 4000 fps without discard).
>>
>> The benchmark creates lots of small files (2 KiB) and deletes them again.
>>
>>
>>> if it doesn't help, please capture a blktrace when you do the benchmark
>>> and
>>> send it to me.
>>>
>> Ok, I will do this tomorrow. Need some sleep :-)
>>
>> Thanks for your work on supporting discard in MD!
> I tried your benchmark, create 2000k 2k files and delete them and
> follows a sync.
> the discard runs pretty fast for both raid 0/1. So can't reproduce the
> issue. I'm using
> a fusionio card though. I'm afraid nothing I can do till get you blktrace.
>
The blktrace is a bit large so I have uploaded it to:

    ftp://ftp.dwd.de/pub/afd/test/trim/trace

This is while the benchmark was running. Just a reminder, md2 is
/home under which the benchmark was running. And md2 is a raid0 of
sda3, sdb3 and sdc3. While md1 is / and also raid0 of ada2, sdb2 and
sdc2.

There is also another blktarce when all files are deleted and note
this is only part of it (10 min), it takes about 30 minutes to delete
all. You can find this here:

    ftp://ftp.dwd.de/pub/afd/test/trim/trace2

Please tell me if you need more information or what else I can do to
help find the problem.

Thanks,
Holger

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-15  9:08                     ` Holger Kiehl
@ 2012-03-16  2:19                       ` Shaohua Li
  0 siblings, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-16  2:19 UTC (permalink / raw)
  To: Holger Kiehl; +Cc: linux-kernel, linux-raid, neilb, axboe

2012/3/15 Holger Kiehl <Holger.Kiehl@dwd.de>:
> On Thu, 15 Mar 2012, Shaohua Li wrote:
>
>> 2012/3/15 Holger Kiehl <Holger.Kiehl@dwd.de>:
>>>
>>> On Wed, 14 Mar 2012, Shaohua Li wrote:
>>>
>>>> Maybe the discard runs slow with small size request in the disk.
>>>> please drop patch "blk: add plug for blkdev_issue_discard" and try
>>>> again.
>>>> Since
>>>> we can't do merge, the plug just introduces latency.
>>>>
>>> Tried again without the patch applied, but there is only a very small
>>> performance increase (520->600 agains 4000 fps without discard).
>>>
>>> The benchmark creates lots of small files (2 KiB) and deletes them again.
>>>
>>>
>>>> if it doesn't help, please capture a blktrace when you do the benchmark
>>>> and
>>>> send it to me.
>>>>
>>> Ok, I will do this tomorrow. Need some sleep :-)
>>>
>>> Thanks for your work on supporting discard in MD!
>>
>> I tried your benchmark, create 2000k 2k files and delete them and
>> follows a sync.
>> the discard runs pretty fast for both raid 0/1. So can't reproduce the
>> issue. I'm using
>> a fusionio card though. I'm afraid nothing I can do till get you blktrace.
>>
> The blktrace is a bit large so I have uploaded it to:
>
>   ftp://ftp.dwd.de/pub/afd/test/trim/trace
>
> This is while the benchmark was running. Just a reminder, md2 is
> /home under which the benchmark was running. And md2 is a raid0 of
> sda3, sdb3 and sdc3. While md1 is / and also raid0 of ada2, sdb2 and
> sdc2.
>
> There is also another blktarce when all files are deleted and note
> this is only part of it (10 min), it takes about 30 minutes to delete
> all. You can find this here:
>
>   ftp://ftp.dwd.de/pub/afd/test/trim/trace2
>
> Please tell me if you need more information or what else I can do to
> help find the problem.
Looks at the blktrace:
8,0    1    47871   116.769583185   870  A   D 46042912 + 96 <- (8,3) 30869280
  8,3    1    47872   116.769583560   870  Q   D 46042912 + 96 [jbd2/md2-8]
  8,3    1    47873   116.769584613   870  G   D 46042912 + 96 [jbd2/md2-8]
  8,3    1    47874   116.769585255   870  I   D 46042912 + 96 [jbd2/md2-8]
  8,3    1    47875   116.769585693   870  D   D 46042912 + 96 [jbd2/md2-8]
  8,3    1    47876   116.771985862     0  C   D 46042912 + 1 [0]
  8,0    1    47877   116.799571098   870  A   D 46040696 + 32 <- (8,3) 30867064
  8,3    1    47878   116.799571462   870  Q   D 46040696 + 32 [jbd2/md2-8]
  8,3    1    47879   116.799572459   870  G   D 46040696 + 32 [jbd2/md2-8]
  8,3    1    47880   116.799573176   870  I   D 46040696 + 32 [jbd2/md2-8]
  8,3    1    47881   116.799573637   870  D   D 46040696 + 32 [jbd2/md2-8]
  8,3    1    47882   116.801970911     0  C   D 46040696 + 1 [0]
  8,0    1    47883   116.801980623   870  A   D 46046568 + 88 <- (8,3) 30872936
  8,3    1    47884   116.801980957   870  Q   D 46046568 + 88 [jbd2/md2-8]
  8,3    1    47885   116.801981894   870  G   D 46046568 + 88 [jbd2/md2-8]
  8,3    1    47886   116.801982539   870  I   D 46046568 + 88 [jbd2/md2-8]
  8,3    1    47887   116.801982974   870  D   D 46046568 + 88 [jbd2/md2-8]
  8,3    1    47888   116.811997203     0  C   D 46046568 + 1 [0]
  8,0    1    47889   116.829566908   870  A   D 46040032 + 32 <- (8,3) 30866400
  8,3    1    47890   116.829567261   870  Q   D 46040032 + 32 [jbd2/md2-8]
  8,3    1    47891   116.829569154   870  G   D 46040032 + 32 [jbd2/md2-8]
  8,3    1    47892   116.829569901   870  I   D 46040032 + 32 [jbd2/md2-8]
  8,3    1    47893   116.829570366   870  D   D 46040032 + 32 [jbd2/md2-8]
  8,3    1    47894   116.831972370     0  C   D 46040032 + 1 [0]
  8,0    1    47895   116.846461610   870  A   D 46039728 + 8 <- (8,3) 30866096
  8,3    1    47896   116.846462008   870  Q   D 46039728 + 8 [jbd2/md2-8]
  8,3    1    47897   116.846462911   870  G   D 46039728 + 8 [jbd2/md2-8]
  8,3    1    47898   116.846463530   870  I   D 46039728 + 8 [jbd2/md2-8]
  8,3    1    47899   116.846463984   870  D   D 46039728 + 8 [jbd2/md2-8]
  8,3    1    47900   116.851970109     0  C   D 46039728 + 1 [0]

there are 5 discard requests, the discard request uses 2ms, 2ms, 10ms, 2ms, 5ms
(from dispatch to finish). this isn't fast definitely. And since
discard runs in jbd, slow
discard will impact other file operations, for example, when journal is full.

So looks like this isn't the fault of my patch. curious is why discard
is fast without md
in your test. Maybe the reason is the files in a new formatted
filesystem haven't
fragmentation, so discard size is big and discard request number is
small, so total
discard time is small too. while your md filesystem might have fragmentation, so
discard size is small and request number is big.

Thanks,
Shaohua

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-14  2:47   ` Shaohua Li
@ 2012-03-17 18:14     ` Mark Lord
  2012-03-18  2:03       ` Shaohua Li
  0 siblings, 1 reply; 33+ messages in thread
From: Mark Lord @ 2012-03-17 18:14 UTC (permalink / raw)
  To: Shaohua Li; +Cc: NeilBrown, linux-kernel, linux-raid, axboe

On 12-03-13 10:47 PM, Shaohua Li wrote:
> On 3/14/12 10:24 AM, NeilBrown wrote:
>>  On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli@fusionio.com> wrote:
..
>>  Also I wonder about handling failure in RAID1.
>>  I think the code will currently treat it like a write error, and
>>  maybe record a bad block (then fail the device is writing the badblock
>>  record fails). Is that what were want?
> Mainly to simplify the code. And I thought a normal discard should not fail.
> If it fails, something is wrong, marked it as badblock maybe not bad.

That sounds like a VERY bad idea.
Failures happen for lots of reasons,
but generic comm errors are not an excuse to suddenly mark sectors as bad.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [patch 0/7] Add TRIM support for raid linear/0/1/10
  2012-03-17 18:14     ` Mark Lord
@ 2012-03-18  2:03       ` Shaohua Li
  0 siblings, 0 replies; 33+ messages in thread
From: Shaohua Li @ 2012-03-18  2:03 UTC (permalink / raw)
  To: Mark Lord; +Cc: NeilBrown, linux-kernel, linux-raid, axboe

2012/3/18 Mark Lord <kernel@teksavvy.com>:
> On 12-03-13 10:47 PM, Shaohua Li wrote:
>> On 3/14/12 10:24 AM, NeilBrown wrote:
>>>  On Mon, 12 Mar 2012 11:04:12 +0800 Shaohua Li <shli@fusionio.com> wrote:
> ..
>>>  Also I wonder about handling failure in RAID1.
>>>  I think the code will currently treat it like a write error, and
>>>  maybe record a bad block (then fail the device is writing the badblock
>>>  record fails). Is that what were want?
>> Mainly to simplify the code. And I thought a normal discard should not fail.
>> If it fails, something is wrong, marked it as badblock maybe not bad.
>
> That sounds like a VERY bad idea.
> Failures happen for lots of reasons,
> but generic comm errors are not an excuse to suddenly mark sectors as bad.
Not sure if I got it, but we treat discard similar like write. Did you
mean discard
request error is common?

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2012-03-18  2:03 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-12  3:04 [patch 0/7] Add TRIM support for raid linear/0/1/10 Shaohua Li
2012-03-12  3:04 ` [patch 1/7] block: makes bio_split support bio without data Shaohua Li
2012-03-12  3:04 ` [patch 2/7] md: linear supports TRIM Shaohua Li
2012-03-12  3:04 ` [patch 3/7] md: raid 0 " Shaohua Li
2012-03-12  3:04 ` [patch 4/7] md: raid 1 " Shaohua Li
2012-03-12  3:04 ` [patch 5/7] md: raid 10 " Shaohua Li
2012-03-12  3:04 ` [patch 6/7] blk: add plug for blkdev_issue_discard Shaohua Li
2012-03-13 15:51   ` Vivek Goyal
2012-03-13 17:04     ` Martin K. Petersen
2012-03-13 17:14       ` Vivek Goyal
2012-03-13 17:19         ` Martin K. Petersen
2012-03-12  3:04 ` [patch 7/7] blk: use correct sectors limitation for discard request Shaohua Li
2012-03-13 16:00   ` Vivek Goyal
2012-03-12  3:18 ` [patch 0/7] Add TRIM support for raid linear/0/1/10 Roberto Spadim
2012-03-12 18:22 ` Holger Kiehl
     [not found]   ` <4F5EFEB6.4060402@kernel.org>
2012-03-13 12:22     ` Holger Kiehl
2012-03-13 14:15       ` Shaohua Li
2012-03-13 14:58         ` Roberto Spadim
2012-03-13 15:44         ` Holger Kiehl
2012-03-14  1:30           ` Shaohua Li
2012-03-14 10:25             ` Holger Kiehl
2012-03-14 11:14               ` Shaohua Li
2012-03-14 11:32                 ` Shaohua Li
2012-03-14 21:01                   ` Holger Kiehl
2012-03-14 21:13                 ` Holger Kiehl
2012-03-15  2:39                   ` Shaohua Li
2012-03-15  9:08                     ` Holger Kiehl
2012-03-16  2:19                       ` Shaohua Li
     [not found]   ` <4F5EA8E9.5010502@fusionio.com>
2012-03-14  2:25     ` NeilBrown
2012-03-14  2:24 ` NeilBrown
2012-03-14  2:47   ` Shaohua Li
2012-03-17 18:14     ` Mark Lord
2012-03-18  2:03       ` Shaohua Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).