All of lore.kernel.org
 help / color / mirror / Atom feed
* Write same support
@ 2012-01-31  0:31 Martin K. Petersen
  2012-01-31  0:31 ` [PATCH 1/5] block: Implement support for WRITE SAME Martin K. Petersen
                   ` (7 more replies)
  0 siblings, 8 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-01-31  0:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, snitzer, jaxboe


Here's my stab at write same support.

I have mainly tested using blkdev_issue_zeroout which I have modified to
use write same if available. It will fall back to writing zeroes the
old-fashioned way if write same fails. I also wired up an ioctl so I
could trigger zeroout from userland.

I have gone back and forth over whether blkdev_issue_write_same() should
take a struct page or a virtual address. It's not such a big deal for
devices with a 512-byte logical block size. But for devices with 4K it's
more of an issue. Comments welcome.

The discovery heuristics is what stopped this patchkit in its tracks
last year. So I've spent the last few days testing with every exotic
(i.e. ancient) SCSI device I could find. I'm pretty happy with how it
works now but it would be great if more people would put it to the test.

I have a 'writesame' branch available at:

	git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git

for those that want to try it out...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [PATCH 1/5] block: Implement support for WRITE SAME
  2012-01-31  0:31 Write same support Martin K. Petersen
@ 2012-01-31  0:31 ` Martin K. Petersen
  2012-02-07 21:40   ` Vivek Goyal
                     ` (2 more replies)
  2012-01-31  0:31 ` [PATCH 2/5] block: Make blkdev_issue_zeroout use " Martin K. Petersen
                   ` (6 subsequent siblings)
  7 siblings, 3 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-01-31  0:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, snitzer, jaxboe, Martin K. Petersen

From: "Martin K. Petersen" <martin.petersen@oracle.com>

The WRITE SAME command supported on some SCSI devices allows the same
block to be efficiently replicated throughout a block range. Only a
single logical block is transferred from the host and the storage device
writes the same data to all blocks described by the I/O.

This patch implements support for WRITE SAME in the block layer. The
blkdev_issue_write_same() function can be used by filesystems and block
drivers to replicate a buffer across a block range. This can be used to
efficiently initialize software RAID devices, etc.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 Documentation/ABI/testing/sysfs-block |   13 +++++
 block/blk-core.c                      |   10 +++-
 block/blk-lib.c                       |   79 +++++++++++++++++++++++++++++++++
 block/blk-settings.c                  |   16 +++++++
 block/blk-sysfs.c                     |   13 +++++
 block/elevator.c                      |    9 +++-
 include/linux/blk_types.h             |    5 ++-
 include/linux/blkdev.h                |   21 ++++++++-
 8 files changed, 160 insertions(+), 6 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
index c1eb41c..e5f4194 100644
--- a/Documentation/ABI/testing/sysfs-block
+++ b/Documentation/ABI/testing/sysfs-block
@@ -206,3 +206,16 @@ Description:
 		when a discarded area is read the discard_zeroes_data
 		parameter will be set to one. Otherwise it will be 0 and
 		the result of reading a discarded area is undefined.
+
+What:		/sys/block/<disk>/queue/max_write_same_bytes
+Date:		January 2012
+Contact:	Martin K. Petersen <martin.petersen@oracle.com>
+Description:
+		Some devices support a write same operation in which a
+		single data block can be written to a range of several
+		contiguous blocks on storage. This can be used to wipe
+		areas on disk or to initialize drives in a RAID
+		configuration. max_write_same_bytes indicates how many
+		bytes can be written in a single write same command. If
+		max_write_same_bytes is 0, write same is not supported
+		by the device.
diff --git a/block/blk-core.c b/block/blk-core.c
index e6c05a9..373ed52 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1592,6 +1592,11 @@ generic_make_request_checks(struct bio *bio)
 		goto end_io;
 	}
 
+	if (bio->bi_rw & REQ_WRITE_SAME && !bdev_write_same(bio->bi_bdev)) {
+		err = -EOPNOTSUPP;
+		goto end_io;
+	}
+
 	if (blk_throtl_bio(q, bio))
 		return false;	/* throttled, will be resubmitted later */
 
@@ -1743,7 +1748,7 @@ EXPORT_SYMBOL(submit_bio);
  */
 int blk_rq_check_limits(struct request_queue *q, struct request *rq)
 {
-	if (rq->cmd_flags & REQ_DISCARD)
+	if ((rq->cmd_flags & REQ_DISCARD) || (rq->cmd_flags & REQ_WRITE_SAME))
 		return 0;
 
 	if (blk_rq_sectors(rq) > queue_max_sectors(q) ||
@@ -2217,7 +2222,8 @@ bool blk_update_request(struct request *req, int error, unsigned int nr_bytes)
 	req->buffer = bio_data(req->bio);
 
 	/* update sector only for requests with clear definition of sector */
-	if (req->cmd_type == REQ_TYPE_FS || (req->cmd_flags & REQ_DISCARD))
+	if (req->cmd_type == REQ_TYPE_FS || (req->cmd_flags & REQ_DISCARD) ||
+	    (req->cmd_flags & REQ_WRITE_SAME))
 		req->__sector += total_bytes >> 9;
 
 	/* mixed attributes always follow the first bio */
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 2b461b4..19f4754 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -115,6 +115,85 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 EXPORT_SYMBOL(blkdev_issue_discard);
 
 /**
+ * blkdev_issue_write_same - queue a write same operation
+ * @bdev:	target blockdev
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to write
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ * @buffer:	pointer to buffer containing data to write
+ * @length:	buffer length. Must be equal to bdev logical block size.
+ *
+ * Description:
+ *    Issue a write same request for the sectors in question.
+ */
+int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+			    sector_t nr_sects, gfp_t gfp_mask, void *buffer,
+			    unsigned int length)
+{
+	DECLARE_COMPLETION_ONSTACK(wait);
+	struct request_queue *q = bdev_get_queue(bdev);
+	unsigned int max_write_same_sectors;
+	struct bio_batch bb;
+	struct bio *bio;
+	int ret = 0;
+
+	if (!q)
+		return -ENXIO;
+
+	max_write_same_sectors = q->limits.max_write_same_sectors;
+
+	if (max_write_same_sectors == 0)
+		return -EOPNOTSUPP;
+
+	BUG_ON(length != bdev_logical_block_size(bdev));
+	BUG_ON(offset_in_page(buffer) + length > PAGE_CACHE_SIZE);
+
+	atomic_set(&bb.done, 1);
+	bb.flags = 1 << BIO_UPTODATE;
+	bb.wait = &wait;
+
+	while (nr_sects) {
+		bio = bio_alloc(gfp_mask, 1);
+		if (!bio) {
+			ret = -ENOMEM;
+			break;
+		}
+
+		bio->bi_sector = sector;
+		bio->bi_end_io = bio_batch_end_io;
+		bio->bi_bdev = bdev;
+		bio->bi_private = &bb;
+		bio->bi_vcnt = 1;
+		bio->bi_phys_segments = 1;
+		bio->bi_io_vec->bv_page = virt_to_page(buffer);
+		bio->bi_io_vec->bv_offset = offset_in_page(buffer);
+		bio->bi_io_vec->bv_len = length;
+
+		if (nr_sects > max_write_same_sectors) {
+			bio->bi_size = max_write_same_sectors << 9;
+			nr_sects -= max_write_same_sectors;
+			sector += max_write_same_sectors;
+		} else {
+			bio->bi_size = nr_sects << 9;
+			nr_sects = 0;
+		}
+
+		atomic_inc(&bb.done);
+		submit_bio(REQ_WRITE | REQ_WRITE_SAME, bio);
+	}
+
+	/* Wait for bios in-flight */
+	if (!atomic_dec_and_test(&bb.done))
+		wait_for_completion(&wait);
+
+	if (!test_bit(BIO_UPTODATE, &bb.flags))
+		ret = -ENOTSUPP;
+
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_write_same);
+
+/**
  * blkdev_issue_zeroout - generate number of zero filed write bios
  * @bdev:	blockdev to issue
  * @sector:	start sector
diff --git a/block/blk-settings.c b/block/blk-settings.c
index d3234fc..5680b91 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -113,6 +113,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->seg_boundary_mask = BLK_SEG_BOUNDARY_MASK;
 	lim->max_segment_size = BLK_MAX_SEGMENT_SIZE;
 	lim->max_sectors = lim->max_hw_sectors = BLK_SAFE_MAX_SECTORS;
+	lim->max_write_same_sectors = 0;
 	lim->max_discard_sectors = 0;
 	lim->discard_granularity = 0;
 	lim->discard_alignment = 0;
@@ -143,6 +144,7 @@ void blk_set_stacking_limits(struct queue_limits *lim)
 	lim->discard_zeroes_data = 1;
 	lim->max_segments = USHRT_MAX;
 	lim->max_hw_sectors = UINT_MAX;
+	lim->max_write_same_sectors = UINT_MAX;
 
 	lim->max_sectors = BLK_DEF_MAX_SECTORS;
 }
@@ -287,6 +289,18 @@ void blk_queue_max_discard_sectors(struct request_queue *q,
 EXPORT_SYMBOL(blk_queue_max_discard_sectors);
 
 /**
+ * blk_queue_max_write_same_sectors - set max sectors for a single write same
+ * @q:  the request queue for the device
+ * @max_write_same_sectors: maximum number of sectors to write per command
+ **/
+void blk_queue_max_write_same_sectors(struct request_queue *q,
+				      unsigned int max_write_same_sectors)
+{
+	q->limits.max_write_same_sectors = max_write_same_sectors;
+}
+EXPORT_SYMBOL(blk_queue_max_write_same_sectors);
+
+/**
  * blk_queue_max_segments - set max hw segments for a request for this queue
  * @q:  the request queue for the device
  * @max_segments:  max number of segments
@@ -511,6 +525,8 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 
 	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
 	t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
+	t->max_write_same_sectors = min(t->max_write_same_sectors,
+					b->max_write_same_sectors);
 	t->bounce_pfn = min_not_zero(t->bounce_pfn, b->bounce_pfn);
 
 	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask,
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index cf15001..e0b38d2 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -161,6 +161,13 @@ static ssize_t queue_discard_zeroes_data_show(struct request_queue *q, char *pag
 	return queue_var_show(queue_discard_zeroes_data(q), page);
 }
 
+static ssize_t queue_max_write_same_show(struct request_queue *q, char *page)
+{
+	return sprintf(page, "%llu\n",
+		(unsigned long long)q->limits.max_write_same_sectors << 9);
+}
+
+
 static ssize_t
 queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 {
@@ -357,6 +364,11 @@ static struct queue_sysfs_entry queue_discard_zeroes_data_entry = {
 	.show = queue_discard_zeroes_data_show,
 };
 
+static struct queue_sysfs_entry queue_max_write_same_entry = {
+	.attr = {.name = "max_write_same_bytes", .mode = S_IRUGO },
+	.show = queue_max_write_same_show,
+};
+
 static struct queue_sysfs_entry queue_nonrot_entry = {
 	.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_show_nonrot,
@@ -404,6 +416,7 @@ static struct attribute *default_attrs[] = {
 	&queue_discard_granularity_entry.attr,
 	&queue_discard_max_entry.attr,
 	&queue_discard_zeroes_data_entry.attr,
+	&queue_max_write_same_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
diff --git a/block/elevator.c b/block/elevator.c
index 91e18f8..d6f7703 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -82,6 +82,12 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
 		return 0;
 
 	/*
+	 * Don't merge write same requests
+	 */
+	if ((bio->bi_rw & REQ_WRITE_SAME) || (rq->bio->bi_rw & REQ_WRITE_SAME))
+		return 0;
+
+	/*
 	 * Don't merge discard requests and secure discard requests
 	 */
 	if ((bio->bi_rw & REQ_SECURE) != (rq->bio->bi_rw & REQ_SECURE))
@@ -639,7 +645,8 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 	if (rq->cmd_flags & REQ_SOFTBARRIER) {
 		/* barriers are scheduling boundary, update end_sector */
 		if (rq->cmd_type == REQ_TYPE_FS ||
-		    (rq->cmd_flags & REQ_DISCARD)) {
+		    (rq->cmd_flags & REQ_DISCARD) ||
+		    (rq->cmd_flags & REQ_WRITE_SAME)) {
 			q->end_sector = rq_end_sector(rq);
 			q->boundary_rq = rq;
 		}
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 4053cbd..b312ac8 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -124,6 +124,7 @@ enum rq_flag_bits {
 	__REQ_PRIO,		/* boost priority in cfq */
 	__REQ_DISCARD,		/* request to discard sectors */
 	__REQ_SECURE,		/* secure discard (used with __REQ_DISCARD) */
+	__REQ_WRITE_SAME,	/* write same block many times */
 
 	__REQ_NOIDLE,		/* don't anticipate more IO after this one */
 	__REQ_FUA,		/* forced unit access */
@@ -162,12 +163,14 @@ enum rq_flag_bits {
 #define REQ_PRIO		(1 << __REQ_PRIO)
 #define REQ_DISCARD		(1 << __REQ_DISCARD)
 #define REQ_NOIDLE		(1 << __REQ_NOIDLE)
+#define REQ_WRITE_SAME		(1 << __REQ_WRITE_SAME)
 
 #define REQ_FAILFAST_MASK \
 	(REQ_FAILFAST_DEV | REQ_FAILFAST_TRANSPORT | REQ_FAILFAST_DRIVER)
 #define REQ_COMMON_MASK \
 	(REQ_WRITE | REQ_FAILFAST_MASK | REQ_SYNC | REQ_META | REQ_PRIO | \
-	 REQ_DISCARD | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | REQ_SECURE)
+	 REQ_DISCARD | REQ_WRITE_SAME | REQ_NOIDLE | REQ_FLUSH | REQ_FUA | \
+	 REQ_SECURE)
 #define REQ_CLONE_MASK		REQ_COMMON_MASK
 
 #define REQ_RAHEAD		(1 << __REQ_RAHEAD)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6c6a1f0..4e2f049 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -254,6 +254,7 @@ struct queue_limits {
 	unsigned int		io_min;
 	unsigned int		io_opt;
 	unsigned int		max_discard_sectors;
+	unsigned int		max_write_same_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 
@@ -581,8 +582,9 @@ static inline void blk_clear_queue_full(struct request_queue *q, int sync)
 	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
 #define rq_mergeable(rq)	\
 	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
-	 (((rq)->cmd_flags & REQ_DISCARD) || \
-	  (rq)->cmd_type == REQ_TYPE_FS))
+	 (((rq)->cmd_flags & REQ_DISCARD || \
+	   (rq)->cmd_flags & REQ_WRITE_SAME ||	\
+	   (rq)->cmd_type == REQ_TYPE_FS)))
 
 /*
  * q->prep_rq_fn return values
@@ -835,6 +837,8 @@ extern void blk_queue_max_segments(struct request_queue *, unsigned short);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_discard_sectors(struct request_queue *q,
 		unsigned int max_discard_sectors);
+extern void blk_queue_max_write_same_sectors(struct request_queue *q,
+		unsigned int max_write_same_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned short);
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned int);
 extern void blk_queue_alignment_offset(struct request_queue *q,
@@ -959,6 +963,9 @@ static inline struct request *blk_map_queue_find_tag(struct blk_queue_tag *bqt,
 extern int blkdev_issue_flush(struct block_device *, gfp_t, sector_t *);
 extern int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 		sector_t nr_sects, gfp_t gfp_mask, unsigned long flags);
+extern int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
+		sector_t nr_sects, gfp_t gfp_mask, void *buffer,
+		unsigned int length);
 extern int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			sector_t nr_sects, gfp_t gfp_mask);
 static inline int sb_issue_discard(struct super_block *sb, sector_t block,
@@ -1126,6 +1133,16 @@ static inline unsigned int bdev_discard_zeroes_data(struct block_device *bdev)
 	return queue_discard_zeroes_data(bdev_get_queue(bdev));
 }
 
+static inline unsigned int bdev_write_same(struct block_device *bdev)
+{
+	struct request_queue *q = bdev_get_queue(bdev);
+
+	if (q)
+		return q->limits.max_write_same_sectors;
+
+	return 0;
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
-- 
1.7.8.3.21.gab8a7


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 2/5] block: Make blkdev_issue_zeroout use WRITE SAME
  2012-01-31  0:31 Write same support Martin K. Petersen
  2012-01-31  0:31 ` [PATCH 1/5] block: Implement support for WRITE SAME Martin K. Petersen
@ 2012-01-31  0:31 ` Martin K. Petersen
  2012-01-31  0:31 ` [PATCH 3/5] block: ioctl to zero block ranges Martin K. Petersen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-01-31  0:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, snitzer, jaxboe, Martin K. Petersen

From: "Martin K. Petersen" <martin.petersen@oracle.com>

If the device supports WRITE SAME, use that to optimize zeroing of
blocks. If the device does not support WRITE SAME or if the operation
fails, fall back to writing zeroes the old-fashioned way.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c |   32 +++++++++++++++++++++++++++++++-
 1 files changed, 31 insertions(+), 1 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index 19f4754..458ffff 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -204,7 +204,7 @@ EXPORT_SYMBOL(blkdev_issue_write_same);
  *  Generate and issue number of bios with zerofiled pages.
  */
 
-int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+int __blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 			sector_t nr_sects, gfp_t gfp_mask)
 {
 	int ret;
@@ -254,4 +254,34 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 
 	return ret;
 }
+
+/**
+ * blkdev_issue_zeroout - zero-fill a block range
+ * @bdev:	blockdev to write
+ * @sector:	start sector
+ * @nr_sects:	number of sectors to write
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *  Generate and issue number of bios with zerofiled pages.
+ */
+
+int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
+			 sector_t nr_sects, gfp_t gfp_mask)
+{
+	if (bdev_write_same(bdev)) {
+		unsigned char bdn[BDEVNAME_SIZE];
+
+		if (!blkdev_issue_write_same(bdev, sector, nr_sects, gfp_mask,
+					     empty_zero_page,
+					     bdev_logical_block_size(bdev)))
+			return 0;
+
+		bdevname(bdev, bdn);
+		printk(KERN_ERR "%s: WRITE SAME failed. Manually zeroing.\n",
+		       bdn);
+	}
+
+	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
+}
 EXPORT_SYMBOL(blkdev_issue_zeroout);
-- 
1.7.8.3.21.gab8a7


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 3/5] block: ioctl to zero block ranges
  2012-01-31  0:31 Write same support Martin K. Petersen
  2012-01-31  0:31 ` [PATCH 1/5] block: Implement support for WRITE SAME Martin K. Petersen
  2012-01-31  0:31 ` [PATCH 2/5] block: Make blkdev_issue_zeroout use " Martin K. Petersen
@ 2012-01-31  0:31 ` Martin K. Petersen
  2012-01-31  0:31 ` [PATCH 4/5] scsi: Add a report opcode helper Martin K. Petersen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-01-31  0:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, snitzer, jaxboe, Martin K. Petersen

From: "Martin K. Petersen" <martin.petersen@oracle.com>

Introduce a BLKZEROOUT ioctl which can be used to clear block ranges by
way of blkdev_issue_zeroout().

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/ioctl.c      |   27 +++++++++++++++++++++++++++
 include/linux/fs.h |    1 +
 2 files changed, 28 insertions(+), 0 deletions(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index ba15b2d..0c6e633 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -132,6 +132,22 @@ static int blk_ioctl_discard(struct block_device *bdev, uint64_t start,
 	return blkdev_issue_discard(bdev, start, len, GFP_KERNEL, flags);
 }
 
+static int blk_ioctl_zeroout(struct block_device *bdev, uint64_t start,
+			     uint64_t len)
+{
+	if (start & 511)
+		return -EINVAL;
+	if (len & 511)
+		return -EINVAL;
+	start >>= 9;
+	len >>= 9;
+
+	if (start + len > (i_size_read(bdev->bd_inode) >> 9))
+		return -EINVAL;
+
+	return blkdev_issue_zeroout(bdev, start, len, GFP_KERNEL);
+}
+
 static int put_ushort(unsigned long arg, unsigned short val)
 {
 	return put_user(val, (unsigned short __user *)arg);
@@ -247,6 +263,17 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 		return blk_ioctl_discard(bdev, range[0], range[1],
 					 cmd == BLKSECDISCARD);
 	}
+	case BLKZEROOUT: {
+		uint64_t range[2];
+
+		if (!(mode & FMODE_WRITE))
+			return -EBADF;
+
+		if (copy_from_user(range, (void __user *)arg, sizeof(range)))
+			return -EFAULT;
+
+		return blk_ioctl_zeroout(bdev, range[0], range[1]);
+	}
 
 	case HDIO_GETGEO: {
 		struct hd_geometry geo;
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 386da09..e918ef7 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -320,6 +320,7 @@ struct inodes_stat_t {
 #define BLKDISCARDZEROES _IO(0x12,124)
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
+#define BLKZEROOUT _IO(0x12,127)
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
1.7.8.3.21.gab8a7


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 4/5] scsi: Add a report opcode helper
  2012-01-31  0:31 Write same support Martin K. Petersen
                   ` (2 preceding siblings ...)
  2012-01-31  0:31 ` [PATCH 3/5] block: ioctl to zero block ranges Martin K. Petersen
@ 2012-01-31  0:31 ` Martin K. Petersen
  2012-01-31 19:53   ` Jeff Garzik
  2012-01-31  0:31 ` [PATCH 5/5] sd: Implement support for WRITE SAME Martin K. Petersen
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-01-31  0:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, snitzer, jaxboe, Martin K. Petersen

From: "Martin K. Petersen" <martin.petersen@oracle.com>

The REPORT SUPPORTED OPERATION CODES command can be used to query
whether a given opcode is supported by a device. Add a helper function
that allows us to look up commands.

We only issue RSOC if the device reports compliance with SPC-3 or
later. But to err on the side of caution we disable the command for ATA,
FireWire and USB.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/ata/libata-scsi.c      |    1 +
 drivers/firewire/sbp2.c        |    1 +
 drivers/scsi/scsi.c            |   45 ++++++++++++++++++++++++++++++++++++++++
 drivers/usb/storage/scsiglue.c |    3 ++
 include/scsi/scsi_device.h     |    3 ++
 5 files changed, 53 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 508a60b..b6db28f 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1052,6 +1052,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
 {
 	sdev->use_10_for_rw = 1;
 	sdev->use_10_for_ms = 1;
+	sdev->no_report_opcodes = 1;
 
 	/* Schedule policy is determined by ->qc_defer() callback and
 	 * it needs to see every deferred qc.  Set dev_blocked to 1 to
diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index 80e95aa..d956dd38 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1517,6 +1517,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device *sdev)
 	struct sbp2_logical_unit *lu = sdev->hostdata;
 
 	sdev->use_10_for_rw = 1;
+	sdev->no_report_opcodes = 1;
 
 	if (sbp2_param_exclusive_login)
 		sdev->manage_start_stop = 1;
diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index 2aeb2e9..8d00214 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -54,6 +54,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/mutex.h>
+#include <asm/unaligned.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -1063,6 +1064,50 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf,
 EXPORT_SYMBOL_GPL(scsi_get_vpd_page);
 
 /**
+ * scsi_report_opcode - Find out if a given command opcode is supported
+ * @sdp:	scsi device to query
+ * @buffer:	scratch buffer (must be at least 20 bytes long)
+ * @len:	length of buffer
+ * @opcode:	opcode for command to look up
+ *
+ * Uses the REPORT SUPPORTED OPERATION CODES to look up the given
+ * opcode. Returns 0 if RSOC fails or if the command opcode is
+ * unsupported. Returns 1 if the device claims to support the command.
+ */
+int scsi_report_opcode(struct scsi_device *sdp, unsigned char *buffer,
+		       unsigned int len, unsigned char opcode)
+{
+	unsigned char cmd[16];
+	struct scsi_sense_hdr sshdr;
+	int result;
+
+	if (sdp->no_report_opcodes || sdp->scsi_level < SCSI_SPC_3)
+		return 0;
+
+	memset(cmd, 0, 16);
+	cmd[0] = MAINTENANCE_IN;
+	cmd[1] = MI_REPORT_SUPPORTED_OPERATION_CODES;
+	cmd[2] = 1;		/* One command format */
+	cmd[3] = opcode;
+	put_unaligned_be32(len, &cmd[6]);
+	memset(buffer, 0, len);
+
+	result = scsi_execute_req(sdp, cmd, DMA_FROM_DEVICE, buffer, len,
+				  &sshdr, 30 * HZ, 3, NULL);
+
+	if (result && scsi_sense_valid(&sshdr) &&
+	    sshdr.sense_key == ILLEGAL_REQUEST &&
+	    (sshdr.asc == 0x20 || sshdr.asc == 0x24) && sshdr.ascq == 0x00)
+		return 0;
+
+	if ((buffer[1] & 3) == 3) /* Command supported */
+		return 1;
+
+	return 0;
+}
+EXPORT_SYMBOL(scsi_report_opcode);
+
+/**
  * scsi_device_get  -  get an additional reference to a scsi_device
  * @sdev:	device to get a reference to
  *
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 13b8bcd..61178b8 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -197,6 +197,9 @@ static int slave_configure(struct scsi_device *sdev)
 		 * page x08, so we will skip it. */
 		sdev->skip_ms_page_8 = 1;
 
+		/* Do not attempt to use REPORT SUPPORTED OPERATION CODES */
+		sdev->no_report_opcodes = 1;
+
 		/* Some disks return the total number of blocks in response
 		 * to READ CAPACITY rather than the highest block number.
 		 * If this device makes that mistake, tell the sd driver. */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 77273f2..59f31dc 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -134,6 +134,7 @@ struct scsi_device {
 				     * because we did a bus reset. */
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
+	unsigned no_report_opcodes:1;	/* no REPORT SUPPORTED OPERATION CODES */
 	unsigned skip_ms_page_8:1;	/* do not use MODE SENSE page 0x08 */
 	unsigned skip_ms_page_3f:1;	/* do not use MODE SENSE page 0x3f */
 	unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page 0x3f */
@@ -354,6 +355,8 @@ extern int scsi_test_unit_ready(struct scsi_device *sdev, int timeout,
 				int retries, struct scsi_sense_hdr *sshdr);
 extern int scsi_get_vpd_page(struct scsi_device *, u8 page, unsigned char *buf,
 			     int buf_len);
+extern int scsi_report_opcode(struct scsi_device *sdp, unsigned char *buffer,
+			      unsigned int len, unsigned char opcode);
 extern int scsi_device_set_state(struct scsi_device *sdev,
 				 enum scsi_device_state state);
 extern struct scsi_event *sdev_evt_alloc(enum scsi_device_event evt_type,
-- 
1.7.8.3.21.gab8a7


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* [PATCH 5/5] sd: Implement support for WRITE SAME
  2012-01-31  0:31 Write same support Martin K. Petersen
                   ` (3 preceding siblings ...)
  2012-01-31  0:31 ` [PATCH 4/5] scsi: Add a report opcode helper Martin K. Petersen
@ 2012-01-31  0:31 ` Martin K. Petersen
  2012-02-20 16:16   ` Mike Snitzer
  2012-02-03 19:15 ` Write same support Mike Snitzer
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-01-31  0:31 UTC (permalink / raw)
  To: linux-scsi; +Cc: James.Bottomley, snitzer, jaxboe, Martin K. Petersen

From: "Martin K. Petersen" <martin.petersen@oracle.com>

Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
driver.

 - We set the default maximum to 0xFFFF because there are several
   devices out there that only support two-byte block counts even with
   WRITE SAME(16). We only enable transfers bigger than 0xFFFF if the
   device explicitly reports MAXIMUM WRITE SAME LENGTH in the BLOCK
   LIMITS VPD.

 - max_write_same_blocks can be overriden per-device basis in sysfs.

 - The UNMAP discovery heuristics remain unchanged but the discard
   limits are tweaked to match the "real" WRITE SAME commands.

 - In the error handling logic we now distinguish between WRITE SAME
   with and without UNMAP set.


The discovery process heuristics are:

 - If the device reports a SCSI level of SPC-3 or greater we'll issue
   READ SUPPORTED OPERATION CODES to find out whether WRITE SAME(16) is
   supported. If that's the case we will use it.

 - If the device supports the block limits VPD and reports a MAXIMUM
   WRITE SAME LENGTH bigger than 0xFFFF we will use WRITE SAME(16).

 - Otherwise we will use WRITE SAME(10) unless the target LBA is beyond
   0xFFFFFFFF or the block count exceeds 0xFFFF.

 - no_write_same is set for ATA, FireWire and USB.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/ata/libata-scsi.c      |    1 +
 drivers/firewire/sbp2.c        |    1 +
 drivers/scsi/scsi_lib.c        |   22 ++++-
 drivers/scsi/sd.c              |  173 +++++++++++++++++++++++++++++++++++++---
 drivers/scsi/sd.h              |    7 ++
 drivers/usb/storage/scsiglue.c |    3 +
 include/scsi/scsi_device.h     |    1 +
 7 files changed, 193 insertions(+), 15 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index b6db28f..dde907b 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1053,6 +1053,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
 	sdev->use_10_for_rw = 1;
 	sdev->use_10_for_ms = 1;
 	sdev->no_report_opcodes = 1;
+	sdev->no_write_same = 1;
 
 	/* Schedule policy is determined by ->qc_defer() callback and
 	 * it needs to see every deferred qc.  Set dev_blocked to 1 to
diff --git a/drivers/firewire/sbp2.c b/drivers/firewire/sbp2.c
index d956dd38..52b2bac 100644
--- a/drivers/firewire/sbp2.c
+++ b/drivers/firewire/sbp2.c
@@ -1518,6 +1518,7 @@ static int sbp2_scsi_slave_configure(struct scsi_device *sdev)
 
 	sdev->use_10_for_rw = 1;
 	sdev->no_report_opcodes = 1;
+	sdev->no_write_same = 1;
 
 	if (sbp2_param_exclusive_login)
 		sdev->manage_start_stop = 1;
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b2c95db..560b63f 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -874,11 +874,23 @@ void scsi_io_completion(struct scsi_cmnd *cmd, unsigned int good_bytes)
 				action = ACTION_FAIL;
 				error = -EILSEQ;
 			/* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
-			} else if ((sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-				   (cmd->cmnd[0] == UNMAP ||
-				    cmd->cmnd[0] == WRITE_SAME_16 ||
-				    cmd->cmnd[0] == WRITE_SAME)) {
-				description = "Discard failure";
+			} else if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+				switch(cmd->cmnd[0]) {
+				case UNMAP:
+					description = "Discard failure";
+					break;
+				case WRITE_SAME:
+				case WRITE_SAME_16:
+					if (cmd->cmnd[1] & 0x8)
+						description = "Discard failure";
+					else
+						description =
+							"Write same failure";
+					break;
+				default:
+					description = "Invalid command failure";
+					break;
+				}
 				action = ACTION_FAIL;
 			} else
 				action = ACTION_FAIL;
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index c691fb5..40946a2 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -98,6 +98,7 @@ MODULE_ALIAS_SCSI_DEVICE(TYPE_RBC);
 #endif
 
 static void sd_config_discard(struct scsi_disk *, unsigned int);
+static void sd_config_write_same(struct scsi_disk *);
 static int  sd_revalidate_disk(struct gendisk *);
 static void sd_unlock_native_capacity(struct gendisk *disk);
 static int  sd_probe(struct device *);
@@ -346,6 +347,41 @@ sd_store_provisioning_mode(struct device *dev, struct device_attribute *attr,
 	return count;
 }
 
+static ssize_t
+sd_show_write_same_blocks(struct device *dev, struct device_attribute *attr,
+			  char *buf)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+
+	return snprintf(buf, 20, "%u\n", sdkp->max_ws_blocks);
+}
+
+static ssize_t
+sd_store_write_same_blocks(struct device *dev, struct device_attribute *attr,
+			   const char *buf, size_t count)
+{
+	struct scsi_disk *sdkp = to_scsi_disk(dev);
+	struct scsi_device *sdp = sdkp->device;
+	unsigned long max;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EACCES;
+
+	if (sdp->type != TYPE_DISK)
+		return -EINVAL;
+
+	max = simple_strtoul(buf, NULL, 10);
+
+	if (max == 0)
+		sdp->no_write_same = 1;
+	else if (max <= SD_MAX_WS16_BLOCKS)
+		sdkp->max_ws_blocks = max;
+
+	sd_config_write_same(sdkp);
+
+	return count;
+}
+
 static struct device_attribute sd_disk_attrs[] = {
 	__ATTR(cache_type, S_IRUGO|S_IWUSR, sd_show_cache_type,
 	       sd_store_cache_type),
@@ -360,6 +396,8 @@ static struct device_attribute sd_disk_attrs[] = {
 	__ATTR(thin_provisioning, S_IRUGO, sd_show_thin_provisioning, NULL),
 	__ATTR(provisioning_mode, S_IRUGO|S_IWUSR, sd_show_provisioning_mode,
 	       sd_store_provisioning_mode),
+	__ATTR(max_write_same_blocks, S_IRUGO|S_IWUSR,
+	       sd_show_write_same_blocks, sd_store_write_same_blocks),
 	__ATTR_NULL,
 };
 
@@ -505,19 +543,23 @@ static void sd_config_discard(struct scsi_disk *sdkp, unsigned int mode)
 		return;
 
 	case SD_LBP_UNMAP:
-		max_blocks = min_not_zero(sdkp->max_unmap_blocks, 0xffffffff);
+		max_blocks = min_not_zero(sdkp->max_unmap_blocks,
+					  (u32)SD_MAX_WS16_BLOCKS);
 		break;
 
 	case SD_LBP_WS16:
-		max_blocks = min_not_zero(sdkp->max_ws_blocks, 0xffffffff);
+		max_blocks = min_not_zero(sdkp->max_ws_blocks,
+					  (u32)SD_MAX_WS16_BLOCKS);
 		break;
 
 	case SD_LBP_WS10:
-		max_blocks = min_not_zero(sdkp->max_ws_blocks, (u32)0xffff);
+		max_blocks = min_not_zero(sdkp->max_ws_blocks,
+					  (u32)SD_MAX_WS10_BLOCKS);
 		break;
 
 	case SD_LBP_ZERO:
-		max_blocks = min_not_zero(sdkp->max_ws_blocks, (u32)0xffff);
+		max_blocks = min_not_zero(sdkp->max_ws_blocks,
+					  (u32)SD_MAX_WS10_BLOCKS);
 		q->limits.discard_zeroes_data = 1;
 		break;
 	}
@@ -615,6 +657,83 @@ out:
 	return ret;
 }
 
+static void sd_config_write_same(struct scsi_disk *sdkp)
+{
+	struct request_queue *q = sdkp->disk->queue;
+	unsigned int logical_block_size = sdkp->device->sector_size;
+	unsigned int blocks = 0;
+
+	if (sdkp->device->no_write_same) {
+		sdkp->max_ws_blocks = 0;
+		goto out;
+	}
+
+	/* Some devices can not handle block counts above 0xffff despite
+	 * supporting WRITE SAME(16). Consequently we default to 64k
+	 * blocks per I/O unless the device explicitly advertises a
+	 * bigger limit.
+	 */
+	if (sdkp->max_ws_blocks == 0)
+		sdkp->max_ws_blocks = SD_MAX_WS10_BLOCKS;
+
+	if (sdkp->ws16 || sdkp->max_ws_blocks > SD_MAX_WS10_BLOCKS)
+		blocks = min_not_zero(sdkp->max_ws_blocks,
+				      (u32)SD_MAX_WS16_BLOCKS);
+	else
+		blocks = min_not_zero(sdkp->max_ws_blocks,
+				      (u32)SD_MAX_WS10_BLOCKS);
+
+out:
+	blk_queue_max_write_same_sectors(q, blocks * (logical_block_size >> 9));
+}
+
+/**
+ * sd_setup_write_same_cmnd - write the same data to multiple blocks
+ * @sdp: scsi device to operate one
+ * @rq: Request to prepare
+ *
+ * Will issue either WRITE SAME(10) or WRITE SAME(16) depending on
+ * preference indicated by target device.
+ **/
+static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
+{
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct bio *bio = rq->bio;
+	sector_t sector = bio->bi_sector;
+	unsigned int nr_sectors = bio_sectors(bio);
+
+	if (sdkp->device->no_write_same)
+		return BLKPREP_KILL;
+
+	if (sdkp->device->sector_size == 4096) {
+		sector >>= 3;
+		nr_sectors >>= 3;
+	}
+
+	rq->timeout = SD_WRITE_SAME_TIMEOUT;
+	rq->__data_len = rq->resid_len = sdp->sector_size;
+	memset(rq->cmd, 0, rq->cmd_len);
+
+	if (rq->nr_phys_segments == 0) {
+		printk(KERN_ERR "%s: Setting phys_segs!\n", __func__);
+		rq->nr_phys_segments = bio->bi_phys_segments;
+	}
+
+	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff) {
+		rq->cmd_len = 16;
+		rq->cmd[0] = WRITE_SAME_16;
+		put_unaligned_be64(sector, &rq->cmd[2]);
+		put_unaligned_be32(nr_sectors, &rq->cmd[10]);
+	} else {
+		rq->cmd_len = 10;
+		rq->cmd[0] = WRITE_SAME;
+		put_unaligned_be32(sector, &rq->cmd[2]);
+		put_unaligned_be16(nr_sectors, &rq->cmd[7]);
+	}
+
+	return scsi_setup_blk_pc_cmnd(sdp, rq);
+}
+
 static int scsi_setup_flush_cmnd(struct scsi_device *sdp, struct request *rq)
 {
 	rq->timeout = SD_FLUSH_TIMEOUT;
@@ -660,6 +779,9 @@ static int sd_prep_fn(struct request_queue *q, struct request *rq)
 	if (rq->cmd_flags & REQ_DISCARD) {
 		ret = scsi_setup_discard_cmnd(sdp, rq);
 		goto out;
+	} else if (rq->cmd_flags & REQ_WRITE_SAME) {
+		ret = sd_setup_write_same_cmnd(sdp, rq);
+		goto out;
 	} else if (rq->cmd_flags & REQ_FLUSH) {
 		ret = scsi_setup_flush_cmnd(sdp, rq);
 		goto out;
@@ -1375,13 +1497,21 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 	unsigned int good_bytes = result ? 0 : scsi_bufflen(SCpnt);
 	struct scsi_sense_hdr sshdr;
 	struct scsi_disk *sdkp = scsi_disk(SCpnt->request->rq_disk);
+	struct request *req = SCpnt->request;
 	int sense_valid = 0;
 	int sense_deferred = 0;
 	unsigned char op = SCpnt->cmnd[0];
+	unsigned char unmap = SCpnt->cmnd[1] & 8;
 
 	if ((SCpnt->request->cmd_flags & REQ_DISCARD) && !result)
 		scsi_set_resid(SCpnt, 0);
 
+	if ((SCpnt->request->cmd_flags & REQ_WRITE_SAME) && !result) {
+		/* Complete entire bio, not just the single block transferred */
+		good_bytes = req->bio->bi_size;
+		scsi_set_resid(SCpnt, 0);
+	}
+
 	if (result) {
 		sense_valid = scsi_command_normalize_sense(SCpnt, &sshdr);
 		if (sense_valid)
@@ -1427,9 +1557,25 @@ static int sd_done(struct scsi_cmnd *SCpnt)
 		if (sshdr.asc == 0x10)  /* DIX: Host detected corruption */
 			good_bytes = sd_completed_bytes(SCpnt);
 		/* INVALID COMMAND OPCODE or INVALID FIELD IN CDB */
-		if ((sshdr.asc == 0x20 || sshdr.asc == 0x24) &&
-		    (op == UNMAP || op == WRITE_SAME_16 || op == WRITE_SAME))
-			sd_config_discard(sdkp, SD_LBP_DISABLE);
+		if (sshdr.asc == 0x20 || sshdr.asc == 0x24) {
+			switch (op) {
+			case UNMAP:
+				sd_config_discard(sdkp, SD_LBP_DISABLE);
+				break;
+			case WRITE_SAME_16:
+			case WRITE_SAME:
+				if (unmap)
+					sd_config_discard(sdkp, SD_LBP_DISABLE);
+				else {
+					sdkp->device->no_write_same = 1;
+					sd_config_write_same(sdkp);
+
+					good_bytes = 0;
+					req->__data_len = req->bio->bi_size;
+					req->cmd_flags |= REQ_QUIET;
+				}
+			}
+		}
 		break;
 	default:
 		break;
@@ -2247,9 +2393,7 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
 	if (buffer[3] == 0x3c) {
 		unsigned int lba_count, desc_count;
 
-		sdkp->max_ws_blocks =
-			(u32) min_not_zero(get_unaligned_be64(&buffer[36]),
-					   (u64)0xffffffff);
+		sdkp->max_ws_blocks = (u32)get_unaligned_be64(&buffer[36]);
 
 		if (!sdkp->lbpme)
 			goto out;
@@ -2342,6 +2486,13 @@ static void sd_read_block_provisioning(struct scsi_disk *sdkp)
 	kfree(buffer);
 }
 
+static void sd_read_write_same(struct scsi_disk *sdkp, unsigned char *buffer)
+{
+	if (scsi_report_opcode(sdkp->device, buffer, SD_BUF_SIZE,
+			       WRITE_SAME_16))
+		sdkp->ws16 = 1;
+}
+
 static int sd_try_extended_inquiry(struct scsi_device *sdp)
 {
 	/*
@@ -2401,6 +2552,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
+		sd_read_write_same(sdkp, buffer);
 	}
 
 	sdkp->first_scan = 0;
@@ -2418,6 +2570,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 	blk_queue_flush(sdkp->disk->queue, flush);
 
 	set_capacity(disk, sdkp->capacity);
+	sd_config_write_same(sdkp);
 	kfree(buffer);
 
  out:
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 4163f29..5923e42 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -14,6 +14,7 @@
 #define SD_TIMEOUT		(30 * HZ)
 #define SD_MOD_TIMEOUT		(75 * HZ)
 #define SD_FLUSH_TIMEOUT	(60 * HZ)
+#define SD_WRITE_SAME_TIMEOUT	(120 * HZ)
 
 /*
  * Number of allowed retries
@@ -38,6 +39,11 @@ enum {
 };
 
 enum {
+	SD_MAX_WS10_BLOCKS = 0xffff,
+	SD_MAX_WS16_BLOCKS = 0x7fffff,
+};
+
+enum {
 	SD_LBP_FULL = 0,	/* Full logical block provisioning */
 	SD_LBP_UNMAP,		/* Use UNMAP command */
 	SD_LBP_WS16,		/* Use WRITE SAME(16) with UNMAP bit */
@@ -74,6 +80,7 @@ struct scsi_disk {
 	unsigned	lbpws : 1;
 	unsigned	lbpws10 : 1;
 	unsigned	lbpvpd : 1;
+	unsigned	ws16 : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
diff --git a/drivers/usb/storage/scsiglue.c b/drivers/usb/storage/scsiglue.c
index 61178b8..7aaf5d4 100644
--- a/drivers/usb/storage/scsiglue.c
+++ b/drivers/usb/storage/scsiglue.c
@@ -200,6 +200,9 @@ static int slave_configure(struct scsi_device *sdev)
 		/* Do not attempt to use REPORT SUPPORTED OPERATION CODES */
 		sdev->no_report_opcodes = 1;
 
+		/* Do not attempt to use WRITE SAME */
+		sdev->no_write_same = 1;
+
 		/* Some disks return the total number of blocks in response
 		 * to READ CAPACITY rather than the highest block number.
 		 * If this device makes that mistake, tell the sd driver. */
diff --git a/include/scsi/scsi_device.h b/include/scsi/scsi_device.h
index 59f31dc..c1d8c2f 100644
--- a/include/scsi/scsi_device.h
+++ b/include/scsi/scsi_device.h
@@ -135,6 +135,7 @@ struct scsi_device {
 	unsigned use_10_for_rw:1; /* first try 10-byte read / write */
 	unsigned use_10_for_ms:1; /* first try 10-byte mode sense/select */
 	unsigned no_report_opcodes:1;	/* no REPORT SUPPORTED OPERATION CODES */
+	unsigned no_write_same:1;	/* no WRITE SAME command */
 	unsigned skip_ms_page_8:1;	/* do not use MODE SENSE page 0x08 */
 	unsigned skip_ms_page_3f:1;	/* do not use MODE SENSE page 0x3f */
 	unsigned use_192_bytes_for_3f:1; /* ask for 192 bytes from page 0x3f */
-- 
1.7.8.3.21.gab8a7


^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/5] scsi: Add a report opcode helper
  2012-01-31  0:31 ` [PATCH 4/5] scsi: Add a report opcode helper Martin K. Petersen
@ 2012-01-31 19:53   ` Jeff Garzik
  2012-01-31 20:16     ` Martin K. Petersen
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff Garzik @ 2012-01-31 19:53 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, snitzer, jaxboe

On 01/30/2012 07:31 PM, Martin K. Petersen wrote:
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1052,6 +1052,7 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
>   {
>   	sdev->use_10_for_rw = 1;
>   	sdev->use_10_for_ms = 1;
> +	sdev->no_report_opcodes = 1;
>
>   	/* Schedule policy is determined by ->qc_defer() callback and
>   	 * it needs to see every deferred qc.  Set dev_blocked to 1 to


While this is bare minimum acceptable, remember this covers ATA and 
ATAPI devices both.

A preferred patch would simulate the RSOC op for ATA devices, and only 
apply the above condition to ATAPI devices.

	Jeff



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 4/5] scsi: Add a report opcode helper
  2012-01-31 19:53   ` Jeff Garzik
@ 2012-01-31 20:16     ` Martin K. Petersen
  0 siblings, 0 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-01-31 20:16 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, snitzer, jaxboe

>>>>> "Jeff" == Jeff Garzik <jeff@garzik.org> writes:

Hey Jeff,

Jeff> While this is bare minimum acceptable, remember this covers ATA
Jeff> and ATAPI devices both.

Proper write same support is making its way through T13. It is my intent
to wire it up in libata once I find a SATA device that implements it.


Jeff> A preferred patch would simulate the RSOC op for ATA devices, and
Jeff> only apply the above condition to ATAPI devices.

I can do that if you insist. It just adds a superfluous heuristic to the
SCSI pile since I still need the existing flag to short out USB and
FireWire. I also can't rely exclusively on RSOC since only very recent
devices support it (and not even all of them)...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Write same support
  2012-01-31  0:31 Write same support Martin K. Petersen
                   ` (4 preceding siblings ...)
  2012-01-31  0:31 ` [PATCH 5/5] sd: Implement support for WRITE SAME Martin K. Petersen
@ 2012-02-03 19:15 ` Mike Snitzer
  2012-02-03 19:20 ` Roland Dreier
  2012-02-16 20:02 ` Mike Snitzer
  7 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-03 19:15 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel

Hey Martin,

On Mon, Jan 30 2012 at  7:31pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> 
> Here's my stab at write same support.
> 
> I have mainly tested using blkdev_issue_zeroout which I have modified to
> use write same if available. It will fall back to writing zeroes the
> old-fashioned way if write same fails. I also wired up an ioctl so I
> could trigger zeroout from userland.
> 
> I have gone back and forth over whether blkdev_issue_write_same() should
> take a struct page or a virtual address. It's not such a big deal for
> devices with a 512-byte logical block size. But for devices with 4K it's
> more of an issue. Comments welcome.
> 
> The discovery heuristics is what stopped this patchkit in its tracks
> last year. So I've spent the last few days testing with every exotic
> (i.e. ancient) SCSI device I could find. I'm pretty happy with how it
> works now but it would be great if more people would put it to the test.
> 
> I have a 'writesame' branch available at:
> 
> 	git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux.git
> 
> for those that want to try it out...

Thanks for getting this together.

Just wanted to let you know I've pulled your branch and will be looking
have dm-thinp make use of write same for the zeroing that it does.

In the process I'll review your code in more detail (so far it looks
good).  So please expect to hear back from me next week.

Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Write same support
  2012-01-31  0:31 Write same support Martin K. Petersen
                   ` (5 preceding siblings ...)
  2012-02-03 19:15 ` Write same support Mike Snitzer
@ 2012-02-03 19:20 ` Roland Dreier
  2012-02-16 20:02 ` Mike Snitzer
  7 siblings, 0 replies; 44+ messages in thread
From: Roland Dreier @ 2012-02-03 19:20 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, snitzer, jaxboe

On Mon, Jan 30, 2012 at 4:31 PM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
> The discovery heuristics is what stopped this patchkit in its tracks
> last year. So I've spent the last few days testing with every exotic
> (i.e. ancient) SCSI device I could find. I'm pretty happy with how it
> works now but it would be great if more people would put it to the test.

I'll give it a try too ... probably mostly from the POV of testing our array's
WRITE SAME implementation ;)

 - R.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-01-31  0:31 ` [PATCH 1/5] block: Implement support for WRITE SAME Martin K. Petersen
@ 2012-02-07 21:40   ` Vivek Goyal
  2012-02-13 22:19     ` Martin K. Petersen
  2012-02-08 22:50   ` Mike Snitzer
  2012-02-09  3:40   ` Mike Snitzer
  2 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2012-02-07 21:40 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, snitzer, jaxboe

On Mon, Jan 30, 2012 at 07:31:28PM -0500, Martin K. Petersen wrote:

[..]
> --- a/block/elevator.c
> +++ b/block/elevator.c
> @@ -82,6 +82,12 @@ int elv_rq_merge_ok(struct request *rq, struct bio *bio)
>  		return 0;
>  
>  	/*
> +	 * Don't merge write same requests
> +	 */
> +	if ((bio->bi_rw & REQ_WRITE_SAME) || (rq->bio->bi_rw & REQ_WRITE_SAME))
> +		return 0;
> +


[..]
>  #define rq_mergeable(rq)	\
>  	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
> -	 (((rq)->cmd_flags & REQ_DISCARD) || \
> -	  (rq)->cmd_type == REQ_TYPE_FS))
> +	 (((rq)->cmd_flags & REQ_DISCARD || \
> +	   (rq)->cmd_flags & REQ_WRITE_SAME ||	\
> +	   (rq)->cmd_type == REQ_TYPE_FS)))

In elv_rq_merge_ok() we don't allow merge of WRITE_SAME requests but
in rq_mergeable() we do say that requests having WRITE_SAME can be merged.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-01-31  0:31 ` [PATCH 1/5] block: Implement support for WRITE SAME Martin K. Petersen
  2012-02-07 21:40   ` Vivek Goyal
@ 2012-02-08 22:50   ` Mike Snitzer
  2012-02-08 23:12     ` Martin K. Petersen
  2012-02-09  3:40   ` Mike Snitzer
  2 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2012-02-08 22:50 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe

On Mon, Jan 30 2012 at  7:31pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> From: "Martin K. Petersen" <martin.petersen@oracle.com>
> 
> The WRITE SAME command supported on some SCSI devices allows the same
> block to be efficiently replicated throughout a block range. Only a
> single logical block is transferred from the host and the storage device
> writes the same data to all blocks described by the I/O.
> 
> This patch implements support for WRITE SAME in the block layer. The
> blkdev_issue_write_same() function can be used by filesystems and block
> drivers to replicate a buffer across a block range. This can be used to
> efficiently initialize software RAID devices, etc.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

...

> diff --git a/block/blk-lib.c b/block/blk-lib.c
> index 2b461b4..19f4754 100644
> --- a/block/blk-lib.c
> +++ b/block/blk-lib.c
> @@ -115,6 +115,85 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
>  EXPORT_SYMBOL(blkdev_issue_discard);
>  
>  /**
> + * blkdev_issue_write_same - queue a write same operation
> + * @bdev:	target blockdev
> + * @sector:	start sector
> + * @nr_sects:	number of sectors to write
> + * @gfp_mask:	memory allocation flags (for bio_alloc)
> + * @buffer:	pointer to buffer containing data to write
> + * @length:	buffer length. Must be equal to bdev logical block size.
> + *
> + * Description:
> + *    Issue a write same request for the sectors in question.
> + */
> +int blkdev_issue_write_same(struct block_device *bdev, sector_t sector,
> +			    sector_t nr_sects, gfp_t gfp_mask, void *buffer,
> +			    unsigned int length)
> +{
> +	DECLARE_COMPLETION_ONSTACK(wait);
> +	struct request_queue *q = bdev_get_queue(bdev);
> +	unsigned int max_write_same_sectors;
> +	struct bio_batch bb;
> +	struct bio *bio;
> +	int ret = 0;
> +
> +	if (!q)
> +		return -ENXIO;
> +
> +	max_write_same_sectors = q->limits.max_write_same_sectors;
> +
> +	if (max_write_same_sectors == 0)
> +		return -EOPNOTSUPP;
> +
> +	BUG_ON(length != bdev_logical_block_size(bdev));

Why require @length to be passed in if we can easily determine the only
correct length, via bdev_logical_block_size, locally?

Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-08 22:50   ` Mike Snitzer
@ 2012-02-08 23:12     ` Martin K. Petersen
  2012-02-09  3:33       ` Mike Snitzer
  0 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-08 23:12 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

>> + BUG_ON(length != bdev_logical_block_size(bdev));

Mike> Why require @length to be passed in if we can easily determine the
Mike> only correct length, via bdev_logical_block_size, locally?

Just sanity checking since the caller is passing in a buffer pointer.
See my comment in the patch series introduction about a page vs a
buffer. I don't really care whether we pick one or the other. Whatever
makes the most sense to dm, md and the filesystems...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-08 23:12     ` Martin K. Petersen
@ 2012-02-09  3:33       ` Mike Snitzer
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-09  3:33 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel

On Wed, Feb 08 2012 at  6:12pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> >> + BUG_ON(length != bdev_logical_block_size(bdev));
> 
> Mike> Why require @length to be passed in if we can easily determine the
> Mike> only correct length, via bdev_logical_block_size, locally?
> 
> Just sanity checking since the caller is passing in a buffer pointer.
> See my comment in the patch series introduction about a page vs a
> buffer. I don't really care whether we pick one or the other. Whatever
> makes the most sense to dm, md and the filesystems...

OK, I see -- might make sense to use a page.  But I don't have a
strong opinion (and not sure I will have one in the near-term).

I wrote a patch for dm-kcopyd (and dm-io) to add support for WRITE_SAME,
but it doesn't use the new blkdev_* interfaces you've provided -- only
because I needed to wire it up to dm-thinp which uses dm_kcopyd_zero()
to provide async zeroing.

So the patch open codes a small portion of bio prep code from
blkdev_issue_write_same() in dm-io's do_region().  I'll post it once I
verify it works.

Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-01-31  0:31 ` [PATCH 1/5] block: Implement support for WRITE SAME Martin K. Petersen
  2012-02-07 21:40   ` Vivek Goyal
  2012-02-08 22:50   ` Mike Snitzer
@ 2012-02-09  3:40   ` Mike Snitzer
  2 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-09  3:40 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe

On Mon, Jan 30 2012 at  7:31pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> diff --git a/Documentation/ABI/testing/sysfs-block b/Documentation/ABI/testing/sysfs-block
> index c1eb41c..e5f4194 100644
> --- a/Documentation/ABI/testing/sysfs-block
> +++ b/Documentation/ABI/testing/sysfs-block
> @@ -206,3 +206,16 @@ Description:
>  		when a discarded area is read the discard_zeroes_data
>  		parameter will be set to one. Otherwise it will be 0 and
>  		the result of reading a discarded area is undefined.
> +
> +What:		/sys/block/<disk>/queue/max_write_same_bytes
> +Date:		January 2012
> +Contact:	Martin K. Petersen <martin.petersen@oracle.com>
> +Description:
> +		Some devices support a write same operation in which a
> +		single data block can be written to a range of several
> +		contiguous blocks on storage. This can be used to wipe
> +		areas on disk or to initialize drives in a RAID
> +		configuration. max_write_same_bytes indicates how many
> +		bytes can be written in a single write same command. If
> +		max_write_same_bytes is 0, write same is not supported
> +		by the device.

How about write_same_max_bytes to follow the pattern established with
/sys/block/<disk>/queue/discard_max_bytes?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-07 21:40   ` Vivek Goyal
@ 2012-02-13 22:19     ` Martin K. Petersen
  2012-02-14  8:05       ` Rolf Eike Beer
  2012-02-15 15:33       ` Vivek Goyal
  0 siblings, 2 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-13 22:19 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, snitzer, jaxboe

>>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:

Vivek> In elv_rq_merge_ok() we don't allow merge of WRITE_SAME requests
Vivek> but in rq_mergeable() we do say that requests having WRITE_SAME
Vivek> can be merged.

Thanks for spotting this! I was blindly hooking into all the places
where REQ_DISCARD was present. However, the way we special-case discard
merging all over is broken.

I propose we do the following. I'll update the write same patch kit
accordingly.


block: Mark discard requests unmergeable

Discards were globally marked as mergeable and as a result we had
several code paths that explicitly disabled merging when REQ_DISCARD was
set.

Mark discard requests as unmergable and remove special-casing of
REQ_DISCARD.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

diff --git a/block/blk-core.c b/block/blk-core.c
index 3a78b00..32bbdb1 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -1545,8 +1545,8 @@ generic_make_request_checks(struct bio *bio)
 		goto end_io;
 	}
 
-	if (unlikely(!(bio->bi_rw & REQ_DISCARD) &&
-		     nr_sectors > queue_max_hw_sectors(q))) {
+	if (likely(bio_has_data(bio) &&
+		   nr_sectors > queue_max_hw_sectors(q))) {
 		printk(KERN_ERR "bio too big device %s (%u > %u)\n",
 		       bdevname(bio->bi_bdev, b),
 		       bio_sectors(bio),
@@ -1698,7 +1698,7 @@ void submit_bio(int rw, struct bio *bio)
 	 * If it's a regular read/write or a barrier with data attached,
 	 * go through the normal accounting stuff before submission.
 	 */
-	if (bio_has_data(bio) && !(rw & REQ_DISCARD)) {
+	if (bio_has_data(bio)) {
 		if (rw & WRITE) {
 			count_vm_events(PGPGOUT, count);
 		} else {
diff --git a/block/blk-merge.c b/block/blk-merge.c
index 160035f..3f73823 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -371,12 +371,6 @@ static int attempt_merge(struct request_queue *q, struct request *req,
 		return 0;
 
 	/*
-	 * Don't merge file system requests and discard requests
-	 */
-	if ((req->cmd_flags & REQ_DISCARD) != (next->cmd_flags & REQ_DISCARD))
-		return 0;
-
-	/*
 	 * Don't merge discard requests and secure discard requests
 	 */
 	if ((req->cmd_flags & REQ_SECURE) != (next->cmd_flags & REQ_SECURE))
@@ -477,10 +471,6 @@ bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 	if (!rq_mergeable(rq))
 		return false;
 
-	/* don't merge file system requests and discard requests */
-	if ((bio->bi_rw & REQ_DISCARD) != (rq->bio->bi_rw & REQ_DISCARD))
-		return false;
-
 	/* don't merge discard requests and secure discard requests */
 	if ((bio->bi_rw & REQ_SECURE) != (rq->bio->bi_rw & REQ_SECURE))
 		return false;
diff --git a/block/elevator.c b/block/elevator.c
index f016855..cccfdbf 100644
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -591,8 +591,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 
 	if (rq->cmd_flags & REQ_SOFTBARRIER) {
 		/* barriers are scheduling boundary, update end_sector */
-		if (rq->cmd_type == REQ_TYPE_FS ||
-		    (rq->cmd_flags & REQ_DISCARD)) {
+		if (rq->cmd_type == REQ_TYPE_FS) {
 			q->end_sector = rq_end_sector(rq);
 			q->boundary_rq = rq;
 		}
@@ -634,8 +633,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
 		if (elv_attempt_insert_merge(q, rq))
 			break;
 	case ELEVATOR_INSERT_SORT:
-		BUG_ON(rq->cmd_type != REQ_TYPE_FS &&
-		       !(rq->cmd_flags & REQ_DISCARD));
+		BUG_ON(rq->cmd_type != REQ_TYPE_FS);
 		rq->cmd_flags |= REQ_SORTED;
 		q->nr_sorted++;
 		if (rq_mergeable(rq)) {
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 129a9c0..b0d25ea 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -358,9 +358,15 @@ static inline char *__bio_kmap_irq(struct bio *bio, unsigned short idx,
 /*
  * Check whether this bio carries any data or not. A NULL bio is allowed.
  */
-static inline int bio_has_data(struct bio *bio)
+static inline unsigned int bio_has_data(struct bio *bio)
 {
-	return bio && bio->bi_io_vec != NULL;
+	if (!bio)
+		return 0;
+
+	if (bio->bi_rw & REQ_DISCARD)
+		return 0;
+
+	return bio->bi_io_vec != NULL;
 }
 
 /*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 606cf33..4bfa699 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -575,11 +575,12 @@ static inline void blk_clear_queue_full(struct request_queue *q, int sync)
  * it already be started by driver.
  */
 #define RQ_NOMERGE_FLAGS	\
-	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA)
+	(REQ_NOMERGE | REQ_STARTED | REQ_SOFTBARRIER | REQ_FLUSH | REQ_FUA | \
+	 REQ_DISCARD)
 #define rq_mergeable(rq)	\
-	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && \
-	 (((rq)->cmd_flags & REQ_DISCARD) || \
-	  (rq)->cmd_type == REQ_TYPE_FS))
+	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && (rq)->cmd_type == REQ_TYPE_FS)
+
+
 
 /*
  * q->prep_rq_fn return values

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-13 22:19     ` Martin K. Petersen
@ 2012-02-14  8:05       ` Rolf Eike Beer
  2012-02-15 15:33       ` Vivek Goyal
  1 sibling, 0 replies; 44+ messages in thread
From: Rolf Eike Beer @ 2012-02-14  8:05 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Vivek Goyal, linux-scsi, James.Bottomley, snitzer, jaxboe

[-- Attachment #1: Type: text/plain, Size: 677 bytes --]

Martin K. Petersen wrote:

> diff --git a/include/linux/bio.h b/include/linux/bio.h
> index 129a9c0..b0d25ea 100644
> --- a/include/linux/bio.h
> +++ b/include/linux/bio.h
> @@ -358,9 +358,15 @@ static inline char *__bio_kmap_irq(struct bio *bio,
> unsigned short idx, /*
>   * Check whether this bio carries any data or not. A NULL bio is allowed.
>   */
> -static inline int bio_has_data(struct bio *bio)
> +static inline unsigned int bio_has_data(struct bio *bio)
>  {
> -	return bio && bio->bi_io_vec != NULL;
> +	if (!bio)
> +		return 0;
> +
> +	if (bio->bi_rw & REQ_DISCARD)
> +		return 0;
> +
> +	return bio->bi_io_vec != NULL;
>  }

Shouldn't this be just "bool"?

Eike

[-- Attachment #2: This is a digitally signed message part. --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-13 22:19     ` Martin K. Petersen
  2012-02-14  8:05       ` Rolf Eike Beer
@ 2012-02-15 15:33       ` Vivek Goyal
  2012-02-16  3:29         ` Martin K. Petersen
  1 sibling, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2012-02-15 15:33 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, snitzer, jaxboe

On Mon, Feb 13, 2012 at 05:19:43PM -0500, Martin K. Petersen wrote:

[..]
> 
> block: Mark discard requests unmergeable
> 
> Discards were globally marked as mergeable and as a result we had
> several code paths that explicitly disabled merging when REQ_DISCARD was
> set.
> 
> Mark discard requests as unmergable and remove special-casing of
> REQ_DISCARD.

Mike Snitzer mentioned that we do allow merging of one discard request
with another except following two cases.

- Don't allow merging of secure discard.
- Don't allow merging of discard request with another non-discard request.

I guess that's why DISCARD requests are mergeable globally and later
we deny merge in selected cases. Do we want to disable that behavior?

[..]
> @@ -591,8 +591,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
>  
>  	if (rq->cmd_flags & REQ_SOFTBARRIER) {
>  		/* barriers are scheduling boundary, update end_sector */
> -		if (rq->cmd_type == REQ_TYPE_FS ||
> -		    (rq->cmd_flags & REQ_DISCARD)) {
> +		if (rq->cmd_type == REQ_TYPE_FS) {

This is orthogonal to merging?


[..]
> @@ -634,8 +633,7 @@ void __elv_add_request(struct request_queue *q, struct request *rq, int where)
>  		if (elv_attempt_insert_merge(q, rq))
>  			break;
>  	case ELEVATOR_INSERT_SORT:
> -		BUG_ON(rq->cmd_type != REQ_TYPE_FS &&
> -		       !(rq->cmd_flags & REQ_DISCARD));
> +		BUG_ON(rq->cmd_type != REQ_TYPE_FS);

This change also looks orthogonal to merging. Will it not trigger BUG_ON()
when DISCARD request is being inserted into the elevator?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-15 15:33       ` Vivek Goyal
@ 2012-02-16  3:29         ` Martin K. Petersen
  2012-02-16 17:16           ` Vivek Goyal
  0 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-16  3:29 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, snitzer, jaxboe

>>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:

Vivek,

Vivek> Mike Snitzer mentioned that we do allow merging of one discard
Vivek> request with another except following two cases.

It's allowed by virtue of being marked mergeable. This goes back to the
very first attempt we had a discard support. But we don't actually
support merging of discard requests. The merge checks were just never
updated to reflect this.

I'm also trying to get rid of all the REQ_DISCARD special cases. Soon
we'll have REQ_WRITE_SAME and REQ_COPY to worry about as well and the
same rules apply to them. So I'm trying to leverage Tejun's recent
cleanup to consolidate the merge checks.


>> - if (rq->cmd_type == REQ_TYPE_FS ||
>> - (rq->cmd_flags & REQ_DISCARD)) {
>> + if (rq->cmd_type == REQ_TYPE_FS) {

Vivek> This is orthogonal to merging?

REQ_DISCARD is REQ_TYPE_FS so the check is redundant.


>> break; case ELEVATOR_INSERT_SORT:
>> - BUG_ON(rq->cmd_type != REQ_TYPE_FS &&
>> - !(rq->cmd_flags & REQ_DISCARD));
>> + BUG_ON(rq->cmd_type != REQ_TYPE_FS);

Vivek> This change also looks orthogonal to merging. Will it not trigger
Vivek> BUG_ON() when DISCARD request is being inserted into the
Vivek> elevator?

I thought the same thing. But I was unable to trigger it.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-16  3:29         ` Martin K. Petersen
@ 2012-02-16 17:16           ` Vivek Goyal
  2012-02-16 19:12             ` Martin K. Petersen
  0 siblings, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2012-02-16 17:16 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, snitzer, jaxboe

On Wed, Feb 15, 2012 at 10:29:17PM -0500, Martin K. Petersen wrote:
> >>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:
> 
> Vivek,
> 
> Vivek> Mike Snitzer mentioned that we do allow merging of one discard
> Vivek> request with another except following two cases.
> 
> It's allowed by virtue of being marked mergeable. This goes back to the
> very first attempt we had a discard support. But we don't actually
> support merging of discard requests. The merge checks were just never
> updated to reflect this.
> 
> I'm also trying to get rid of all the REQ_DISCARD special cases. Soon
> we'll have REQ_WRITE_SAME and REQ_COPY to worry about as well and the
> same rules apply to them. So I'm trying to leverage Tejun's recent
> cleanup to consolidate the merge checks.

Ok.

> >> - if (rq->cmd_type == REQ_TYPE_FS ||
> >> - (rq->cmd_flags & REQ_DISCARD)) {
> >> + if (rq->cmd_type == REQ_TYPE_FS) {
> 
> Vivek> This is orthogonal to merging?
> 
> REQ_DISCARD is REQ_TYPE_FS so the check is redundant.

Ok. So init_request_from_bio() will mark discard request also as
REQ_TYPE_FS. So this change is fine.

> 
> 
> >> break; case ELEVATOR_INSERT_SORT:
> >> - BUG_ON(rq->cmd_type != REQ_TYPE_FS &&
> >> - !(rq->cmd_flags & REQ_DISCARD));
> >> + BUG_ON(rq->cmd_type != REQ_TYPE_FS);
> 
> Vivek> This change also looks orthogonal to merging. Will it not trigger
> Vivek> BUG_ON() when DISCARD request is being inserted into the
> Vivek> elevator?
> 
> I thought the same thing. But I was unable to trigger it.

Given the fact that REQ_DISCARD will have REQ_TYPE_FS set, then second
condition is anyway redundant and that explains why bug is not triggered.

Thanks for the explanation.

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 1/5] block: Implement support for WRITE SAME
  2012-02-16 17:16           ` Vivek Goyal
@ 2012-02-16 19:12             ` Martin K. Petersen
  0 siblings, 0 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-16 19:12 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, snitzer, jaxboe

>>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:

>> 
>> I thought the same thing. But I was unable to trigger it.

Vivek> Given the fact that REQ_DISCARD will have REQ_TYPE_FS set, then
Vivek> second condition is anyway redundant and that explains why bug is
Vivek> not triggered.

You're right! Missed that first time around.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Write same support
  2012-01-31  0:31 Write same support Martin K. Petersen
                   ` (6 preceding siblings ...)
  2012-02-03 19:20 ` Roland Dreier
@ 2012-02-16 20:02 ` Mike Snitzer
  2012-02-16 20:46   ` Martin K. Petersen
  2012-02-16 21:03   ` dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support] Mike Snitzer
  7 siblings, 2 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-16 20:02 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe

Hi Martin,

Given the feedback you've gotten so far I'd imagine you're well on your
way with a new patchset (building on your discard merge cleanup, etc).

But one thing that seemed odd about the previous set was the order of
the patches.

I think it'd be more appropriate to put patch 4 and 5 (scsi support) at
the beginning of the patchset (block interfaces aren't functional
without underlying scsi support).

FYI, I'll bounce a message detailing the iSCSI scatter-gather NULL
pointer I _always_ hit with dm-io issuing async WRITE_SAME.

Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Write same support
  2012-02-16 20:02 ` Mike Snitzer
@ 2012-02-16 20:46   ` Martin K. Petersen
  2012-02-16 21:09     ` Mike Snitzer
  2012-02-16 21:03   ` dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support] Mike Snitzer
  1 sibling, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-16 20:46 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> Given the feedback you've gotten so far I'd imagine you're well on
Mike> your way with a new patchset (building on your discard merge
Mike> cleanup, etc).

Yeah, tweaked a few things based on your comments.


Mike> I think it'd be more appropriate to put patch 4 and 5 (scsi
Mike> support) at the beginning of the patchset (block interfaces aren't
Mike> functional without underlying scsi support).

The SCSI patches depend on the block layer ditto (topology parameters
need to be present, etc.).

Also, we always do it this way so we can get the block layer changes in
during the merge window and then add the SCSI bits to the post-merge
tree.


Mike> FYI, I'll bounce a message detailing the iSCSI scatter-gather NULL
Mike> pointer I _always_ hit with dm-io issuing async WRITE_SAME.

Interesting, please do. I've only tested on SAS and FC...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-16 20:02 ` Mike Snitzer
  2012-02-16 20:46   ` Martin K. Petersen
@ 2012-02-16 21:03   ` Mike Snitzer
  2012-02-16 21:25     ` Mike Christie
  2012-02-20 17:44     ` Martin K. Petersen
  1 sibling, 2 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-16 21:03 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc

On Thu, Feb 16 2012 at  3:02pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> FYI, I'll bounce a message detailing the iSCSI scatter-gather NULL
> pointer I _always_ hit with dm-io issuing async WRITE_SAME.

I developed a patch for dm-io so that the new dm-thinp target can
leverage your new WRITE SAME functionality for, hopefully, more
efficient zeroing of the disk (see: dm-io-WRITE_SAME.patch at the end of
the following patchset).

Here is the patchset I'm using ontop of Linux 3.2:
http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/series.html

All works great on FC (tested against NetApp 3040 LUN)... I'm using the
thinp-test-suite to test dm-thinp's use of dm_kcopyd_zero().

But testing with iSCSI, I get a NULL pointer _every_ time in the iSCSI
scatter-gather code, see:
http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/async-WRITE_SAME-makes-iscsi-sg-die.txt
-- in the middle of that file you'll see my 'crash' analysis of the
issue -- but that is just the NULL pointer.. no idea what the smoking
gun is that caused the iscsi_segment to become NULL.

Anyway, taking a step back... WRITE SAME is all about transfering a
single logical block, backed by a single empty_zero_page in this test
case, so I'm wondering if for some reason iSCSI's sg code is getting
confused and thinking that more pages need to be transferred than were
in the original bio's payload (but iSCSI is way beneath the bio -> SCSI
command translation... grr)

It should be noted that using your synchronous blkdev_issue_write_same()
interface works fine on the same iSCSI LUN (again NetApp 3040 LUN):
http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/dm-thin-use-WRITE_SAME-for-zeroing.patch
(but an dm-thinp really wants an async interface for zeroing).

Mike

p.s.
dm-io-WRITE_SAME.patch's use of dp->get_page()+bio_add_page()
accomplishes the same bio init as you do in blkdev_issue_write_same():

      bio->bi_vcnt = 1;
      bio->bi_phys_segments = 1;
      bio->bi_io_vec->bv_page = virt_to_page(empty_zero_page);
      bio->bi_io_vec->bv_offset = offset_in_page(empty_zero_page);
      bio->bi_io_vec->bv_len = logical_block_size;

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Write same support
  2012-02-16 20:46   ` Martin K. Petersen
@ 2012-02-16 21:09     ` Mike Snitzer
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-16 21:09 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe

On Thu, Feb 16 2012 at  3:46pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> Given the feedback you've gotten so far I'd imagine you're well on
> Mike> your way with a new patchset (building on your discard merge
> Mike> cleanup, etc).
> 
> Yeah, tweaked a few things based on your comments.
> 
> 
> Mike> I think it'd be more appropriate to put patch 4 and 5 (scsi
> Mike> support) at the beginning of the patchset (block interfaces aren't
> Mike> functional without underlying scsi support).
> 
> The SCSI patches depend on the block layer ditto (topology parameters
> need to be present, etc.).

Ah ok, I missed the call to blk_queue_max_write_same_sectors().

> Also, we always do it this way so we can get the block layer changes in
> during the merge window and then add the SCSI bits to the post-merge
> tree.

Got it, thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-16 21:03   ` dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support] Mike Snitzer
@ 2012-02-16 21:25     ` Mike Christie
  2012-02-16 21:35       ` Mike Snitzer
  2012-02-20 17:44     ` Martin K. Petersen
  1 sibling, 1 reply; 44+ messages in thread
From: Mike Christie @ 2012-02-16 21:25 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe, dm-devel

On 02/16/2012 03:03 PM, Mike Snitzer wrote:
> On Thu, Feb 16 2012 at  3:02pm -0500,
> Mike Snitzer <snitzer@redhat.com> wrote:
> 
>> FYI, I'll bounce a message detailing the iSCSI scatter-gather NULL
>> pointer I _always_ hit with dm-io issuing async WRITE_SAME.
> 
> I developed a patch for dm-io so that the new dm-thinp target can
> leverage your new WRITE SAME functionality for, hopefully, more
> efficient zeroing of the disk (see: dm-io-WRITE_SAME.patch at the end of
> the following patchset).
> 
> Here is the patchset I'm using ontop of Linux 3.2:
> http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/series.html
> 
> All works great on FC (tested against NetApp 3040 LUN)... I'm using the
> thinp-test-suite to test dm-thinp's use of dm_kcopyd_zero().
> 
> But testing with iSCSI, I get a NULL pointer _every_ time in the iSCSI
> scatter-gather code, see:
> http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/async-WRITE_SAME-makes-iscsi-sg-die.txt
> -- in the middle of that file you'll see my 'crash' analysis of the
> issue -- but that is just the NULL pointer.. no idea what the smoking
> gun is that caused the iscsi_segment to become NULL.
> 
> Anyway, taking a step back... WRITE SAME is all about transfering a
> single logical block, backed by a single empty_zero_page in this test
> case, so I'm wondering if for some reason iSCSI's sg code is getting
> confused and thinking that more pages need to be transferred than were
> in the original bio's payload (but iSCSI is way beneath the bio -> SCSI
> command translation... grr)

Yeah, probably a request/scsi_cmnd/sg sector/length/offset value is off
or iscsi is making a bad assumption.

Do:

echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_session
echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_session
echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
echo 1 > /sys/module/libiscsi_tcp/parameters/iscsi_tcp

then rerun your test.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-16 21:25     ` Mike Christie
@ 2012-02-16 21:35       ` Mike Snitzer
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-16 21:35 UTC (permalink / raw)
  To: Mike Christie
  Cc: jaxboe, linux-scsi, Martin K. Petersen, device-mapper development

On Thu, Feb 16 2012 at  4:25pm -0500,
Mike Christie <michaelc@cs.wisc.edu> wrote:

> On 02/16/2012 03:03 PM, Mike Snitzer wrote:
> > On Thu, Feb 16 2012 at  3:02pm -0500,
> > Mike Snitzer <snitzer@redhat.com> wrote:
> > 
> >> FYI, I'll bounce a message detailing the iSCSI scatter-gather NULL
> >> pointer I _always_ hit with dm-io issuing async WRITE_SAME.
> > 
> > I developed a patch for dm-io so that the new dm-thinp target can
> > leverage your new WRITE SAME functionality for, hopefully, more
> > efficient zeroing of the disk (see: dm-io-WRITE_SAME.patch at the end of
> > the following patchset).
> > 
> > Here is the patchset I'm using ontop of Linux 3.2:
> > http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/series.html
> > 
> > All works great on FC (tested against NetApp 3040 LUN)... I'm using the
> > thinp-test-suite to test dm-thinp's use of dm_kcopyd_zero().
> > 
> > But testing with iSCSI, I get a NULL pointer _every_ time in the iSCSI
> > scatter-gather code, see:
> > http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/async-WRITE_SAME-makes-iscsi-sg-die.txt
> > -- in the middle of that file you'll see my 'crash' analysis of the
> > issue -- but that is just the NULL pointer.. no idea what the smoking
> > gun is that caused the iscsi_segment to become NULL.
> > 
> > Anyway, taking a step back... WRITE SAME is all about transfering a
> > single logical block, backed by a single empty_zero_page in this test
> > case, so I'm wondering if for some reason iSCSI's sg code is getting
> > confused and thinking that more pages need to be transferred than were
> > in the original bio's payload (but iSCSI is way beneath the bio -> SCSI
> > command translation... grr)
> 
> Yeah, probably a request/scsi_cmnd/sg sector/length/offset value is off
> or iscsi is making a bad assumption.
> 
> Do:
> 
> echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_session
> echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_session
> echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp
> echo 1 > /sys/module/libiscsi_tcp/parameters/iscsi_tcp
> 
> then rerun your test.

OK, will retry with all 4.. but just this caused the system to crap
itself:

echo 1 > /sys/module/libiscsi_tcp/parameters/debug_libiscsi_tcp

(I did this to turn on the ISCSI_DBG_TCP messages I noticed while
reviewing the code).

I saw a bunch of opcode 0x25 (READ CAPACITY) but never did see 0x93
(WRITE_SAME_16) come through.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/5] sd: Implement support for WRITE SAME
  2012-01-31  0:31 ` [PATCH 5/5] sd: Implement support for WRITE SAME Martin K. Petersen
@ 2012-02-20 16:16   ` Mike Snitzer
  2012-02-20 17:36     ` Martin K. Petersen
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2012-02-20 16:16 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe

On Mon, Jan 30 2012 at  7:31pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> From: "Martin K. Petersen" <martin.petersen@oracle.com>
> 
> Implement support for WRITE SAME(10) and WRITE SAME(16) in the SCSI disk
> driver.

...

> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> index c691fb5..40946a2 100644
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c

...

> +/**
> + * sd_setup_write_same_cmnd - write the same data to multiple blocks
> + * @sdp: scsi device to operate one
> + * @rq: Request to prepare
> + *
> + * Will issue either WRITE SAME(10) or WRITE SAME(16) depending on
> + * preference indicated by target device.
> + **/
> +static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)

Just a small nit, this should be scsi_setup_write_same_cmnd given we
already have: scsi_setup_{discard,flush}_cmnd

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/5] sd: Implement support for WRITE SAME
  2012-02-20 16:16   ` Mike Snitzer
@ 2012-02-20 17:36     ` Martin K. Petersen
  2012-02-20 18:28       ` Mike Snitzer
  0 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-20 17:36 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> Just a small nit, this should be scsi_setup_write_same_cmnd given
Mike> we already have: scsi_setup_{discard,flush}_cmnd

This was deliberate. It's the two other functions that are
misnamed. They clearly belong in the sd_ name space.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-16 21:03   ` dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support] Mike Snitzer
  2012-02-16 21:25     ` Mike Christie
@ 2012-02-20 17:44     ` Martin K. Petersen
  2012-02-20 18:46       ` Mike Snitzer
  1 sibling, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-20 17:44 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe,
	dm-devel, michaelc

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> But testing with iSCSI, I get a NULL pointer _every_ time in the
Mike> iSCSI scatter-gather code,

I finally managed to get a WRITE SAME-capable iSCSI target set up.

Can you beam me your test case?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH 5/5] sd: Implement support for WRITE SAME
  2012-02-20 17:36     ` Martin K. Petersen
@ 2012-02-20 18:28       ` Mike Snitzer
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-20 18:28 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-scsi, James.Bottomley, jaxboe

On Mon, Feb 20 2012 at 12:36pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> Just a small nit, this should be scsi_setup_write_same_cmnd given
> Mike> we already have: scsi_setup_{discard,flush}_cmnd
> 
> This was deliberate. It's the two other functions that are
> misnamed. They clearly belong in the sd_ name space.

OK, sure, might make the most sense to have an earlier patch in the
series rename the others.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-20 17:44     ` Martin K. Petersen
@ 2012-02-20 18:46       ` Mike Snitzer
  2012-02-20 23:44         ` Mike Snitzer
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2012-02-20 18:46 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc

[-- Attachment #1: Type: text/plain, Size: 3577 bytes --]

On Mon, Feb 20 2012 at 12:44pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> But testing with iSCSI, I get a NULL pointer _every_ time in the
> Mike> iSCSI scatter-gather code,
> 
> I finally managed to get a WRITE SAME-capable iSCSI target set up.
> 
> Can you beam me your test case?

I've been using the thinp-test-suite, here:
https://github.com/jthornber/thinp-test-suite

After establishing a minimal config in config.rb, e.g.:
    'hostname' =>
    { :metadata_dev => '/dev/thin_iscsi/metadata',
      :metadata_size => 2097152,
      :data_dev => '/dev/thin_iscsi/data',
      :data_size => 17825792
    }

I run a single test:
export THIN_TESTS=EXECUTE
./run_tests -n /test_two_pools_can_create_thins/

Anyway, looking at the code, the sg setup code is common for all SCSI,
so I have no reason to believe that FC is different than iSCSI -- FC's
sg code may just be much more forgiving?:

sd_setup_write_same_cmnd -> sd_setup_write_same_cmnd -> scsi_setup_blk_pc_cmnd ->
 scsi_init_io -> scsi_init_sgtable -> blk_rq_map_sg

I'm not seeing any code that accounts for the possibility of a
WRITE_SAME bio having bv_len=512 but bi_size=65536 in blk_rq_map_sg().
I'm definitely seeing blk_rq_map_sg() take a WRITE SAME request and
create a sg with multiple segments.  That in and of itself seems like a
bug.  But I'm no scatter-gather expert!

I'm adding more debug printks to blk_rq_map_sg() to try to understand
what is going on... will share more once I have it.

But the iSCSI traces I got, with some debugging code that Mike Christie
provided -- attached to this mail, showed that the sg was setup
improperly (so I think) because there are multiple segments, e.g.:

iscsi prep [write cid 0 sc ffff880117ce5080 cdb 0x41 itt 0x15 len 512 sg count 8 bidi_len 0 cmdsn 883 win 128]
0 sg ffff88011a471800 0 512
1 sg ffff88011a471828 0 512
2 sg ffff88011a471850 0 512
3 sg ffff88011a471878 0 512
4 sg ffff88011a4718a0 0 512
5 sg ffff88011a4718c8 0 512
6 sg ffff88011a4718f0 0 512
7 sg ffff88011a471918 0 512

When the test is executed using your synchronous blkdev_issue_write_same
interface each WRITE SAME is confined to an sg with a single segment,
e.g.:

iscsi prep [write cid 0 sc ffff880117ce1c80 cdb 0x41 itt 0x5e len 512 sg count 1 bidi_len 0 cmdsn 902 win 128]
0 sg ffff880118d72e80 0 512
iscsi prep [write cid 0 sc ffff880116fc0b80 cdb 0x41 itt 0x5f len 512 sg count 1 bidi_len 0 cmdsn 903 win 128]
0 sg ffff88011a471940 0 512
iscsi prep [write cid 0 sc ffff880117ce1c80 cdb 0x41 itt 0x60 len 512 sg count 1 bidi_len 0 cmdsn 904 win 128]
0 sg ffff880118d72e80 0 512

And here is the sg that ultimately caused the libiscsi_tcp code to NULL
pointer (len is much too large, and it _seems_ like WRITE SAME has been
merged on a WRITE cdb's sg):

iscsi prep [write cid 0 sc ffff8800cfc953c0 cdb 0x2a itt 0x27 len 512000 sg count 20 bidi_len 0 cmdsn 1146 win 128]
0 sg ffff88011a476040 0 4096
1 sg ffff88011a476068 0 4096
2 sg ffff88011a476090 0 4096
3 sg ffff88011a4760b8 0 4096
4 sg ffff88011a4760e0 0 4096
5 sg ffff88011a476108 0 4096
6 sg ffff88011a476130 0 4096
7 sg ffff88011a476158 0 4096
8 sg ffff88011a476180 0 4096
9 sg ffff88011a4761a8 0 4096
10 sg ffff88011a4761d0 0 4096
11 sg ffff88011a4761f8 0 4096
12 sg ffff88011a476220 0 4096
13 sg ffff88011a476248 0 512
14 sg ffff88011a476270 0 512
15 sg ffff88011a476298 0 512
16 sg ffff88011a4762c0 0 512
17 sg ffff88011a4762e8 0 512
18 sg ffff88011a476310 0 512
19 sg ffff88011a476338 0 512
Bad! offset 0 size 65536 copied 56832

[-- Attachment #2: cmd-sgl-check.patch --]
[-- Type: text/plain, Size: 1864 bytes --]

diff --git a/drivers/scsi/libiscsi.c b/drivers/scsi/libiscsi.c
index 9075cfa..50a1d79 100644
--- a/drivers/scsi/libiscsi.c
+++ b/drivers/scsi/libiscsi.c
@@ -362,7 +362,8 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
 	struct iscsi_cmd *hdr;
 	unsigned hdrlength, cmd_len;
 	itt_t itt;
-	int rc;
+	int rc, i;
+	struct scatterlist *sg;
 
 	rc = iscsi_check_tmf_restrictions(task, ISCSI_OP_SCSI_CMD);
 	if (rc)
@@ -479,15 +480,19 @@ static int iscsi_prep_scsi_cmd_pdu(struct iscsi_task *task)
 	session->cmdsn++;
 
 	conn->scsicmd_pdus_cnt++;
-	ISCSI_DBG_SESSION(session, "iscsi prep [%s cid %d sc %p cdb 0x%x "
-			  "itt 0x%x len %d bidi_len %d cmdsn %d win %d]\n",
+	printk(KERN_ERR "iscsi prep [%s cid %d sc %p cdb 0x%x "
+			  "itt 0x%x len %d sg count %d bidi_len %d cmdsn %d win %d]\n",
 			  scsi_bidi_cmnd(sc) ? "bidirectional" :
 			  sc->sc_data_direction == DMA_TO_DEVICE ?
 			  "write" : "read", conn->id, sc, sc->cmnd[0],
-			  task->itt, scsi_bufflen(sc),
+			  task->itt, scsi_bufflen(sc), scsi_sg_count(sc),
 			  scsi_bidi_cmnd(sc) ? scsi_in(sc)->length : 0,
 			  session->cmdsn,
 			  session->max_cmdsn - session->exp_cmdsn + 1);
+
+	scsi_for_each_sg(sc, sg, scsi_sg_count(sc), i)
+		printk(KERN_ERR "%d sg %p %u %u\n", i, sg, sg->offset, sg->length);
+
 	return 0;
 }
 
diff --git a/drivers/scsi/libiscsi_tcp.c b/drivers/scsi/libiscsi_tcp.c
index e0ea92c..240f2d2 100644
--- a/drivers/scsi/libiscsi_tcp.c
+++ b/drivers/scsi/libiscsi_tcp.c
@@ -95,6 +95,12 @@ static inline void
 iscsi_tcp_segment_init_sg(struct iscsi_segment *segment,
 			  struct scatterlist *sg, unsigned int offset)
 {
+	if (!sg) {
+		printk(KERN_ERR "Bad! offset %u size %u copied %u\n",
+			offset, segment->total_size, segment->total_copied);
+		return;
+	}
+
 	segment->sg = sg;
 	segment->sg_offset = offset;
 	segment->size = min(sg->length - offset,

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-20 18:46       ` Mike Snitzer
@ 2012-02-20 23:44         ` Mike Snitzer
  2012-02-21  0:07           ` Martin K. Petersen
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2012-02-20 23:44 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc, Vivek Goyal

On Mon, Feb 20 2012 at  1:46pm -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> I'm adding more debug printks to blk_rq_map_sg() to try to understand
> what is going on... will share more once I have it.

The REQ_WRITE_SAME request, that SCSI is processing on behalf of the
dm_kcopyd_zero() generated bio, has multiple bios (as if merging
occurred).

Curiously, using the dm_kcopyd_zero() interface, I'll see repeat calls
to elv_rq_merge_ok() for a bio with a given bi_sector:

           <...>-10    [000]  6047.137941: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336
           <...>-10    [000]  6047.137942: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336
           <...>-10    [000]  6047.137942: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336
           <...>-10    [000]  6047.137943: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336
           <...>-10    [000]  6047.137943: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336
           <...>-10    [000]  6047.137944: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336
           <...>-10    [000]  6047.137944: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336
           <...>-10    [000]  6047.137945: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214336

           <...>-10    [000]  6047.137958: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137959: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137959: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137960: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137960: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137961: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137961: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137962: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464
           <...>-10    [000]  6047.137963: elv_rq_merge_ok: WRITE_SAME bio bi_sector=5214464

So something in the dm-kcopyd and dm-io bio submission path is causing
multiple calls to elv_rq_merge_ok() for the _same_ bio, really quite
bizarre!

If I use the bdev_write_same() interface I only get one elv_rq_merge_ok
for a given bi_sector:

           <...>-2088  [001] 10430.160868: elv_rq_merge_ok: WRITE_SAME bio bi_sector=4652928
           <...>-1990  [001] 10432.565862: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6659456
           <...>-1990  [001] 10434.050269: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6667904
           <...>-1990  [001] 10434.238763: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668032
           <...>-1990  [001] 10435.852311: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668160
           <...>-1990  [001] 10437.730371: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668288
           <...>-1990  [001] 10438.275437: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668416
           <...>-1990  [001] 10439.737049: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668544
           <...>-1990  [001] 10440.100221: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668672
           <...>-1990  [001] 10441.987857: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668800
           <...>-1990  [001] 10443.875244: elv_rq_merge_ok: WRITE_SAME bio bi_sector=6668928

And as if the above wasn't weird enough, I can avoid the scatter-gather
NULL pointer (in libiscsi_tcp when using the dm_kcopyd_zero() interface)
if I switch elv_rq_merge_ok() to checking the rq->cmd_flags for
REQ_WRITE_SAME, rather than checking the request's first bio's bi_rw:

        /*
         * Don't merge write same requests
         */
-       if ((bio->bi_rw & REQ_WRITE_SAME) || (rq->bio->bi_rw & REQ_WRITE_SAME))
+       if ((bio->bi_rw & REQ_WRITE_SAME) || (rq->cmd_flags & REQ_WRITE_SAME))
                return 0;

That would seem to imply to me that some WRITE SAME bios are losing the
REQ_WRITE_SAME flag in bio->bi_rw!? (or there is some other un-guarded
merge point -- also associated with the issue above).

(It at least starts to explain why I was seeing 512b sg segments at the
end of a WRITE's sg list with 4096b segments... but I still have
unanswered questions I need to sort out).

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-20 23:44         ` Mike Snitzer
@ 2012-02-21  0:07           ` Martin K. Petersen
  2012-02-21  3:18             ` Mike Snitzer
  0 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-21  0:07 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe,
	dm-devel, michaelc, Vivek Goyal

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> The REQ_WRITE_SAME request, that SCSI is processing on behalf of
Mike> the dm_kcopyd_zero() generated bio, has multiple bios (as if
Mike> merging occurred).

Did you add a fix for the issue Vivek pointed out wrt. merging?

PS. I pushed an updated 'writesame2' branch to kernel.org.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21  0:07           ` Martin K. Petersen
@ 2012-02-21  3:18             ` Mike Snitzer
  2012-02-21  3:57               ` Martin K. Petersen
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2012-02-21  3:18 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc, Vivek Goyal

On Mon, Feb 20 2012 at  7:07pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> The REQ_WRITE_SAME request, that SCSI is processing on behalf of
> Mike> the dm_kcopyd_zero() generated bio, has multiple bios (as if
> Mike> merging occurred).
> 
> Did you add a fix for the issue Vivek pointed out wrt. merging?

Nope, probably the problem (rq_mergeable called in multiple places).
 
> PS. I pushed an updated 'writesame2' branch to kernel.org.

OK, thanks.

One, thing I noticed: bio_has_data returns false for REQ_WRITE_SAME.
But REQ_WRITE_SAME does have data, and it really should be accounted
no?:

@@ -1682,7 +1682,7 @@ void submit_bio(int rw, struct bio *bio)
         * If it's a regular read/write or a barrier with data attached,
         * go through the normal accounting stuff before submission.
         */
-       if (bio_has_data(bio) && !(rw & REQ_DISCARD)) {
+       if (bio_has_data(bio)) {
                if (rw & WRITE) {
                        count_vm_events(PGPGOUT, count);
                } else {

That aside, I tried your updated code and hit this BUG when I use the
patch that has always worked (my dm-thin patch that uses the
blkdev_issue_write_same() interface):

------------[ cut here ]------------
kernel BUG at drivers/scsi/scsi_lib.c:1116!
invalid opcode: 0000 [#1] SMP 
CPU 1 
Modules linked in: dm_thin_pool dm_persistent_data dm_bufio libcrc32c dm_mod sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_net virtio_balloon i2c_piix4 i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded : dm_thin_pool]

Pid: 33, comm: kworker/1:2 Tainted: G        W    3.2.0-snitm+ #177 Red Hat KVM
RIP: 0010:[<ffffffff81293c06>]  [<ffffffff81293c06>] scsi_setup_blk_pc_cmnd+0x91/0x11f
RSP: 0000:ffff880117d99b00  EFLAGS: 00010046
RAX: ffff8801191db838 RBX: ffff880119315a80 RCX: 8c6318c6318c6320
RDX: ffff88011f40dabc RSI: ffffffff81395722 RDI: ffffffff8107254a
RBP: ffff880117d99b20 R08: ffff880117f96880 R09: ffffffff815509b7
R10: ffff880117f96800 R11: 0000000000000046 R12: ffff8801191db740
R13: 0000000000000000 R14: ffff880117f96800 R15: ffff880117f96800
FS:  0000000000000000(0000) GS:ffff88011f400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000410fb0 CR3: 0000000111b93000 CR4: 00000000000006e0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Process kworker/1:2 (pid: 33, threadinfo ffff880117d98000, task ffff880117d94880)
Stack:
 0000000000008000 ffff8801191db740 0000000000420800 0000000000000001
 ffff880117d99ba0 ffffffff8129e271 ffff8801191db740 0000000000000001
 ffff880117d99b70 ffff8801191db740 ffff8801175c4f48 ffff8801175c4f48
Call Trace:
 [<ffffffff8129e271>] sd_prep_fn+0x3da/0xce2
 [<ffffffff811c35be>] ? elv_dispatch_add_tail+0x6f/0x71
 [<ffffffff811c9ee0>] blk_peek_request+0xee/0x1d8
 [<ffffffff812930b7>] scsi_request_fn+0x7d/0x48d
 [<ffffffff811c4480>] __blk_run_queue+0x1e/0x20
 [<ffffffff811c88e3>] queue_unplugged+0x8a/0xa2
 [<ffffffff811c9034>] blk_flush_plug_list+0x1a9/0x1dd
 [<ffffffffa00e3e32>] ? process_jobs+0xe1/0xe1 [dm_mod]
 [<ffffffff811c9080>] blk_finish_plug+0x18/0x39
 [<ffffffffa00e3ea4>] do_work+0x72/0x7d [dm_mod]
 [<ffffffff81049ccd>] process_one_work+0x213/0x37b
 [<ffffffff81049c3e>] ? process_one_work+0x184/0x37b
 [<ffffffff8104a16a>] worker_thread+0x138/0x21c
 [<ffffffff8104a032>] ? rescuer_thread+0x1fd/0x1fd
 [<ffffffff8104de3a>] kthread+0xa7/0xaf
 [<ffffffff810744f4>] ? trace_hardirqs_on_caller+0x16/0x166
 [<ffffffff8139d6f4>] kernel_thread_helper+0x4/0x10
 [<ffffffff81395974>] ? retint_restore_args+0x13/0x13
 [<ffffffff8104dd93>] ? __init_kthread_worker+0x5b/0x5b
 [<ffffffff8139d6f0>] ? gs_change+0x13/0x13
Code: 00 88 83 e4 00 00 00 49 8b 84 24 08 01 00 00 c6 43 48 00 48 89 43 50 49 83 7c 24 60 00 74 26 66 41 83 bc 24 d0 00 00 00 00 75 04 <0f> 0b eb fe be 20 00 00 00 48 89 df e8 55 fd ff ff 85 c0 74 2d 
RIP  [<ffffffff81293c06>] scsi_setup_blk_pc_cmnd+0x91/0x11f
 RSP <ffff880117d99b00>
---[ end trace a7919e7f17c0a727 ]---

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21  3:18             ` Mike Snitzer
@ 2012-02-21  3:57               ` Martin K. Petersen
  2012-02-21  6:55                 ` Mike Snitzer
  0 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-21  3:57 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe,
	dm-devel, michaelc, Vivek Goyal

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike,

Mike> One, thing I noticed: bio_has_data returns false for
Mike> REQ_WRITE_SAME.  But REQ_WRITE_SAME does have data, and it really
Mike> should be accounted no?:

I decided against it. We don't count discards either and write sames are
not really page-out types of activity. Happy to change it if people
think this is something we should handle. But what do we actually count?
A single logical block or the number of sectors written by the target
device?


Mike> That aside, I tried your updated code and hit this BUG when I use
Mike> the patch that has always worked (my dm-thin patch that uses the
Mike> blkdev_issue_write_same() interface):

Mike> ------------[ cut here ]------------ kernel BUG at
Mike> drivers/scsi/scsi_lib.c:1116!

That's the

                BUG_ON(!req->nr_phys_segments);

in scsi_setup_blk_pc_cmnd(). We set nr_phys_segments to bi_phys_segments
just before calling that function. So how did you end up with a
zero-segment bio?

PS. I was unsuccessful in getting the thinp test suite working. If you
can come up with a simpler way for me to get DM to issue a write same
then please share...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21  3:57               ` Martin K. Petersen
@ 2012-02-21  6:55                 ` Mike Snitzer
  2012-02-21 12:31                   ` Martin K. Petersen
  2012-02-21 19:47                   ` Vivek Goyal
  0 siblings, 2 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-21  6:55 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc, Vivek Goyal

On Mon, Feb 20 2012 at 10:57pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike,
> 
> Mike> One, thing I noticed: bio_has_data returns false for
> Mike> REQ_WRITE_SAME.  But REQ_WRITE_SAME does have data, and it really
> Mike> should be accounted no?:
> 
> I decided against it. We don't count discards either and write sames are
> not really page-out types of activity. Happy to change it if people
> think this is something we should handle. But what do we actually count?
> A single logical block or the number of sectors written by the target
> device?

I'd say the single logical block.

> Mike> That aside, I tried your updated code and hit this BUG when I use
> Mike> the patch that has always worked (my dm-thin patch that uses the
> Mike> blkdev_issue_write_same() interface):
> 
> Mike> ------------[ cut here ]------------ kernel BUG at
> Mike> drivers/scsi/scsi_lib.c:1116!
> 
> That's the
> 
>                 BUG_ON(!req->nr_phys_segments);
> 
> in scsi_setup_blk_pc_cmnd(). We set nr_phys_segments to bi_phys_segments
> just before calling that function. So how did you end up with a
> zero-segment bio?

All I did was apply this patch to your writesame2 branch:
http://people.redhat.com/msnitzer/patches/upstream/dm-io-WRITE_SAME/dm-thin-use-WRITE_SAME-for-zeroing.patch
(and run the thinp-test-suite test I referenced in the other mail).
-- I'm just using the blkdev_issue_write_same() interface.. nothing special

I think the bio_has_data() change is at the heart of the BUG_ON().

Now this branch in blk_rq_bio_prep() is no longer taken:

        if (bio_has_data(bio)) {
                rq->nr_phys_segments = bio_phys_segments(q, bio);
                rq->buffer = bio_data(bio);
        }

This patch fixed the issue for me (though I'm still missing why
bio->bi_phys_segments was 0 given blkdev_issue_write_same() sets it):

Index: linux-2.6/drivers/scsi/sd.c
===================================================================
--- linux-2.6.orig/drivers/scsi/sd.c
+++ linux-2.6/drivers/scsi/sd.c
@@ -697,6 +697,7 @@ out:
 static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
 {
 	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct request_queue *q = sdkp->disk->queue;
 	struct bio *bio = rq->bio;
 	sector_t sector = bio->bi_sector;
 	unsigned int nr_sectors = bio_sectors(bio);
@@ -711,7 +712,8 @@ static int sd_setup_write_same_cmnd(stru
 
 	rq->timeout = SD_WRITE_SAME_TIMEOUT;
 	rq->__data_len = rq->resid_len = sdp->sector_size;
-	rq->nr_phys_segments = bio->bi_phys_segments;
+	rq->nr_phys_segments = bio_phys_segments(q, bio);
+	rq->buffer = bio_data(bio);
 	memset(rq->cmd, 0, rq->cmd_len);
 
 	if (sdkp->ws16 || sector > 0xffffffff || nr_sectors > 0xffff) {
 

> PS. I was unsuccessful in getting the thinp test suite working. If you
> can come up with a simpler way for me to get DM to issue a write same
> then please share...

Any new block that gets provisioned will trigger zeroing (unless the
entire block will be consumed with data -- in that case the zero is
avoided).  If you use the default thinp block size: a random IO
benchmark or a simple dd, of a partial block, should trigger zeroing.

Mike

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21  6:55                 ` Mike Snitzer
@ 2012-02-21 12:31                   ` Martin K. Petersen
  2012-02-21 14:42                     ` Mike Snitzer
  2012-02-21 19:47                   ` Vivek Goyal
  1 sibling, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-21 12:31 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe,
	dm-devel, michaelc, Vivek Goyal

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

Mike> This patch fixed the issue for me (though I'm still missing why
Mike> bio->bi_phys_segments was 0 given blkdev_issue_write_same() sets
Mike> it):

Ok, I see what's going on. You have your own dm-specific make request
function. When cloning the original bio phys_segments isn't carried
over. And that's why we see 0 in sd.

For discard this is not a problem because we hardwire things in sd.c
regardless of what was passed down. And besides you have special
handling for mapping discards in DM.

I was trying to avoid perpetuating Christoph's horrible hack (his words,
not mine). But maybe it's better to do it the same way as for discard so
we only have to have to deal with pure evil in one place.

I'll contemplate a bit...


PS. The good news is that your async stuff works when I set phys_segs to
1 in sd.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21 12:31                   ` Martin K. Petersen
@ 2012-02-21 14:42                     ` Mike Snitzer
  2012-02-21 19:33                       ` Mike Snitzer
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2012-02-21 14:42 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc, Vivek Goyal

On Tue, Feb 21 2012 at  7:31am -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> Mike> This patch fixed the issue for me (though I'm still missing why
> Mike> bio->bi_phys_segments was 0 given blkdev_issue_write_same() sets
> Mike> it):
> 
> Ok, I see what's going on. You have your own dm-specific make request
> function. When cloning the original bio phys_segments isn't carried
> over. And that's why we see 0 in sd.

Ah, yes indeed.  I was just setting out to answer the why on it so
you've saved me some time, thanks!

> For discard this is not a problem because we hardwire things in sd.c
> regardless of what was passed down. And besides you have special
> handling for mapping discards in DM.

sd allocating the page used for discard was what enabled DM to have
discard support; otherwise cloning a discard required allocation of the
page and it all got _really_ ugly.

> I was trying to avoid perpetuating Christoph's horrible hack (his words,
> not mine). But maybe it's better to do it the same way as for discard so
> we only have to have to deal with pure evil in one place.

Which hack are you referring to?  sd allocates the page used for
discard (I had a hand in that work, along with tomo, and don't hold it
to be too big a hack really).

But I'm not immediately seeing a clean way to do so for WRITE SAME
because the user provided buffer would need to get down to sd somehow.

> I'll contemplate a bit...
> 
> 
> PS. The good news is that your async stuff works when I set phys_segs to
> 1 in sd.

Yeah, it worked with the patch I provided in my previous mail too.  But
ultimately the async stuff wasn't working for me due to merging.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21 14:42                     ` Mike Snitzer
@ 2012-02-21 19:33                       ` Mike Snitzer
  2012-02-21 21:31                         ` Martin K. Petersen
  0 siblings, 1 reply; 44+ messages in thread
From: Mike Snitzer @ 2012-02-21 19:33 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc, Vivek Goyal

On Tue, Feb 21 2012 at  9:41am -0500,
Mike Snitzer <snitzer@redhat.com> wrote:

> On Tue, Feb 21 2012 at  7:31am -0500,
> Martin K. Petersen <martin.petersen@oracle.com> wrote:

> > PS. The good news is that your async stuff works when I set phys_segs to
> > 1 in sd.
> 
> Yeah, it worked with the patch I provided in my previous mail too.  But
> ultimately the async stuff wasn't working for me due to merging.

(not related to async interface but...)

After further testing (iSCSI from with a guest) it is clear that we
still have a problem with REQ_WRITE_SAME bios being merged into WRITE
requests (I added a debugging WARN_ON_ONCE to generate the following):

------------[ cut here ]------------
WARNING: at block/blk-merge.c:476 blk_rq_merge_ok+0x4d/0xb4()
Hardware name: KVM
Modules linked in: dm_thin_pool dm_persistent_data dm_bufio libcrc32c dm_mod sunrpc iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi virtio_net virtio_balloon i2c_piix4 i2c_core virtio_blk virtio_pci virtio_ring virtio [last unloaded: dm_thin_pool]
Pid: 5, comm: kworker/u:0 Not tainted 3.2.0-snitm+ #185
Call Trace:
 [<ffffffff81042efa>] warn_slowpath_common+0x85/0x9d
 [<ffffffff81042f2c>] warn_slowpath_null+0x1a/0x1c
 [<ffffffff811cd8bc>] blk_rq_merge_ok+0x4d/0xb4
 [<ffffffff811c4213>] elv_rq_merge_ok+0x17/0x47
 [<ffffffff811c4287>] elv_merge+0x44/0xc2
 [<ffffffff811ca815>] blk_queue_bio+0xf2/0x2d5
 [<ffffffff811c9ca2>] generic_make_request+0xa1/0xe2
 [<ffffffff811c9dc2>] submit_bio+0xdf/0x119
 [<ffffffff8113191d>] ? bio_alloc_bioset+0x4d/0xc2
 [<ffffffff811ceee3>] blkdev_issue_write_same+0x203/0x26c
 [<ffffffff8106511a>] ? local_clock+0x41/0x5a
 [<ffffffffa00f5dbf>] do_worker+0x3b6/0x5f1 [dm_thin_pool]
 [<ffffffff8107084e>] ? trace_hardirqs_off_caller+0x1f/0x9e
 [<ffffffffa00f5a09>] ? process_shared_bio+0x36e/0x36e [dm_thin_pool]
 [<ffffffffa00f5a09>] ? process_shared_bio+0x36e/0x36e [dm_thin_pool]
 [<ffffffff8105b6d0>] process_one_work+0x213/0x37b
 [<ffffffff8105b641>] ? process_one_work+0x184/0x37b
 [<ffffffff8105bb6d>] worker_thread+0x138/0x21c
 [<ffffffff8105ba35>] ? rescuer_thread+0x1fd/0x1fd
 [<ffffffff8105f6b1>] kthread+0xa0/0xa8
 [<ffffffff81073663>] ? trace_hardirqs_on_caller+0x12f/0x166
 [<ffffffff81398384>] kernel_thread_helper+0x4/0x10
 [<ffffffff81390534>] ? retint_restore_args+0x13/0x13
 [<ffffffff8105f611>] ? __init_kthread_worker+0x5b/0x5b
 [<ffffffff81398380>] ? gs_change+0x13/0x13
---[ end trace 07b896d4bdef61b0 ]---

This patch fixes it for me, please feel free to add it to your series:

From: Mike Snitzer <snitzer@redhat.com>
Date:   Tue Feb 21 13:55:42 2012 -0500

    block: disallow certain bios from being merged into a request
    
    Not all WRITE bios are pure WRITEs (there may be other flags set,
    e.g. REQ_WRITE_SAME).  Introduce bio_mergeable() and have
    blk_rq_merge_ok() check that a given bio is mergeable.
    
    Signed-off-by: Mike Snitzer <snitzer@redhat.com>

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 3f73823..81484d3 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -468,7 +468,7 @@ int blk_attempt_req_merge(struct request_queue *q, struct request *rq,
 
 bool blk_rq_merge_ok(struct request *rq, struct bio *bio)
 {
-	if (!rq_mergeable(rq))
+	if (!rq_mergeable(rq) || !bio_mergeable(bio))
 		return false;
 
 	/* don't merge discard requests and secure discard requests */
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5a505d7..7cf2d37 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -568,7 +568,10 @@ static inline void blk_clear_queue_full(struct request_queue *q, int sync)
 #define rq_mergeable(rq)	\
 	(!((rq)->cmd_flags & RQ_NOMERGE_FLAGS) && (rq)->cmd_type == REQ_TYPE_FS)
 
-
+#define BIO_NOMERGE_FLAGS	\
+	(REQ_DISCARD | REQ_WRITE_SAME)
+#define bio_mergeable(bio)	\
+	(!((bio)->bi_rw & BIO_NOMERGE_FLAGS))
 
 /*
  * q->prep_rq_fn return values

^ permalink raw reply related	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21  6:55                 ` Mike Snitzer
  2012-02-21 12:31                   ` Martin K. Petersen
@ 2012-02-21 19:47                   ` Vivek Goyal
  2012-02-21 19:56                     ` Martin K. Petersen
  1 sibling, 1 reply; 44+ messages in thread
From: Vivek Goyal @ 2012-02-21 19:47 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe,
	dm-devel, michaelc

On Tue, Feb 21, 2012 at 01:55:04AM -0500, Mike Snitzer wrote:

[..]
> I think the bio_has_data() change is at the heart of the BUG_ON().
> 
> Now this branch in blk_rq_bio_prep() is no longer taken:
> 
>         if (bio_has_data(bio)) {
>                 rq->nr_phys_segments = bio_phys_segments(q, bio);
>                 rq->buffer = bio_data(bio);
>         }
> 
> This patch fixed the issue for me (though I'm still missing why
> bio->bi_phys_segments was 0 given blkdev_issue_write_same() sets it):
> 
> Index: linux-2.6/drivers/scsi/sd.c
> ===================================================================
> --- linux-2.6.orig/drivers/scsi/sd.c
> +++ linux-2.6/drivers/scsi/sd.c
> @@ -697,6 +697,7 @@ out:
>  static int sd_setup_write_same_cmnd(struct scsi_device *sdp, struct request *rq)
>  {
>  	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
> +	struct request_queue *q = sdkp->disk->queue;
>  	struct bio *bio = rq->bio;
>  	sector_t sector = bio->bi_sector;
>  	unsigned int nr_sectors = bio_sectors(bio);
> @@ -711,7 +712,8 @@ static int sd_setup_write_same_cmnd(stru
>  
>  	rq->timeout = SD_WRITE_SAME_TIMEOUT;
>  	rq->__data_len = rq->resid_len = sdp->sector_size;
> -	rq->nr_phys_segments = bio->bi_phys_segments;
> +	rq->nr_phys_segments = bio_phys_segments(q, bio);
> +	rq->buffer = bio_data(bio);

This kind of sounds not so good. We should have got it covered in
blk_rq_bio_prep().

First we returned "false" from bio_has_data() and skipped above assignment,
in blk_rq_bio_prep() and now we are trying to make up for it. 

Maybe returning false from bio_has_data() for WRITE_SAME is not such a
good idea because this bio has one logical block of data.

Martin, thoughts?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21 19:47                   ` Vivek Goyal
@ 2012-02-21 19:56                     ` Martin K. Petersen
  0 siblings, 0 replies; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-21 19:56 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: Mike Snitzer, Martin K. Petersen, linux-scsi, James.Bottomley,
	jaxboe, dm-devel, michaelc

>>>>> "Vivek" == Vivek Goyal <vgoyal@redhat.com> writes:

Vivek> First we returned "false" from bio_has_data() and skipped above
Vivek> assignment, in blk_rq_bio_prep() and now we are trying to make up
Vivek> for it.

Vivek> Maybe returning false from bio_has_data() for WRITE_SAME is not
Vivek> such a good idea because this bio has one logical block of data.

bio_has_data() is used for two different things. In my current tree I
split it up into bio_has_data() and bio_is_rw().

But given Mike's merge observation I think I'll take a step back and see
if I can come up with a better way to handle these
REQ_I_CANT_BELIEVE_ITS_NOT_FS requests in general. I really don't want
to have all the special cases scattered all over the place.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21 19:33                       ` Mike Snitzer
@ 2012-02-21 21:31                         ` Martin K. Petersen
  2012-02-21 23:36                           ` Mike Snitzer
  0 siblings, 1 reply; 44+ messages in thread
From: Martin K. Petersen @ 2012-02-21 21:31 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, linux-scsi, James.Bottomley, jaxboe,
	dm-devel, michaelc, Vivek Goyal

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

>> Yeah, it worked with the patch I provided in my previous mail too.
>> But ultimately the async stuff wasn't working for me due to merging.

New writesame3 branch on kernel.org. Please give it a whirl...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support]
  2012-02-21 21:31                         ` Martin K. Petersen
@ 2012-02-21 23:36                           ` Mike Snitzer
  0 siblings, 0 replies; 44+ messages in thread
From: Mike Snitzer @ 2012-02-21 23:36 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: linux-scsi, James.Bottomley, jaxboe, dm-devel, michaelc, Vivek Goyal

On Tue, Feb 21 2012 at  4:31pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> >>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:
> 
> >> Yeah, it worked with the patch I provided in my previous mail too.
> >> But ultimately the async stuff wasn't working for me due to merging.
> 
> New writesame3 branch on kernel.org. Please give it a whirl...

Looks good.  And it works great (with both your improved
blkdev_issue_write_same interface and the dm_kcopyd_zero interface).

Please feel free to add my Acked-by: Mike Snitzer <snitzer@redhat.com>
to the entire set.

(and thanks for working with me on chasing down and/or explaining the
various issues I was seeing!)

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2012-02-21 23:36 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-01-31  0:31 Write same support Martin K. Petersen
2012-01-31  0:31 ` [PATCH 1/5] block: Implement support for WRITE SAME Martin K. Petersen
2012-02-07 21:40   ` Vivek Goyal
2012-02-13 22:19     ` Martin K. Petersen
2012-02-14  8:05       ` Rolf Eike Beer
2012-02-15 15:33       ` Vivek Goyal
2012-02-16  3:29         ` Martin K. Petersen
2012-02-16 17:16           ` Vivek Goyal
2012-02-16 19:12             ` Martin K. Petersen
2012-02-08 22:50   ` Mike Snitzer
2012-02-08 23:12     ` Martin K. Petersen
2012-02-09  3:33       ` Mike Snitzer
2012-02-09  3:40   ` Mike Snitzer
2012-01-31  0:31 ` [PATCH 2/5] block: Make blkdev_issue_zeroout use " Martin K. Petersen
2012-01-31  0:31 ` [PATCH 3/5] block: ioctl to zero block ranges Martin K. Petersen
2012-01-31  0:31 ` [PATCH 4/5] scsi: Add a report opcode helper Martin K. Petersen
2012-01-31 19:53   ` Jeff Garzik
2012-01-31 20:16     ` Martin K. Petersen
2012-01-31  0:31 ` [PATCH 5/5] sd: Implement support for WRITE SAME Martin K. Petersen
2012-02-20 16:16   ` Mike Snitzer
2012-02-20 17:36     ` Martin K. Petersen
2012-02-20 18:28       ` Mike Snitzer
2012-02-03 19:15 ` Write same support Mike Snitzer
2012-02-03 19:20 ` Roland Dreier
2012-02-16 20:02 ` Mike Snitzer
2012-02-16 20:46   ` Martin K. Petersen
2012-02-16 21:09     ` Mike Snitzer
2012-02-16 21:03   ` dm-io async WRITE_SAME results in iSCSI NULL pointer [was: Re: Write same support] Mike Snitzer
2012-02-16 21:25     ` Mike Christie
2012-02-16 21:35       ` Mike Snitzer
2012-02-20 17:44     ` Martin K. Petersen
2012-02-20 18:46       ` Mike Snitzer
2012-02-20 23:44         ` Mike Snitzer
2012-02-21  0:07           ` Martin K. Petersen
2012-02-21  3:18             ` Mike Snitzer
2012-02-21  3:57               ` Martin K. Petersen
2012-02-21  6:55                 ` Mike Snitzer
2012-02-21 12:31                   ` Martin K. Petersen
2012-02-21 14:42                     ` Mike Snitzer
2012-02-21 19:33                       ` Mike Snitzer
2012-02-21 21:31                         ` Martin K. Petersen
2012-02-21 23:36                           ` Mike Snitzer
2012-02-21 19:47                   ` Vivek Goyal
2012-02-21 19:56                     ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.