All of lore.kernel.org
 help / color / mirror / Atom feed
* QUEUE_FLAG_CLUSTER?
@ 2010-11-20  1:24 Ed Lin - PTU
  2010-11-25 15:21 ` [dm-devel] QUEUE_FLAG_CLUSTER? Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Ed Lin - PTU @ 2010-11-20  1:24 UTC (permalink / raw)
  To: linux-scsi; +Cc: dm-devel, Promise_Linux

Hello,

When a dm is created, it will adjust queue limits from underlying
devices.
If the no_cluster member is not set, QUEUE_FLAG_CLUSTER will be set.
When a bio is constructed, QUEUE_FLAG_CLUSTER enables segment
merge. Later, however, when the request is mapped to scatterlist, if the
underlying (scsi) device does not enable clustering, the mergeable
segments
will be separated, and unexpected sg count surge follows. If the segment
count is around the limit,  the final sg count could break the limit due
to the
extra addition. This is caused by mismatch of cluster setting between dm
and underlying devices.

For the stex driver, this problem was made obvious when code in
blk_set_default_limits() (block/blk-settings.c) was changed from
	lim->max_sectors = lim->max_hw_sectors = SAFE_MAX_SECTORS;
in 2.6.31, to
	lim->max_sectors = BLK_DEF_MAX_SECTORS;
	lim->max_hw_sectors = INT_MAX;
in 2.6.32.

The dm already has function to adjust limits based on underlying device
limits. So, is it good to add the no_cluster setting to the queue limits
at the scsi side? For example, some code change like the following?

Thanks,
Ed Lin


diff -rupN a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
--- a/drivers/scsi/scsi_lib.c	2010-11-19 16:45:56.000000000 -0800
+++ b/drivers/scsi/scsi_lib.c	2010-11-19 16:46:27.000000000 -0800
@@ -1641,8 +1641,10 @@ struct request_queue *__scsi_alloc_queue
 	blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
 
 	/* New queue, no concurrency on queue_flags */
-	if (!shost->use_clustering)
+	if (!shost->use_clustering) {
 		queue_flag_clear_unlocked(QUEUE_FLAG_CLUSTER, q);
+		q->limits.no_cluster = 1;
+	}
 
 	/*
 	 * set a reasonable default alignment on word boundaries: the

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dm-devel] QUEUE_FLAG_CLUSTER?
  2010-11-20  1:24 QUEUE_FLAG_CLUSTER? Ed Lin - PTU
@ 2010-11-25 15:21 ` Martin K. Petersen
  2010-11-25 16:35   ` block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead Martin K. Petersen
  2010-11-29 19:08   ` [dm-devel] QUEUE_FLAG_CLUSTER? Ed Lin - PTU
  0 siblings, 2 replies; 12+ messages in thread
From: Martin K. Petersen @ 2010-11-25 15:21 UTC (permalink / raw)
  To: device-mapper development; +Cc: linux-scsi, Promise_Linux

>>>>> "Ed" == "Ed Lin <- PTU" <ed.lin@promise.com>> writes:

Ed,

Ed> The dm already has function to adjust limits based on underlying
Ed> device limits. So, is it good to add the no_cluster setting to the
Ed> queue limits at the scsi side? For example, some code change like
Ed> the following?

We should never issue a command that does not adhere to the limits set
by the device driver. Upon inspection I agree this is busted and it's my
fault.

DM does not have a request_queue when stacking and as a result we ended
up with two entities tracking whether a device supports clustering or
not. And as it turns out the queue flag and the queue limits are not
always in agreement.

I've coded up a proper fix for this and will post a patch shortly. I'm
just running a bunch of tests to ensure we do the right thing...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  2010-11-25 15:21 ` [dm-devel] QUEUE_FLAG_CLUSTER? Martin K. Petersen
@ 2010-11-25 16:35   ` Martin K. Petersen
  2010-11-25 18:43     ` Jens Axboe
  2010-11-29 19:08   ` [dm-devel] QUEUE_FLAG_CLUSTER? Ed Lin - PTU
  1 sibling, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2010-11-25 16:35 UTC (permalink / raw)
  To: Jens Axboe; +Cc: device-mapper development, linux-scsi, Promise_Linux


When stacking devices a request_queue is not always available. This
forced us to have a no_cluster flag in the queue_limits that could be
used as a carrier until the request_queue had been set up for a
metadevice.

There were several problems with that approach. First of all it was up
to the stacking device to remember to set queue flag after stacking had
completed. Also, the queue flag and the queue limits had to be kept in
sync at all times. We got that wrong, which could lead to us issuing
commands that went beyond the max scatterlist limit set by the driver.

The proper fix is to avoid having two flags for tracking the same thing.
We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
block layer merging functions. The queue_limit 'no_cluster' is turned
into 'cluster' to avoid double negatives and to ease stacking.
Clustering defaults to being enabled as before. The queue flag logic is
removed from the stacking function, and explicitly setting the cluster
flag is no longer necessary in DM and MD.

Reported-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---

 block/blk-merge.c       |    6 +++---
 block/blk-settings.c    |   25 ++-----------------------
 block/blk-sysfs.c       |    2 +-
 drivers/md/dm-table.c   |    5 -----
 drivers/md/md.c         |    3 ---
 drivers/scsi/scsi_lib.c |    2 +-
 include/linux/blkdev.h  |    9 ++++++---
 7 files changed, 13 insertions(+), 39 deletions(-)

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 77b7c26..11b9952 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -21,7 +21,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 		return 0;
 
 	fbio = bio;
-	cluster = test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
+	cluster = blk_queue_cluster(q);
 	seg_size = 0;
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
@@ -87,7 +87,7 @@ EXPORT_SYMBOL(blk_recount_segments);
 static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
 				   struct bio *nxt)
 {
-	if (!test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags))
+	if (blk_queue_cluster(q))
 		return 0;
 
 	if (bio->bi_seg_back_size + nxt->bi_seg_front_size >
@@ -123,7 +123,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
 	int nsegs, cluster;
 
 	nsegs = 0;
-	cluster = test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
+	cluster = blk_queue_cluster(q);
 
 	/*
 	 * for each bio in rq
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 701859f..e55f5fc 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -126,7 +126,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->alignment_offset = 0;
 	lim->io_opt = 0;
 	lim->misaligned = 0;
-	lim->no_cluster = 0;
+	lim->cluster = 1;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -464,15 +464,6 @@ EXPORT_SYMBOL(blk_queue_io_opt);
 void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b)
 {
 	blk_stack_limits(&t->limits, &b->limits, 0);
-
-	if (!t->queue_lock)
-		WARN_ON_ONCE(1);
-	else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
-		unsigned long flags;
-		spin_lock_irqsave(t->queue_lock, flags);
-		queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
-		spin_unlock_irqrestore(t->queue_lock, flags);
-	}
 }
 EXPORT_SYMBOL(blk_queue_stack_limits);
 
@@ -545,7 +536,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->io_min = max(t->io_min, b->io_min);
 	t->io_opt = lcm(t->io_opt, b->io_opt);
 
-	t->no_cluster |= b->no_cluster;
+	t->cluster &= b->cluster;
 	t->discard_zeroes_data &= b->discard_zeroes_data;
 
 	/* Physical block size a multiple of the logical block size? */
@@ -641,7 +632,6 @@ void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
 		       sector_t offset)
 {
 	struct request_queue *t = disk->queue;
-	struct request_queue *b = bdev_get_queue(bdev);
 
 	if (bdev_stack_limits(&t->limits, bdev, offset >> 9) < 0) {
 		char top[BDEVNAME_SIZE], bottom[BDEVNAME_SIZE];
@@ -652,17 +642,6 @@ void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
 		printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
 		       top, bottom);
 	}
-
-	if (!t->queue_lock)
-		WARN_ON_ONCE(1);
-	else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(t->queue_lock, flags);
-		if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags))
-			queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
-		spin_unlock_irqrestore(t->queue_lock, flags);
-	}
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 013457f..41fb691 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -119,7 +119,7 @@ static ssize_t queue_max_integrity_segments_show(struct request_queue *q, char *
 
 static ssize_t queue_max_segment_size_show(struct request_queue *q, char *page)
 {
-	if (test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags))
+	if (blk_queue_cluster(q))
 		return queue_var_show(queue_max_segment_size(q), (page));
 
 	return queue_var_show(PAGE_CACHE_SIZE, (page));
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 90267f8..e2da191 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1131,11 +1131,6 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	 */
 	q->limits = *limits;
 
-	if (limits->no_cluster)
-		queue_flag_clear_unlocked(QUEUE_FLAG_CLUSTER, q);
-	else
-		queue_flag_set_unlocked(QUEUE_FLAG_CLUSTER, q);
-
 	if (!dm_table_supports_discards(t))
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
 	else
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 84c46a1..52694d2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4296,9 +4296,6 @@ static int md_alloc(dev_t dev, char *name)
 		goto abort;
 	mddev->queue->queuedata = mddev;
 
-	/* Can be unlocked because the queue is new: no concurrency */
-	queue_flag_set_unlocked(QUEUE_FLAG_CLUSTER, mddev->queue);
-
 	blk_queue_make_request(mddev->queue, md_make_request);
 
 	disk = alloc_disk(1 << shift);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b55b0ec..493f599 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1645,7 +1645,7 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 
 	/* New queue, no concurrency on queue_flags */
 	if (!shost->use_clustering)
-		queue_flag_clear_unlocked(QUEUE_FLAG_CLUSTER, q);
+		q->limits.cluster = 0;
 
 	/*
 	 * set a reasonable default alignment on word boundaries: the
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index aae86fd..95aeeeb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -250,7 +250,7 @@ struct queue_limits {
 
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
-	unsigned char		no_cluster;
+	unsigned char		cluster;
 	signed char		discard_zeroes_data;
 };
 
@@ -380,7 +380,6 @@ struct request_queue
 #endif
 };
 
-#define QUEUE_FLAG_CLUSTER	0	/* cluster several segments into 1 */
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
 #define QUEUE_FLAG_STOPPED	2	/* queue is stopped */
 #define	QUEUE_FLAG_SYNCFULL	3	/* read queue has been filled */
@@ -403,7 +402,6 @@ struct request_queue
 #define QUEUE_FLAG_SECDISCARD  19	/* supports SECDISCARD */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
-				 (1 << QUEUE_FLAG_CLUSTER) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
 				 (1 << QUEUE_FLAG_ADD_RANDOM))
@@ -510,6 +508,11 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 
 #define rq_data_dir(rq)		((rq)->cmd_flags & 1)
 
+static inline unsigned int blk_queue_cluster(struct request_queue *q)
+{
+	return q->limits.cluster;
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  2010-11-25 16:35   ` block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead Martin K. Petersen
@ 2010-11-25 18:43     ` Jens Axboe
  2010-11-26  0:37       ` Martin K. Petersen
  0 siblings, 1 reply; 12+ messages in thread
From: Jens Axboe @ 2010-11-25 18:43 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: device-mapper development, linux-scsi, Promise_Linux

On 2010-11-25 17:35, Martin K. Petersen wrote:
> 
> When stacking devices a request_queue is not always available. This
> forced us to have a no_cluster flag in the queue_limits that could be
> used as a carrier until the request_queue had been set up for a
> metadevice.
> 
> There were several problems with that approach. First of all it was up
> to the stacking device to remember to set queue flag after stacking had
> completed. Also, the queue flag and the queue limits had to be kept in
> sync at all times. We got that wrong, which could lead to us issuing
> commands that went beyond the max scatterlist limit set by the driver.
> 
> The proper fix is to avoid having two flags for tracking the same thing.
> We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
> block layer merging functions. The queue_limit 'no_cluster' is turned
> into 'cluster' to avoid double negatives and to ease stacking.
> Clustering defaults to being enabled as before. The queue flag logic is
> removed from the stacking function, and explicitly setting the cluster
> flag is no longer necessary in DM and MD.

Great, the two different values and needing to sync them was horrible.
What kind of testing did you do? Have to be a little extra careful at
this point.

> @@ -87,7 +87,7 @@ EXPORT_SYMBOL(blk_recount_segments);
>  static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
>  				   struct bio *nxt)
>  {
> -	if (!test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags))
> +	if (blk_queue_cluster(q))
>  		return 0;

Oops, inverted this one.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  2010-11-25 18:43     ` Jens Axboe
@ 2010-11-26  0:37       ` Martin K. Petersen
  2010-11-26  2:28         ` Mike Snitzer
  2010-12-01 18:43         ` Jens Axboe
  0 siblings, 2 replies; 12+ messages in thread
From: Martin K. Petersen @ 2010-11-26  0:37 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, device-mapper development, linux-scsi, Promise_Linux

>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes:

Jens> Great, the two different values and needing to sync them was
Jens> horrible.  What kind of testing did you do? Have to be a little
Jens> extra careful at this point.

Yeah, we should probably let it soak a bit in -next just to make sure.

There really aren't many devices from this millennium that don't support
clustering. Which I guess is why we haven't seen any problems.

I ended up disabling clustering in one of the FC drivers to test with a
real workload. Threw in a BUG_ON(nsegs > queue_max_segments(q)) for good
measure.

I also tested mixing and matching clustered and non-clustered bottom
devices while stacking with DM.

New version below, fixing the things you and Matthew pointed out...



block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead

When stacking devices, a request_queue is not always available. This
forced us to have a no_cluster flag in the queue_limits that could be
used as a carrier until the request_queue had been set up for a
metadevice.

There were several problems with that approach. First of all it was up
to the stacking device to remember to set queue flag after stacking had
completed. Also, the queue flag and the queue limits had to be kept in
sync at all times. We got that wrong, which could lead to us issuing
commands that went beyond the max scatterlist limit set by the driver.

The proper fix is to avoid having two flags for tracking the same thing.
We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
block layer merging functions. The queue_limit 'no_cluster' is turned
into 'cluster' to avoid double negatives and to ease stacking.
Clustering defaults to being enabled as before. The queue flag logic is
removed from the stacking function, and explicitly setting the cluster
flag is no longer necessary in DM and MD.

Reported-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

diff --git a/block/blk-merge.c b/block/blk-merge.c
index 77b7c26..74bc4a7 100644
--- a/block/blk-merge.c
+++ b/block/blk-merge.c
@@ -21,7 +21,7 @@ static unsigned int __blk_recalc_rq_segments(struct request_queue *q,
 		return 0;
 
 	fbio = bio;
-	cluster = test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
+	cluster = blk_queue_cluster(q);
 	seg_size = 0;
 	nr_phys_segs = 0;
 	for_each_bio(bio) {
@@ -87,7 +87,7 @@ EXPORT_SYMBOL(blk_recount_segments);
 static int blk_phys_contig_segment(struct request_queue *q, struct bio *bio,
 				   struct bio *nxt)
 {
-	if (!test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags))
+	if (!blk_queue_cluster(q))
 		return 0;
 
 	if (bio->bi_seg_back_size + nxt->bi_seg_front_size >
@@ -123,7 +123,7 @@ int blk_rq_map_sg(struct request_queue *q, struct request *rq,
 	int nsegs, cluster;
 
 	nsegs = 0;
-	cluster = test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags);
+	cluster = blk_queue_cluster(q);
 
 	/*
 	 * for each bio in rq
diff --git a/block/blk-settings.c b/block/blk-settings.c
index 701859f..e55f5fc 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -126,7 +126,7 @@ void blk_set_default_limits(struct queue_limits *lim)
 	lim->alignment_offset = 0;
 	lim->io_opt = 0;
 	lim->misaligned = 0;
-	lim->no_cluster = 0;
+	lim->cluster = 1;
 }
 EXPORT_SYMBOL(blk_set_default_limits);
 
@@ -464,15 +464,6 @@ EXPORT_SYMBOL(blk_queue_io_opt);
 void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b)
 {
 	blk_stack_limits(&t->limits, &b->limits, 0);
-
-	if (!t->queue_lock)
-		WARN_ON_ONCE(1);
-	else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
-		unsigned long flags;
-		spin_lock_irqsave(t->queue_lock, flags);
-		queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
-		spin_unlock_irqrestore(t->queue_lock, flags);
-	}
 }
 EXPORT_SYMBOL(blk_queue_stack_limits);
 
@@ -545,7 +536,7 @@ int blk_stack_limits(struct queue_limits *t, struct queue_limits *b,
 	t->io_min = max(t->io_min, b->io_min);
 	t->io_opt = lcm(t->io_opt, b->io_opt);
 
-	t->no_cluster |= b->no_cluster;
+	t->cluster &= b->cluster;
 	t->discard_zeroes_data &= b->discard_zeroes_data;
 
 	/* Physical block size a multiple of the logical block size? */
@@ -641,7 +632,6 @@ void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
 		       sector_t offset)
 {
 	struct request_queue *t = disk->queue;
-	struct request_queue *b = bdev_get_queue(bdev);
 
 	if (bdev_stack_limits(&t->limits, bdev, offset >> 9) < 0) {
 		char top[BDEVNAME_SIZE], bottom[BDEVNAME_SIZE];
@@ -652,17 +642,6 @@ void disk_stack_limits(struct gendisk *disk, struct block_device *bdev,
 		printk(KERN_NOTICE "%s: Warning: Device %s is misaligned\n",
 		       top, bottom);
 	}
-
-	if (!t->queue_lock)
-		WARN_ON_ONCE(1);
-	else if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
-		unsigned long flags;
-
-		spin_lock_irqsave(t->queue_lock, flags);
-		if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags))
-			queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
-		spin_unlock_irqrestore(t->queue_lock, flags);
-	}
 }
 EXPORT_SYMBOL(disk_stack_limits);
 
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
index 013457f..41fb691 100644
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -119,7 +119,7 @@ static ssize_t queue_max_integrity_segments_show(struct request_queue *q, char *
 
 static ssize_t queue_max_segment_size_show(struct request_queue *q, char *page)
 {
-	if (test_bit(QUEUE_FLAG_CLUSTER, &q->queue_flags))
+	if (blk_queue_cluster(q))
 		return queue_var_show(queue_max_segment_size(q), (page));
 
 	return queue_var_show(PAGE_CACHE_SIZE, (page));
diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index 90267f8..e2da191 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1131,11 +1131,6 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	 */
 	q->limits = *limits;
 
-	if (limits->no_cluster)
-		queue_flag_clear_unlocked(QUEUE_FLAG_CLUSTER, q);
-	else
-		queue_flag_set_unlocked(QUEUE_FLAG_CLUSTER, q);
-
 	if (!dm_table_supports_discards(t))
 		queue_flag_clear_unlocked(QUEUE_FLAG_DISCARD, q);
 	else
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 84c46a1..52694d2 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -4296,9 +4296,6 @@ static int md_alloc(dev_t dev, char *name)
 		goto abort;
 	mddev->queue->queuedata = mddev;
 
-	/* Can be unlocked because the queue is new: no concurrency */
-	queue_flag_set_unlocked(QUEUE_FLAG_CLUSTER, mddev->queue);
-
 	blk_queue_make_request(mddev->queue, md_make_request);
 
 	disk = alloc_disk(1 << shift);
diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index b55b0ec..3852e51 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1643,9 +1643,8 @@ struct request_queue *__scsi_alloc_queue(struct Scsi_Host *shost,
 
 	blk_queue_max_segment_size(q, dma_get_max_seg_size(dev));
 
-	/* New queue, no concurrency on queue_flags */
 	if (!shost->use_clustering)
-		queue_flag_clear_unlocked(QUEUE_FLAG_CLUSTER, q);
+		q->limits.cluster = 0;
 
 	/*
 	 * set a reasonable default alignment on word boundaries: the
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index aae86fd..95aeeeb 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -250,7 +250,7 @@ struct queue_limits {
 
 	unsigned char		misaligned;
 	unsigned char		discard_misaligned;
-	unsigned char		no_cluster;
+	unsigned char		cluster;
 	signed char		discard_zeroes_data;
 };
 
@@ -380,7 +380,6 @@ struct request_queue
 #endif
 };
 
-#define QUEUE_FLAG_CLUSTER	0	/* cluster several segments into 1 */
 #define QUEUE_FLAG_QUEUED	1	/* uses generic tag queueing */
 #define QUEUE_FLAG_STOPPED	2	/* queue is stopped */
 #define	QUEUE_FLAG_SYNCFULL	3	/* read queue has been filled */
@@ -403,7 +402,6 @@ struct request_queue
 #define QUEUE_FLAG_SECDISCARD  19	/* supports SECDISCARD */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
-				 (1 << QUEUE_FLAG_CLUSTER) |		\
 				 (1 << QUEUE_FLAG_STACKABLE)	|	\
 				 (1 << QUEUE_FLAG_SAME_COMP)	|	\
 				 (1 << QUEUE_FLAG_ADD_RANDOM))
@@ -510,6 +508,11 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q)
 
 #define rq_data_dir(rq)		((rq)->cmd_flags & 1)
 
+static inline unsigned int blk_queue_cluster(struct request_queue *q)
+{
+	return q->limits.cluster;
+}
+
 /*
  * We regard a request as sync, if either a read or a sync write
  */



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  2010-11-26  0:37       ` Martin K. Petersen
@ 2010-11-26  2:28         ` Mike Snitzer
  2010-11-26  2:53           ` Martin K. Petersen
  2010-11-26  9:00           ` Jens Axboe
  2010-12-01 18:43         ` Jens Axboe
  1 sibling, 2 replies; 12+ messages in thread
From: Mike Snitzer @ 2010-11-26  2:28 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Jens Axboe, dm-devel, Promise_Linux, linux-scsi

On Thu, Nov 25 2010 at  7:37pm -0500,
Martin K. Petersen <martin.petersen@oracle.com> wrote:

> New version below, fixing the things you and Matthew pointed out...
> 
> 
> 
> block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
> 
> When stacking devices, a request_queue is not always available. This
> forced us to have a no_cluster flag in the queue_limits that could be
> used as a carrier until the request_queue had been set up for a
> metadevice.
> 
> There were several problems with that approach. First of all it was up
> to the stacking device to remember to set queue flag after stacking had
> completed. 

But that was already done properly, so that wasn't a problem that needed
fixing (it just had potential to be overlooked if/when there is a new
stacking driver).

> Also, the queue flag and the queue limits had to be kept in
> sync at all times. We got that wrong, which could lead to us issuing
> commands that went beyond the max scatterlist limit set by the driver.

It took me a bit to see exactly where we got it wrong.  Looks like
__scsi_alloc_queue was only concerned with the queue flag.  So a
minimalist fix would've been to also set no_cluster = 1 in
__scsi_alloc_queue?

(OK, I just reviewed Ed's initial report and that is exactly what his
proposed patch did! :)

This patch's header could stand to be a bit more explicit about where
the real problem was ...

But I agree that this patch cleans things up nicely.

Acked-by: Mike Snitzer <snitzer@redhat.com>

Thanks,
Mike

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  2010-11-26  2:28         ` Mike Snitzer
@ 2010-11-26  2:53           ` Martin K. Petersen
  2010-11-26  9:00           ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Martin K. Petersen @ 2010-11-26  2:53 UTC (permalink / raw)
  To: Mike Snitzer
  Cc: Martin K. Petersen, Jens Axboe, dm-devel, Promise_Linux, linux-scsi

>>>>> "Mike" == Mike Snitzer <snitzer@redhat.com> writes:

[Stacking drivers having to set the queue flag]

Mike> But that was already done properly, so that wasn't a problem that
Mike> needed fixing (it just had potential to be overlooked if/when
Mike> there is a new stacking driver).

Yeah, just elaborating that the existing approach is flawed for several
reasons.

I don't know what happened to the wrapper function that set both
flags. I must have botched it when the topology code got merged.


Mike> It took me a bit to see exactly where we got it wrong.  Looks like
Mike> __scsi_alloc_queue was only concerned with the queue flag.  So a
Mike> minimalist fix would've been to also set no_cluster = 1 in
Mike> __scsi_alloc_queue?

Yep. But I'd rather fix this up the right way instead of perpetuating
the ugly two-flag monte.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  2010-11-26  2:28         ` Mike Snitzer
  2010-11-26  2:53           ` Martin K. Petersen
@ 2010-11-26  9:00           ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2010-11-26  9:00 UTC (permalink / raw)
  To: Mike Snitzer; +Cc: Martin K. Petersen, dm-devel, Promise_Linux, linux-scsi

On 2010-11-26 03:28, Mike Snitzer wrote:
> This patch's header could stand to be a bit more explicit about where
> the real problem was ...

Indeed, it seems to miss that. I'll add that bit.

> But I agree that this patch cleans things up nicely.
> 
> Acked-by: Mike Snitzer <snitzer@redhat.com>

Thanks, added.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [dm-devel] QUEUE_FLAG_CLUSTER?
  2010-11-25 15:21 ` [dm-devel] QUEUE_FLAG_CLUSTER? Martin K. Petersen
  2010-11-25 16:35   ` block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead Martin K. Petersen
@ 2010-11-29 19:08   ` Ed Lin - PTU
  2010-11-30 18:40     ` Martin K. Petersen
  1 sibling, 1 reply; 12+ messages in thread
From: Ed Lin - PTU @ 2010-11-29 19:08 UTC (permalink / raw)
  To: Martin K. Petersen, device-mapper development; +Cc: linux-scsi, Promise_Linux


>We should never issue a command that does not adhere to the limits set
>by the device driver. Upon inspection I agree this is busted 
>and it's my
>fault.
>
>DM does not have a request_queue when stacking and as a result we ended
>up with two entities tracking whether a device supports clustering or
>not. And as it turns out the queue flag and the queue limits are not
>always in agreement.
>
>I've coded up a proper fix for this and will post a patch shortly. I'm
>just running a bunch of tests to ensure we do the right thing...
>

Hi Martin,

I will do some test with your later patch. But obviously it is a
thorough fix of the problem, eliminating all the confusing dual flag
thing in the original code.

Is it possible to port the patch to previous kernels?

Thanks, much appreciated.

Ed Lin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [dm-devel] QUEUE_FLAG_CLUSTER?
  2010-11-29 19:08   ` [dm-devel] QUEUE_FLAG_CLUSTER? Ed Lin - PTU
@ 2010-11-30 18:40     ` Martin K. Petersen
  2010-12-07 21:48       ` [stable] " Greg KH
  0 siblings, 1 reply; 12+ messages in thread
From: Martin K. Petersen @ 2010-11-30 18:40 UTC (permalink / raw)
  To: Ed Lin - PTU
  Cc: Martin K. Petersen, device-mapper development, linux-scsi,
	Promise_Linux, stable

>>>>> "Ed" == "Ed Lin <- PTU" <ed.lin@promise.com>> writes:

Ed> Is it possible to port the patch to previous kernels?

There's a bit of fuzz due to some of the discard changes and the sysfs
export of the segment size. But the core of the patch is the same.

http://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux-2.6-mkp.git
git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux-2.6-mkp.git

I've created three branches:

for-stable-2.6.32.y
for-stable-2.6.35.y
for-stable-2.6.36.y

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  2010-11-26  0:37       ` Martin K. Petersen
  2010-11-26  2:28         ` Mike Snitzer
@ 2010-12-01 18:43         ` Jens Axboe
  1 sibling, 0 replies; 12+ messages in thread
From: Jens Axboe @ 2010-12-01 18:43 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: device-mapper development, linux-scsi, Promise_Linux

On 2010-11-26 01:37, Martin K. Petersen wrote:
>>>>>> "Jens" == Jens Axboe <jaxboe@fusionio.com> writes:
> 
> Jens> Great, the two different values and needing to sync them was
> Jens> horrible.  What kind of testing did you do? Have to be a little
> Jens> extra careful at this point.
> 
> Yeah, we should probably let it soak a bit in -next just to make sure.
> 
> There really aren't many devices from this millennium that don't support
> clustering. Which I guess is why we haven't seen any problems.
> 
> I ended up disabling clustering in one of the FC drivers to test with a
> real workload. Threw in a BUG_ON(nsegs > queue_max_segments(q)) for good
> measure.
> 
> I also tested mixing and matching clustered and non-clustered bottom
> devices while stacking with DM.
> 
> New version below, fixing the things you and Matthew pointed out...

Thanks applied. Small plea for the future - please rewrite the subject
line to indicate a v2 (or whatever) of the patch. That will make them
harder to miss.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [stable] [dm-devel] QUEUE_FLAG_CLUSTER?
  2010-11-30 18:40     ` Martin K. Petersen
@ 2010-12-07 21:48       ` Greg KH
  0 siblings, 0 replies; 12+ messages in thread
From: Greg KH @ 2010-12-07 21:48 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Ed Lin - PTU, device-mapper development, Promise_Linux,
	linux-scsi, stable

On Tue, Nov 30, 2010 at 01:40:56PM -0500, Martin K. Petersen wrote:
> >>>>> "Ed" == "Ed Lin <- PTU" <ed.lin@promise.com>> writes:
> 
> Ed> Is it possible to port the patch to previous kernels?
> 
> There's a bit of fuzz due to some of the discard changes and the sysfs
> export of the segment size. But the core of the patch is the same.
> 
> http://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux-2.6-mkp.git
> git://git.kernel.org/pub/scm/linux/kernel/git/mkp/linux-2.6-mkp.git
> 
> I've created three branches:
> 
> for-stable-2.6.32.y
> for-stable-2.6.35.y
> for-stable-2.6.36.y

Care to just send me the patches to stable@kernel.org so that I can
apply them?  git trees don't do me much good when I am handling the
stable patches in quilt :(

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-12-07 22:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-20  1:24 QUEUE_FLAG_CLUSTER? Ed Lin - PTU
2010-11-25 15:21 ` [dm-devel] QUEUE_FLAG_CLUSTER? Martin K. Petersen
2010-11-25 16:35   ` block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead Martin K. Petersen
2010-11-25 18:43     ` Jens Axboe
2010-11-26  0:37       ` Martin K. Petersen
2010-11-26  2:28         ` Mike Snitzer
2010-11-26  2:53           ` Martin K. Petersen
2010-11-26  9:00           ` Jens Axboe
2010-12-01 18:43         ` Jens Axboe
2010-11-29 19:08   ` [dm-devel] QUEUE_FLAG_CLUSTER? Ed Lin - PTU
2010-11-30 18:40     ` Martin K. Petersen
2010-12-07 21:48       ` [stable] " Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.