All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0 of 8] I/O topology patch kit
@ 2009-04-23  5:29 Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 1 of 8] block: Expose stacked device queues in sysfs Martin K. Petersen
                   ` (10 more replies)
  0 siblings, 11 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide


Here is another take of my I/O topology patches.  There was quite a bit
of discussion at the FS & Storage workshop about the need to describe
regions.  Sanity prevailed, and the region code and associated
complexity is now gone.

There are changes in various subsystems, all depending on the block
layer bits.  To avoid breaking bisection I ended up creating a new block
layer stacking function.  Once all patches are in place we can deprecate
the old one.

Comments and suggestions welcome.

-- 
Martin K. Petersen	Oracle Linux Engineering



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1 of 8] block: Expose stacked device queues in sysfs
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23  5:29 ` Martin K. Petersen
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

Currently stacking devices do not have a queue directory in sysfs.
However, many of the I/O characteristics like sector size, maximum
request size, etc. are queue properties.

This patch enables the queue directory for MD/DM devices.  The elevator
code has been modified to deal with queues that do not have an I/O
scheduler.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
2 files changed, 15 insertions(+), 4 deletions(-)
block/blk-sysfs.c |    6 +++---
block/elevator.c  |   13 ++++++++++++-



diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -393,9 +393,6 @@ int blk_register_queue(struct gendisk *d
 	if (WARN_ON(!q))
 		return -ENXIO;
 
-	if (!q->request_fn)
-		return 0;
-
 	ret = kobject_add(&q->kobj, kobject_get(&disk_to_dev(disk)->kobj),
 			  "%s", "queue");
 	if (ret < 0)
@@ -403,6 +400,9 @@ int blk_register_queue(struct gendisk *d
 
 	kobject_uevent(&q->kobj, KOBJ_ADD);
 
+	if (!q->request_fn)
+		return 0;
+
 	ret = elv_register_queue(q);
 	if (ret) {
 		kobject_uevent(&q->kobj, KOBJ_REMOVE);
diff --git a/block/elevator.c b/block/elevator.c
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -592,6 +592,9 @@ void elv_drain_elevator(struct request_q
  */
 void elv_quiesce_start(struct request_queue *q)
 {
+	if (!q->elevator)
+		return;
+
 	queue_flag_set(QUEUE_FLAG_ELVSWITCH, q);
 
 	/*
@@ -1179,6 +1182,9 @@ ssize_t elv_iosched_store(struct request
 	char elevator_name[ELV_NAME_MAX];
 	struct elevator_type *e;
 
+	if (!q->elevator)
+		return count;
+
 	strlcpy(elevator_name, name, sizeof(elevator_name));
 	strstrip(elevator_name);
 
@@ -1202,10 +1208,15 @@ ssize_t elv_iosched_store(struct request
 ssize_t elv_iosched_show(struct request_queue *q, char *name)
 {
 	struct elevator_queue *e = q->elevator;
-	struct elevator_type *elv = e->elevator_type;
+	struct elevator_type *elv;
 	struct elevator_type *__e;
 	int len = 0;
 
+	if (!q->elevator)
+		return sprintf(name, "none\n");
+
+	elv = e->elevator_type;
+
 	spin_lock(&elv_list_lock);
 	list_for_each_entry(__e, &elv_list, list) {
 		if (!strcmp(elv->elevator_name, __e->elevator_name))



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 1 of 8] block: Expose stacked device queues in sysfs
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 1 of 8] block: Expose stacked device queues in sysfs Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 2 of 8] block: Export I/O topology for block devices and partitions Martin K. Petersen
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

Currently stacking devices do not have a queue directory in sysfs.
However, many of the I/O characteristics like sector size, maximum
request size, etc. are queue properties.

This patch enables the queue directory for MD/DM devices.  The elevator
code has been modified to deal with queues that do not have an I/O
scheduler.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
2 files changed, 15 insertions(+), 4 deletions(-)
block/blk-sysfs.c |    6 +++---
block/elevator.c  |   13 ++++++++++++-



diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -393,9 +393,6 @@ int blk_register_queue(struct gendisk *d
 	if (WARN_ON(!q))
 		return -ENXIO;
 
-	if (!q->request_fn)
-		return 0;
-
 	ret = kobject_add(&q->kobj, kobject_get(&disk_to_dev(disk)->kobj),
 			  "%s", "queue");
 	if (ret < 0)
@@ -403,6 +400,9 @@ int blk_register_queue(struct gendisk *d
 
 	kobject_uevent(&q->kobj, KOBJ_ADD);
 
+	if (!q->request_fn)
+		return 0;
+
 	ret = elv_register_queue(q);
 	if (ret) {
 		kobject_uevent(&q->kobj, KOBJ_REMOVE);
diff --git a/block/elevator.c b/block/elevator.c
--- a/block/elevator.c
+++ b/block/elevator.c
@@ -592,6 +592,9 @@ void elv_drain_elevator(struct request_q
  */
 void elv_quiesce_start(struct request_queue *q)
 {
+	if (!q->elevator)
+		return;
+
 	queue_flag_set(QUEUE_FLAG_ELVSWITCH, q);
 
 	/*
@@ -1179,6 +1182,9 @@ ssize_t elv_iosched_store(struct request
 	char elevator_name[ELV_NAME_MAX];
 	struct elevator_type *e;
 
+	if (!q->elevator)
+		return count;
+
 	strlcpy(elevator_name, name, sizeof(elevator_name));
 	strstrip(elevator_name);
 
@@ -1202,10 +1208,15 @@ ssize_t elv_iosched_store(struct request
 ssize_t elv_iosched_show(struct request_queue *q, char *name)
 {
 	struct elevator_queue *e = q->elevator;
-	struct elevator_type *elv = e->elevator_type;
+	struct elevator_type *elv;
 	struct elevator_type *__e;
 	int len = 0;
 
+	if (!q->elevator)
+		return sprintf(name, "none\n");
+
+	elv = e->elevator_type;
+
 	spin_lock(&elv_list_lock);
 	list_for_each_entry(__e, &elv_list, list) {
 		if (!strcmp(elv->elevator_name, __e->elevator_name))



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 2 of 8] block: Export I/O topology for block devices and partitions
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 1 of 8] block: Expose stacked device queues in sysfs Martin K. Petersen
  2009-04-23  5:29 ` Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23 10:51   ` Jens Axboe
  2009-04-23  5:29 ` [PATCH 3 of 8] MD: Use new topology calls to indicate alignment and I/O sizes Martin K. Petersen
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

To support devices with physical block sizes bigger than 512 bytes we
need to ensure proper alignment.  This patch adds support for exposing
I/O topology characteristics as devices are stacked.

  hardsect_size remains unchanged.  It is the smallest atomic unit the
  device can address (i.e. logical block size).

  granularity indicates the smallest I/O the device can access without
  incurring a read-modify-write penalty.  The granularity is set by
  low-level drivers from then on it is purely internal to the stacking
  logic.

  The min_io parameter is the smallest preferred I/O size reported by
  the device.  In many cases this is the same as granularity.  However,
  the min_io parameter can be scaled up when stacking (RAID5 chunk
  size > physical sector size).  min_io is available in sysfs.

  The opt_io characteristic indicates the preferred I/O size reported by
  the device.  This is usually the stripe width for arrays.  The value
  is in sysfs.

  The alignment parameter indicates the number of bytes the start of the
  device/partition is offset from the device granularity.  Partition
  tools and MD/DM tools can use this to align filesystems to the proper
  boundaries.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
6 files changed, 204 insertions(+), 4 deletions(-)
block/blk-settings.c   |  135 ++++++++++++++++++++++++++++++++++++++++++++++--
block/blk-sysfs.c      |   22 +++++++
block/genhd.c          |   10 +++
fs/partitions/check.c  |   10 +++
include/linux/blkdev.h |   30 ++++++++++
include/linux/genhd.h  |    1 



diff --git a/block/blk-settings.c b/block/blk-settings.c
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -292,22 +292,87 @@ EXPORT_SYMBOL(blk_queue_max_segment_size
  *
  * Description:
  *   This should typically be set to the lowest possible sector size
- *   that the hardware can operate on (possible without reverting to
- *   even internal read-modify-write operations). Usually the default
- *   of 512 covers most hardware.
+ *   (logical block size) that the hardware can operate on.  Usually the
+ *   default of 512 covers most hardware.
  **/
 void blk_queue_hardsect_size(struct request_queue *q, unsigned short size)
 {
-	q->hardsect_size = size;
+	q->hardsect_size = q->granularity = size;
 }
 EXPORT_SYMBOL(blk_queue_hardsect_size);
 
+/**
+ * blk_queue_granularity - set I/O granularity for the queue
+ * @q:  the request queue for the device
+ * @size:  the I/O granularity, in bytes
+ *
+ * Description:
+ *   This should typically be set to the lowest possible sector size
+ *   that the hardware can operate on without reverting to
+ *   read-modify-write operations.
+ **/
+void blk_queue_granularity(struct request_queue *q, unsigned short size)
+{
+	q->granularity = size;
+}
+EXPORT_SYMBOL(blk_queue_granularity);
+
+/**
+ * blk_queue_alignment - set alignment for the queue
+ * @q:  the request queue for the device
+ * @alignment:  alignment offset in bytes
+ *
+ * Description:
+ *   Some devices are naturally misaligned to compensate for things like
+ *   the legacy DOS partition table 63-sector offset.  Low-level drivers
+ *   should call this function for devices whose first sector is not
+ *   naturally aligned.
+ */
+void blk_queue_alignment(struct request_queue *q, unsigned int alignment)
+{
+	q->alignment = alignment & (q->granularity - 1);
+	clear_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags);
+}
+EXPORT_SYMBOL(blk_queue_alignment);
+
 /*
  * Returns the minimum that is _not_ zero, unless both are zero.
  */
 #define min_not_zero(l, r) (l == 0) ? r : ((r == 0) ? l : min(l, r))
 
 /**
+ * blk_queue_min_io - set minimum request size for the queue
+ * @q:  the request queue for the device
+ * @min_io:  smallest I/O size in bytes
+ *
+ * Description:
+ *   Some devices have an internal block size bigger than the reported
+ *   hardware sector size.  This function can be used to signal the
+ *   smallest I/O the device can perform without incurring a performance
+ *   penalty.
+ */
+void blk_queue_min_io(struct request_queue *q, unsigned int min_io)
+{
+	q->min_io = min_io;
+}
+EXPORT_SYMBOL(blk_queue_min_io);
+
+/**
+ * blk_queue_opt_io - set optimal request size for the queue
+ * @q:  the request queue for the device
+ * @opt_io:  optimal request size in bytes
+ *
+ * Description:
+ *   Drivers can call this function to set the preferred I/O request
+ *   size for devices that report such a value.
+ */
+void blk_queue_opt_io(struct request_queue *q, unsigned int opt_io)
+{
+	q->opt_io = opt_io;
+}
+EXPORT_SYMBOL(blk_queue_opt_io);
+
+/**
  * blk_queue_stack_limits - inherit underlying queue limits for stacked drivers
  * @t:	the stacking driver (top)
  * @b:  the underlying device (bottom)
@@ -335,6 +400,68 @@ void blk_queue_stack_limits(struct reque
 EXPORT_SYMBOL(blk_queue_stack_limits);
 
 /**
+ * blk_queue_stack_topology - adjust queue limits for stacked drivers
+ * @t:	the stacking driver (top)
+ * @bdev:  the underlying block device (bottom)
+ * @offset:  offset to beginning of data within component device
+ **/
+void blk_queue_stack_topology(struct request_queue *t, struct block_device *bdev,
+			      sector_t offset)
+{
+	struct request_queue *b = bdev_get_queue(bdev);
+	int misaligned;
+
+	/* zero is "infinity" */
+	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
+	t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
+	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, b->seg_boundary_mask);
+
+	t->max_phys_segments = min_not_zero(t->max_phys_segments, b->max_phys_segments);
+	t->max_hw_segments = min_not_zero(t->max_hw_segments, b->max_hw_segments);
+	t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size);
+	t->hardsect_size = max(t->hardsect_size, b->hardsect_size);
+	t->min_io = max(t->min_io, b->min_io);
+	t->granularity = max(t->granularity, b->granularity);
+
+	misaligned = 0;
+	offset += get_start_sect(bdev) << 9;
+
+	/* Bottom device offset aligned? */
+	if (offset && (offset & (b->granularity - 1)) != b->alignment) {
+		misaligned = 1;
+		goto out;
+	}
+
+	/* If top has no alignment, inherit from bottom */
+	if (!t->alignment)
+		t->alignment = b->alignment & (b->granularity - 1);
+
+	/* Top alignment on logical block boundary? */
+	if (t->alignment & (t->hardsect_size - 1)) {
+		misaligned = 1;
+		goto out;
+	}
+
+out:
+	if (!t->queue_lock)
+		WARN_ON_ONCE(1);
+	else if (misaligned || !test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
+		unsigned long flags;
+
+		spin_lock_irqsave(t->queue_lock, flags);
+
+		if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags))
+			queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
+
+		if (misaligned)
+			queue_flag_set(QUEUE_FLAG_MISALIGNED, t);
+
+		spin_unlock_irqrestore(t->queue_lock, flags);
+	}
+}
+EXPORT_SYMBOL(blk_queue_stack_topology);
+
+/**
  * blk_queue_dma_pad - set pad mask
  * @q:     the request queue for the device
  * @mask:  pad mask
diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
--- a/block/blk-sysfs.c
+++ b/block/blk-sysfs.c
@@ -105,6 +105,16 @@ static ssize_t queue_hw_sector_size_show
 	return queue_var_show(q->hardsect_size, page);
 }
 
+static ssize_t queue_min_io_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->min_io, page);
+}
+
+static ssize_t queue_opt_io_show(struct request_queue *q, char *page)
+{
+	return queue_var_show(q->opt_io, page);
+}
+
 static ssize_t
 queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
 {
@@ -256,6 +266,16 @@ static struct queue_sysfs_entry queue_hw
 	.show = queue_hw_sector_size_show,
 };
 
+static struct queue_sysfs_entry queue_min_io_entry = {
+	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
+	.show = queue_min_io_show,
+};
+
+static struct queue_sysfs_entry queue_opt_io_entry = {
+	.attr = {.name = "optimal_io_size", .mode = S_IRUGO },
+	.show = queue_opt_io_show,
+};
+
 static struct queue_sysfs_entry queue_nonrot_entry = {
 	.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
 	.show = queue_nonrot_show,
@@ -287,6 +307,8 @@ static struct attribute *default_attrs[]
 	&queue_max_sectors_entry.attr,
 	&queue_iosched_entry.attr,
 	&queue_hw_sector_size_entry.attr,
+	&queue_min_io_entry.attr,
+	&queue_opt_io_entry.attr,
 	&queue_nonrot_entry.attr,
 	&queue_nomerges_entry.attr,
 	&queue_rq_affinity_entry.attr,
diff --git a/block/genhd.c b/block/genhd.c
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -848,11 +848,20 @@ static ssize_t disk_capability_show(stru
 	return sprintf(buf, "%x\n", disk->flags);
 }
 
+static ssize_t disk_alignment_show(struct device *dev,
+				   struct device_attribute *attr, char *buf)
+{
+	struct gendisk *disk = dev_to_disk(dev);
+
+	return sprintf(buf, "%d\n", queue_alignment(disk->queue));
+}
+
 static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL);
 static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL);
 static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL);
 static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL);
 static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
+static DEVICE_ATTR(alignment, S_IRUGO, disk_alignment_show, NULL);
 static DEVICE_ATTR(capability, S_IRUGO, disk_capability_show, NULL);
 static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
 #ifdef CONFIG_FAIL_MAKE_REQUEST
@@ -871,6 +880,7 @@ static struct attribute *disk_attrs[] = 
 	&dev_attr_removable.attr,
 	&dev_attr_ro.attr,
 	&dev_attr_size.attr,
+	&dev_attr_alignment.attr,
 	&dev_attr_capability.attr,
 	&dev_attr_stat.attr,
 #ifdef CONFIG_FAIL_MAKE_REQUEST
diff --git a/fs/partitions/check.c b/fs/partitions/check.c
--- a/fs/partitions/check.c
+++ b/fs/partitions/check.c
@@ -219,6 +219,13 @@ ssize_t part_size_show(struct device *de
 	return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects);
 }
 
+ssize_t part_alignment_show(struct device *dev,
+			    struct device_attribute *attr, char *buf)
+{
+	struct hd_struct *p = dev_to_part(dev);
+	return sprintf(buf, "%llu\n", (unsigned long long)p->alignment);
+}
+
 ssize_t part_stat_show(struct device *dev,
 		       struct device_attribute *attr, char *buf)
 {
@@ -272,6 +279,7 @@ ssize_t part_fail_store(struct device *d
 static DEVICE_ATTR(partition, S_IRUGO, part_partition_show, NULL);
 static DEVICE_ATTR(start, S_IRUGO, part_start_show, NULL);
 static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
+static DEVICE_ATTR(alignment, S_IRUGO, part_alignment_show, NULL);
 static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 static struct device_attribute dev_attr_fail =
@@ -282,6 +290,7 @@ static struct attribute *part_attrs[] = 
 	&dev_attr_partition.attr,
 	&dev_attr_start.attr,
 	&dev_attr_size.attr,
+	&dev_attr_alignment.attr,
 	&dev_attr_stat.attr,
 #ifdef CONFIG_FAIL_MAKE_REQUEST
 	&dev_attr_fail.attr,
@@ -383,6 +392,7 @@ struct hd_struct *add_partition(struct g
 	pdev = part_to_dev(p);
 
 	p->start_sect = start;
+	p->alignment = queue_sector_alignment(disk->queue, start);
 	p->nr_sects = len;
 	p->partno = partno;
 	p->policy = get_disk_ro(disk);
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -402,6 +402,10 @@ struct request_queue
 	unsigned short		max_hw_segments;
 	unsigned short		hardsect_size;
 	unsigned int		max_segment_size;
+	unsigned int		alignment;
+	unsigned int		granularity;
+	unsigned int		min_io;
+	unsigned int		opt_io;
 
 	unsigned long		seg_boundary_mask;
 	void			*dma_drain_buffer;
@@ -461,6 +465,7 @@ struct request_queue
 #define QUEUE_FLAG_NONROT      14	/* non-rotational device (SSD) */
 #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
 #define QUEUE_FLAG_IO_STAT     15	/* do IO stats */
+#define QUEUE_FLAG_MISALIGNED  16	/* bdev not aligned to disk */
 
 #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
 				 (1 << QUEUE_FLAG_CLUSTER) |		\
@@ -877,7 +882,15 @@ extern void blk_queue_max_phys_segments(
 extern void blk_queue_max_hw_segments(struct request_queue *, unsigned short);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
 extern void blk_queue_hardsect_size(struct request_queue *, unsigned short);
+extern void blk_queue_granularity(struct request_queue *, unsigned short);
+extern void blk_queue_alignment(struct request_queue *q,
+				unsigned int alignment);
+extern void blk_queue_min_io(struct request_queue *q, unsigned int min_io);
+extern void blk_queue_opt_io(struct request_queue *q, unsigned int opt_io);
 extern void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b);
+extern void blk_queue_stack_topology(struct request_queue *t,
+				     struct block_device *bdev,
+				     sector_t offset);
 extern void blk_queue_dma_pad(struct request_queue *, unsigned int);
 extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
 extern int blk_queue_dma_drain(struct request_queue *q,
@@ -978,6 +991,23 @@ static inline int bdev_hardsect_size(str
 	return queue_hardsect_size(bdev_get_queue(bdev));
 }
 
+static inline int queue_alignment(struct request_queue *q)
+{
+	if (q && test_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags))
+		return -1;
+
+	if (q && q->alignment)
+		return q->alignment;
+
+	return 0;
+}
+
+static inline int queue_sector_alignment(struct request_queue *q,
+					 sector_t sector)
+{
+	return ((sector << 9) - q->alignment) & (q->min_io - 1);
+}
+
 static inline int queue_dma_alignment(struct request_queue *q)
 {
 	return q ? q->dma_alignment : 511;
diff --git a/include/linux/genhd.h b/include/linux/genhd.h
--- a/include/linux/genhd.h
+++ b/include/linux/genhd.h
@@ -90,6 +90,7 @@ struct disk_stats {
 struct hd_struct {
 	sector_t start_sect;
 	sector_t nr_sects;
+	sector_t alignment;
 	struct device __dev;
 	struct kobject *holder_dir;
 	int policy, partno;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 3 of 8] MD: Use new topology calls to indicate alignment and I/O sizes
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (2 preceding siblings ...)
  2009-04-23  5:29 ` [PATCH 2 of 8] block: Export I/O topology for block devices and partitions Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 4 of 8] sd: Physical block size and alignment support Martin K. Petersen
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

Switch MD over to the new blk_queue_stack_topology() function which
checks for aligment and adjusts preferred I/O sizes when stacking.

Also warn if an MD device contains misaligned components.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
6 files changed, 73 insertions(+), 15 deletions(-)
drivers/md/linear.c    |   10 ++++++++--
drivers/md/multipath.c |    5 ++---
drivers/md/raid0.c     |   14 ++++++++++++--
drivers/md/raid1.c     |   19 +++++++++++++++----
drivers/md/raid10.c    |   23 +++++++++++++++++++----
drivers/md/raid5.c     |   17 +++++++++++++++++



diff --git a/drivers/md/linear.c b/drivers/md/linear.c
--- a/drivers/md/linear.c
+++ b/drivers/md/linear.c
@@ -139,8 +139,8 @@ static linear_conf_t *linear_conf(mddev_
 
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -154,6 +154,12 @@ static linear_conf_t *linear_conf(mddev_
 
 		cnt++;
 	}
+
+	if (queue_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	if (cnt != raid_disks) {
 		printk("linear: not enough drives present. Aborting!\n");
 		goto out;
diff --git a/drivers/md/multipath.c b/drivers/md/multipath.c
--- a/drivers/md/multipath.c
+++ b/drivers/md/multipath.c
@@ -294,7 +294,7 @@ static int multipath_add_disk(mddev_t *m
 	for (path = first; path <= last; path++)
 		if ((p=conf->multipaths+path)->rdev == NULL) {
 			q = rdev->bdev->bd_disk->queue;
-			blk_queue_stack_limits(mddev->queue, q);
+			blk_queue_stack_topology(mddev->queue, rdev->bdev, 0);
 
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
@@ -461,8 +461,7 @@ static int multipath_run (mddev_t *mddev
 		disk = conf->multipaths + disk_idx;
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev, 0);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, not that we ever expect a device with
 		 * a merge_bvec_fn to be involved in multipath */
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -120,6 +120,10 @@ static int create_strip_zones (mddev_t *
 	zone = &conf->strip_zone[0];
 	cnt = 0;
 	smallest = NULL;
+
+	blk_queue_min_io(mddev->queue, mddev->chunk_size);
+	blk_queue_opt_io(mddev->queue, mddev->chunk_size * mddev->raid_disks);
+
 	zone->dev = conf->devlist;
 	list_for_each_entry(rdev1, &mddev->disks, same_set) {
 		int j = rdev1->raid_disk;
@@ -136,8 +140,8 @@ static int create_strip_zones (mddev_t *
 		}
 		zone->dev[j] = rdev1;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev1->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev1->bdev,
+					 rdev1->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -151,6 +155,12 @@ static int create_strip_zones (mddev_t *
 			smallest = rdev1;
 		cnt++;
 	}
+
+	if (queue_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	if (cnt != mddev->raid_disks) {
 		printk(KERN_ERR "raid0: too few disks (%d of %d) - "
 			"aborting!\n", cnt, mddev->raid_disks);
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1123,8 +1123,8 @@ static int raid1_add_disk(mddev_t *mddev
 	for (mirror = first; mirror <= last; mirror++)
 		if ( !(p=conf->mirrors+mirror)->rdev) {
 
-			blk_queue_stack_limits(mddev->queue,
-					       rdev->bdev->bd_disk->queue);
+			blk_queue_stack_topology(mddev->queue, rdev->bdev,
+						 rdev->data_offset);
 			/* as we don't honour merge_bvec_fn, we must never risk
 			 * violating it, so limit ->max_sector to one PAGE, as
 			 * a one page request is never in violation.
@@ -1145,6 +1145,11 @@ static int raid1_add_disk(mddev_t *mddev
 			break;
 		}
 
+	if (queue_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	print_conf(conf);
 	return err;
 }
@@ -1989,8 +1994,8 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -2001,6 +2006,12 @@ static int run(mddev_t *mddev)
 
 		disk->head_position = 0;
 	}
+
+	if (queue_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	conf->raid_disks = mddev->raid_disks;
 	conf->mddev = mddev;
 	INIT_LIST_HEAD(&conf->retry_list);
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1151,8 +1151,8 @@ static int raid10_add_disk(mddev_t *mdde
 	for ( ; mirror <= last ; mirror++)
 		if ( !(p=conf->mirrors+mirror)->rdev) {
 
-			blk_queue_stack_limits(mddev->queue,
-					       rdev->bdev->bd_disk->queue);
+			blk_queue_stack_topology(mddev->queue, rdev->bdev,
+						 rdev->data_offset);
 			/* as we don't honour merge_bvec_fn, we must never risk
 			 * violating it, so limit ->max_sector to one PAGE, as
 			 * a one page request is never in violation.
@@ -1170,6 +1170,11 @@ static int raid10_add_disk(mddev_t *mdde
 			break;
 		}
 
+	if (queue_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	print_conf(conf);
 	return err;
 }
@@ -2129,6 +2134,10 @@ static int run(mddev_t *mddev)
 	spin_lock_init(&conf->device_lock);
 	mddev->queue->queue_lock = &conf->device_lock;
 
+	blk_queue_min_io(mddev->queue, mddev->chunk_size);
+	blk_queue_opt_io(mddev->queue,
+			 (mddev->chunk_size * mddev->raid_disks) >> 1);
+
 	list_for_each_entry(rdev, &mddev->disks, same_set) {
 		disk_idx = rdev->raid_disk;
 		if (disk_idx >= mddev->raid_disks
@@ -2138,8 +2147,8 @@ static int run(mddev_t *mddev)
 
 		disk->rdev = rdev;
 
-		blk_queue_stack_limits(mddev->queue,
-				       rdev->bdev->bd_disk->queue);
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
 		/* as we don't honour merge_bvec_fn, we must never risk
 		 * violating it, so limit ->max_sector to one PAGE, as
 		 * a one page request is never in violation.
@@ -2150,6 +2159,12 @@ static int run(mddev_t *mddev)
 
 		disk->head_position = 0;
 	}
+
+	if (queue_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
+
 	INIT_LIST_HEAD(&conf->retry_list);
 
 	spin_lock_init(&conf->resync_lock);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4601,6 +4601,23 @@ static int run(mddev_t *mddev)
 	md_set_array_sectors(mddev, raid5_size(mddev, 0, 0));
 
 	blk_queue_merge_bvec(mddev->queue, raid5_mergeable_bvec);
+	blk_queue_min_io(mddev->queue, mddev->chunk_size);
+
+	if (mddev->level == 5)
+		blk_queue_opt_io(mddev->queue,
+				 mddev->chunk_size * (mddev->raid_disks - 1));
+	else
+		blk_queue_opt_io(mddev->queue,
+				 mddev->chunk_size * (mddev->raid_disks - 2));
+
+	list_for_each_entry(rdev, &mddev->disks, same_set)
+		blk_queue_stack_topology(mddev->queue, rdev->bdev,
+					 rdev->data_offset);
+
+	if (queue_alignment(mddev->queue) < 0)
+		printk(KERN_NOTICE
+		       "Warning: %s has one or more misaligned components\n",
+		       mdname(mddev));
 
 	return 0;
 abort:



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 4 of 8] sd: Physical block size and alignment support
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (3 preceding siblings ...)
  2009-04-23  5:29 ` [PATCH 3 of 8] MD: Use new topology calls to indicate alignment and I/O sizes Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23 16:37   ` Konrad Rzeszutek
  2009-04-23  5:29 ` Martin K. Petersen
                   ` (5 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

Extract physical block size and lowest aligned LBA from READ
CAPACITY(16) response and adjust queue parameters.

Report physical block size and alignment when applicable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
2 files changed, 22 insertions(+), 2 deletions(-)
drivers/scsi/sd.c |   23 +++++++++++++++++++++--
drivers/scsi/sd.h |    1 +



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1306,6 +1306,7 @@ static int read_capacity_16(struct scsi_
 	int sense_valid = 0;
 	int the_result;
 	int retries = 3;
+	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
 
@@ -1346,6 +1347,16 @@ static int read_capacity_16(struct scsi_
 
 	sector_size =	(buffer[8] << 24) | (buffer[9] << 16) |
 			(buffer[10] << 8) | buffer[11];
+
+	/* Logical blocks per physical block exponent */
+	sdkp->hw_sector_size = (1 << (buffer[13] & 0xf)) * sector_size;
+
+	/* Lowest aligned logical block */
+	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
+	blk_queue_alignment(sdp->request_queue, alignment);
+	if (alignment && sdkp->first_scan)
+		sd_printk(KERN_NOTICE, sdkp, "%u-byte alignment\n", alignment);
+
 	lba =  (((u64)buffer[0] << 56) | ((u64)buffer[1] << 48) |
 		((u64)buffer[2] << 40) | ((u64)buffer[3] << 32) |
 		((u64)buffer[4] << 24) | ((u64)buffer[5] << 16) |
@@ -1402,6 +1413,7 @@ static int read_capacity_10(struct scsi_
 
 	sector_size =	(buffer[4] << 24) | (buffer[5] << 16) |
 			(buffer[6] << 8) | buffer[7];
+	sdkp->hw_sector_size = sector_size;
 	lba =	(buffer[0] << 24) | (buffer[1] << 16) |
 		(buffer[2] << 8) | buffer[3];
 
@@ -1526,11 +1538,17 @@ got_data:
 		string_get_size(sz, STRING_UNITS_10, cap_str_10,
 				sizeof(cap_str_10));
 
-		if (sdkp->first_scan || old_capacity != sdkp->capacity)
+		if (sdkp->first_scan || old_capacity != sdkp->capacity) {
 			sd_printk(KERN_NOTICE, sdkp,
-				  "%llu %d-byte hardware sectors: (%s/%s)\n",
+				  "%llu %d-byte logical blocks: (%s/%s)\n",
 				  (unsigned long long)sdkp->capacity,
 				  sector_size, cap_str_10, cap_str_2);
+
+			if (sdkp->hw_sector_size != sector_size)
+				sd_printk(KERN_NOTICE, sdkp,
+					  "%u-byte physical blocks\n",
+					  sdkp->hw_sector_size);
+		}
 	}
 
 	/* Rescale capacity to 512-byte units */
@@ -1543,6 +1561,7 @@ got_data:
 	else if (sector_size == 256)
 		sdkp->capacity >>= 1;
 
+	blk_queue_granularity(sdp->request_queue, sdkp->hw_sector_size);
 	sdkp->device->sector_size = sector_size;
 }
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -45,6 +45,7 @@ struct scsi_disk {
 	unsigned int	openers;	/* protected by BKL for now, yuck */
 	sector_t	capacity;	/* size in 512-byte sectors */
 	u32		index;
+	unsigned short	hw_sector_size;
 	u8		media_present;
 	u8		write_prot;
 	u8		protection_type;/* Data Integrity Field */



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 4 of 8] sd: Physical block size and alignment support
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (4 preceding siblings ...)
  2009-04-23  5:29 ` [PATCH 4 of 8] sd: Physical block size and alignment support Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 5 of 8] sd: Detect non-rotational devices Martin K. Petersen
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

Extract physical block size and lowest aligned LBA from READ
CAPACITY(16) response and adjust queue parameters.

Report physical block size and alignment when applicable.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
2 files changed, 22 insertions(+), 2 deletions(-)
drivers/scsi/sd.c |   23 +++++++++++++++++++++--
drivers/scsi/sd.h |    1 +



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1306,6 +1306,7 @@ static int read_capacity_16(struct scsi_
 	int sense_valid = 0;
 	int the_result;
 	int retries = 3;
+	unsigned int alignment;
 	unsigned long long lba;
 	unsigned sector_size;
 
@@ -1346,6 +1347,16 @@ static int read_capacity_16(struct scsi_
 
 	sector_size =	(buffer[8] << 24) | (buffer[9] << 16) |
 			(buffer[10] << 8) | buffer[11];
+
+	/* Logical blocks per physical block exponent */
+	sdkp->hw_sector_size = (1 << (buffer[13] & 0xf)) * sector_size;
+
+	/* Lowest aligned logical block */
+	alignment = ((buffer[14] & 0x3f) << 8 | buffer[15]) * sector_size;
+	blk_queue_alignment(sdp->request_queue, alignment);
+	if (alignment && sdkp->first_scan)
+		sd_printk(KERN_NOTICE, sdkp, "%u-byte alignment\n", alignment);
+
 	lba =  (((u64)buffer[0] << 56) | ((u64)buffer[1] << 48) |
 		((u64)buffer[2] << 40) | ((u64)buffer[3] << 32) |
 		((u64)buffer[4] << 24) | ((u64)buffer[5] << 16) |
@@ -1402,6 +1413,7 @@ static int read_capacity_10(struct scsi_
 
 	sector_size =	(buffer[4] << 24) | (buffer[5] << 16) |
 			(buffer[6] << 8) | buffer[7];
+	sdkp->hw_sector_size = sector_size;
 	lba =	(buffer[0] << 24) | (buffer[1] << 16) |
 		(buffer[2] << 8) | buffer[3];
 
@@ -1526,11 +1538,17 @@ got_data:
 		string_get_size(sz, STRING_UNITS_10, cap_str_10,
 				sizeof(cap_str_10));
 
-		if (sdkp->first_scan || old_capacity != sdkp->capacity)
+		if (sdkp->first_scan || old_capacity != sdkp->capacity) {
 			sd_printk(KERN_NOTICE, sdkp,
-				  "%llu %d-byte hardware sectors: (%s/%s)\n",
+				  "%llu %d-byte logical blocks: (%s/%s)\n",
 				  (unsigned long long)sdkp->capacity,
 				  sector_size, cap_str_10, cap_str_2);
+
+			if (sdkp->hw_sector_size != sector_size)
+				sd_printk(KERN_NOTICE, sdkp,
+					  "%u-byte physical blocks\n",
+					  sdkp->hw_sector_size);
+		}
 	}
 
 	/* Rescale capacity to 512-byte units */
@@ -1543,6 +1561,7 @@ got_data:
 	else if (sector_size == 256)
 		sdkp->capacity >>= 1;
 
+	blk_queue_granularity(sdp->request_queue, sdkp->hw_sector_size);
 	sdkp->device->sector_size = sector_size;
 }
 
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -45,6 +45,7 @@ struct scsi_disk {
 	unsigned int	openers;	/* protected by BKL for now, yuck */
 	sector_t	capacity;	/* size in 512-byte sectors */
 	u32		index;
+	unsigned short	hw_sector_size;
 	u8		media_present;
 	u8		write_prot;
 	u8		protection_type;/* Data Integrity Field */



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (5 preceding siblings ...)
  2009-04-23  5:29 ` Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23 10:52   ` Jens Axboe
  2009-04-23  5:29 ` [PATCH 6 of 8] sd: Block limits VPD support Martin K. Petersen
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

Detect non-rotational devices and set the queue flag accordingly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 27 insertions(+)
drivers/scsi/sd.c |   27 +++++++++++++++++++++++++++



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -50,6 +50,7 @@
 #include <linux/string_helpers.h>
 #include <linux/async.h>
 #include <asm/uaccess.h>
+#include <asm/unaligned.h>
 
 #include <scsi/scsi.h>
 #include <scsi/scsi_cmnd.h>
@@ -1800,6 +1801,29 @@ void sd_read_app_tag_own(struct scsi_dis
 }
 
 /**
+ * sd_read_block_characteristics - Query block dev. characteristics
+ * @disk: disk to query
+ */
+static void sd_read_block_characteristics(struct scsi_disk *sdkp)
+{
+	char *buffer;
+	u16 rot;
+
+	/* Block Device Characteristics VPD */
+	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
+
+	if (buffer == NULL)
+		return;
+
+	rot = get_unaligned_be16(&buffer[4]);
+
+	if (rot == 1)
+		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
+
+	kfree(buffer);
+}
+
+/**
  *	sd_revalidate_disk - called the first time a new disk is seen,
  *	performs disk spin up, read_capacity, etc.
  *	@disk: struct gendisk we care about
@@ -1836,6 +1860,7 @@ static int sd_revalidate_disk(struct gen
 	 */
 	if (sdkp->media_present) {
 		sd_read_capacity(sdkp, buffer);
+		sd_read_block_characteristics(sdkp);
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);
 		sd_read_app_tag_own(sdkp, buffer);
@@ -1976,6 +2001,8 @@ static void sd_probe_async(void *data, a
 	add_disk(gd);
 	sd_dif_config_host(sdkp);
 
+	sd_revalidate_disk(gd);
+
 	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
 		  sdp->removable ? "removable " : "");
 



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 6 of 8] sd: Block limits VPD support
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (6 preceding siblings ...)
  2009-04-23  5:29 ` [PATCH 5 of 8] sd: Detect non-rotational devices Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 7 of 8] scsi_debug: Add support for physical block exponent and alignment Martin K. Petersen
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

Query the block limits VPD page and adjust queue minimum and optimal I/O
sizes.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 24 insertions(+)
drivers/scsi/sd.c |   24 ++++++++++++++++++++++++



diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1801,6 +1801,29 @@ void sd_read_app_tag_own(struct scsi_dis
 }
 
 /**
+ * sd_read_block_limits - Query disk device for preferred I/O sizes.
+ * @disk: disk to query
+ */
+static void sd_read_block_limits(struct scsi_disk *sdkp)
+{
+	unsigned int sector_sz = sdkp->device->sector_size;
+	char *buffer;
+
+	/* Block Limits VPD */
+	buffer = scsi_get_vpd_page(sdkp->device, 0xb0);
+
+	if (buffer == NULL)
+		return;
+
+	blk_queue_min_io(sdkp->disk->queue,
+			 get_unaligned_be16(&buffer[6]) * sector_sz);
+	blk_queue_opt_io(sdkp->disk->queue,
+			 get_unaligned_be32(&buffer[12]) * sector_sz);
+
+	kfree(buffer);
+}
+
+/**
  * sd_read_block_characteristics - Query block dev. characteristics
  * @disk: disk to query
  */
@@ -1860,6 +1883,7 @@ static int sd_revalidate_disk(struct gen
 	 */
 	if (sdkp->media_present) {
 		sd_read_capacity(sdkp, buffer);
+		sd_read_block_limits(sdkp);
 		sd_read_block_characteristics(sdkp);
 		sd_read_write_protect_flag(sdkp, buffer);
 		sd_read_cache_type(sdkp, buffer);



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 7 of 8] scsi_debug: Add support for physical block exponent and alignment
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (7 preceding siblings ...)
  2009-04-23  5:29 ` [PATCH 6 of 8] sd: Block limits VPD support Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23  5:29 ` [PATCH 8 of 8] libata: Report disk alignment and physical block size Martin K. Petersen
  2009-04-23  5:29 ` Martin K. Petersen
  10 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

This patch adds support for setting the physical block exponent and
lowest aligned LBA in the READ CAPACITY(16) response.

The B0 VPD page is adjusted accordingly.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 29 insertions(+), 1 deletion(-)
drivers/scsi/scsi_debug.c |   30 +++++++++++++++++++++++++++++-



diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -101,6 +101,8 @@ static const char * scsi_debug_version_d
 #define DEF_DIF 0
 #define DEF_GUARD 0
 #define DEF_ATO 1
+#define DEF_PHYSBLK_EXP 0
+#define DEF_LOWEST_ALIGNED 0
 
 /* bit mask values for scsi_debug_opts */
 #define SCSI_DEBUG_OPT_NOISE   1
@@ -156,6 +158,8 @@ static int scsi_debug_dix = DEF_DIX;
 static int scsi_debug_dif = DEF_DIF;
 static int scsi_debug_guard = DEF_GUARD;
 static int scsi_debug_ato = DEF_ATO;
+static int scsi_debug_physblk_exp = DEF_PHYSBLK_EXP;
+static int scsi_debug_lowest_aligned = DEF_LOWEST_ALIGNED;
 
 static int scsi_debug_cmnd_count = 0;
 
@@ -657,7 +661,12 @@ static unsigned char vpdb0_data[] = {
 
 static int inquiry_evpd_b0(unsigned char * arr)
 {
+	unsigned int gran;
+
 	memcpy(arr, vpdb0_data, sizeof(vpdb0_data));
+	gran = 1 << scsi_debug_physblk_exp;
+	arr[2] = (gran >> 8) & 0xff;
+	arr[3] = gran & 0xff;
 	if (sdebug_store_sectors > 0x400) {
 		arr[4] = (sdebug_store_sectors >> 24) & 0xff;
 		arr[5] = (sdebug_store_sectors >> 16) & 0xff;
@@ -945,6 +954,9 @@ static int resp_readcap16(struct scsi_cm
 	arr[9] = (scsi_debug_sector_size >> 16) & 0xff;
 	arr[10] = (scsi_debug_sector_size >> 8) & 0xff;
 	arr[11] = scsi_debug_sector_size & 0xff;
+	arr[13] = scsi_debug_physblk_exp & 0xf;
+	arr[14] = (scsi_debug_lowest_aligned >> 8) & 0x3f;
+	arr[15] = scsi_debug_lowest_aligned & 0xff;
 
 	if (scsi_debug_dif) {
 		arr[12] = (scsi_debug_dif - 1) << 1; /* P_TYPE */
@@ -2380,6 +2392,8 @@ module_param_named(dix, scsi_debug_dix, 
 module_param_named(dif, scsi_debug_dif, int, S_IRUGO);
 module_param_named(guard, scsi_debug_guard, int, S_IRUGO);
 module_param_named(ato, scsi_debug_ato, int, S_IRUGO);
+module_param_named(physblk_exp, scsi_debug_physblk_exp, int, S_IRUGO);
+module_param_named(lowest_aligned, scsi_debug_lowest_aligned, int, S_IRUGO);
 
 MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -2401,7 +2415,9 @@ MODULE_PARM_DESC(ptype, "SCSI peripheral
 MODULE_PARM_DESC(scsi_level, "SCSI level to simulate(def=5[SPC-3])");
 MODULE_PARM_DESC(virtual_gb, "virtual gigabyte size (def=0 -> use dev_size_mb)");
 MODULE_PARM_DESC(vpd_use_hostno, "0 -> dev ids ignore hostno (def=1 -> unique dev ids)");
-MODULE_PARM_DESC(sector_size, "hardware sector size in bytes (def=512)");
+MODULE_PARM_DESC(sector_size, "logical block size in bytes (def=512)");
+MODULE_PARM_DESC(physblk_exp, "physical block exponent (def=0)");
+MODULE_PARM_DESC(lowest_aligned, "lowest aligned lba (def=0)");
 MODULE_PARM_DESC(dix, "data integrity extensions mask (def=0)");
 MODULE_PARM_DESC(dif, "data integrity field type: 0-3 (def=0)");
 MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=0)");
@@ -2874,6 +2890,18 @@ static int __init scsi_debug_init(void)
 		return -EINVAL;
 	}
 
+	if (scsi_debug_physblk_exp > 15) {
+		printk(KERN_ERR "scsi_debug_init: invalid physblk_exp %u\n",
+		       scsi_debug_physblk_exp);
+		return -EINVAL;
+	}
+
+	if (scsi_debug_lowest_aligned > 0x3fff) {
+		printk(KERN_ERR "scsi_debug_init: lowest_aligned too big: %u\n",
+		       scsi_debug_lowest_aligned);
+		return -EINVAL;
+	}
+
 	if (scsi_debug_dev_size_mb < 1)
 		scsi_debug_dev_size_mb = 1;  /* force minimum 1 MB ramdisk */
 	sz = (unsigned long)scsi_debug_dev_size_mb * 1048576;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 8 of 8] libata: Report disk alignment and physical block size
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (8 preceding siblings ...)
  2009-04-23  5:29 ` [PATCH 7 of 8] scsi_debug: Add support for physical block exponent and alignment Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  2009-04-23 13:46   ` Matthew Wilcox
  2009-04-23  5:29 ` Martin K. Petersen
  10 siblings, 1 reply; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

For disks with 4KB sectors, report the correct block size and alignment
when filling out the READ CAPACITY(16) response.

This patch is based upon code from Matthew Wilcox' 4KB ATA tree.  I
fixed the bug I reported a while back caused by ATA and SCSI using
different approaches to describing the alignment.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 20 insertions(+)
drivers/ata/libata-scsi.c |   20 ++++++++++++++++++++



diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2376,7 +2376,22 @@ saving_not_supp:
  */
 static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 {
+	struct ata_device *dev = args->dev;
 	u64 last_lba = args->dev->n_sectors - 1; /* LBA of the last block */
+	u8 log_per_phys = 0;
+	u16 lowest_aligned_lba = 0;
+	u16 word_106 = dev->id[106];
+	u16 word_209 = dev->id[209];
+
+	if ((word_106 & 0xc000) == 0x4000) {
+		/* Number and offset of logical sectors per physical sector */
+		if (word_106 & (1 << 13))
+			log_per_phys = word_106 & 0xf;
+		if ((word_209 & 0xc000) == 0x4000)
+			lowest_aligned_lba =
+				((1 << log_per_phys) - (word_209 & 0x3fff))
+				% (1 << log_per_phys);
+	}
 
 	VPRINTK("ENTER\n");
 
@@ -2407,6 +2422,11 @@ static unsigned int ata_scsiop_read_cap(
 		/* sector size */
 		rbuf[10] = ATA_SECT_SIZE >> 8;
 		rbuf[11] = ATA_SECT_SIZE & 0xff;
+
+		rbuf[12] = 0;
+		rbuf[13] = log_per_phys;
+		rbuf[14] = (lowest_aligned_lba >> 8) & 0x3f;
+		rbuf[15] = lowest_aligned_lba;
 	}
 
 	return 0;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* [PATCH 8 of 8] libata: Report disk alignment and physical block size
  2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
                   ` (9 preceding siblings ...)
  2009-04-23  5:29 ` [PATCH 8 of 8] libata: Report disk alignment and physical block size Martin K. Petersen
@ 2009-04-23  5:29 ` Martin K. Petersen
  10 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23  5:29 UTC (permalink / raw)
  To: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide

For disks with 4KB sectors, report the correct block size and alignment
when filling out the READ CAPACITY(16) response.

This patch is based upon code from Matthew Wilcox' 4KB ATA tree.  I
fixed the bug I reported a while back caused by ATA and SCSI using
different approaches to describing the alignment.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>

---
1 file changed, 20 insertions(+)
drivers/ata/libata-scsi.c |   20 ++++++++++++++++++++



diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2376,7 +2376,22 @@ saving_not_supp:
  */
 static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 {
+	struct ata_device *dev = args->dev;
 	u64 last_lba = args->dev->n_sectors - 1; /* LBA of the last block */
+	u8 log_per_phys = 0;
+	u16 lowest_aligned_lba = 0;
+	u16 word_106 = dev->id[106];
+	u16 word_209 = dev->id[209];
+
+	if ((word_106 & 0xc000) == 0x4000) {
+		/* Number and offset of logical sectors per physical sector */
+		if (word_106 & (1 << 13))
+			log_per_phys = word_106 & 0xf;
+		if ((word_209 & 0xc000) == 0x4000)
+			lowest_aligned_lba =
+				((1 << log_per_phys) - (word_209 & 0x3fff))
+				% (1 << log_per_phys);
+	}
 
 	VPRINTK("ENTER\n");
 
@@ -2407,6 +2422,11 @@ static unsigned int ata_scsiop_read_cap(
 		/* sector size */
 		rbuf[10] = ATA_SECT_SIZE >> 8;
 		rbuf[11] = ATA_SECT_SIZE & 0xff;
+
+		rbuf[12] = 0;
+		rbuf[13] = log_per_phys;
+		rbuf[14] = (lowest_aligned_lba >> 8) & 0x3f;
+		rbuf[15] = lowest_aligned_lba;
 	}
 
 	return 0;



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23  5:29 ` [PATCH 2 of 8] block: Export I/O topology for block devices and partitions Martin K. Petersen
@ 2009-04-23 10:51   ` Jens Axboe
  2009-04-23 11:49     ` Matthew Wilcox
                       ` (2 more replies)
  0 siblings, 3 replies; 42+ messages in thread
From: Jens Axboe @ 2009-04-23 10:51 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	linux-ide, linux-scsi

On Thu, Apr 23 2009, Martin K. Petersen wrote:
> To support devices with physical block sizes bigger than 512 bytes we
> need to ensure proper alignment.  This patch adds support for exposing
> I/O topology characteristics as devices are stacked.
> 
>   hardsect_size remains unchanged.  It is the smallest atomic unit the
>   device can address (i.e. logical block size).
> 
>   granularity indicates the smallest I/O the device can access without
>   incurring a read-modify-write penalty.  The granularity is set by
>   low-level drivers from then on it is purely internal to the stacking
>   logic.
> 
>   The min_io parameter is the smallest preferred I/O size reported by
>   the device.  In many cases this is the same as granularity.  However,
>   the min_io parameter can be scaled up when stacking (RAID5 chunk
>   size > physical sector size).  min_io is available in sysfs.
> 
>   The opt_io characteristic indicates the preferred I/O size reported by
>   the device.  This is usually the stripe width for arrays.  The value
>   is in sysfs.
> 
>   The alignment parameter indicates the number of bytes the start of the
>   device/partition is offset from the device granularity.  Partition
>   tools and MD/DM tools can use this to align filesystems to the proper
>   boundaries.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> ---
> 6 files changed, 204 insertions(+), 4 deletions(-)
> block/blk-settings.c   |  135 ++++++++++++++++++++++++++++++++++++++++++++++--
> block/blk-sysfs.c      |   22 +++++++
> block/genhd.c          |   10 +++
> fs/partitions/check.c  |   10 +++
> include/linux/blkdev.h |   30 ++++++++++
> include/linux/genhd.h  |    1 
> 
> 
> 
> diff --git a/block/blk-settings.c b/block/blk-settings.c
> --- a/block/blk-settings.c
> +++ b/block/blk-settings.c
> @@ -292,22 +292,87 @@ EXPORT_SYMBOL(blk_queue_max_segment_size
>   *
>   * Description:
>   *   This should typically be set to the lowest possible sector size
> - *   that the hardware can operate on (possible without reverting to
> - *   even internal read-modify-write operations). Usually the default
> - *   of 512 covers most hardware.
> + *   (logical block size) that the hardware can operate on.  Usually the
> + *   default of 512 covers most hardware.
>   **/
>  void blk_queue_hardsect_size(struct request_queue *q, unsigned short size)
>  {
> -	q->hardsect_size = size;
> +	q->hardsect_size = q->granularity = size;
>  }
>  EXPORT_SYMBOL(blk_queue_hardsect_size);
>  
> +/**
> + * blk_queue_granularity - set I/O granularity for the queue
> + * @q:  the request queue for the device
> + * @size:  the I/O granularity, in bytes
> + *
> + * Description:
> + *   This should typically be set to the lowest possible sector size
> + *   that the hardware can operate on without reverting to
> + *   read-modify-write operations.
> + **/
> +void blk_queue_granularity(struct request_queue *q, unsigned short size)
> +{
> +	q->granularity = size;
> +}
> +EXPORT_SYMBOL(blk_queue_granularity);
> +
> +/**
> + * blk_queue_alignment - set alignment for the queue
> + * @q:  the request queue for the device
> + * @alignment:  alignment offset in bytes
> + *
> + * Description:
> + *   Some devices are naturally misaligned to compensate for things like
> + *   the legacy DOS partition table 63-sector offset.  Low-level drivers
> + *   should call this function for devices whose first sector is not
> + *   naturally aligned.
> + */
> +void blk_queue_alignment(struct request_queue *q, unsigned int alignment)
> +{
> +	q->alignment = alignment & (q->granularity - 1);
> +	clear_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags);
> +}
> +EXPORT_SYMBOL(blk_queue_alignment);

How would low-level drivers know?

> +
>  /*
>   * Returns the minimum that is _not_ zero, unless both are zero.
>   */
>  #define min_not_zero(l, r) (l == 0) ? r : ((r == 0) ? l : min(l, r))
>  
>  /**
> + * blk_queue_min_io - set minimum request size for the queue
> + * @q:  the request queue for the device
> + * @min_io:  smallest I/O size in bytes
> + *
> + * Description:
> + *   Some devices have an internal block size bigger than the reported
> + *   hardware sector size.  This function can be used to signal the
> + *   smallest I/O the device can perform without incurring a performance
> + *   penalty.
> + */
> +void blk_queue_min_io(struct request_queue *q, unsigned int min_io)
> +{
> +	q->min_io = min_io;
> +}
> +EXPORT_SYMBOL(blk_queue_min_io);
> +
> +/**
> + * blk_queue_opt_io - set optimal request size for the queue
> + * @q:  the request queue for the device
> + * @opt_io:  optimal request size in bytes
> + *
> + * Description:
> + *   Drivers can call this function to set the preferred I/O request
> + *   size for devices that report such a value.
> + */
> +void blk_queue_opt_io(struct request_queue *q, unsigned int opt_io)
> +{
> +	q->opt_io = opt_io;
> +}
> +EXPORT_SYMBOL(blk_queue_opt_io);
> +
> +/**
>   * blk_queue_stack_limits - inherit underlying queue limits for stacked drivers
>   * @t:	the stacking driver (top)
>   * @b:  the underlying device (bottom)
> @@ -335,6 +400,68 @@ void blk_queue_stack_limits(struct reque
>  EXPORT_SYMBOL(blk_queue_stack_limits);
>  
>  /**
> + * blk_queue_stack_topology - adjust queue limits for stacked drivers
> + * @t:	the stacking driver (top)
> + * @bdev:  the underlying block device (bottom)
> + * @offset:  offset to beginning of data within component device
> + **/
> +void blk_queue_stack_topology(struct request_queue *t, struct block_device *bdev,
> +			      sector_t offset)
> +{
> +	struct request_queue *b = bdev_get_queue(bdev);
> +	int misaligned;
> +
> +	/* zero is "infinity" */
> +	t->max_sectors = min_not_zero(t->max_sectors, b->max_sectors);
> +	t->max_hw_sectors = min_not_zero(t->max_hw_sectors, b->max_hw_sectors);
> +	t->seg_boundary_mask = min_not_zero(t->seg_boundary_mask, b->seg_boundary_mask);
> +
> +	t->max_phys_segments = min_not_zero(t->max_phys_segments, b->max_phys_segments);
> +	t->max_hw_segments = min_not_zero(t->max_hw_segments, b->max_hw_segments);
> +	t->max_segment_size = min_not_zero(t->max_segment_size, b->max_segment_size);
> +	t->hardsect_size = max(t->hardsect_size, b->hardsect_size);
> +	t->min_io = max(t->min_io, b->min_io);
> +	t->granularity = max(t->granularity, b->granularity);
> +
> +	misaligned = 0;
> +	offset += get_start_sect(bdev) << 9;
> +
> +	/* Bottom device offset aligned? */
> +	if (offset && (offset & (b->granularity - 1)) != b->alignment) {
> +		misaligned = 1;
> +		goto out;
> +	}
> +
> +	/* If top has no alignment, inherit from bottom */
> +	if (!t->alignment)
> +		t->alignment = b->alignment & (b->granularity - 1);
> +
> +	/* Top alignment on logical block boundary? */
> +	if (t->alignment & (t->hardsect_size - 1)) {
> +		misaligned = 1;
> +		goto out;
> +	}
> +
> +out:
> +	if (!t->queue_lock)
> +		WARN_ON_ONCE(1);
> +	else if (misaligned || !test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags)) {
> +		unsigned long flags;
> +
> +		spin_lock_irqsave(t->queue_lock, flags);
> +
> +		if (!test_bit(QUEUE_FLAG_CLUSTER, &b->queue_flags))
> +			queue_flag_clear(QUEUE_FLAG_CLUSTER, t);
> +
> +		if (misaligned)
> +			queue_flag_set(QUEUE_FLAG_MISALIGNED, t);
> +
> +		spin_unlock_irqrestore(t->queue_lock, flags);
> +	}
> +}
> +EXPORT_SYMBOL(blk_queue_stack_topology);
> +
> +/**
>   * blk_queue_dma_pad - set pad mask
>   * @q:     the request queue for the device
>   * @mask:  pad mask
> diff --git a/block/blk-sysfs.c b/block/blk-sysfs.c
> --- a/block/blk-sysfs.c
> +++ b/block/blk-sysfs.c
> @@ -105,6 +105,16 @@ static ssize_t queue_hw_sector_size_show
>  	return queue_var_show(q->hardsect_size, page);
>  }
>  
> +static ssize_t queue_min_io_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->min_io, page);
> +}
> +
> +static ssize_t queue_opt_io_show(struct request_queue *q, char *page)
> +{
> +	return queue_var_show(q->opt_io, page);
> +}
> +
>  static ssize_t
>  queue_max_sectors_store(struct request_queue *q, const char *page, size_t count)
>  {
> @@ -256,6 +266,16 @@ static struct queue_sysfs_entry queue_hw
>  	.show = queue_hw_sector_size_show,
>  };
>  
> +static struct queue_sysfs_entry queue_min_io_entry = {
> +	.attr = {.name = "minimum_io_size", .mode = S_IRUGO },
> +	.show = queue_min_io_show,
> +};
> +
> +static struct queue_sysfs_entry queue_opt_io_entry = {
> +	.attr = {.name = "optimal_io_size", .mode = S_IRUGO },
> +	.show = queue_opt_io_show,
> +};
> +
>  static struct queue_sysfs_entry queue_nonrot_entry = {
>  	.attr = {.name = "rotational", .mode = S_IRUGO | S_IWUSR },
>  	.show = queue_nonrot_show,
> @@ -287,6 +307,8 @@ static struct attribute *default_attrs[]
>  	&queue_max_sectors_entry.attr,
>  	&queue_iosched_entry.attr,
>  	&queue_hw_sector_size_entry.attr,
> +	&queue_min_io_entry.attr,
> +	&queue_opt_io_entry.attr,
>  	&queue_nonrot_entry.attr,
>  	&queue_nomerges_entry.attr,
>  	&queue_rq_affinity_entry.attr,
> diff --git a/block/genhd.c b/block/genhd.c
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -848,11 +848,20 @@ static ssize_t disk_capability_show(stru
>  	return sprintf(buf, "%x\n", disk->flags);
>  }
>  
> +static ssize_t disk_alignment_show(struct device *dev,
> +				   struct device_attribute *attr, char *buf)
> +{
> +	struct gendisk *disk = dev_to_disk(dev);
> +
> +	return sprintf(buf, "%d\n", queue_alignment(disk->queue));
> +}
> +
>  static DEVICE_ATTR(range, S_IRUGO, disk_range_show, NULL);
>  static DEVICE_ATTR(ext_range, S_IRUGO, disk_ext_range_show, NULL);
>  static DEVICE_ATTR(removable, S_IRUGO, disk_removable_show, NULL);
>  static DEVICE_ATTR(ro, S_IRUGO, disk_ro_show, NULL);
>  static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
> +static DEVICE_ATTR(alignment, S_IRUGO, disk_alignment_show, NULL);
>  static DEVICE_ATTR(capability, S_IRUGO, disk_capability_show, NULL);
>  static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
>  #ifdef CONFIG_FAIL_MAKE_REQUEST
> @@ -871,6 +880,7 @@ static struct attribute *disk_attrs[] = 
>  	&dev_attr_removable.attr,
>  	&dev_attr_ro.attr,
>  	&dev_attr_size.attr,
> +	&dev_attr_alignment.attr,
>  	&dev_attr_capability.attr,
>  	&dev_attr_stat.attr,
>  #ifdef CONFIG_FAIL_MAKE_REQUEST
> diff --git a/fs/partitions/check.c b/fs/partitions/check.c
> --- a/fs/partitions/check.c
> +++ b/fs/partitions/check.c
> @@ -219,6 +219,13 @@ ssize_t part_size_show(struct device *de
>  	return sprintf(buf, "%llu\n",(unsigned long long)p->nr_sects);
>  }
>  
> +ssize_t part_alignment_show(struct device *dev,
> +			    struct device_attribute *attr, char *buf)
> +{
> +	struct hd_struct *p = dev_to_part(dev);
> +	return sprintf(buf, "%llu\n", (unsigned long long)p->alignment);
> +}
> +
>  ssize_t part_stat_show(struct device *dev,
>  		       struct device_attribute *attr, char *buf)
>  {
> @@ -272,6 +279,7 @@ ssize_t part_fail_store(struct device *d
>  static DEVICE_ATTR(partition, S_IRUGO, part_partition_show, NULL);
>  static DEVICE_ATTR(start, S_IRUGO, part_start_show, NULL);
>  static DEVICE_ATTR(size, S_IRUGO, part_size_show, NULL);
> +static DEVICE_ATTR(alignment, S_IRUGO, part_alignment_show, NULL);
>  static DEVICE_ATTR(stat, S_IRUGO, part_stat_show, NULL);
>  #ifdef CONFIG_FAIL_MAKE_REQUEST
>  static struct device_attribute dev_attr_fail =
> @@ -282,6 +290,7 @@ static struct attribute *part_attrs[] = 
>  	&dev_attr_partition.attr,
>  	&dev_attr_start.attr,
>  	&dev_attr_size.attr,
> +	&dev_attr_alignment.attr,
>  	&dev_attr_stat.attr,
>  #ifdef CONFIG_FAIL_MAKE_REQUEST
>  	&dev_attr_fail.attr,
> @@ -383,6 +392,7 @@ struct hd_struct *add_partition(struct g
>  	pdev = part_to_dev(p);
>  
>  	p->start_sect = start;
> +	p->alignment = queue_sector_alignment(disk->queue, start);
>  	p->nr_sects = len;
>  	p->partno = partno;
>  	p->policy = get_disk_ro(disk);
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -402,6 +402,10 @@ struct request_queue
>  	unsigned short		max_hw_segments;
>  	unsigned short		hardsect_size;
>  	unsigned int		max_segment_size;
> +	unsigned int		alignment;
> +	unsigned int		granularity;
> +	unsigned int		min_io;
> +	unsigned int		opt_io;
>  
>  	unsigned long		seg_boundary_mask;
>  	void			*dma_drain_buffer;

Patch looks fine, but can we group these by name please? alignment could
be anything.

+	unsigned int		io_alignment;
+	unsigned int		io_granularity;
+	unsigned int		io_min;
+	unsigned int		io_opt;
>  

> @@ -461,6 +465,7 @@ struct request_queue
>  #define QUEUE_FLAG_NONROT      14	/* non-rotational device (SSD) */
>  #define QUEUE_FLAG_VIRT        QUEUE_FLAG_NONROT /* paravirt device */
>  #define QUEUE_FLAG_IO_STAT     15	/* do IO stats */
> +#define QUEUE_FLAG_MISALIGNED  16	/* bdev not aligned to disk */
>  
>  #define QUEUE_FLAG_DEFAULT	((1 << QUEUE_FLAG_IO_STAT) |		\
>  				 (1 << QUEUE_FLAG_CLUSTER) |		\
> @@ -877,7 +882,15 @@ extern void blk_queue_max_phys_segments(
>  extern void blk_queue_max_hw_segments(struct request_queue *, unsigned short);
>  extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
>  extern void blk_queue_hardsect_size(struct request_queue *, unsigned short);
> +extern void blk_queue_granularity(struct request_queue *, unsigned short);
> +extern void blk_queue_alignment(struct request_queue *q,
> +				unsigned int alignment);
> +extern void blk_queue_min_io(struct request_queue *q, unsigned int min_io);
> +extern void blk_queue_opt_io(struct request_queue *q, unsigned int opt_io);
>  extern void blk_queue_stack_limits(struct request_queue *t, struct request_queue *b);
> +extern void blk_queue_stack_topology(struct request_queue *t,
> +				     struct block_device *bdev,
> +				     sector_t offset);
>  extern void blk_queue_dma_pad(struct request_queue *, unsigned int);
>  extern void blk_queue_update_dma_pad(struct request_queue *, unsigned int);
>  extern int blk_queue_dma_drain(struct request_queue *q,
> @@ -978,6 +991,23 @@ static inline int bdev_hardsect_size(str
>  	return queue_hardsect_size(bdev_get_queue(bdev));
>  }
>  
> +static inline int queue_alignment(struct request_queue *q)
> +{
> +	if (q && test_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags))
> +		return -1;
> +
> +	if (q && q->alignment)
> +		return q->alignment;
> +
> +	return 0;
> +}
> +
> +static inline int queue_sector_alignment(struct request_queue *q,
> +					 sector_t sector)
> +{
> +	return ((sector << 9) - q->alignment) & (q->min_io - 1);
> +}
> +
>  static inline int queue_dma_alignment(struct request_queue *q)
>  {
>  	return q ? q->dma_alignment : 511;
> diff --git a/include/linux/genhd.h b/include/linux/genhd.h
> --- a/include/linux/genhd.h
> +++ b/include/linux/genhd.h
> @@ -90,6 +90,7 @@ struct disk_stats {
>  struct hd_struct {
>  	sector_t start_sect;
>  	sector_t nr_sects;
> +	sector_t alignment;
>  	struct device __dev;
>  	struct kobject *holder_dir;
>  	int policy, partno;
> 
> 

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23  5:29 ` [PATCH 5 of 8] sd: Detect non-rotational devices Martin K. Petersen
@ 2009-04-23 10:52   ` Jens Axboe
  2009-04-23 11:09     ` Jeff Garzik
  0 siblings, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2009-04-23 10:52 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	linux-ide, linux-scsi

On Thu, Apr 23 2009, Martin K. Petersen wrote:
> Detect non-rotational devices and set the queue flag accordingly.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
> ---
> 1 file changed, 27 insertions(+)
> drivers/scsi/sd.c |   27 +++++++++++++++++++++++++++
> 
> 
> 
> diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
> --- a/drivers/scsi/sd.c
> +++ b/drivers/scsi/sd.c
> @@ -50,6 +50,7 @@
>  #include <linux/string_helpers.h>
>  #include <linux/async.h>
>  #include <asm/uaccess.h>
> +#include <asm/unaligned.h>
>  
>  #include <scsi/scsi.h>
>  #include <scsi/scsi_cmnd.h>
> @@ -1800,6 +1801,29 @@ void sd_read_app_tag_own(struct scsi_dis
>  }
>  
>  /**
> + * sd_read_block_characteristics - Query block dev. characteristics
> + * @disk: disk to query
> + */
> +static void sd_read_block_characteristics(struct scsi_disk *sdkp)
> +{
> +	char *buffer;
> +	u16 rot;
> +
> +	/* Block Device Characteristics VPD */
> +	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
> +
> +	if (buffer == NULL)
> +		return;
> +
> +	rot = get_unaligned_be16(&buffer[4]);
> +
> +	if (rot == 1)
> +		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
> +
> +	kfree(buffer);
> +}
> +
> +/**
>   *	sd_revalidate_disk - called the first time a new disk is seen,
>   *	performs disk spin up, read_capacity, etc.
>   *	@disk: struct gendisk we care about
> @@ -1836,6 +1860,7 @@ static int sd_revalidate_disk(struct gen
>  	 */
>  	if (sdkp->media_present) {
>  		sd_read_capacity(sdkp, buffer);
> +		sd_read_block_characteristics(sdkp);
>  		sd_read_write_protect_flag(sdkp, buffer);
>  		sd_read_cache_type(sdkp, buffer);
>  		sd_read_app_tag_own(sdkp, buffer);
> @@ -1976,6 +2001,8 @@ static void sd_probe_async(void *data, a
>  	add_disk(gd);
>  	sd_dif_config_host(sdkp);
>  
> +	sd_revalidate_disk(gd);
> +
>  	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
>  		  sdp->removable ? "removable " : "");

Make sure this works for libata as well, and then kill the rotational
check in there instead.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 10:52   ` Jens Axboe
@ 2009-04-23 11:09     ` Jeff Garzik
  2009-04-23 11:13       ` Jens Axboe
  2009-04-23 11:38       ` Matthew Wilcox
  0 siblings, 2 replies; 42+ messages in thread
From: Jeff Garzik @ 2009-04-23 11:09 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, rwheeler, snitzer, neilb, James.Bottomley,
	dgilbert, linux-ide, linux-scsi

Jens Axboe wrote:
>> +static void sd_read_block_characteristics(struct scsi_disk *sdkp)
>> +{
>> +	char *buffer;
>> +	u16 rot;
>> +
>> +	/* Block Device Characteristics VPD */
>> +	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
>> +
>> +	if (buffer == NULL)
>> +		return;
>> +
>> +	rot = get_unaligned_be16(&buffer[4]);
>> +
>> +	if (rot == 1)
>> +		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
>> +
>> +	kfree(buffer);
>> +}
>> +
>> +/**
>>   *	sd_revalidate_disk - called the first time a new disk is seen,
>>   *	performs disk spin up, read_capacity, etc.
>>   *	@disk: struct gendisk we care about
>> @@ -1836,6 +1860,7 @@ static int sd_revalidate_disk(struct gen
>>  	 */
>>  	if (sdkp->media_present) {
>>  		sd_read_capacity(sdkp, buffer);
>> +		sd_read_block_characteristics(sdkp);
>>  		sd_read_write_protect_flag(sdkp, buffer);
>>  		sd_read_cache_type(sdkp, buffer);
>>  		sd_read_app_tag_own(sdkp, buffer);
>> @@ -1976,6 +2001,8 @@ static void sd_probe_async(void *data, a
>>  	add_disk(gd);
>>  	sd_dif_config_host(sdkp);
>>  
>> +	sd_revalidate_disk(gd);
>> +
>>  	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
>>  		  sdp->removable ? "removable " : "");
> 
> Make sure this works for libata as well, and then kill the rotational
> check in there instead.

Yep.  libata-scsi.c would need to simulate that VPD page.

Also (to mkp or whoever does the work) -- note Linus's comment, and my 
provisional patch[1], about libata potentially wanting to detect NONROT 
by looking for "*SSD" from IDENTIFY DEVICE'S model string.

	Jeff


[1] partial subject line, from the big "Ext3 latency fixes" thread):
"libata: add SSD detection hueristic; move SSD setup to ata_dev_configure"



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 11:09     ` Jeff Garzik
@ 2009-04-23 11:13       ` Jens Axboe
  2009-04-23 11:22         ` Jeff Garzik
  2009-04-23 11:38       ` Matthew Wilcox
  1 sibling, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2009-04-23 11:13 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Martin K. Petersen, rwheeler, snitzer, neilb, James.Bottomley,
	dgilbert, linux-ide, linux-scsi

On Thu, Apr 23 2009, Jeff Garzik wrote:
> Jens Axboe wrote:
>>> +static void sd_read_block_characteristics(struct scsi_disk *sdkp)
>>> +{
>>> +	char *buffer;
>>> +	u16 rot;
>>> +
>>> +	/* Block Device Characteristics VPD */
>>> +	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
>>> +
>>> +	if (buffer == NULL)
>>> +		return;
>>> +
>>> +	rot = get_unaligned_be16(&buffer[4]);
>>> +
>>> +	if (rot == 1)
>>> +		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
>>> +
>>> +	kfree(buffer);
>>> +}
>>> +
>>> +/**
>>>   *	sd_revalidate_disk - called the first time a new disk is seen,
>>>   *	performs disk spin up, read_capacity, etc.
>>>   *	@disk: struct gendisk we care about
>>> @@ -1836,6 +1860,7 @@ static int sd_revalidate_disk(struct gen
>>>  	 */
>>>  	if (sdkp->media_present) {
>>>  		sd_read_capacity(sdkp, buffer);
>>> +		sd_read_block_characteristics(sdkp);
>>>  		sd_read_write_protect_flag(sdkp, buffer);
>>>  		sd_read_cache_type(sdkp, buffer);
>>>  		sd_read_app_tag_own(sdkp, buffer);
>>> @@ -1976,6 +2001,8 @@ static void sd_probe_async(void *data, a
>>>  	add_disk(gd);
>>>  	sd_dif_config_host(sdkp);
>>>  +	sd_revalidate_disk(gd);
>>> +
>>>  	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
>>>  		  sdp->removable ? "removable " : "");
>>
>> Make sure this works for libata as well, and then kill the rotational
>> check in there instead.
>
> Yep.  libata-scsi.c would need to simulate that VPD page.

Exactly

> Also (to mkp or whoever does the work) -- note Linus's comment, and my  
> provisional patch[1], about libata potentially wanting to detect NONROT  
> by looking for "*SSD" from IDENTIFY DEVICE'S model string.

I think that's an entirely orthogonal issue.

>
> 	Jeff
>
>
> [1] partial subject line, from the big "Ext3 latency fixes" thread):
> "libata: add SSD detection hueristic; move SSD setup to ata_dev_configure"
>
>

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 11:13       ` Jens Axboe
@ 2009-04-23 11:22         ` Jeff Garzik
  0 siblings, 0 replies; 42+ messages in thread
From: Jeff Garzik @ 2009-04-23 11:22 UTC (permalink / raw)
  Cc: Martin K. Petersen, rwheeler, snitzer, neilb, James.Bottomley,
	dgilbert, linux-ide, linux-scsi

Jens Axboe wrote:
> On Thu, Apr 23 2009, Jeff Garzik wrote:
>> Jens Axboe wrote:
>>>> +static void sd_read_block_characteristics(struct scsi_disk *sdkp)
>>>> +{
>>>> +	char *buffer;
>>>> +	u16 rot;
>>>> +
>>>> +	/* Block Device Characteristics VPD */
>>>> +	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
>>>> +
>>>> +	if (buffer == NULL)
>>>> +		return;
>>>> +
>>>> +	rot = get_unaligned_be16(&buffer[4]);
>>>> +
>>>> +	if (rot == 1)
>>>> +		queue_flag_set_unlocked(QUEUE_FLAG_NONROT, sdkp->disk->queue);
>>>> +
>>>> +	kfree(buffer);
>>>> +}
>>>> +
>>>> +/**
>>>>   *	sd_revalidate_disk - called the first time a new disk is seen,
>>>>   *	performs disk spin up, read_capacity, etc.
>>>>   *	@disk: struct gendisk we care about
>>>> @@ -1836,6 +1860,7 @@ static int sd_revalidate_disk(struct gen
>>>>  	 */
>>>>  	if (sdkp->media_present) {
>>>>  		sd_read_capacity(sdkp, buffer);
>>>> +		sd_read_block_characteristics(sdkp);
>>>>  		sd_read_write_protect_flag(sdkp, buffer);
>>>>  		sd_read_cache_type(sdkp, buffer);
>>>>  		sd_read_app_tag_own(sdkp, buffer);
>>>> @@ -1976,6 +2001,8 @@ static void sd_probe_async(void *data, a
>>>>  	add_disk(gd);
>>>>  	sd_dif_config_host(sdkp);
>>>>  +	sd_revalidate_disk(gd);
>>>> +
>>>>  	sd_printk(KERN_NOTICE, sdkp, "Attached SCSI %sdisk\n",
>>>>  		  sdp->removable ? "removable " : "");
>>> Make sure this works for libata as well, and then kill the rotational
>>> check in there instead.
>> Yep.  libata-scsi.c would need to simulate that VPD page.
> 
> Exactly
> 
>> Also (to mkp or whoever does the work) -- note Linus's comment, and my  
>> provisional patch[1], about libata potentially wanting to detect NONROT  
>> by looking for "*SSD" from IDENTIFY DEVICE'S model string.
> 
> I think that's an entirely orthogonal issue.

Yes.

If somebody is going to be essentially rewriting the libata SSD 
detection logic, that area will get stirred _anyway_.

So I hope it is not out of bounds to point out additional areas where a 
developer could serve a legitimate need of Linux users, particularly 
when doing so is easy and a provisional patch already exists.

	Jeff




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 11:09     ` Jeff Garzik
  2009-04-23 11:13       ` Jens Axboe
@ 2009-04-23 11:38       ` Matthew Wilcox
  2009-04-23 11:58         ` Jeff Garzik
  2009-04-23 13:16         ` Martin K. Petersen
  1 sibling, 2 replies; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 11:38 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Jens Axboe, Martin K. Petersen, rwheeler, snitzer, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 07:09:37AM -0400, Jeff Garzik wrote:
> Jens Axboe wrote:
> >>+	/* Block Device Characteristics VPD */
> >>+	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
> >>+
> >>+	if (buffer == NULL)
> >>+		return;
> >>+
> >>+	rot = get_unaligned_be16(&buffer[4]);
>
> >Make sure this works for libata as well, and then kill the rotational
> >check in there instead.
> 
> Yep.  libata-scsi.c would need to simulate that VPD page.

I already did that.  The only problem is that you made me include the stupid:

        if (ata_id_major_version(args->id) > 7) {

so of course it doesn't work on any existing hardware.  How about
applying this patch:

----

libata: fill in b1 page for all drives, not just ATA-8

Some of the drives on the market fill in the rotational speed and form
factor correctly, even though they claim support for an earlier version
of ATA.  The current ata_id_is_ssd() code doesn't check the version
number and doesn't appear to have caused any trouble.  Besides, SCSI devices
are also capable of returning garbage in these fields.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 2733b0c..59358ca 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2144,11 +2144,9 @@ static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
 {
 	rbuf[1] = 0xb1;
 	rbuf[3] = 0x3c;
-	if (ata_id_major_version(args->id) > 7) {
-		rbuf[4] = args->id[217] >> 8;
-		rbuf[5] = args->id[217];
-		rbuf[7] = args->id[168] & 0xf;
-	}
+	rbuf[4] = args->id[217] >> 8;
+	rbuf[5] = args->id[217];
+	rbuf[7] = args->id[168] & 0xf;
 
 	return 0;
 }

> Also (to mkp or whoever does the work) -- note Linus's comment, and my 
> provisional patch[1], about libata potentially wanting to detect NONROT 
> by looking for "*SSD" from IDENTIFY DEVICE'S model string.

Found it ... and Jens' suggestion that this be done in userspace instead.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23 10:51   ` Jens Axboe
@ 2009-04-23 11:49     ` Matthew Wilcox
  2009-04-23 11:55       ` Jens Axboe
  2009-04-23 13:17     ` Martin K. Petersen
  2009-04-23 18:13     ` Konrad Rzeszutek
  2 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 11:49 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 12:51:45PM +0200, Jens Axboe wrote:
> > +/**
> > + * blk_queue_alignment - set alignment for the queue
> > + * @q:  the request queue for the device
> > + * @alignment:  alignment offset in bytes
> > + *
> > + * Description:
> > + *   Some devices are naturally misaligned to compensate for things like
> > + *   the legacy DOS partition table 63-sector offset.  Low-level drivers
> > + *   should call this function for devices whose first sector is not
> > + *   naturally aligned.
> > + */
> > +void blk_queue_alignment(struct request_queue *q, unsigned int alignment)
> > +{
> > +	q->alignment = alignment & (q->granularity - 1);
> > +	clear_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags);
> > +}
> > +EXPORT_SYMBOL(blk_queue_alignment);
> 
> How would low-level drivers know?

ATA reports this in the INQUIRY data.  SCSI reports it in READ CAPACITY 16.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23 11:49     ` Matthew Wilcox
@ 2009-04-23 11:55       ` Jens Axboe
  2009-04-23 13:22         ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: Jens Axboe @ 2009-04-23 11:55 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23 2009, Matthew Wilcox wrote:
> On Thu, Apr 23, 2009 at 12:51:45PM +0200, Jens Axboe wrote:
> > > +/**
> > > + * blk_queue_alignment - set alignment for the queue
> > > + * @q:  the request queue for the device
> > > + * @alignment:  alignment offset in bytes
> > > + *
> > > + * Description:
> > > + *   Some devices are naturally misaligned to compensate for things like
> > > + *   the legacy DOS partition table 63-sector offset.  Low-level drivers
> > > + *   should call this function for devices whose first sector is not
> > > + *   naturally aligned.
> > > + */
> > > +void blk_queue_alignment(struct request_queue *q, unsigned int alignment)
> > > +{
> > > +	q->alignment = alignment & (q->granularity - 1);
> > > +	clear_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags);
> > > +}
> > > +EXPORT_SYMBOL(blk_queue_alignment);
> > 
> > How would low-level drivers know?
> 
> ATA reports this in the INQUIRY data.  SCSI reports it in READ CAPACITY 16.

OK, I wasn't aware of that. That's a relief!

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 11:38       ` Matthew Wilcox
@ 2009-04-23 11:58         ` Jeff Garzik
  2009-04-23 12:03           ` Jens Axboe
  2009-04-23 13:16         ` Martin K. Petersen
  1 sibling, 1 reply; 42+ messages in thread
From: Jeff Garzik @ 2009-04-23 11:58 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jens Axboe, Martin K. Petersen, rwheeler, snitzer, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

Matthew Wilcox wrote:
> On Thu, Apr 23, 2009 at 07:09:37AM -0400, Jeff Garzik wrote:
>> Jens Axboe wrote:
>>>> +	/* Block Device Characteristics VPD */
>>>> +	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
>>>> +
>>>> +	if (buffer == NULL)
>>>> +		return;
>>>> +
>>>> +	rot = get_unaligned_be16(&buffer[4]);
>>> Make sure this works for libata as well, and then kill the rotational
>>> check in there instead.
>> Yep.  libata-scsi.c would need to simulate that VPD page.
> 
> I already did that.  The only problem is that you made me include the stupid:
> 
>         if (ata_id_major_version(args->id) > 7) {
> 
> so of course it doesn't work on any existing hardware.  How about
> applying this patch:
> 
> ----
> 
> libata: fill in b1 page for all drives, not just ATA-8
> 
> Some of the drives on the market fill in the rotational speed and form
> factor correctly, even though they claim support for an earlier version
> of ATA.  The current ata_id_is_ssd() code doesn't check the version
> number and doesn't appear to have caused any trouble.  Besides, SCSI devices
> are also capable of returning garbage in these fields.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 2733b0c..59358ca 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2144,11 +2144,9 @@ static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
>  {
>  	rbuf[1] = 0xb1;
>  	rbuf[3] = 0x3c;
> -	if (ata_id_major_version(args->id) > 7) {
> -		rbuf[4] = args->id[217] >> 8;
> -		rbuf[5] = args->id[217];
> -		rbuf[7] = args->id[168] & 0xf;
> -	}
> +	rbuf[4] = args->id[217] >> 8;
> +	rbuf[5] = args->id[217];
> +	rbuf[7] = args->id[168] & 0xf;

Thus returning undefined garbage for the vast majority of ATA devices? 
Might as well admit that a call to get_random_bytes() is a valid 
implementation, at that point.

Linux users deserve more than that :)

If you want to find a better test than "version > 7", that is fine.

Surely a few minutes of thinking and a few minutes of research will 
yield a reasonable hueristic, that gives a reasonable estimation of 
when/if these fields are valid?

linux/ata.h is filled with examples of proper range checking -- ensuring 
that a range of IDENTIFY DEVICE words are valid.  There are also typical 
tests such as assuming values other than 0x0000 and 0xffff are valid.


>> Also (to mkp or whoever does the work) -- note Linus's comment, and my 
>> provisional patch[1], about libata potentially wanting to detect NONROT 
>> by looking for "*SSD" from IDENTIFY DEVICE'S model string.
> 
> Found it ... and Jens' suggestion that this be done in userspace instead.

It is trivial to do in the kernel, where we already match against model 
info for a long list of quirks.

Therefore, I think the Just Works(tm) value to SSD owners is higher. 
That way old userlands work with SSDs too.

	Jeff




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 11:58         ` Jeff Garzik
@ 2009-04-23 12:03           ` Jens Axboe
  0 siblings, 0 replies; 42+ messages in thread
From: Jens Axboe @ 2009-04-23 12:03 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Matthew Wilcox, Martin K. Petersen, rwheeler, snitzer, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23 2009, Jeff Garzik wrote:
> Matthew Wilcox wrote:
>> On Thu, Apr 23, 2009 at 07:09:37AM -0400, Jeff Garzik wrote:
>>> Jens Axboe wrote:
>>>>> +	/* Block Device Characteristics VPD */
>>>>> +	buffer = scsi_get_vpd_page(sdkp->device, 0xb1);
>>>>> +
>>>>> +	if (buffer == NULL)
>>>>> +		return;
>>>>> +
>>>>> +	rot = get_unaligned_be16(&buffer[4]);
>>>> Make sure this works for libata as well, and then kill the rotational
>>>> check in there instead.
>>> Yep.  libata-scsi.c would need to simulate that VPD page.
>>
>> I already did that.  The only problem is that you made me include the stupid:
>>
>>         if (ata_id_major_version(args->id) > 7) {
>>
>> so of course it doesn't work on any existing hardware.  How about
>> applying this patch:
>>
>> ----
>>
>> libata: fill in b1 page for all drives, not just ATA-8
>>
>> Some of the drives on the market fill in the rotational speed and form
>> factor correctly, even though they claim support for an earlier version
>> of ATA.  The current ata_id_is_ssd() code doesn't check the version
>> number and doesn't appear to have caused any trouble.  Besides, SCSI devices
>> are also capable of returning garbage in these fields.
>>
>> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 2733b0c..59358ca 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -2144,11 +2144,9 @@ static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
>>  {
>>  	rbuf[1] = 0xb1;
>>  	rbuf[3] = 0x3c;
>> -	if (ata_id_major_version(args->id) > 7) {
>> -		rbuf[4] = args->id[217] >> 8;
>> -		rbuf[5] = args->id[217];
>> -		rbuf[7] = args->id[168] & 0xf;
>> -	}
>> +	rbuf[4] = args->id[217] >> 8;
>> +	rbuf[5] = args->id[217];
>> +	rbuf[7] = args->id[168] & 0xf;
>
> Thus returning undefined garbage for the vast majority of ATA devices?  
> Might as well admit that a call to get_random_bytes() is a valid  
> implementation, at that point.
>
> Linux users deserve more than that :)
>
> If you want to find a better test than "version > 7", that is fine.
>
> Surely a few minutes of thinking and a few minutes of research will  
> yield a reasonable hueristic, that gives a reasonable estimation of  
> when/if these fields are valid?
>
> linux/ata.h is filled with examples of proper range checking -- ensuring  
> that a range of IDENTIFY DEVICE words are valid.  There are also typical  
> tests such as assuming values other than 0x0000 and 0xffff are valid.
>
>
>>> Also (to mkp or whoever does the work) -- note Linus's comment, and 
>>> my provisional patch[1], about libata potentially wanting to detect 
>>> NONROT by looking for "*SSD" from IDENTIFY DEVICE'S model string.
>>
>> Found it ... and Jens' suggestion that this be done in userspace instead.
>
> It is trivial to do in the kernel, where we already match against model  
> info for a long list of quirks.
>
> Therefore, I think the Just Works(tm) value to SSD owners is higher.  
> That way old userlands work with SSDs too.

But it's just as easy to do in udev, it's just a one-line udev rule.
Don't care much either way, though.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 11:38       ` Matthew Wilcox
  2009-04-23 11:58         ` Jeff Garzik
@ 2009-04-23 13:16         ` Martin K. Petersen
  2009-04-23 13:33           ` Jeff Garzik
  1 sibling, 1 reply; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23 13:16 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jeff Garzik, Jens Axboe, Martin K. Petersen, rwheeler, snitzer,
	neilb, James.Bottomley, dgilbert, linux-ide, linux-scsi

>>>>> "Matthew" == Matthew Wilcox <matthew@wil.cx> writes:

Matthew> I already did that.  The only problem is that you made me
Matthew> include the stupid:

Matthew>         if (ata_id_major_version(args->id) > 7) {

Matthew> so of course it doesn't work on any existing hardware.  How
Matthew> about applying this patch:

Maybe we could incubate your patch in the next tree for a bit and see
what breaks without the version check?

We could even be somewhat conservative like we were with RC16 in SCSI.
The SATA devices I have here with valid rotational flags all report
version 7.  I wonder if > 6 do the trick?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23 10:51   ` Jens Axboe
  2009-04-23 11:49     ` Matthew Wilcox
@ 2009-04-23 13:17     ` Martin K. Petersen
  2009-04-23 18:13     ` Konrad Rzeszutek
  2 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23 13:17 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

>>>>> "Jens" == Jens Axboe <jens.axboe@oracle.com> writes:

>> + unsigned int alignment;
>> + unsigned int granularity;
>> + unsigned int min_io;
>> + unsigned int opt_io;
>> 
>> unsigned long seg_boundary_mask; void *dma_drain_buffer;

Jens> Patch looks fine, but can we group these by name please? alignment
Jens> could be anything.

Jens> + unsigned int io_alignment;
Jens> + unsigned int io_granularity;
Jens> + unsigned int io_min;
Jens> + unsigned int io_opt;

Sure, no problem.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23 11:55       ` Jens Axboe
@ 2009-04-23 13:22         ` Matthew Wilcox
  2009-04-23 13:30           ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 13:22 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 01:55:09PM +0200, Jens Axboe wrote:
> On Thu, Apr 23 2009, Matthew Wilcox wrote:
> > ATA reports this in the INQUIRY data.  SCSI reports it in READ CAPACITY 16.
> 
> OK, I wasn't aware of that. That's a relief!

Here's a patch to translate ATA to SCSI (extracted from the 4k work I
did last year).  Jeff, could you apply this?

--- 

libata: Report number of logical sectors per physical sector

Translate the information from IDENTIFY DEVICE into SCSI's READ CAPACITY 16.
Note that ATA reports the offset of the first sector from the alignment
whereas SCSI reports the first aligned sector.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 2733b0c..e8a83d9 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2376,10 +2374,25 @@ saving_not_supp:
  */
 static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 {
-	u64 last_lba = args->dev->n_sectors - 1; /* LBA of the last block */
+	struct ata_device *dev = args->dev;
+	u64 last_lba = dev->n_sectors - 1; /* LBA of the last block */
+	u8 log_per_phys = 0;
+	u16 lowest_aligned = 0;
+	u16 word_106 = dev->id[106];
 
 	VPRINTK("ENTER\n");
 
+	if ((word_106 & 0xc000) == 0x4000) {
+		/* Number and offset of logical sectors per physical sector */
+		if (word_106 & (1 << 13))
+			log_per_phys = word_106 & 0xf;
+		if ((dev->id[209] & 0xc000) == 0x4000) {
+			u16 first = dev->id[209] & 0x3fff;
+			if (first > 0)
+				lowest_aligned = (1 << log_per_phys) - first;
+		}
+	}
+
 	if (args->cmd->cmnd[0] == READ_CAPACITY) {
 		if (last_lba >= 0xffffffffULL)
 			last_lba = 0xffffffff;
@@ -2407,6 +2420,11 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 		/* sector size */
 		rbuf[10] = ATA_SECT_SIZE >> 8;
 		rbuf[11] = ATA_SECT_SIZE & 0xff;
+
+		rbuf[12] = 0;
+		rbuf[13] = log_per_phys;
+		rbuf[14] = lowest_aligned >> 8;
+		rbuf[15] = lowest_aligned;
 	}
 
 	return 0;

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply related	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23 13:22         ` Matthew Wilcox
@ 2009-04-23 13:30           ` Matthew Wilcox
  0 siblings, 0 replies; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 13:30 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 07:22:37AM -0600, Matthew Wilcox wrote:
> On Thu, Apr 23, 2009 at 01:55:09PM +0200, Jens Axboe wrote:
> > On Thu, Apr 23 2009, Matthew Wilcox wrote:
> > > ATA reports this in the INQUIRY data.  SCSI reports it in READ CAPACITY 16.
> > 
> > OK, I wasn't aware of that. That's a relief!
> 
> Here's a patch to translate ATA to SCSI (extracted from the 4k work I
> did last year).  Jeff, could you apply this?

hahaha.  i should have read down as far as 8/8 ;-)

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 13:16         ` Martin K. Petersen
@ 2009-04-23 13:33           ` Jeff Garzik
  2009-04-23 14:10             ` James Bottomley
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff Garzik @ 2009-04-23 13:33 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Matthew Wilcox, Jens Axboe, rwheeler, snitzer, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

Martin K. Petersen wrote:
>>>>>> "Matthew" == Matthew Wilcox <matthew@wil.cx> writes:
> 
> Matthew> I already did that.  The only problem is that you made me
> Matthew> include the stupid:
> 
> Matthew>         if (ata_id_major_version(args->id) > 7) {
> 
> Matthew> so of course it doesn't work on any existing hardware.  How
> Matthew> about applying this patch:
> 
> Maybe we could incubate your patch in the next tree for a bit and see
> what breaks without the version check?
> 
> We could even be somewhat conservative like we were with RC16 in SCSI.
> The SATA devices I have here with valid rotational flags all report
> version 7.  I wonder if > 6 do the trick?

linux/ata.h illustrates the standard ATA rules for validating bits of 
IDENTIFY DEVICE.

Just checking the version was always just a simplistic hack...  we are 
talking specifically about trusting values listed as undefined in the 
relevant specs.  That requires more, not less, gymnastics :)

	Jeff





^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8 of 8] libata: Report disk alignment and physical block size
  2009-04-23  5:29 ` [PATCH 8 of 8] libata: Report disk alignment and physical block size Martin K. Petersen
@ 2009-04-23 13:46   ` Matthew Wilcox
  2009-04-23 14:05     ` Martin K. Petersen
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 13:46 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide, linux-scsi


OK, comparing the two ...

On Thu, Apr 23, 2009 at 01:29:35AM -0400, Martin K. Petersen wrote:
>  static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
>  {
> +	struct ata_device *dev = args->dev;
>  	u64 last_lba = args->dev->n_sectors - 1; /* LBA of the last block */

You can drop the 'args->' here and save a pointer dereference.

> +	u8 log_per_phys = 0;
> +	u16 lowest_aligned_lba = 0;
> +	u16 word_106 = dev->id[106];
> +	u16 word_209 = dev->id[209];
> +
> +	if ((word_106 & 0xc000) == 0x4000) {
> +		/* Number and offset of logical sectors per physical sector */
> +		if (word_106 & (1 << 13))
> +			log_per_phys = word_106 & 0xf;
> +		if ((word_209 & 0xc000) == 0x4000)
> +			lowest_aligned_lba =
> +				((1 << log_per_phys) - (word_209 & 0x3fff))
> +				% (1 << log_per_phys);

I found it clearer to do:
			u16 first = word_209 & 0x3fff;
			if (first)
				lowest_aligned = (1 << log_per_phys) - first;

but you may disagree.

> +		rbuf[14] = (lowest_aligned_lba >> 8) & 0x3f;

Hm, yes, it could overflow ... 1 << 0xf -1 is 32767 (0x7fff) which would
be larger than could fit in SCSI's RC16 and ends up inadvertently setting
Thin Provisioning Read Zeroes, which we really don't want.  I suppose
reporting 0x3fff is better than reporting anything else in this field.
All highly theoretical, since you'd have to have 16MB physical sectors
with 512 byte logical sectors to get to this situation.

So I have no objections to this one going in, and please add:

Reviewed-by: Matthew Wilcox <willy@linux.intel.com>

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 8 of 8] libata: Report disk alignment and physical block size
  2009-04-23 13:46   ` Matthew Wilcox
@ 2009-04-23 14:05     ` Martin K. Petersen
  0 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23 14:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, jens.axboe, linux-ide, linux-scsi

>>>>> "Matthew" == Matthew Wilcox <matthew@wil.cx> writes:

Matthew> I found it clearer to do:
Matthew> 			u16 first = word_209 & 0x3fff; if
Matthew> 			(first)
Matthew> 				lowest_aligned = (1 <<
Matthew> 				log_per_phys) - first;

Matthew> but you may disagree.

I don't care much either way.


>> + rbuf[14] = (lowest_aligned_lba >> 8) & 0x3f;

Matthew> Hm, yes, it could overflow ... 1 << 0xf -1 is 32767 (0x7fff)
Matthew> which would be larger than could fit in SCSI's RC16 and ends up
Matthew> inadvertently setting Thin Provisioning Read Zeroes, which we
Matthew> really don't want.  I suppose reporting 0x3fff is better than
Matthew> reporting anything else in this field.  All highly theoretical,
Matthew> since you'd have to have 16MB physical sectors with 512 byte
Matthew> logical sectors to get to this situation.

Yeah, I agree it's mostly in the academic interest bucket.  Just trying
to be defensive about what we return, that's all.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 13:33           ` Jeff Garzik
@ 2009-04-23 14:10             ` James Bottomley
  2009-04-23 14:16               ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: James Bottomley @ 2009-04-23 14:10 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Martin K. Petersen, Matthew Wilcox, Jens Axboe, rwheeler,
	snitzer, neilb, dgilbert, linux-ide, linux-scsi

On Thu, 2009-04-23 at 09:33 -0400, Jeff Garzik wrote:
> Martin K. Petersen wrote:
> >>>>>> "Matthew" == Matthew Wilcox <matthew@wil.cx> writes:
> > 
> > Matthew> I already did that.  The only problem is that you made me
> > Matthew> include the stupid:
> > 
> > Matthew>         if (ata_id_major_version(args->id) > 7) {
> > 
> > Matthew> so of course it doesn't work on any existing hardware.  How
> > Matthew> about applying this patch:
> > 
> > Maybe we could incubate your patch in the next tree for a bit and see
> > what breaks without the version check?
> > 
> > We could even be somewhat conservative like we were with RC16 in SCSI.
> > The SATA devices I have here with valid rotational flags all report
> > version 7.  I wonder if > 6 do the trick?
> 
> linux/ata.h illustrates the standard ATA rules for validating bits of 
> IDENTIFY DEVICE.
> 
> Just checking the version was always just a simplistic hack...  we are 
> talking specifically about trusting values listed as undefined in the 
> relevant specs.  That requires more, not less, gymnastics :)

I'm with Jeff on this one.  We had an identically similar problem with
REPORT LUNS, which, today is the basis of SCSI discovery.

The default rule is that if you're less than SCSI-3 (where REPORT LUNS
was introduced) we *don't* send you REPORT LUNS and if you're 3 or above
we always discover via REPORT LUNS.  However, for the bunch of early
adopters we have the BLIST_REPORTLUN2 flag to say do it anyway
regardless of the SCSI level.  (And for the screw ups we have
BLIST_NOREPORTLUN for the ones who couldn't implement a standard to save
their corporate life).

James



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 14:10             ` James Bottomley
@ 2009-04-23 14:16               ` Matthew Wilcox
  2009-04-23 14:39                 ` Jeff Garzik
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 14:16 UTC (permalink / raw)
  To: James Bottomley
  Cc: Jeff Garzik, Martin K. Petersen, Jens Axboe, rwheeler, snitzer,
	neilb, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 02:10:13PM +0000, James Bottomley wrote:
> I'm with Jeff on this one.  We had an identically similar problem with
> REPORT LUNS, which, today is the basis of SCSI discovery.

Yes, REPORT LUNS is quite essential.

It's not exactly in the same category as reporting device form factor
and rotational speed.  If REPORT LUNS is wrong, we're in really deep
trouble.  If device form factr is wrong ... umm ... nothing much happens.
If rotational speed is wrong, we might have a suboptimal IO pattern.
It's also fixable by udev.

The current check for is_ssd (that doesn't check the version) was put
in for 2.6.27.  It doesn't appear to have caused any trouble.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 14:16               ` Matthew Wilcox
@ 2009-04-23 14:39                 ` Jeff Garzik
  2009-04-23 17:25                   ` Matthew Wilcox
  0 siblings, 1 reply; 42+ messages in thread
From: Jeff Garzik @ 2009-04-23 14:39 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: James Bottomley, Martin K. Petersen, Jens Axboe, rwheeler,
	snitzer, neilb, dgilbert, linux-ide, linux-scsi

Matthew Wilcox wrote:
> On Thu, Apr 23, 2009 at 02:10:13PM +0000, James Bottomley wrote:
>> I'm with Jeff on this one.  We had an identically similar problem with
>> REPORT LUNS, which, today is the basis of SCSI discovery.
> 
> Yes, REPORT LUNS is quite essential.
> 
> It's not exactly in the same category as reporting device form factor
> and rotational speed.  If REPORT LUNS is wrong, we're in really deep
> trouble.  If device form factr is wrong ... umm ... nothing much happens.
> If rotational speed is wrong, we might have a suboptimal IO pattern.

If the general attitude is "oh, that info might be wrong", why will app 
developers bother at all?

It is better to return zeroes than have a decent probability of 
returning garbage.  I value predictability much more than adopting a 
generalized rule to handle a few special case early adopters.


> It's also fixable by udev.

By that logic we should leave it to udev to handle the ata_version <= 7 
stuff, since that is the early adopter special case.

	Jeff




^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4 of 8] sd: Physical block size and alignment support
  2009-04-23  5:29 ` [PATCH 4 of 8] sd: Physical block size and alignment support Martin K. Petersen
@ 2009-04-23 16:37   ` Konrad Rzeszutek
  2009-04-23 18:25     ` Martin K. Petersen
  0 siblings, 1 reply; 42+ messages in thread
From: Konrad Rzeszutek @ 2009-04-23 16:37 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 01:29:31AM -0400, Martin K. Petersen wrote:
> Extract physical block size and lowest aligned LBA from READ
> CAPACITY(16) response and adjust queue parameters.
> 
> Report physical block size and alignment when applicable.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> 
.. snip ..
> -		if (sdkp->first_scan || old_capacity != sdkp->capacity)
> +		if (sdkp->first_scan || old_capacity != sdkp->capacity) {
>  			sd_printk(KERN_NOTICE, sdkp,
> -				  "%llu %d-byte hardware sectors: (%s/%s)\n",
> +				  "%llu %d-byte logical blocks: (%s/%s)\n",
>  				  (unsigned long long)sdkp->capacity,
>  				  sector_size, cap_str_10, cap_str_2);
> +
> +			if (sdkp->hw_sector_size != sector_size)
> +				sd_printk(KERN_NOTICE, sdkp,
> +					  "%u-byte physical blocks\n",
> +					  sdkp->hw_sector_size);

With the changes in the read_capacity_16 (where you set the
hw_sector_size) this won't be printed (at least on first probe).
Should this be printed irregardless of that?

Another question  - should the Documentation/*.txt have an entry about what a
'logical block', and 'physical block' is vs. 'hardware sector' ?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 14:39                 ` Jeff Garzik
@ 2009-04-23 17:25                   ` Matthew Wilcox
  2009-04-23 17:37                     ` James Bottomley
  0 siblings, 1 reply; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 17:25 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: James Bottomley, Martin K. Petersen, Jens Axboe, rwheeler,
	snitzer, neilb, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 10:39:41AM -0400, Jeff Garzik wrote:
> Matthew Wilcox wrote:
> >On Thu, Apr 23, 2009 at 02:10:13PM +0000, James Bottomley wrote:
> >>I'm with Jeff on this one.  We had an identically similar problem with
> >>REPORT LUNS, which, today is the basis of SCSI discovery.
> >
> >Yes, REPORT LUNS is quite essential.
> >
> >It's not exactly in the same category as reporting device form factor
> >and rotational speed.  If REPORT LUNS is wrong, we're in really deep
> >trouble.  If device form factr is wrong ... umm ... nothing much happens.
> >If rotational speed is wrong, we might have a suboptimal IO pattern.
> 
> If the general attitude is "oh, that info might be wrong", why will app 
> developers bother at all?

Any info might be wrong.  SCSI drives might return the wrong thing in
EVPD b1 too.  We don't check that in the kernel, we just return it to
user space.

> >It's also fixable by udev.
> 
> By that logic we should leave it to udev to handle the ata_version <= 7 
> stuff, since that is the early adopter special case.

I'm perfectly fine with never setting this bit in the kernel and having
udev handle all cases, including checking for garbage.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 5 of 8] sd: Detect non-rotational devices
  2009-04-23 17:25                   ` Matthew Wilcox
@ 2009-04-23 17:37                     ` James Bottomley
  0 siblings, 0 replies; 42+ messages in thread
From: James Bottomley @ 2009-04-23 17:37 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Jeff Garzik, Martin K. Petersen, Jens Axboe, rwheeler, snitzer,
	neilb, dgilbert, linux-ide, linux-scsi

On Thu, 2009-04-23 at 11:25 -0600, Matthew Wilcox wrote:
> On Thu, Apr 23, 2009 at 10:39:41AM -0400, Jeff Garzik wrote:
> > Matthew Wilcox wrote:
> > >On Thu, Apr 23, 2009 at 02:10:13PM +0000, James Bottomley wrote:
> > >>I'm with Jeff on this one.  We had an identically similar problem with
> > >>REPORT LUNS, which, today is the basis of SCSI discovery.
> > >
> > >Yes, REPORT LUNS is quite essential.
> > >
> > >It's not exactly in the same category as reporting device form factor
> > >and rotational speed.  If REPORT LUNS is wrong, we're in really deep
> > >trouble.  If device form factr is wrong ... umm ... nothing much happens.
> > >If rotational speed is wrong, we might have a suboptimal IO pattern.
> > 
> > If the general attitude is "oh, that info might be wrong", why will app 
> > developers bother at all?
> 
> Any info might be wrong.  SCSI drives might return the wrong thing in
> EVPD b1 too.  We don't check that in the kernel, we just return it to
> user space.

Right, that's the REPORT LUNs case:  If you report compliance to a
standard that doesn't have it, we don't use it *unless* you have an
inquiry data whitelist to say we should.

If you report compliance to a standard that does, then we lead with
REPORT LUNs *unless* you're in a blacklist of people who don't implement
correctly.

The point is that we use the gross standard compliance data to flip how
we treat the device.

So we're chasing the case where what we're doing is most likely to work
and then we fix up the corner problems in each of the cases.

James



^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23 10:51   ` Jens Axboe
  2009-04-23 11:49     ` Matthew Wilcox
  2009-04-23 13:17     ` Martin K. Petersen
@ 2009-04-23 18:13     ` Konrad Rzeszutek
  2009-04-23 18:26       ` Ric Wheeler
  2009-04-23 18:34       ` Martin K. Petersen
  2 siblings, 2 replies; 42+ messages in thread
From: Konrad Rzeszutek @ 2009-04-23 18:13 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

> > + * blk_queue_alignment - set alignment for the queue
> > + * @q:  the request queue for the device
> > + * @alignment:  alignment offset in bytes
> > + *
> > + * Description:
> > + *   Some devices are naturally misaligned to compensate for things like
> > + *   the legacy DOS partition table 63-sector offset.  Low-level drivers
> > + *   should call this function for devices whose first sector is not
> > + *   naturally aligned.
> > + */
> > +void blk_queue_alignment(struct request_queue *q, unsigned int alignment)
> > +{
> > +	q->alignment = alignment & (q->granularity - 1);
> > +	clear_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags);
> > +}
> > +EXPORT_SYMBOL(blk_queue_alignment);
> 
> How would low-level drivers know?

Should there be a comment/Documentation blurb on how to remediate a
misaligment? Such as how to modify the partition tables and what to look
for in the BIOS (if anything) to take advantage of 4K sectors?

Is there a corresponding patch for fdisk that understands 4K sectors? Or
is that already covered by it doing ioctl calls on the block device?

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4 of 8] sd: Physical block size and alignment support
  2009-04-23 16:37   ` Konrad Rzeszutek
@ 2009-04-23 18:25     ` Martin K. Petersen
  2009-04-23 18:44       ` Konrad Rzeszutek
  0 siblings, 1 reply; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23 18:25 UTC (permalink / raw)
  To: Konrad Rzeszutek
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, jens.axboe, linux-ide, linux-scsi

>>>>> "Konrad" == Konrad Rzeszutek <konrad@virtualiron.com> writes:

[Printing physical block size]

Konrad> With the changes in the read_capacity_16 (where you set the
Konrad> hw_sector_size) this won't be printed (at least on first probe).

Why not?

[root@10 ~]# modprobe scsi_debug dev_size_mb=1024 num_parts=4 physblk_exp=3
scsi 6:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
sd 6:0:0:0: Attached scsi generic sg3 type 0
sd 6:0:0:0: [sdc] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
sd 6:0:0:0: [sdc] 4096-byte physical blocks


Konrad> Another question - should the Documentation/*.txt have an entry
Konrad> about what a 'logical block', and 'physical block' is
Konrad> vs. 'hardware sector' ?

Well, another item on my todo list is to kill the notion of hardware
sector completely.  The protocols have been referring to logical blocks
for ages.

It hasn't been a big problem until now because logical block size has
been equal to the hardware sector size.  That's no longer a valid
assumption.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and partitions
  2009-04-23 18:13     ` Konrad Rzeszutek
@ 2009-04-23 18:26       ` Ric Wheeler
  2009-04-23 18:44         ` Matthew Wilcox
  2009-04-23 18:34       ` Martin K. Petersen
  1 sibling, 1 reply; 42+ messages in thread
From: Ric Wheeler @ 2009-04-23 18:26 UTC (permalink / raw)
  To: Konrad Rzeszutek
  Cc: Jens Axboe, Martin K. Petersen, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

Konrad Rzeszutek wrote:
>>> + * blk_queue_alignment - set alignment for the queue
>>> + * @q:  the request queue for the device
>>> + * @alignment:  alignment offset in bytes
>>> + *
>>> + * Description:
>>> + *   Some devices are naturally misaligned to compensate for things like
>>> + *   the legacy DOS partition table 63-sector offset.  Low-level drivers
>>> + *   should call this function for devices whose first sector is not
>>> + *   naturally aligned.
>>> + */
>>> +void blk_queue_alignment(struct request_queue *q, unsigned int alignment)
>>> +{
>>> +	q->alignment = alignment & (q->granularity - 1);
>>> +	clear_bit(QUEUE_FLAG_MISALIGNED, &q->queue_flags);
>>> +}
>>> +EXPORT_SYMBOL(blk_queue_alignment);
>> How would low-level drivers know?
> 
> Should there be a comment/Documentation blurb on how to remediate a
> misaligment? Such as how to modify the partition tables and what to look
> for in the BIOS (if anything) to take advantage of 4K sectors?
> 
> Is there a corresponding patch for fdisk that understands 4K sectors? Or
> is that already covered by it doing ioctl calls on the block device?

We need to update the various tools to deal with this data and prevent 
misalignment, some of which is going on already.

I am afraid that if you are misaligned, the only remedy would be to tar up your 
data, recreate an aligned device and then restore :-)

ric


^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and  partitions
  2009-04-23 18:13     ` Konrad Rzeszutek
  2009-04-23 18:26       ` Ric Wheeler
@ 2009-04-23 18:34       ` Martin K. Petersen
  1 sibling, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23 18:34 UTC (permalink / raw)
  To: Konrad Rzeszutek
  Cc: Jens Axboe, Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, linux-ide, linux-scsi

>>>>> "Konrad" == Konrad Rzeszutek <konrad@virtualiron.com> writes:

Konrad> Should there be a comment/Documentation blurb on how to
Konrad> remediate a misaligment? Such as how to modify the partition
Konrad> tables and what to look for in the BIOS (if anything) to take
Konrad> advantage of 4K sectors?

My patches provide the infrastructure for partitioning tools etc. to
start doing the right thing.  The parted/util-linux people know that the
support is coming.

There are lots of considerations in this department due to compatibility
with Windows, BIOS int13 calls and EFI.  I'll leave it up to the tools
people to define how to deal with alignment and padding in each case.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 2 of 8] block: Export I/O topology for block devices and partitions
  2009-04-23 18:26       ` Ric Wheeler
@ 2009-04-23 18:44         ` Matthew Wilcox
  0 siblings, 0 replies; 42+ messages in thread
From: Matthew Wilcox @ 2009-04-23 18:44 UTC (permalink / raw)
  To: Ric Wheeler
  Cc: Konrad Rzeszutek, Jens Axboe, Martin K. Petersen, snitzer, jeff,
	neilb, James.Bottomley, dgilbert, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 02:26:21PM -0400, Ric Wheeler wrote:
> >Should there be a comment/Documentation blurb on how to remediate a
> >misaligment? Such as how to modify the partition tables and what to look
> >for in the BIOS (if anything) to take advantage of 4K sectors?
> >
> >Is there a corresponding patch for fdisk that understands 4K sectors? Or
> >is that already covered by it doing ioctl calls on the block device?
> 
> We need to update the various tools to deal with this data and prevent 
> misalignment, some of which is going on already.
> 
> I am afraid that if you are misaligned, the only remedy would be to tar up 
> your data, recreate an aligned device and then restore :-)

We could write a dd_move command (analagous to memmove) that reads a chunk
of your block device into memory, then writes it back shifted slightly.

Of course, if you lose power midway through, you're completely hosed.

A better solution might be for the filesystem to perform subsequent
writes to the device that are aligned.  I'm thinking about the case
where you've been using a filesystem on one device, you dd it to another
device (accidentally misaligning it) and then start modifying it on the
new device.  If all writes on the new drive go to an aligned address,
the performance implications shouldn't be too bad.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4 of 8] sd: Physical block size and alignment support
  2009-04-23 18:25     ` Martin K. Petersen
@ 2009-04-23 18:44       ` Konrad Rzeszutek
  2009-04-23 19:02         ` Martin K. Petersen
  0 siblings, 1 reply; 42+ messages in thread
From: Konrad Rzeszutek @ 2009-04-23 18:44 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: rwheeler, snitzer, jeff, neilb, James.Bottomley, dgilbert,
	jens.axboe, linux-ide, linux-scsi

On Thu, Apr 23, 2009 at 02:25:54PM -0400, Martin K. Petersen wrote:
> >>>>> "Konrad" == Konrad Rzeszutek <konrad@virtualiron.com> writes:
> 
> [Printing physical block size]
> 
> Konrad> With the changes in the read_capacity_16 (where you set the
> Konrad> hw_sector_size) this won't be printed (at least on first probe).
> 
> Why not?

The read_capacity_16 sets the sdkp->hw_sector_size to sector_size:
@@ -1402,6 +1413,7 @@ static int read_capacity_10(struct scsi_

        sector_size =   (buffer[4] << 24) | (buffer[5] << 16) |
                        (buffer[6] << 8) | buffer[7];
+       sdkp->hw_sector_size = sector_size;
        lba =   (buffer[0] << 24) | (buffer[1] << 16) |
                (buffer[2] << 8) | buffer[3];


That function also returns that value, which sd_read_capacity assigns to the
sector_size variable. So when the check for sdkp->hw_sector_size
inequality is reached, it is skiped as sdkp->hw_sector_size ==
sector_size. No?

> 
> [root@10 ~]# modprobe scsi_debug dev_size_mb=1024 num_parts=4 physblk_exp=3
> scsi 6:0:0:0: Direct-Access     Linux    scsi_debug       0004 PQ: 0 ANSI: 5
> sd 6:0:0:0: Attached scsi generic sg3 type 0
> sd 6:0:0:0: [sdc] 2097152 512-byte logical blocks: (1.07 GB/1.00 GiB)
> sd 6:0:0:0: [sdc] 4096-byte physical blocks

Is that doing a read_capacity_16 or _10?
> 
> 
> Konrad> Another question - should the Documentation/*.txt have an entry
> Konrad> about what a 'logical block', and 'physical block' is
> Konrad> vs. 'hardware sector' ?
> 
> Well, another item on my todo list is to kill the notion of hardware
> sector completely.  The protocols have been referring to logical blocks
> for ages.

Do they refer to physical blocks as well?
> 
> It hasn't been a big problem until now because logical block size has
> been equal to the hardware sector size.  That's no longer a valid
> assumption.

Right.
> 
> -- 
> Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 42+ messages in thread

* Re: [PATCH 4 of 8] sd: Physical block size and alignment support
  2009-04-23 18:44       ` Konrad Rzeszutek
@ 2009-04-23 19:02         ` Martin K. Petersen
  0 siblings, 0 replies; 42+ messages in thread
From: Martin K. Petersen @ 2009-04-23 19:02 UTC (permalink / raw)
  To: Konrad Rzeszutek
  Cc: Martin K. Petersen, rwheeler, snitzer, jeff, neilb,
	James.Bottomley, dgilbert, jens.axboe, linux-ide, linux-scsi

>>>>> "Konrad" == Konrad Rzeszutek <konrad@virtualiron.com> writes:

Konrad> The read_capacity_16 sets the sdkp->hw_sector_size to
Konrad> sector_size:

> @@ -1402,6 +1413,7 @@ static int read_capacity_10(struct scsi_
                                   ^^^^^^^^^^^^^^^^
That's the fallback setting for devices that only respond to READ
CAPACITY(10).


Konrad> That function also returns that value, which sd_read_capacity
Konrad> assigns to the sector_size variable. So when the check for
Konrad> sdkp->hw_sector_size inequality is reached, it is skiped as
Konrad> sdkp->hw_sector_size == sector_size. No?

For devices that support READ CAPACITY(16) (and are thus capable of
reporting physical block size) the two values may differ and the message
is printed.


Konrad> Is that doing a read_capacity_16 or _10?

16.


>> The protocols have been referring to logical blocks for ages.

Konrad> Do they refer to physical blocks as well?

Yep.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 42+ messages in thread

end of thread, other threads:[~2009-04-23 19:02 UTC | newest]

Thread overview: 42+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-23  5:29 [PATCH 0 of 8] I/O topology patch kit Martin K. Petersen
2009-04-23  5:29 ` [PATCH 1 of 8] block: Expose stacked device queues in sysfs Martin K. Petersen
2009-04-23  5:29 ` Martin K. Petersen
2009-04-23  5:29 ` [PATCH 2 of 8] block: Export I/O topology for block devices and partitions Martin K. Petersen
2009-04-23 10:51   ` Jens Axboe
2009-04-23 11:49     ` Matthew Wilcox
2009-04-23 11:55       ` Jens Axboe
2009-04-23 13:22         ` Matthew Wilcox
2009-04-23 13:30           ` Matthew Wilcox
2009-04-23 13:17     ` Martin K. Petersen
2009-04-23 18:13     ` Konrad Rzeszutek
2009-04-23 18:26       ` Ric Wheeler
2009-04-23 18:44         ` Matthew Wilcox
2009-04-23 18:34       ` Martin K. Petersen
2009-04-23  5:29 ` [PATCH 3 of 8] MD: Use new topology calls to indicate alignment and I/O sizes Martin K. Petersen
2009-04-23  5:29 ` [PATCH 4 of 8] sd: Physical block size and alignment support Martin K. Petersen
2009-04-23 16:37   ` Konrad Rzeszutek
2009-04-23 18:25     ` Martin K. Petersen
2009-04-23 18:44       ` Konrad Rzeszutek
2009-04-23 19:02         ` Martin K. Petersen
2009-04-23  5:29 ` Martin K. Petersen
2009-04-23  5:29 ` [PATCH 5 of 8] sd: Detect non-rotational devices Martin K. Petersen
2009-04-23 10:52   ` Jens Axboe
2009-04-23 11:09     ` Jeff Garzik
2009-04-23 11:13       ` Jens Axboe
2009-04-23 11:22         ` Jeff Garzik
2009-04-23 11:38       ` Matthew Wilcox
2009-04-23 11:58         ` Jeff Garzik
2009-04-23 12:03           ` Jens Axboe
2009-04-23 13:16         ` Martin K. Petersen
2009-04-23 13:33           ` Jeff Garzik
2009-04-23 14:10             ` James Bottomley
2009-04-23 14:16               ` Matthew Wilcox
2009-04-23 14:39                 ` Jeff Garzik
2009-04-23 17:25                   ` Matthew Wilcox
2009-04-23 17:37                     ` James Bottomley
2009-04-23  5:29 ` [PATCH 6 of 8] sd: Block limits VPD support Martin K. Petersen
2009-04-23  5:29 ` [PATCH 7 of 8] scsi_debug: Add support for physical block exponent and alignment Martin K. Petersen
2009-04-23  5:29 ` [PATCH 8 of 8] libata: Report disk alignment and physical block size Martin K. Petersen
2009-04-23 13:46   ` Matthew Wilcox
2009-04-23 14:05     ` Martin K. Petersen
2009-04-23  5:29 ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.