All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
@ 2010-08-15  0:00 Grant Grundler
  2010-08-15  4:37   ` Wilcox, Matthew R
  0 siblings, 1 reply; 7+ messages in thread
From: Grant Grundler @ 2010-08-15  0:00 UTC (permalink / raw)
  To: Jeff Garzik, Tejun Heo; +Cc: Linux IDE, Matthew Wilcox, Gwendal Grignou, LKML

[-- Attachment #1: Type: text/plain, Size: 3391 bytes --]

Attached patch enables my x86 machine to recognize and talk to a
"Native 4K" SATA device.

When I started working on this, I didn't know Matthew Wilcox had
posted a similar patch 2 years ago:
   http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors

Gwendal Grignou pointed me at the the above code and small portions of
this patch include Matthew's work. That's why Mathew is first on the
"Signed-off-by:". I've NOT included his use of a bitmap to determine
512 vs Native for ATA command block size - just used a simple table.
And bugs are almost certainly mine.

Lastly, the patch has been tested with a native 4K 'Engineering
Sample' drive provided by Hitachi GST.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Grant Grundler <grundler@google.com>
Reviewed-by: Gwendal Grignou <gwendal@google.com>

-----

Gwendal's review of this patch had him concerned about access to
"dev->sdev->sector_size" may have risks (e.g. sdev goes away before
dev does).  I didn't like adding additional fields to ata_dev that
duplicate what scsi is doing and this seems to be working so far. But
since I don't know how dev->sdev is protected/syncronized (despite how
libata and scsi are joined at the hip), I hope someone else can
comment on if my usage of dev->sdev->sector_size is a problem and why.

I believe use of ata_id_logical_per_physical_sectors() was wrong. I've
replaced it with ata_id_log2_per_physical_sector() and cleaned up how
block size word and alignment are referenced. It's alot easier to read
where those functions are used now.


Here is the kernel output from hotplugging the SATA drive on my
machine (using Marvell 7042 controller):

[ 2468.301218] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000
action 0xe frozen
[ 2468.301225] ata4: edma_err_cause=00000010 pp_flags=00000000, dev connect
[ 2468.301240] ata4: SError: { PHYRdyChg DevExch }
[ 2468.301257] ata4: hard resetting link
[ 2471.701117] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300)
[ 2471.740390] ata4.00: HPA detected: current 9770670, native 78165360
[ 2471.740399] ata4.00: ATA-7: Hitachi HTSxxxxxxxxxxxx, yyyyyyyy, max UDMA/100
[ 2471.740402] ata4.00: 9770670 sectors, multi 0: LBA48 NCQ (depth 31/32)
[ 2471.764465] ata4.00: configured for UDMA/100
[ 2471.764501] ata4: EH complete
[ 2471.764988] scsi 4:0:0:0: Direct-Access     ATA      Hitachi
HTSxxxxxxxxxxxx PQ: 0 ANSI: 5
[ 2471.782914] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.787093] sd 4:0:0:0: [sdd] Write Protect is off
[ 2471.787120] sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00
[ 2471.788768] sd 4:0:0:0: [sdd] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 2471.792293] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.792834] sd 4:0:0:0: Attached scsi generic sg3 type 0
[ 2471.799370]  sdd: unknown partition table
[ 2471.845582] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.853763] sd 4:0:0:0: [sdd] Attached SCSI disk

I was able to dd to/from the device and pull SMART data (not shown and
probably not able to share.)

If someone has "Advanced Format Developers Kit" SATA drives from
idema.org, I'd appreciate any test reports.

thanks
grant

[ patch NOT inlined because gmail will word-wrap and make a mess of
it. Please use attached filed.]

[-- Attachment #2: 2.6.35-libata_4k_block_support-01 --]
[-- Type: application/octet-stream, Size: 8580 bytes --]

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index a54273d..16d2bbf 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -52,7 +52,6 @@
 
 #include "libata.h"
 
-#define SECTOR_SIZE		512
 #define ATA_SCSI_RBUF_SIZE	4096
 
 static DEFINE_SPINLOCK(ata_scsi_rbuf_lock);
@@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
 	memset(scsi_cmd, 0, sizeof(scsi_cmd));
 
 	if (args[3]) {
-		argsize = SECTOR_SIZE * args[3];
+		argsize = ATA_SECT_SIZE * args[3];
 		argbuf = kmalloc(argsize, GFP_KERNEL);
 		if (argbuf == NULL) {
 			rc = -ENOMEM;
@@ -1150,8 +1149,9 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
 		blk_queue_dma_drain(q, atapi_drain_needed, buf, ATAPI_MAX_DRAIN);
 	} else {
 		/* ATA devices must be sector aligned */
+		sdev->sector_size = ata_id_logical_sector_size(dev->id);
 		blk_queue_update_dma_alignment(sdev->request_queue,
-					       ATA_SECT_SIZE - 1);
+					       sdev->sector_size - 1);
 		sdev->manage_start_stop = 1;
 	}
 
@@ -1166,6 +1166,7 @@ static int ata_scsi_dev_config(struct scsi_device *sdev,
 		scsi_adjust_queue_depth(sdev, MSG_SIMPLE_TAG, depth);
 	}
 
+	dev->sdev = sdev;
 	return 0;
 }
 
@@ -1696,7 +1697,7 @@ static unsigned int ata_scsi_rw_xlat(struct ata_queued_cmd *qc)
 		goto nothing_to_do;
 
 	qc->flags |= ATA_QCFLAG_IO;
-	qc->nbytes = n_block * ATA_SECT_SIZE;
+	qc->nbytes = n_block * qc->dev->sdev->sector_size;
 
 	rc = ata_build_rw_tf(&qc->tf, qc->dev, block, n_block, tf_flags,
 			     qc->tag);
@@ -2123,7 +2124,7 @@ static unsigned int ata_scsiop_inq_89(struct ata_scsi_args *args, u8 *rbuf)
 
 static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
 {
-	u32 min_io_sectors;
+	u16 min_io_sectors;
 
 	rbuf[1] = 0xb0;
 	rbuf[3] = 0x3c;		/* required VPD size with unmap support */
@@ -2135,10 +2136,7 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
 	 * logical than physical sector size we need to figure out what the
 	 * latter is.
 	 */
-	if (ata_id_has_large_logical_sectors(args->id))
-		min_io_sectors = ata_id_logical_per_physical_sectors(args->id);
-	else
-		min_io_sectors = 1;
+	min_io_sectors = 1 << ata_id_log2_per_physical_sector(args->id);
 	put_unaligned_be16(min_io_sectors, &rbuf[6]);
 
 	/*
@@ -2397,21 +2395,13 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 {
 	struct ata_device *dev = args->dev;
 	u64 last_lba = dev->n_sectors - 1; /* LBA of the last block */
-	u8 log_per_phys = 0;
-	u16 lowest_aligned = 0;
-	u16 word_106 = dev->id[106];
-	u16 word_209 = dev->id[209];
-
-	if ((word_106 & 0xc000) == 0x4000) {
-		/* Number and offset of logical sectors per physical sector */
-		if (word_106 & (1 << 13))
-			log_per_phys = word_106 & 0xf;
-		if ((word_209 & 0xc000) == 0x4000) {
-			u16 first = dev->id[209] & 0x3fff;
-			if (first > 0)
-				lowest_aligned = (1 << log_per_phys) - first;
-		}
-	}
+	u32 sector_size; /* physical sector size in bytes */
+	u8 log2_per_phys;
+	u16 lowest_aligned;
+
+	sector_size = ata_id_logical_sector_size(dev->id);
+	log2_per_phys = ata_id_log2_per_physical_sector(dev->id);
+	lowest_aligned = ata_id_logical_sector_offset(dev->id, log2_per_phys);
 
 	VPRINTK("ENTER\n");
 
@@ -2426,8 +2416,10 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 		rbuf[3] = last_lba;
 
 		/* sector size */
-		rbuf[6] = ATA_SECT_SIZE >> 8;
-		rbuf[7] = ATA_SECT_SIZE & 0xff;
+		rbuf[4] = sector_size >> (8 * 3);
+		rbuf[5] = sector_size >> (8 * 2);
+		rbuf[6] = sector_size >> (8 * 1);
+		rbuf[7] = sector_size;
 	} else {
 		/* sector count, 64-bit */
 		rbuf[0] = last_lba >> (8 * 7);
@@ -2440,11 +2432,13 @@ static unsigned int ata_scsiop_read_cap(struct ata_scsi_args *args, u8 *rbuf)
 		rbuf[7] = last_lba;
 
 		/* sector size */
-		rbuf[10] = ATA_SECT_SIZE >> 8;
-		rbuf[11] = ATA_SECT_SIZE & 0xff;
+		rbuf[ 8] = sector_size >> (8 * 3);
+		rbuf[ 9] = sector_size >> (8 * 2);
+		rbuf[10] = sector_size >> (8 * 1);
+		rbuf[11] = sector_size;
 
 		rbuf[12] = 0;
-		rbuf[13] = log_per_phys;
+		rbuf[13] = log2_per_phys;
 		rbuf[14] = (lowest_aligned >> 8) & 0x3f;
 		rbuf[15] = lowest_aligned;
 
@@ -2888,9 +2882,8 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
 	tf->device = dev->devno ?
 		tf->device | ATA_DEV1 : tf->device & ~ATA_DEV1;
 
-	/* READ/WRITE LONG use a non-standard sect_size */
-	qc->sect_size = ATA_SECT_SIZE;
 	switch (tf->command) {
+	/* READ/WRITE LONG use a non-standard sect_size */
 	case ATA_CMD_READ_LONG:
 	case ATA_CMD_READ_LONG_ONCE:
 	case ATA_CMD_WRITE_LONG:
@@ -2898,6 +2891,45 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
 		if (tf->protocol != ATA_PROT_PIO || tf->nsect != 1)
 			goto invalid_fld;
 		qc->sect_size = scsi_bufflen(scmd);
+		break;
+
+	/* commands using reported Logical Block size (e.g. 512 or 4K) */
+	case ATA_CMD_CFA_WRITE_NE:
+	case ATA_CMD_CFA_TRANS_SECT:
+	case ATA_CMD_CFA_WRITE_MULT_NE:
+	/* XXX: case ATA_CMD_CFA_WRITE_SECTORS_WITHOUT_ERASE: */
+	case ATA_CMD_READ:
+	case ATA_CMD_READ_EXT:
+	case ATA_CMD_READ_QUEUED:
+	/* XXX: case ATA_CMD_READ_QUEUED_EXT: */
+	case ATA_CMD_FPDMA_READ:
+	case ATA_CMD_READ_MULTI:
+	case ATA_CMD_READ_MULTI_EXT:
+	case ATA_CMD_PIO_READ:
+	case ATA_CMD_PIO_READ_EXT:
+	case ATA_CMD_READ_STREAM_DMA_EXT:
+	case ATA_CMD_READ_STREAM_EXT:
+	case ATA_CMD_VERIFY:
+	case ATA_CMD_VERIFY_EXT:
+	case ATA_CMD_WRITE:
+	case ATA_CMD_WRITE_EXT:
+	case ATA_CMD_WRITE_FUA_EXT:
+	case ATA_CMD_WRITE_QUEUED:
+	case ATA_CMD_WRITE_QUEUED_FUA_EXT:
+	case ATA_CMD_FPDMA_WRITE:
+	case ATA_CMD_WRITE_MULTI:
+	case ATA_CMD_WRITE_MULTI_EXT:
+	case ATA_CMD_WRITE_MULTI_FUA_EXT:
+	case ATA_CMD_PIO_WRITE:
+	case ATA_CMD_PIO_WRITE_EXT:
+	case ATA_CMD_WRITE_STREAM_DMA_EXT:
+	case ATA_CMD_WRITE_STREAM_EXT:
+		qc->sect_size = dev->sdev->sector_size;
+		break;
+
+	/* Everything else uses 512 byte "sectors" */
+	default:
+		qc->sect_size = ATA_SECT_SIZE;
 	}
 
 	/*
@@ -3393,6 +3425,8 @@ void ata_scsi_scan_host(struct ata_port *ap, int sync)
 			if (!IS_ERR(sdev)) {
 				dev->sdev = sdev;
 				scsi_device_put(sdev);
+			} else {
+				dev->sdev = NULL;
 			}
 		}
 	}
diff --git a/include/linux/ata.h b/include/linux/ata.h
index fe6e681..24a7727 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -89,6 +89,7 @@ enum {
 	ATA_ID_SPG		= 98,
 	ATA_ID_LBA_CAPACITY_2	= 100,
 	ATA_ID_SECTOR_SIZE	= 106,
+	ATA_ID_LOGICAL_SECTOR_SIZE	= 117,	/* and 118 */
 	ATA_ID_LAST_LUN		= 126,
 	ATA_ID_DLF		= 128,
 	ATA_ID_CSFO		= 129,
@@ -640,16 +641,51 @@ static inline int ata_id_flush_ext_enabled(const u16 *id)
 	return (id[ATA_ID_CFS_ENABLE_2] & 0x2400) == 0x2400;
 }
 
-static inline int ata_id_has_large_logical_sectors(const u16 *id)
+/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
+ */
+static inline u32 ata_id_logical_sector_size(const u16 *id)
 {
-	if ((id[ATA_ID_SECTOR_SIZE] & 0xc000) != 0x4000)
-		return 0;
-	return id[ATA_ID_SECTOR_SIZE] & (1 << 13);
+	/* T13/1699-D Revision 6a, Sep 6, 2008. Page 128.
+	 * IDENTIFY DEVICE data, word 117-118.
+	 * 0xd000 ignores bit 13 (logical:physical > 1)
+	 */
+	if ((id[ATA_ID_SECTOR_SIZE] & 0xd000) == 0x5000)
+		return (((id[ATA_ID_LOGICAL_SECTOR_SIZE+1] << 16)
+			 + id[ATA_ID_LOGICAL_SECTOR_SIZE]) * sizeof(u16)) ;
+	return ATA_SECT_SIZE;
 }
 
-static inline u16 ata_id_logical_per_physical_sectors(const u16 *id)
+static inline u8 ata_id_log2_per_physical_sector(const u16 *id)
 {
-	return 1 << (id[ATA_ID_SECTOR_SIZE] & 0xf);
+	/* T13/1699-D Revision 6a, Sep 6, 2008. Page 128.
+	 * IDENTIFY DEVICE data, word 106.
+	 * 0xe000 ignores bit 12 (logical sector > 512 bytes)
+	 */
+	if ((id[ATA_ID_SECTOR_SIZE] & 0xe000) == 0x6000)
+		return (id[ATA_ID_SECTOR_SIZE] & 0xf);
+	return 0;
+}
+
+/* Offset of logical sectors relative to physical sectors.
+ *
+ * If device has more than one logical sector per physical sector
+ * (aka 512 byte emulation), vendors might offset the "sector 0" address
+ * so sector 63 is "naturally aligned" - e.g. FAT partition table.
+ * This avoids Read/Mod/Write penalties when using FAT partition table
+ * and updating "well aligned" (FS perspective) physical sectors on every
+ * transaction.
+ */
+static inline u16 ata_id_logical_sector_offset(const u16 *id,
+	 u8 log2_per_phys)
+{
+	u16 word_209 = id[209];
+
+	if ((log2_per_phys > 1) && (word_209 & 0xc000) == 0x4000) {
+		u16 first = word_209 & 0x3fff;
+		if (first > 0)
+			return (1 << log2_per_phys) - first;
+	}
+	return 0;
 }
 
 static inline int ata_id_has_lba48(const u16 *id)

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* RE: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
  2010-08-15  0:00 [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native) Grant Grundler
@ 2010-08-15  4:37   ` Wilcox, Matthew R
  0 siblings, 0 replies; 7+ messages in thread
From: Wilcox, Matthew R @ 2010-08-15  4:37 UTC (permalink / raw)
  To: Grant Grundler, Jeff Garzik, Tejun Heo; +Cc: Linux IDE, Gwendal Grignou, LKML

If you will insist on sending to my Outlook address, you'll get replies in standard broken Outlook format.  You've been warned.

---

+/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
+ */

No, it's already been converted.  See ata_dev_read_id().

---

I'm not sure about your use of a switch to set the sector size.  Have you checked the code that GCC generates for this?

---

All the places you dereference dev->sdev are within a callchain from ->queuecommand; sdev can't possibly go away.
It'd be clearer that this is the case if you used scmd->device->sector_size instead of dev->sdev->sector_size.

---

@@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
 	memset(scsi_cmd, 0, sizeof(scsi_cmd));
 
 	if (args[3]) {
-		argsize = SECTOR_SIZE * args[3];
+		argsize = ATA_SECT_SIZE * args[3];
 		argbuf = kmalloc(argsize, GFP_KERNEL);
 		if (argbuf == NULL) {
 			rc = -ENOMEM;

I think this is wrong.  The ioctl does PIO Data-in; as such, it should use the native sector size, not 512.
That said, there's a possibility of data corruption for programs which use this ioctl, assuming a 512-byte sector size when it's natively 4k.
This one's tricky and needs serious thought.  I might error it if sector_size isn't 512 bytes :-)
It's a legacy ioctl anyway, right?

---

I also remember Alan pointing out that some ATA controllers don't support non-512-byte commands, and we should have a 'safe for large sectors' flag set by the controller.

-----Original Message-----
From: Grant Grundler [mailto:grundler@google.com] 
Sent: Saturday, August 14, 2010 5:01 PM
To: Jeff Garzik; Tejun Heo
Cc: Linux IDE; Wilcox, Matthew R; Gwendal Grignou; LKML
Subject: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

Attached patch enables my x86 machine to recognize and talk to a "Native 4K" SATA device.

When I started working on this, I didn't know Matthew Wilcox had posted a similar patch 2 years ago:
   http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors

Gwendal Grignou pointed me at the the above code and small portions of this patch include Matthew's work. That's why Mathew is first on the "Signed-off-by:". I've NOT included his use of a bitmap to determine
512 vs Native for ATA command block size - just used a simple table.
And bugs are almost certainly mine.

Lastly, the patch has been tested with a native 4K 'Engineering Sample' drive provided by Hitachi GST.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Grant Grundler <grundler@google.com>
Reviewed-by: Gwendal Grignou <gwendal@google.com>

-----

Gwendal's review of this patch had him concerned about access to "dev->sdev->sector_size" may have risks (e.g. sdev goes away before dev does).  I didn't like adding additional fields to ata_dev that duplicate what scsi is doing and this seems to be working so far. But since I don't know how dev->sdev is protected/syncronized (despite how libata and scsi are joined at the hip), I hope someone else can comment on if my usage of dev->sdev->sector_size is a problem and why.

I believe use of ata_id_logical_per_physical_sectors() was wrong. I've replaced it with ata_id_log2_per_physical_sector() and cleaned up how block size word and alignment are referenced. It's alot easier to read where those functions are used now.


Here is the kernel output from hotplugging the SATA drive on my machine (using Marvell 7042 controller):

[ 2468.301218] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [ 2468.301225] ata4: edma_err_cause=00000010 pp_flags=00000000, dev connect [ 2468.301240] ata4: SError: { PHYRdyChg DevExch } [ 2468.301257] ata4: hard resetting link [ 2471.701117] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 2471.740390] ata4.00: HPA detected: current 9770670, native 78165360 [ 2471.740399] ata4.00: ATA-7: Hitachi HTSxxxxxxxxxxxx, yyyyyyyy, max UDMA/100 [ 2471.740402] ata4.00: 9770670 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2471.764465] ata4.00: configured for UDMA/100 [ 2471.764501] ata4: EH complete
[ 2471.764988] scsi 4:0:0:0: Direct-Access     ATA      Hitachi
HTSxxxxxxxxxxxx PQ: 0 ANSI: 5
[ 2471.782914] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.787093] sd 4:0:0:0: [sdd] Write Protect is off [ 2471.787120] sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 2471.788768] sd 4:0:0:0: [sdd] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 2471.792293] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.792834] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2471.799370]  sdd: unknown partition table [ 2471.845582] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.853763] sd 4:0:0:0: [sdd] Attached SCSI disk

I was able to dd to/from the device and pull SMART data (not shown and probably not able to share.)

If someone has "Advanced Format Developers Kit" SATA drives from idema.org, I'd appreciate any test reports.

thanks
grant

[ patch NOT inlined because gmail will word-wrap and make a mess of it. Please use attached filed.]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* RE: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
@ 2010-08-15  4:37   ` Wilcox, Matthew R
  0 siblings, 0 replies; 7+ messages in thread
From: Wilcox, Matthew R @ 2010-08-15  4:37 UTC (permalink / raw)
  To: Grant Grundler, Jeff Garzik, Tejun Heo; +Cc: Linux IDE, Gwendal Grignou, LKML

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5336 bytes --]

If you will insist on sending to my Outlook address, you'll get replies in standard broken Outlook format.  You've been warned.

---

+/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
+ */

No, it's already been converted.  See ata_dev_read_id().

---

I'm not sure about your use of a switch to set the sector size.  Have you checked the code that GCC generates for this?

---

All the places you dereference dev->sdev are within a callchain from ->queuecommand; sdev can't possibly go away.
It'd be clearer that this is the case if you used scmd->device->sector_size instead of dev->sdev->sector_size.

---

@@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
 	memset(scsi_cmd, 0, sizeof(scsi_cmd));
 
 	if (args[3]) {
-		argsize = SECTOR_SIZE * args[3];
+		argsize = ATA_SECT_SIZE * args[3];
 		argbuf = kmalloc(argsize, GFP_KERNEL);
 		if (argbuf == NULL) {
 			rc = -ENOMEM;

I think this is wrong.  The ioctl does PIO Data-in; as such, it should use the native sector size, not 512.
That said, there's a possibility of data corruption for programs which use this ioctl, assuming a 512-byte sector size when it's natively 4k.
This one's tricky and needs serious thought.  I might error it if sector_size isn't 512 bytes :-)
It's a legacy ioctl anyway, right?

---

I also remember Alan pointing out that some ATA controllers don't support non-512-byte commands, and we should have a 'safe for large sectors' flag set by the controller.

-----Original Message-----
From: Grant Grundler [mailto:grundler@google.com] 
Sent: Saturday, August 14, 2010 5:01 PM
To: Jeff Garzik; Tejun Heo
Cc: Linux IDE; Wilcox, Matthew R; Gwendal Grignou; LKML
Subject: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)

Attached patch enables my x86 machine to recognize and talk to a "Native 4K" SATA device.

When I started working on this, I didn't know Matthew Wilcox had posted a similar patch 2 years ago:
   http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors

Gwendal Grignou pointed me at the the above code and small portions of this patch include Matthew's work. That's why Mathew is first on the "Signed-off-by:". I've NOT included his use of a bitmap to determine
512 vs Native for ATA command block size - just used a simple table.
And bugs are almost certainly mine.

Lastly, the patch has been tested with a native 4K 'Engineering Sample' drive provided by Hitachi GST.

Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
Signed-off-by: Grant Grundler <grundler@google.com>
Reviewed-by: Gwendal Grignou <gwendal@google.com>

-----

Gwendal's review of this patch had him concerned about access to "dev->sdev->sector_size" may have risks (e.g. sdev goes away before dev does).  I didn't like adding additional fields to ata_dev that duplicate what scsi is doing and this seems to be working so far. But since I don't know how dev->sdev is protected/syncronized (despite how libata and scsi are joined at the hip), I hope someone else can comment on if my usage of dev->sdev->sector_size is a problem and why.

I believe use of ata_id_logical_per_physical_sectors() was wrong. I've replaced it with ata_id_log2_per_physical_sector() and cleaned up how block size word and alignment are referenced. It's alot easier to read where those functions are used now.


Here is the kernel output from hotplugging the SATA drive on my machine (using Marvell 7042 controller):

[ 2468.301218] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [ 2468.301225] ata4: edma_err_cause=00000010 pp_flags=00000000, dev connect [ 2468.301240] ata4: SError: { PHYRdyChg DevExch } [ 2468.301257] ata4: hard resetting link [ 2471.701117] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 2471.740390] ata4.00: HPA detected: current 9770670, native 78165360 [ 2471.740399] ata4.00: ATA-7: Hitachi HTSxxxxxxxxxxxx, yyyyyyyy, max UDMA/100 [ 2471.740402] ata4.00: 9770670 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2471.764465] ata4.00: configured for UDMA/100 [ 2471.764501] ata4: EH complete
[ 2471.764988] scsi 4:0:0:0: Direct-Access     ATA      Hitachi
HTSxxxxxxxxxxxx PQ: 0 ANSI: 5
[ 2471.782914] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.787093] sd 4:0:0:0: [sdd] Write Protect is off [ 2471.787120] sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 2471.788768] sd 4:0:0:0: [sdd] Write cache: enabled, read cache:
enabled, doesn't support DPO or FUA
[ 2471.792293] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.792834] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2471.799370]  sdd: unknown partition table [ 2471.845582] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
(40.0 GB/37.2 GiB)
[ 2471.853763] sd 4:0:0:0: [sdd] Attached SCSI disk

I was able to dd to/from the device and pull SMART data (not shown and probably not able to share.)

If someone has "Advanced Format Developers Kit" SATA drives from idema.org, I'd appreciate any test reports.

thanks
grant

[ patch NOT inlined because gmail will word-wrap and make a mess of it. Please use attached filed.]
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
  2010-08-15  4:37   ` Wilcox, Matthew R
  (?)
@ 2010-08-16 19:17   ` Grant Grundler
  2010-08-16 19:41     ` Jeff Garzik
  -1 siblings, 1 reply; 7+ messages in thread
From: Grant Grundler @ 2010-08-16 19:17 UTC (permalink / raw)
  To: Wilcox, Matthew R
  Cc: Jeff Garzik, Tejun Heo, Linux IDE, Gwendal Grignou, LKML

On Sat, Aug 14, 2010 at 9:37 PM, Wilcox, Matthew R
<matthew.r.wilcox@intel.com> wrote:
> If you will insist on sending to my Outlook address, you'll get replies in standard broken Outlook format.  You've been warned.

You trumped my Gmail warning. I fold. :)

>
> ---
>
> +/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
> + */
>
> No, it's already been converted.  See ata_dev_read_id().

Ah - good. I'll remove the comment.

>
> I'm not sure about your use of a switch to set the sector size.  Have you checked the code that GCC generates for this?

The switch probably sucks unless we could weight the order of the
tests. E.g. common cases first. But it's just an implementation detail
that is relatively easy to replace with the bitmap you had implemented
before.

>
> All the places you dereference dev->sdev are within a callchain from ->queuecommand; sdev can't possibly go away.
> It'd be clearer that this is the case if you used scmd->device->sector_size instead of dev->sdev->sector_size.

Thank you - that looks much better to me too.

> --
>
> @@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
>        memset(scsi_cmd, 0, sizeof(scsi_cmd));
>
>        if (args[3]) {
> -               argsize = SECTOR_SIZE * args[3];
> +               argsize = ATA_SECT_SIZE * args[3];
>                argbuf = kmalloc(argsize, GFP_KERNEL);
>                if (argbuf == NULL) {
>                        rc = -ENOMEM;
>
> I think this is wrong.  The ioctl does PIO Data-in; as such, it should use the native sector size, not 512.
> That said, there's a possibility of data corruption for programs which use this ioctl, assuming a 512-byte sector size when it's natively 4k.
> This one's tricky and needs serious thought.  I might error it if sector_size isn't 512 bytes :-)
> It's a legacy ioctl anyway, right?

I have no idea. If it's tricky, I probably have it wrong.
Anyone else have guidance here?


> ---
>
> I also remember Alan pointing out that some ATA controllers don't support non-512-byte commands, and we should have a 'safe for large sectors' flag set by the controller.

This could also be added as a separate patch. I didn't run into this
problem so I can't say which controllers would have problems.

thanks for the helpful feedback!
I'll post a V2 later today.

thanks,
grant

>
> -----Original Message-----
> From: Grant Grundler [mailto:grundler@google.com]
> Sent: Saturday, August 14, 2010 5:01 PM
> To: Jeff Garzik; Tejun Heo
> Cc: Linux IDE; Wilcox, Matthew R; Gwendal Grignou; LKML
> Subject: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
>
> Attached patch enables my x86 machine to recognize and talk to a "Native 4K" SATA device.
>
> When I started working on this, I didn't know Matthew Wilcox had posted a similar patch 2 years ago:
>   http://git.kernel.org/?p=linux/kernel/git/willy/ata.git;a=shortlog;h=refs/heads/ata-large-sectors
>
> Gwendal Grignou pointed me at the the above code and small portions of this patch include Matthew's work. That's why Mathew is first on the "Signed-off-by:". I've NOT included his use of a bitmap to determine
> 512 vs Native for ATA command block size - just used a simple table.
> And bugs are almost certainly mine.
>
> Lastly, the patch has been tested with a native 4K 'Engineering Sample' drive provided by Hitachi GST.
>
> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
> Signed-off-by: Grant Grundler <grundler@google.com>
> Reviewed-by: Gwendal Grignou <gwendal@google.com>
>
> -----
>
> Gwendal's review of this patch had him concerned about access to "dev->sdev->sector_size" may have risks (e.g. sdev goes away before dev does).  I didn't like adding additional fields to ata_dev that duplicate what scsi is doing and this seems to be working so far. But since I don't know how dev->sdev is protected/syncronized (despite how libata and scsi are joined at the hip), I hope someone else can comment on if my usage of dev->sdev->sector_size is a problem and why.
>
> I believe use of ata_id_logical_per_physical_sectors() was wrong. I've replaced it with ata_id_log2_per_physical_sector() and cleaned up how block size word and alignment are referenced. It's alot easier to read where those functions are used now.
>
>
> Here is the kernel output from hotplugging the SATA drive on my machine (using Marvell 7042 controller):
>
> [ 2468.301218] ata4: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen [ 2468.301225] ata4: edma_err_cause=00000010 pp_flags=00000000, dev connect [ 2468.301240] ata4: SError: { PHYRdyChg DevExch } [ 2468.301257] ata4: hard resetting link [ 2471.701117] ata4: SATA link up 1.5 Gbps (SStatus 113 SControl 300) [ 2471.740390] ata4.00: HPA detected: current 9770670, native 78165360 [ 2471.740399] ata4.00: ATA-7: Hitachi HTSxxxxxxxxxxxx, yyyyyyyy, max UDMA/100 [ 2471.740402] ata4.00: 9770670 sectors, multi 0: LBA48 NCQ (depth 31/32) [ 2471.764465] ata4.00: configured for UDMA/100 [ 2471.764501] ata4: EH complete
> [ 2471.764988] scsi 4:0:0:0: Direct-Access     ATA      Hitachi
> HTSxxxxxxxxxxxx PQ: 0 ANSI: 5
> [ 2471.782914] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
> (40.0 GB/37.2 GiB)
> [ 2471.787093] sd 4:0:0:0: [sdd] Write Protect is off [ 2471.787120] sd 4:0:0:0: [sdd] Mode Sense: 00 3a 00 00 [ 2471.788768] sd 4:0:0:0: [sdd] Write cache: enabled, read cache:
> enabled, doesn't support DPO or FUA
> [ 2471.792293] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
> (40.0 GB/37.2 GiB)
> [ 2471.792834] sd 4:0:0:0: Attached scsi generic sg3 type 0 [ 2471.799370]  sdd: unknown partition table [ 2471.845582] sd 4:0:0:0: [sdd] 9770670 4096-byte logical blocks:
> (40.0 GB/37.2 GiB)
> [ 2471.853763] sd 4:0:0:0: [sdd] Attached SCSI disk
>
> I was able to dd to/from the device and pull SMART data (not shown and probably not able to share.)
>
> If someone has "Advanced Format Developers Kit" SATA drives from idema.org, I'd appreciate any test reports.
>
> thanks
> grant
>
> [ patch NOT inlined because gmail will word-wrap and make a mess of it. Please use attached filed.]
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
  2010-08-16 19:17   ` Grant Grundler
@ 2010-08-16 19:41     ` Jeff Garzik
  2010-08-16 20:24       ` Grant Grundler
  0 siblings, 1 reply; 7+ messages in thread
From: Jeff Garzik @ 2010-08-16 19:41 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Wilcox, Matthew R, Tejun Heo, Linux IDE, Gwendal Grignou, LKML

On 08/16/2010 03:17 PM, Grant Grundler wrote:
> On Sat, Aug 14, 2010 at 9:37 PM, Wilcox, Matthew R
> <matthew.r.wilcox@intel.com>  wrote:
>> If you will insist on sending to my Outlook address, you'll get replies in standard broken Outlook format.  You've been warned.
>
> You trumped my Gmail warning. I fold. :)
>
>>
>> ---
>>
>> +/* BUG: Big endian systems need accesses to "id" wrapped with le16_to_cpu().
>> + */
>>
>> No, it's already been converted.  See ata_dev_read_id().
>
> Ah - good. I'll remove the comment.
>
>>
>> I'm not sure about your use of a switch to set the sector size.  Have you checked the code that GCC generates for this?
>
> The switch probably sucks unless we could weight the order of the
> tests. E.g. common cases first. But it's just an implementation detail
> that is relatively easy to replace with the bitmap you had implemented
> before.
>
>>
>> All the places you dereference dev->sdev are within a callchain from ->queuecommand; sdev can't possibly go away.
>> It'd be clearer that this is the case if you used scmd->device->sector_size instead of dev->sdev->sector_size.
>
> Thank you - that looks much better to me too.
>
>> --
>>
>> @@ -516,7 +515,7 @@ int ata_cmd_ioctl(struct scsi_device *scsidev, void __user *arg)
>>         memset(scsi_cmd, 0, sizeof(scsi_cmd));
>>
>>         if (args[3]) {
>> -               argsize = SECTOR_SIZE * args[3];
>> +               argsize = ATA_SECT_SIZE * args[3];
>>                 argbuf = kmalloc(argsize, GFP_KERNEL);
>>                 if (argbuf == NULL) {
>>                         rc = -ENOMEM;
>>
>> I think this is wrong.  The ioctl does PIO Data-in; as such, it should use the native sector size, not 512.
>> That said, there's a possibility of data corruption for programs which use this ioctl, assuming a 512-byte sector size when it's natively 4k.
>> This one's tricky and needs serious thought.  I might error it if sector_size isn't 512 bytes :-)
>> It's a legacy ioctl anyway, right?
>
> I have no idea. If it's tricky, I probably have it wrong.
> Anyone else have guidance here?

The main question is whether the size of a DRQ block changes, when LBA 
logical size changes?  I need to review the ATA8 specs in this area, but 
I would think some interfaces that return 512-byte pages for things like 
SMART info would be unchanged.  How do the drives behave for 
PIO-Data-{In,Out} commands that are not reading/writing user data, but 
rather drive metadata?

	Jeff

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
  2010-08-16 19:41     ` Jeff Garzik
@ 2010-08-16 20:24       ` Grant Grundler
  2010-08-17 15:22         ` Tejun Heo
  0 siblings, 1 reply; 7+ messages in thread
From: Grant Grundler @ 2010-08-16 20:24 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Wilcox, Matthew R, Tejun Heo, Linux IDE, Gwendal Grignou, LKML

On Mon, Aug 16, 2010 at 12:41 PM, Jeff Garzik <jgarzik@pobox.com> wrote:
...
> The main question is whether the size of a DRQ block changes, when LBA
> logical size changes?

Don't we already know that depends on the ATA command?

> I need to review the ATA8 specs in this area, but I
> would think some interfaces that return 512-byte pages for things like SMART
> info would be unchanged.  How do the drives behave for PIO-Data-{In,Out}
> commands that are not reading/writing user data, but rather drive metadata?

SMART Info/Logs are meta data and as you noted unchanged.

Two other meta-data PIO commands I can think of:
o  FW update: I'm not going to risk bricking this drive to find that out.
o Vendor specific selftest/diagnostic commands (I don't know if any
exist - just a guess)

If someone from Hitachi cares to respond publicly regarding the
behavior of this "Engineering Sample", that would be appreciated and
helpful.  But feedback from any other vendor would be welcome/useful
also.

I suspect legacy tools might be needed to update firmware and those
might continue to function if DRQ block size is still 512 byte.  I'd
like to leave this hardcoded to ATA_SECT_SIZE until someone can
demonstrate a need to change it. Ok?

thanks,
grant

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native)
  2010-08-16 20:24       ` Grant Grundler
@ 2010-08-17 15:22         ` Tejun Heo
  0 siblings, 0 replies; 7+ messages in thread
From: Tejun Heo @ 2010-08-17 15:22 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Jeff Garzik, Wilcox, Matthew R, Linux IDE, Gwendal Grignou, LKML

Hello,

On 08/16/2010 10:24 PM, Grant Grundler wrote:
> I suspect legacy tools might be needed to update firmware and those
> might continue to function if DRQ block size is still 512 byte.  I'd
> like to leave this hardcoded to ATA_SECT_SIZE until someone can
> demonstrate a need to change it. Ok?

I agree it'll be best to leave it 512bytes.  We'll just cause more
confusion by overriding the sector size there.  Another possible
approach would be simply failing the ioctl if logical sector size
isn't 512 byte.  The ioctl should probably be deprecated one day
anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2010-08-17 15:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-15  0:00 [PATCH] 2.6.35 libata support for > 512 byte sectors (e.g. 4K Native) Grant Grundler
2010-08-15  4:37 ` Wilcox, Matthew R
2010-08-15  4:37   ` Wilcox, Matthew R
2010-08-16 19:17   ` Grant Grundler
2010-08-16 19:41     ` Jeff Garzik
2010-08-16 20:24       ` Grant Grundler
2010-08-17 15:22         ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.