All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/4] SCT Write Same
@ 2016-08-22  4:23 Shaun Tancheff
  2016-08-22  4:23 ` [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat Shaun Tancheff
                   ` (5 more replies)
  0 siblings, 6 replies; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:23 UTC (permalink / raw)
  To: linux-ide, linux-kernel
  Cc: Shaun Tancheff, Tejun Heo, Christoph Hellwig, Tom Yan,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

At some point the method of issuing Write Same for ATA drives changed.
Currently write same is commonly available via SCT so expose the SCT
capabilities and use SCT Write Same when it is available.

This is useful for zoned based media that prefers to support discard
with lbprz set, aka discard zeroes data by mapping discard operations to
reset write pointer operations. Conventional zones that do not support
reset write pointer can still honor the discard zeroes data by issuing
a write same over the zone.

It may also be nice to know if various controllers that currently
disable WRITE SAME will work with the SCT Write Same code path:
  aacraid, arcmsr, megaraid, 3w-9xxx, 3w-sas, 3w-xxxx, gdth, hpsa, ips,
  megaraid, pmcraid, storvsc_drv

This patch against v4.8-rc2 is also at
    https://github.com/stancheff/linux/tree/v4.8-rc2%2Bbiof.v9

    git@github.com:stancheff/linux.git v4.8-rc2+biof.v9

v6:
 - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
 - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
 - Added support for a sector_size descriptor other than 512 bytes.
v5:
 - Addressed review comments
 - Report support for ZBC only for zoned devices.
 - kmap page during rewrite
 - Fix unmap set to require trim or error, if not unmap then sct write
   same or error.
v4:
 - Added partial MAINTENANCE_IN opcode simulation
 - Dropped all changes in drivers/scsi/*
 - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
v3:
 - Demux UNMAP/TRIM from WRITE SAME
v2:
 - Remove fugly ata hacking from sd.c

Shaun Tancheff (4):
  libata: Safely overwrite attached page in WRITE SAME xlat
  Add support for SCT Write Same
  SCT Write Same / DSM Trim
  SCT Write Same handle ATA_DFLAG_PIO

 drivers/ata/libata-scsi.c | 280 +++++++++++++++++++++++++++++++++++++++++-----
 include/linux/ata.h       |  69 +++++++-----
 2 files changed, 292 insertions(+), 57 deletions(-)

-- 
2.9.3


^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22  4:23 [PATCH v6 0/4] SCT Write Same Shaun Tancheff
@ 2016-08-22  4:23 ` Shaun Tancheff
  2016-08-22  6:27   ` Hannes Reinecke
                     ` (2 more replies)
  2016-08-22  4:23 ` [PATCH v6 2/4] Add support for SCT Write Same Shaun Tancheff
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:23 UTC (permalink / raw)
  To: linux-ide, linux-kernel
  Cc: Shaun Tancheff, Tejun Heo, Christoph Hellwig, Tom Yan,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

Safely overwriting the attached page to ATA format from the SCSI formatted
variant.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v6:
 - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
 - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
v5:
 - Added prep patch to work with non-page aligned scatterlist pages
   and use kmap_atomic() to lock page during modification.

 drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
 include/linux/ata.h       | 26 ----------------------
 2 files changed, 51 insertions(+), 31 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e207b33..7990cb2 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3282,6 +3282,54 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
 	return 1;
 }
 
+/**
+ * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
+ * @cmd: SCSI command being translated
+ * @num: Maximum number of entries (nominally 64).
+ * @sector: Starting sector
+ * @count: Total Range of request
+ *
+ * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
+ * descriptor.
+ *
+ * Upto 64 entries of the format:
+ *   63:48 Range Length
+ *   47:0  LBA
+ *
+ *  Range Length of 0 is ignored.
+ *  LBA's should be sorted order and not overlap.
+ *
+ * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
+ */
+static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
+					      u64 sector, u32 count)
+{
+	__le64 *buffer;
+	u32 i = 0, used_bytes;
+	unsigned long flags;
+
+	BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
+
+	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
+	buffer = ((void *)ata_scsi_rbuf);
+	while (i < num) {
+		u64 entry = sector |
+			((u64)(count > 0xffff ? 0xffff : count) << 48);
+		buffer[i++] = __cpu_to_le64(entry);
+		if (count <= 0xffff)
+			break;
+		count -= 0xffff;
+		sector += 0xffff;
+	}
+
+	used_bytes = ALIGN(i * 8, 512);
+	memset(buffer + i, 0, used_bytes - i * 8);
+	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
+	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
+
+	return used_bytes;
+}
+
 static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 {
 	struct ata_taskfile *tf = &qc->tf;
@@ -3290,8 +3338,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 	const u8 *cdb = scmd->cmnd;
 	u64 block;
 	u32 n_block;
+	const u32 trmax = ATA_MAX_TRIM_RNUM;
 	u32 size;
-	void *buf;
 	u16 fp;
 	u8 bp = 0xff;
 
@@ -3319,10 +3367,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 	if (!scsi_sg_count(scmd))
 		goto invalid_param_len;
 
-	buf = page_address(sg_page(scsi_sglist(scmd)));
-
-	if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
-		size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
+	if (n_block <= 0xffff * trmax) {
+		size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
 	} else {
 		fp = 2;
 		goto invalid_fld;
diff --git a/include/linux/ata.h b/include/linux/ata.h
index adbc812..45a1d71 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
 #endif
 }
 
-/*
- * Write LBA Range Entries to the buffer that will cover the extent from
- * sector to sector + count.  This is used for TRIM and for ADD LBA(S)
- * TO NV CACHE PINNED SET.
- */
-static inline unsigned ata_set_lba_range_entries(void *_buffer,
-		unsigned num, u64 sector, unsigned long count)
-{
-	__le64 *buffer = _buffer;
-	unsigned i = 0, used_bytes;
-
-	while (i < num) {
-		u64 entry = sector |
-			((u64)(count > 0xffff ? 0xffff : count) << 48);
-		buffer[i++] = __cpu_to_le64(entry);
-		if (count <= 0xffff)
-			break;
-		count -= 0xffff;
-		sector += 0xffff;
-	}
-
-	used_bytes = ALIGN(i * 8, 512);
-	memset(buffer + i, 0, used_bytes - i * 8);
-	return used_bytes;
-}
-
 static inline bool ata_ok(u8 status)
 {
 	return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22  4:23 [PATCH v6 0/4] SCT Write Same Shaun Tancheff
  2016-08-22  4:23 ` [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat Shaun Tancheff
@ 2016-08-22  4:23 ` Shaun Tancheff
  2016-08-22  6:27   ` Hannes Reinecke
                     ` (2 more replies)
  2016-08-22  4:23 ` [PATCH v6 3/4] SCT Write Same / DSM Trim Shaun Tancheff
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:23 UTC (permalink / raw)
  To: linux-ide, linux-kernel
  Cc: Shaun Tancheff, Tejun Heo, Christoph Hellwig, Tom Yan,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

SATA drives may support write same via SCT. This is useful
for setting the drive contents to a specific pattern (0's).

Translate a SCSI WRITE SAME 16 command to be either a DSM TRIM
command or an SCT Write Same command.

Based on the UNMAP flag:
  - When set translate to DSM TRIM
  - When not set translate to SCT Write Same

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v6:
 - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
v5:
 - Addressed review comments
 - Report support for ZBC only for zoned devices.
 - kmap page during rewrite
 - Fix unmap set to require trim or error, if not unmap then sct write
   same or error.
v4:
 - Added partial MAINTENANCE_IN opcode simulation
 - Dropped all changes in drivers/scsi/*
 - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
v3:
 - Demux UNMAP/TRIM from WRITE SAME
v2:
 - Remove fugly ata hacking from sd.c

 drivers/ata/libata-scsi.c | 199 +++++++++++++++++++++++++++++++++++++++-------
 include/linux/ata.h       |  43 ++++++++++
 2 files changed, 213 insertions(+), 29 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 7990cb2..ebf1a04 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -1159,8 +1159,6 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
 {
 	sdev->use_10_for_rw = 1;
 	sdev->use_10_for_ms = 1;
-	sdev->no_report_opcodes = 1;
-	sdev->no_write_same = 1;
 
 	/* Schedule policy is determined by ->qc_defer() callback and
 	 * it needs to see every deferred qc.  Set dev_blocked to 1 to
@@ -3287,7 +3285,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
  * @cmd: SCSI command being translated
  * @num: Maximum number of entries (nominally 64).
  * @sector: Starting sector
- * @count: Total Range of request
+ * @count: Total Range of request in logical sectors
  *
  * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
  * descriptor.
@@ -3330,6 +3328,45 @@ static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
 	return used_bytes;
 }
 
+/**
+ * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
+ * @cmd: SCSI command being translated
+ * @lba: Starting sector
+ * @num: Number of logical sectors to be zero'd.
+ *
+ * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
+ * descriptor.
+ * NOTE: Writes a pattern (0's) in the foreground.
+ *       Large write-same requents can timeout.
+ */
+static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
+{
+	u16 *sctpg;
+	unsigned long flags;
+
+	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
+	sctpg = ((void *)ata_scsi_rbuf);
+
+	put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
+	put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
+	put_unaligned_le64(lba,     &sctpg[2]);
+	put_unaligned_le64(num,     &sctpg[6]);
+	put_unaligned_le32(0u,      &sctpg[10]);
+
+	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
+	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
+}
+
+/**
+ * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
+ * @qc: Command to be translated
+ *
+ * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
+ * an SCT Write Same command.
+ * Based on WRITE SAME has the UNMAP flag
+ *   When set translate to DSM TRIM
+ *   When clear translate to SCT Write Same
+ */
 static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 {
 	struct ata_taskfile *tf = &qc->tf;
@@ -3342,6 +3379,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 	u32 size;
 	u16 fp;
 	u8 bp = 0xff;
+	u8 unmap = cdb[1] & 0x8;
 
 	/* we may not issue DMA commands if no DMA mode is set */
 	if (unlikely(!dev->dma_mode))
@@ -3353,11 +3391,26 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 	}
 	scsi_16_lba_len(cdb, &block, &n_block);
 
-	/* for now we only support WRITE SAME with the unmap bit set */
-	if (unlikely(!(cdb[1] & 0x8))) {
-		fp = 1;
-		bp = 3;
-		goto invalid_fld;
+	if (unmap) {
+		/* If trim is not enabled the cmd is invalid. */
+		if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
+		    !ata_id_has_trim(dev->id)) {
+			fp = 1;
+			bp = 3;
+			goto invalid_fld;
+		}
+		/* If the request is too large the cmd is invalid */
+		if (n_block > 0xffff * trmax) {
+			fp = 2;
+			goto invalid_fld;
+		}
+	} else {
+		/* If write same is not available the cmd is invalid */
+		if (!ata_id_sct_write_same(dev->id)) {
+			fp = 1;
+			bp = 3;
+			goto invalid_fld;
+		}
 	}
 
 	/*
@@ -3367,30 +3420,42 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 	if (!scsi_sg_count(scmd))
 		goto invalid_param_len;
 
-	if (n_block <= 0xffff * trmax) {
+	if (unmap) {
 		size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
+		if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
+			/* Newer devices support queued TRIM commands */
+			tf->protocol = ATA_PROT_NCQ;
+			tf->command = ATA_CMD_FPDMA_SEND;
+			tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
+			tf->nsect = qc->tag << 3;
+			tf->hob_feature = (size / 512) >> 8;
+			tf->feature = size / 512;
+
+			tf->auxiliary = 1;
+		} else {
+			tf->protocol = ATA_PROT_DMA;
+			tf->hob_feature = 0;
+			tf->feature = ATA_DSM_TRIM;
+			tf->hob_nsect = (size / 512) >> 8;
+			tf->nsect = size / 512;
+			tf->command = ATA_CMD_DSM;
+		}
 	} else {
-		fp = 2;
-		goto invalid_fld;
-	}
-
-	if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
-		/* Newer devices support queued TRIM commands */
-		tf->protocol = ATA_PROT_NCQ;
-		tf->command = ATA_CMD_FPDMA_SEND;
-		tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
-		tf->nsect = qc->tag << 3;
-		tf->hob_feature = (size / 512) >> 8;
-		tf->feature = size / 512;
+		ata_format_sct_write_same(scmd, block, n_block);
 
-		tf->auxiliary = 1;
-	} else {
-		tf->protocol = ATA_PROT_DMA;
 		tf->hob_feature = 0;
-		tf->feature = ATA_DSM_TRIM;
-		tf->hob_nsect = (size / 512) >> 8;
-		tf->nsect = size / 512;
-		tf->command = ATA_CMD_DSM;
+		tf->feature = 0;
+		tf->hob_nsect = 0;
+		tf->nsect = 1;
+		tf->lbah = 0;
+		tf->lbam = 0;
+		tf->lbal = ATA_CMD_STANDBYNOW1;
+		tf->hob_lbah = 0;
+		tf->hob_lbam = 0;
+		tf->hob_lbal = 0;
+		tf->device = ATA_CMD_STANDBYNOW1;
+		tf->protocol = ATA_PROT_DMA;
+		tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
 	}
 
 	tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
@@ -3414,6 +3479,76 @@ invalid_opcode:
 }
 
 /**
+ *	ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN
+ *	@args: device MAINTENANCE_IN data / SCSI command of interest.
+ *	@rbuf: Response buffer, to which simulated SCSI cmd output is sent.
+ *
+ *	Yields a subset to satisfy scsi_report_opcode()
+ *
+ *	LOCKING:
+ *	spin_lock_irqsave(host lock)
+ */
+static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf)
+{
+	struct ata_device *dev = args->dev;
+	u8 *cdb = args->cmd->cmnd;
+	u8 supported = 0;
+	unsigned int err = 0;
+
+	if (cdb[2] != 1) {
+		ata_dev_warn(dev, "invalid command format %d\n", cdb[2]);
+		err = 2;
+		goto out;
+	}
+	switch (cdb[3]) {
+	case INQUIRY:
+	case MODE_SENSE:
+	case MODE_SENSE_10:
+	case READ_CAPACITY:
+	case SERVICE_ACTION_IN_16:
+	case REPORT_LUNS:
+	case REQUEST_SENSE:
+	case SYNCHRONIZE_CACHE:
+	case REZERO_UNIT:
+	case SEEK_6:
+	case SEEK_10:
+	case TEST_UNIT_READY:
+	case SEND_DIAGNOSTIC:
+	case MAINTENANCE_IN:
+	case READ_6:
+	case READ_10:
+	case READ_16:
+	case WRITE_6:
+	case WRITE_10:
+	case WRITE_16:
+	case ATA_12:
+	case ATA_16:
+	case VERIFY:
+	case VERIFY_16:
+	case MODE_SELECT:
+	case MODE_SELECT_10:
+	case START_STOP:
+		supported = 3;
+		break;
+	case WRITE_SAME_16:
+		if (ata_id_sct_write_same(dev->id))
+			supported = 3;
+		break;
+	case ZBC_IN:
+	case ZBC_OUT:
+		if (ata_id_zoned_cap(dev->id) ||
+		    dev->class == ATA_DEV_ZAC)
+			supported = 3;
+		break;
+	default:
+		break;
+	}
+out:
+	rbuf[1] = supported; /* supported */
+	return err;
+}
+
+/**
  *	ata_scsi_report_zones_complete - convert ATA output
  *	@qc: command structure returning the data
  *
@@ -4193,6 +4328,13 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
 			ata_scsi_invalid_field(dev, cmd, 1);
 		break;
 
+	case MAINTENANCE_IN:
+		if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES)
+			ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in);
+		else
+			ata_scsi_invalid_field(dev, cmd, 1);
+		break;
+
 	/* all other commands */
 	default:
 		ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0);
@@ -4225,7 +4367,6 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
 		shost->max_lun = 1;
 		shost->max_channel = 1;
 		shost->max_cmd_len = 16;
-		shost->no_write_same = 1;
 
 		/* Schedule policy is determined by ->qc_defer()
 		 * callback and it needs to see every deferred qc.
diff --git a/include/linux/ata.h b/include/linux/ata.h
index 45a1d71..fdb1803 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -105,6 +105,7 @@ enum {
 	ATA_ID_CFA_KEY_MGMT	= 162,
 	ATA_ID_CFA_MODES	= 163,
 	ATA_ID_DATA_SET_MGMT	= 169,
+	ATA_ID_SCT_CMD_XPORT	= 206,
 	ATA_ID_ROT_SPEED	= 217,
 	ATA_ID_PIO4		= (1 << 1),
 
@@ -789,6 +790,48 @@ static inline bool ata_id_sense_reporting_enabled(const u16 *id)
 }
 
 /**
+ *
+ * Word: 206 - SCT Command Transport
+ *    15:12 - Vendor Specific
+ *     11:6 - Reserved
+ *        5 - SCT Command Transport Data Tables supported
+ *        4 - SCT Command Transport Features Control supported
+ *        3 - SCT Command Transport Error Recovery Control supported
+ *        2 - SCT Command Transport Write Same supported
+ *        1 - SCT Command Transport Long Sector Access supported
+ *        0 - SCT Command Transport supported
+ */
+static inline bool ata_id_sct_data_tables(const u16 *id)
+{
+	return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
+}
+
+static inline bool ata_id_sct_features_ctrl(const u16 *id)
+{
+	return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
+}
+
+static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
+{
+	return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
+}
+
+static inline bool ata_id_sct_write_same(const u16 *id)
+{
+	return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
+}
+
+static inline bool ata_id_sct_long_sector_access(const u16 *id)
+{
+	return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
+}
+
+static inline bool ata_id_sct_supported(const u16 *id)
+{
+	return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
+}
+
+/**
  *	ata_id_major_version	-	get ATA level of drive
  *	@id: Identify data
  *
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22  4:23 [PATCH v6 0/4] SCT Write Same Shaun Tancheff
  2016-08-22  4:23 ` [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat Shaun Tancheff
  2016-08-22  4:23 ` [PATCH v6 2/4] Add support for SCT Write Same Shaun Tancheff
@ 2016-08-22  4:23 ` Shaun Tancheff
  2016-08-22  6:30   ` Hannes Reinecke
  2016-08-22  8:31   ` [PATCH v6 3/4] " Tom Yan
  2016-08-22  4:23 ` [PATCH v6 4/4] SCT Write Same handle ATA_DFLAG_PIO Shaun Tancheff
                   ` (2 subsequent siblings)
  5 siblings, 2 replies; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:23 UTC (permalink / raw)
  To: linux-ide, linux-kernel
  Cc: Shaun Tancheff, Tejun Heo, Christoph Hellwig, Tom Yan,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

Correct handling of devices with sector_size other that 512 bytes.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
In the case of a 4Kn device sector_size it is possible to describe a much
larger DSM Trim than the current fixed default of 512 bytes.

This patch assumes the minimum descriptor is sector_size and fills out
the descriptor accordingly.

The ACS-2 specification is quite clear that the DSM command payload is
sized as number of 512 byte transfers so a 4Kn device will operate
correctly without this patch.

v5:
 - Added support for a sector_size descriptor other than 512 bytes.

 drivers/ata/libata-scsi.c | 85 +++++++++++++++++++++++++++++++----------------
 1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index ebf1a04..37f456e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3283,7 +3283,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
 /**
  * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
  * @cmd: SCSI command being translated
- * @num: Maximum number of entries (nominally 64).
+ * @trmax: Maximum number of entries that will fit in sector_size bytes.
  * @sector: Starting sector
  * @count: Total Range of request in logical sectors
  *
@@ -3298,63 +3298,80 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
  *  LBA's should be sorted order and not overlap.
  *
  * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
+ *
+ * Return: Number of bytes copied into sglist.
  */
-static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
-					      u64 sector, u32 count)
+static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax,
+					u64 sector, u32 count)
 {
-	__le64 *buffer;
-	u32 i = 0, used_bytes;
+	struct scsi_device *sdp = cmd->device;
+	size_t len = sdp->sector_size;
+	size_t r;
+	__le64 *buf;
+	u32 i = 0;
 	unsigned long flags;
 
-	BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
+	WARN_ON(len > ATA_SCSI_RBUF_SIZE);
+
+	if (len > ATA_SCSI_RBUF_SIZE)
+		len = ATA_SCSI_RBUF_SIZE;
 
 	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
-	buffer = ((void *)ata_scsi_rbuf);
-	while (i < num) {
+	buf = ((void *)ata_scsi_rbuf);
+	memset(buf, 0, len);
+	while (i < trmax) {
 		u64 entry = sector |
 			((u64)(count > 0xffff ? 0xffff : count) << 48);
-		buffer[i++] = __cpu_to_le64(entry);
+		buf[i++] = __cpu_to_le64(entry);
 		if (count <= 0xffff)
 			break;
 		count -= 0xffff;
 		sector += 0xffff;
 	}
-
-	used_bytes = ALIGN(i * 8, 512);
-	memset(buffer + i, 0, used_bytes - i * 8);
-	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
+	r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
 	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
 
-	return used_bytes;
+	return r;
 }
 
 /**
  * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
  * @cmd: SCSI command being translated
  * @lba: Starting sector
- * @num: Number of logical sectors to be zero'd.
+ * @num: Number of sectors to be zero'd.
  *
- * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
+ * Rewrite the WRITE SAME payload to be an SCT Write Same formatted
  * descriptor.
  * NOTE: Writes a pattern (0's) in the foreground.
- *       Large write-same requents can timeout.
+ *
+ * Return: Number of bytes copied into sglist.
  */
-static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
+static size_t ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
 {
-	u16 *sctpg;
+	struct scsi_device *sdp = cmd->device;
+	size_t len = sdp->sector_size;
+	size_t r;
+	u16 *buf;
 	unsigned long flags;
 
 	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
-	sctpg = ((void *)ata_scsi_rbuf);
+	buf = ((void *)ata_scsi_rbuf);
+
+	put_unaligned_le16(0x0002,  &buf[0]); /* SCT_ACT_WRITE_SAME */
+	put_unaligned_le16(0x0101,  &buf[1]); /* WRITE PTRN FG */
+	put_unaligned_le64(lba,     &buf[2]);
+	put_unaligned_le64(num,     &buf[6]);
+	put_unaligned_le32(0u,      &buf[10]); /* pattern */
+
+	WARN_ON(len > ATA_SCSI_RBUF_SIZE);
 
-	put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
-	put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
-	put_unaligned_le64(lba,     &sctpg[2]);
-	put_unaligned_le64(num,     &sctpg[6]);
-	put_unaligned_le32(0u,      &sctpg[10]);
+	if (len > ATA_SCSI_RBUF_SIZE)
+		len = ATA_SCSI_RBUF_SIZE;
 
-	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
+	r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
 	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
+
+	return r;
 }
 
 /**
@@ -3371,11 +3388,13 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 {
 	struct ata_taskfile *tf = &qc->tf;
 	struct scsi_cmnd *scmd = qc->scsicmd;
+	struct scsi_device *sdp = scmd->device;
+	size_t len = sdp->sector_size;
 	struct ata_device *dev = qc->dev;
 	const u8 *cdb = scmd->cmnd;
 	u64 block;
 	u32 n_block;
-	const u32 trmax = ATA_MAX_TRIM_RNUM;
+	const u32 trmax = len >> 3;
 	u32 size;
 	u16 fp;
 	u8 bp = 0xff;
@@ -3420,8 +3439,16 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 	if (!scsi_sg_count(scmd))
 		goto invalid_param_len;
 
+	/*
+	 * size must match sector size in bytes
+	 * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count)
+	 * is defined as number of 512 byte blocks to be transferred.
+	 */
 	if (unmap) {
 		size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
+		if (size != len)
+			goto invalid_param_len;
+
 		if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
 			/* Newer devices support queued TRIM commands */
 			tf->protocol = ATA_PROT_NCQ;
@@ -3441,7 +3468,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 			tf->command = ATA_CMD_DSM;
 		}
 	} else {
-		ata_format_sct_write_same(scmd, block, n_block);
+		size = ata_format_sct_write_same(scmd, block, n_block);
+		if (size != len)
+			goto invalid_param_len;
 
 		tf->hob_feature = 0;
 		tf->feature = 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 55+ messages in thread

* [PATCH v6 4/4] SCT Write Same handle ATA_DFLAG_PIO
  2016-08-22  4:23 [PATCH v6 0/4] SCT Write Same Shaun Tancheff
                   ` (2 preceding siblings ...)
  2016-08-22  4:23 ` [PATCH v6 3/4] SCT Write Same / DSM Trim Shaun Tancheff
@ 2016-08-22  4:23 ` Shaun Tancheff
  2016-08-22  6:31   ` Hannes Reinecke
  2016-08-22  6:32 ` [PATCH v6 0/4] SCT Write Same Hannes Reinecke
  2016-08-25 15:28 ` Tejun Heo
  5 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:23 UTC (permalink / raw)
  To: linux-ide, linux-kernel
  Cc: Shaun Tancheff, Tejun Heo, Christoph Hellwig, Tom Yan,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

Use non DMA write log when ATA_DFLAG_PIO is set.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v6: Added check for ATA_DFLAG_PIO and fallback to non DMA write log for
    SCT Write Same

 drivers/ata/libata-scsi.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 37f456e..e50b7a7 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3485,6 +3485,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 		tf->device = ATA_CMD_STANDBYNOW1;
 		tf->protocol = ATA_PROT_DMA;
 		tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
+		if (unlikely(dev->flags & ATA_DFLAG_PIO))
+			tf->command = ATA_CMD_WRITE_LOG_EXT;
 	}
 
 	tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22  4:23 ` [PATCH v6 2/4] Add support for SCT Write Same Shaun Tancheff
@ 2016-08-22  6:27   ` Hannes Reinecke
  2016-08-22 19:20   ` Tom Yan
  2016-08-23 10:37   ` Tom Yan
  2 siblings, 0 replies; 55+ messages in thread
From: Hannes Reinecke @ 2016-08-22  6:27 UTC (permalink / raw)
  To: Shaun Tancheff, linux-ide, linux-kernel
  Cc: Tejun Heo, Christoph Hellwig, Tom Yan, Martin K . Petersen,
	Damien Le Moal, Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 08/22/2016 06:23 AM, Shaun Tancheff wrote:
> SATA drives may support write same via SCT. This is useful
> for setting the drive contents to a specific pattern (0's).
> 
> Translate a SCSI WRITE SAME 16 command to be either a DSM TRIM
> command or an SCT Write Same command.
> 
> Based on the UNMAP flag:
>   - When set translate to DSM TRIM
>   - When not set translate to SCT Write Same
> 
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6:
>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
> v5:
>  - Addressed review comments
>  - Report support for ZBC only for zoned devices.
>  - kmap page during rewrite
>  - Fix unmap set to require trim or error, if not unmap then sct write
>    same or error.
> v4:
>  - Added partial MAINTENANCE_IN opcode simulation
>  - Dropped all changes in drivers/scsi/*
>  - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
> v3:
>  - Demux UNMAP/TRIM from WRITE SAME
> v2:
>  - Remove fugly ata hacking from sd.c
> 
>  drivers/ata/libata-scsi.c | 199 +++++++++++++++++++++++++++++++++++++++-------
>  include/linux/ata.h       |  43 ++++++++++
>  2 files changed, 213 insertions(+), 29 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22  4:23 ` [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat Shaun Tancheff
@ 2016-08-22  6:27   ` Hannes Reinecke
  2016-08-22 19:27   ` Tom Yan
  2016-08-22 21:20   ` Tom Yan
  2 siblings, 0 replies; 55+ messages in thread
From: Hannes Reinecke @ 2016-08-22  6:27 UTC (permalink / raw)
  To: Shaun Tancheff, linux-ide, linux-kernel
  Cc: Tejun Heo, Christoph Hellwig, Tom Yan, Martin K . Petersen,
	Damien Le Moal, Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 08/22/2016 06:23 AM, Shaun Tancheff wrote:
> Safely overwriting the attached page to ATA format from the SCSI formatted
> variant.
> 
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6:
>  - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
> v5:
>  - Added prep patch to work with non-page aligned scatterlist pages
>    and use kmap_atomic() to lock page during modification.
> 
>  drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/ata.h       | 26 ----------------------
>  2 files changed, 51 insertions(+), 31 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22  4:23 ` [PATCH v6 3/4] SCT Write Same / DSM Trim Shaun Tancheff
@ 2016-08-22  6:30   ` Hannes Reinecke
  2016-08-24 18:08     ` [PATCH v6 3/4 RESEND] " Shaun Tancheff
  2016-08-22  8:31   ` [PATCH v6 3/4] " Tom Yan
  1 sibling, 1 reply; 55+ messages in thread
From: Hannes Reinecke @ 2016-08-22  6:30 UTC (permalink / raw)
  To: Shaun Tancheff, linux-ide, linux-kernel
  Cc: Tejun Heo, Christoph Hellwig, Tom Yan, Martin K . Petersen,
	Damien Le Moal, Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 08/22/2016 06:23 AM, Shaun Tancheff wrote:
> Correct handling of devices with sector_size other that 512 bytes.
> 
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> In the case of a 4Kn device sector_size it is possible to describe a much
> larger DSM Trim than the current fixed default of 512 bytes.
> 
> This patch assumes the minimum descriptor is sector_size and fills out
> the descriptor accordingly.
> 
> The ACS-2 specification is quite clear that the DSM command payload is
> sized as number of 512 byte transfers so a 4Kn device will operate
> correctly without this patch.
> 
Can you please reshuffle the description to have the 'Signed-off-by'
line following the entire description?
This makes things easier to read and avoid the last part of the
description being killed by overzealous patch programs.
THX.

> v5:
>  - Added support for a sector_size descriptor other than 512 bytes.
> 
>  drivers/ata/libata-scsi.c | 85 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 57 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index ebf1a04..37f456e 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3283,7 +3283,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>  /**
>   * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
>   * @cmd: SCSI command being translated
> - * @num: Maximum number of entries (nominally 64).
> + * @trmax: Maximum number of entries that will fit in sector_size bytes.
>   * @sector: Starting sector
>   * @count: Total Range of request in logical sectors
>   *
> @@ -3298,63 +3298,80 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>   *  LBA's should be sorted order and not overlap.
>   *
>   * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
> + *
> + * Return: Number of bytes copied into sglist.
>   */
> -static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
> -					      u64 sector, u32 count)
> +static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax,
> +					u64 sector, u32 count)
>  {
> -	__le64 *buffer;
> -	u32 i = 0, used_bytes;
> +	struct scsi_device *sdp = cmd->device;
> +	size_t len = sdp->sector_size;
> +	size_t r;
> +	__le64 *buf;
> +	u32 i = 0;
>  	unsigned long flags;
>  
> -	BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
> +	WARN_ON(len > ATA_SCSI_RBUF_SIZE);
> +
> +	if (len > ATA_SCSI_RBUF_SIZE)
> +		len = ATA_SCSI_RBUF_SIZE;
>  
>  	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> -	buffer = ((void *)ata_scsi_rbuf);
> -	while (i < num) {
> +	buf = ((void *)ata_scsi_rbuf);
> +	memset(buf, 0, len);
> +	while (i < trmax) {
>  		u64 entry = sector |
>  			((u64)(count > 0xffff ? 0xffff : count) << 48);
> -		buffer[i++] = __cpu_to_le64(entry);
> +		buf[i++] = __cpu_to_le64(entry);
>  		if (count <= 0xffff)
>  			break;
>  		count -= 0xffff;
>  		sector += 0xffff;
>  	}
> -
> -	used_bytes = ALIGN(i * 8, 512);
> -	memset(buffer + i, 0, used_bytes - i * 8);
> -	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
> +	r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
>  	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>  
> -	return used_bytes;
> +	return r;
>  }
>  
>  /**
>   * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>   * @cmd: SCSI command being translated
>   * @lba: Starting sector
> - * @num: Number of logical sectors to be zero'd.
> + * @num: Number of sectors to be zero'd.
>   *
> - * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
> + * Rewrite the WRITE SAME payload to be an SCT Write Same formatted
>   * descriptor.
>   * NOTE: Writes a pattern (0's) in the foreground.
> - *       Large write-same requents can timeout.
> + *
> + * Return: Number of bytes copied into sglist.
>   */
> -static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
> +static size_t ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>  {
> -	u16 *sctpg;
> +	struct scsi_device *sdp = cmd->device;
> +	size_t len = sdp->sector_size;
> +	size_t r;
> +	u16 *buf;
>  	unsigned long flags;
>  
>  	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> -	sctpg = ((void *)ata_scsi_rbuf);
> +	buf = ((void *)ata_scsi_rbuf);
> +
> +	put_unaligned_le16(0x0002,  &buf[0]); /* SCT_ACT_WRITE_SAME */
> +	put_unaligned_le16(0x0101,  &buf[1]); /* WRITE PTRN FG */
> +	put_unaligned_le64(lba,     &buf[2]);
> +	put_unaligned_le64(num,     &buf[6]);
> +	put_unaligned_le32(0u,      &buf[10]); /* pattern */
> +
> +	WARN_ON(len > ATA_SCSI_RBUF_SIZE);
>  
> -	put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> -	put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
> -	put_unaligned_le64(lba,     &sctpg[2]);
> -	put_unaligned_le64(num,     &sctpg[6]);
> -	put_unaligned_le32(0u,      &sctpg[10]);
> +	if (len > ATA_SCSI_RBUF_SIZE)
> +		len = ATA_SCSI_RBUF_SIZE;
>  
> -	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
> +	r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
>  	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +
> +	return r;
>  }
>  
>  /**
> @@ -3371,11 +3388,13 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  {
>  	struct ata_taskfile *tf = &qc->tf;
>  	struct scsi_cmnd *scmd = qc->scsicmd;
> +	struct scsi_device *sdp = scmd->device;
> +	size_t len = sdp->sector_size;
>  	struct ata_device *dev = qc->dev;
>  	const u8 *cdb = scmd->cmnd;
>  	u64 block;
>  	u32 n_block;
> -	const u32 trmax = ATA_MAX_TRIM_RNUM;
> +	const u32 trmax = len >> 3;
>  	u32 size;
>  	u16 fp;
>  	u8 bp = 0xff;
> @@ -3420,8 +3439,16 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  	if (!scsi_sg_count(scmd))
>  		goto invalid_param_len;
>  
> +	/*
> +	 * size must match sector size in bytes
> +	 * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count)
> +	 * is defined as number of 512 byte blocks to be transferred.
> +	 */
>  	if (unmap) {
>  		size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
> +		if (size != len)
> +			goto invalid_param_len;
> +
>  		if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
>  			/* Newer devices support queued TRIM commands */
>  			tf->protocol = ATA_PROT_NCQ;
> @@ -3441,7 +3468,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  			tf->command = ATA_CMD_DSM;
>  		}
>  	} else {
> -		ata_format_sct_write_same(scmd, block, n_block);
> +		size = ata_format_sct_write_same(scmd, block, n_block);
> +		if (size != len)
> +			goto invalid_param_len;
>  
>  		tf->hob_feature = 0;
>  		tf->feature = 0;
> 
Why is this not folded into the previous patches?
This mostly patches code which you've already modified and/or initiated,
right?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 4/4] SCT Write Same handle ATA_DFLAG_PIO
  2016-08-22  4:23 ` [PATCH v6 4/4] SCT Write Same handle ATA_DFLAG_PIO Shaun Tancheff
@ 2016-08-22  6:31   ` Hannes Reinecke
  0 siblings, 0 replies; 55+ messages in thread
From: Hannes Reinecke @ 2016-08-22  6:31 UTC (permalink / raw)
  To: Shaun Tancheff, linux-ide, linux-kernel
  Cc: Tejun Heo, Christoph Hellwig, Tom Yan, Martin K . Petersen,
	Damien Le Moal, Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 08/22/2016 06:23 AM, Shaun Tancheff wrote:
> Use non DMA write log when ATA_DFLAG_PIO is set.
> 
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6: Added check for ATA_DFLAG_PIO and fallback to non DMA write log for
>     SCT Write Same
> 
>  drivers/ata/libata-scsi.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 37f456e..e50b7a7 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3485,6 +3485,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  		tf->device = ATA_CMD_STANDBYNOW1;
>  		tf->protocol = ATA_PROT_DMA;
>  		tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
> +		if (unlikely(dev->flags & ATA_DFLAG_PIO))
> +			tf->command = ATA_CMD_WRITE_LOG_EXT;
>  	}
>  
>  	tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
> 
Reviewed-by: Hannes Reinecke <hare@suse.com>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 0/4] SCT Write Same
  2016-08-22  4:23 [PATCH v6 0/4] SCT Write Same Shaun Tancheff
                   ` (3 preceding siblings ...)
  2016-08-22  4:23 ` [PATCH v6 4/4] SCT Write Same handle ATA_DFLAG_PIO Shaun Tancheff
@ 2016-08-22  6:32 ` Hannes Reinecke
  2016-08-25 15:28 ` Tejun Heo
  5 siblings, 0 replies; 55+ messages in thread
From: Hannes Reinecke @ 2016-08-22  6:32 UTC (permalink / raw)
  To: Shaun Tancheff, linux-ide, linux-kernel
  Cc: Tejun Heo, Christoph Hellwig, Tom Yan, Martin K . Petersen,
	Damien Le Moal, Josh Bingaman, Hannes Reinecke

On 08/22/2016 06:23 AM, Shaun Tancheff wrote:
> At some point the method of issuing Write Same for ATA drives changed.
> Currently write same is commonly available via SCT so expose the SCT
> capabilities and use SCT Write Same when it is available.
> 
> This is useful for zoned based media that prefers to support discard
> with lbprz set, aka discard zeroes data by mapping discard operations to
> reset write pointer operations. Conventional zones that do not support
> reset write pointer can still honor the discard zeroes data by issuing
> a write same over the zone.
> 
> It may also be nice to know if various controllers that currently
> disable WRITE SAME will work with the SCT Write Same code path:
>   aacraid, arcmsr, megaraid, 3w-9xxx, 3w-sas, 3w-xxxx, gdth, hpsa, ips,
>   megaraid, pmcraid, storvsc_drv
> 
> This patch against v4.8-rc2 is also at
>     https://github.com/stancheff/linux/tree/v4.8-rc2%2Bbiof.v9
> 
>     git@github.com:stancheff/linux.git v4.8-rc2+biof.v9
> 
> v6:
>  - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
>  - Added support for a sector_size descriptor other than 512 bytes.
> v5:
>  - Addressed review comments
>  - Report support for ZBC only for zoned devices.
>  - kmap page during rewrite
>  - Fix unmap set to require trim or error, if not unmap then sct write
>    same or error.
> v4:
>  - Added partial MAINTENANCE_IN opcode simulation
>  - Dropped all changes in drivers/scsi/*
>  - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
> v3:
>  - Demux UNMAP/TRIM from WRITE SAME
> v2:
>  - Remove fugly ata hacking from sd.c
> 
> Shaun Tancheff (4):
>   libata: Safely overwrite attached page in WRITE SAME xlat
>   Add support for SCT Write Same
>   SCT Write Same / DSM Trim
>   SCT Write Same handle ATA_DFLAG_PIO
> 
>  drivers/ata/libata-scsi.c | 280 +++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/ata.h       |  69 +++++++-----
>  2 files changed, 292 insertions(+), 57 deletions(-)
> 
Thanks for doing this.
It has been on my To-Do list for a long time, and it's good to see the
UNMAP/TRIM SATL cleaned up finally.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22  4:23 ` [PATCH v6 3/4] SCT Write Same / DSM Trim Shaun Tancheff
  2016-08-22  6:30   ` Hannes Reinecke
@ 2016-08-22  8:31   ` Tom Yan
  2016-08-22  8:33     ` Tom Yan
  1 sibling, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22  8:31 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

As mentioned before, as of the latest draft of ACS-4, nothing about a
larger payload size is mentioned. Conservatively speaking, it sort of
means that we are allowing four 512-byte block payload on 4Kn device
regardless of the reported limit in the IDENTIFY DEVICE data. I am
really not sure if it's a good thing to do. Doesn't seem necessary
anyway, especially when our block layer does not support such a large
bio size (well, yet), so each request will end up using a payload of
two 512-byte blocks at max anyway.

Also, it's IMHO better to do it in a seperate patch (series) after the
SCT Write Same support has entered libata's repo too, because this has
nothing to with it but TRIM translation. In case the future ACS
standards has clearer/better instruction on this, it will be easier
for us to revert/follow up too.

And you'll need to fix the Block Limits VPD simulation
(ata_scsiop_inq_b0) too, so that it will advertise the Maximum Write
Same Length dynamically as per the logical sector size, otherwise your
effort will be completely in vain, even if our block layer is
overhauled in the future.

Please be noted that, since your haven't touched ata_scsiop_inq_b0 at
all, the reported Maximum Write Same Length will be:

On device with TRIM support:
- 4194240 LOGICAL sector per request split / command
-- ~=2G on non-4Kn drives
-- ~=16G on non-4Kn drives

On device without TRIM support:
- 0 --> SD_MAX_WS10_BLOCKS (65535) per request split / command
-- ~= 32M on non-4Kn drives
-- ~=256M on non-4Kn drives

Even if we ignore the upper limit(s) of the block layer, do we want
such inconsistencies?

On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
> Correct handling of devices with sector_size other that 512 bytes.
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> In the case of a 4Kn device sector_size it is possible to describe a much
> larger DSM Trim than the current fixed default of 512 bytes.
>
> This patch assumes the minimum descriptor is sector_size and fills out
> the descriptor accordingly.
>
> The ACS-2 specification is quite clear that the DSM command payload is
> sized as number of 512 byte transfers so a 4Kn device will operate
> correctly without this patch.
>
> v5:
>  - Added support for a sector_size descriptor other than 512 bytes.
>
>  drivers/ata/libata-scsi.c | 85 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 57 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index ebf1a04..37f456e 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3283,7 +3283,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>  /**
>   * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
>   * @cmd: SCSI command being translated
> - * @num: Maximum number of entries (nominally 64).
> + * @trmax: Maximum number of entries that will fit in sector_size bytes.
>   * @sector: Starting sector
>   * @count: Total Range of request in logical sectors
>   *
> @@ -3298,63 +3298,80 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>   *  LBA's should be sorted order and not overlap.
>   *
>   * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
> + *
> + * Return: Number of bytes copied into sglist.
>   */
> -static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
> -                                             u64 sector, u32 count)
> +static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax,
> +                                       u64 sector, u32 count)
>  {
> -       __le64 *buffer;
> -       u32 i = 0, used_bytes;
> +       struct scsi_device *sdp = cmd->device;
> +       size_t len = sdp->sector_size;
> +       size_t r;
> +       __le64 *buf;
> +       u32 i = 0;
>         unsigned long flags;
>
> -       BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
> +       WARN_ON(len > ATA_SCSI_RBUF_SIZE);
> +
> +       if (len > ATA_SCSI_RBUF_SIZE)
> +               len = ATA_SCSI_RBUF_SIZE;
>
>         spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> -       buffer = ((void *)ata_scsi_rbuf);
> -       while (i < num) {
> +       buf = ((void *)ata_scsi_rbuf);
> +       memset(buf, 0, len);
> +       while (i < trmax) {
>                 u64 entry = sector |
>                         ((u64)(count > 0xffff ? 0xffff : count) << 48);
> -               buffer[i++] = __cpu_to_le64(entry);
> +               buf[i++] = __cpu_to_le64(entry);
>                 if (count <= 0xffff)
>                         break;
>                 count -= 0xffff;
>                 sector += 0xffff;
>         }
> -
> -       used_bytes = ALIGN(i * 8, 512);
> -       memset(buffer + i, 0, used_bytes - i * 8);
> -       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
> +       r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
>         spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>
> -       return used_bytes;
> +       return r;
>  }
>
>  /**
>   * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>   * @cmd: SCSI command being translated
>   * @lba: Starting sector
> - * @num: Number of logical sectors to be zero'd.
> + * @num: Number of sectors to be zero'd.
>   *
> - * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
> + * Rewrite the WRITE SAME payload to be an SCT Write Same formatted
>   * descriptor.
>   * NOTE: Writes a pattern (0's) in the foreground.
> - *       Large write-same requents can timeout.
> + *
> + * Return: Number of bytes copied into sglist.
>   */
> -static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
> +static size_t ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>  {
> -       u16 *sctpg;
> +       struct scsi_device *sdp = cmd->device;
> +       size_t len = sdp->sector_size;
> +       size_t r;
> +       u16 *buf;
>         unsigned long flags;
>
>         spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> -       sctpg = ((void *)ata_scsi_rbuf);
> +       buf = ((void *)ata_scsi_rbuf);
> +
> +       put_unaligned_le16(0x0002,  &buf[0]); /* SCT_ACT_WRITE_SAME */
> +       put_unaligned_le16(0x0101,  &buf[1]); /* WRITE PTRN FG */
> +       put_unaligned_le64(lba,     &buf[2]);
> +       put_unaligned_le64(num,     &buf[6]);
> +       put_unaligned_le32(0u,      &buf[10]); /* pattern */
> +
> +       WARN_ON(len > ATA_SCSI_RBUF_SIZE);
>
> -       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> -       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
> -       put_unaligned_le64(lba,     &sctpg[2]);
> -       put_unaligned_le64(num,     &sctpg[6]);
> -       put_unaligned_le32(0u,      &sctpg[10]);
> +       if (len > ATA_SCSI_RBUF_SIZE)
> +               len = ATA_SCSI_RBUF_SIZE;
>
> -       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
> +       r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
>         spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +
> +       return r;
>  }
>
>  /**
> @@ -3371,11 +3388,13 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  {
>         struct ata_taskfile *tf = &qc->tf;
>         struct scsi_cmnd *scmd = qc->scsicmd;
> +       struct scsi_device *sdp = scmd->device;
> +       size_t len = sdp->sector_size;
>         struct ata_device *dev = qc->dev;
>         const u8 *cdb = scmd->cmnd;
>         u64 block;
>         u32 n_block;
> -       const u32 trmax = ATA_MAX_TRIM_RNUM;
> +       const u32 trmax = len >> 3;
>         u32 size;
>         u16 fp;
>         u8 bp = 0xff;
> @@ -3420,8 +3439,16 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         if (!scsi_sg_count(scmd))
>                 goto invalid_param_len;
>
> +       /*
> +        * size must match sector size in bytes
> +        * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count)
> +        * is defined as number of 512 byte blocks to be transferred.
> +        */
>         if (unmap) {
>                 size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
> +               if (size != len)
> +                       goto invalid_param_len;
> +
>                 if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
>                         /* Newer devices support queued TRIM commands */
>                         tf->protocol = ATA_PROT_NCQ;
> @@ -3441,7 +3468,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>                         tf->command = ATA_CMD_DSM;
>                 }
>         } else {
> -               ata_format_sct_write_same(scmd, block, n_block);
> +               size = ata_format_sct_write_same(scmd, block, n_block);
> +               if (size != len)
> +                       goto invalid_param_len;
>
>                 tf->hob_feature = 0;
>                 tf->feature = 0;
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22  8:31   ` [PATCH v6 3/4] " Tom Yan
@ 2016-08-22  8:33     ` Tom Yan
  2016-08-22 15:04       ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22  8:33 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 22 August 2016 at 08:31, Tom Yan <tom.ty89@gmail.com> wrote:
> As mentioned before, as of the latest draft of ACS-4, nothing about a
> larger payload size is mentioned. Conservatively speaking, it sort of

*payload block size

> means that we are allowing four 512-byte block payload on 4Kn device

*eight 512-byte-block payload

> regardless of the reported limit in the IDENTIFY DEVICE data. I am
> really not sure if it's a good thing to do. Doesn't seem necessary
> anyway, especially when our block layer does not support such a large
> bio size (well, yet), so each request will end up using a payload of
> two 512-byte blocks at max anyway.
>
> Also, it's IMHO better to do it in a seperate patch (series) after the
> SCT Write Same support has entered libata's repo too, because this has
> nothing to with it but TRIM translation. In case the future ACS
> standards has clearer/better instruction on this, it will be easier
> for us to revert/follow up too.
>
> And you'll need to fix the Block Limits VPD simulation
> (ata_scsiop_inq_b0) too, so that it will advertise the Maximum Write
> Same Length dynamically as per the logical sector size, otherwise your
> effort will be completely in vain, even if our block layer is
> overhauled in the future.
>
> Please be noted that, since your haven't touched ata_scsiop_inq_b0 at
> all, the reported Maximum Write Same Length will be:
>
> On device with TRIM support:
> - 4194240 LOGICAL sector per request split / command
> -- ~=2G on non-4Kn drives
> -- ~=16G on non-4Kn drives
>
> On device without TRIM support:
> - 0 --> SD_MAX_WS10_BLOCKS (65535) per request split / command
> -- ~= 32M on non-4Kn drives
> -- ~=256M on non-4Kn drives
>
> Even if we ignore the upper limit(s) of the block layer, do we want
> such inconsistencies?
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22  8:33     ` Tom Yan
@ 2016-08-22 15:04       ` Shaun Tancheff
  2016-08-22 17:02         ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22 15:04 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@gmail.com> wrote:
>> As mentioned before, as of the latest draft of ACS-4, nothing about a
>> larger payload size is mentioned. Conservatively speaking, it sort of
>
> *payload block size
>
>> means that we are allowing four 512-byte block payload on 4Kn device
>
> *eight 512-byte-block payload
>
>> regardless of the reported limit in the IDENTIFY DEVICE data. I am
>> really not sure if it's a good thing to do. Doesn't seem necessary
>> anyway, especially when our block layer does not support such a large
>> bio size (well, yet), so each request will end up using a payload of
>> two 512-byte blocks at max anyway.
>>
>> Also, it's IMHO better to do it in a seperate patch (series) after the
>> SCT Write Same support has entered libata's repo too, because this has
>> nothing to with it but TRIM translation. In case the future ACS
>> standards has clearer/better instruction on this, it will be easier
>> for us to revert/follow up too.

I am certainly fine with dropping this patch as it is not critical to
the reset of the series.

Nothing will break if we stick with the 512 byte fixed limit. This
is at most a prep patch for handling increased limits should
they be reported.

All it really is doing is acknowledging that any write same
must have a payload of sector_size which can be something
larger than 512 bytes.

>> And you'll need to fix the Block Limits VPD simulation
>> (ata_scsiop_inq_b0) too, so that it will advertise the Maximum Write
>> Same Length dynamically as per the logical sector size, otherwise your
>> effort will be completely in vain, even if our block layer is
>> overhauled in the future.

Martin had earlier suggested that I leave the write same defaults
as is due to concerns with misbehaving hardware.

I think your patch adjusting the reported limits is reasonable
enough. It seems to me we should have the hardware
report it's actual limits, for example, report what the spec
allows.

Of course there are lots of reasons to limit the absolute
maximums.

So in this case we are just enabling the limit to be
increased but not changing the current black-listing
that distrusts DSM Trim. Once we have 4Kn devices
to test then we can start white-listing and see if there
is an overall increase in performance.

>> Please be noted that, since your haven't touched ata_scsiop_inq_b0 at
>> all, the reported Maximum Write Same Length will be:
>>
>> On device with TRIM support:
>> - 4194240 LOGICAL sector per request split / command
>> -- ~=2G on non-4Kn drives
>> -- ~=16G on non-4Kn drives
>>
>> On device without TRIM support:
>> - 0 --> SD_MAX_WS10_BLOCKS (65535) per request split / command
>> -- ~= 32M on non-4Kn drives
>> -- ~=256M on non-4Kn drives
>>
>> Even if we ignore the upper limit(s) of the block layer, do we want
>> such inconsistencies?

Hmm. Overall I think it is still okay if a bit confusing.
It is possible that for devices which support SCT Write Same
and DSM TRIM will still Trim faster than they can Write Same,
However the internal implementation is opaque so I can't
say if Write Same is often implemented in terms of TRIM
or not. I mean that's how _I_ do it [Write 1 block and map
N blocks to it], But not every FTL will have come to the
same conclusion.

I also suspect that given the choice for most use casess that
TRIM is preferred over WS when TRIM supports returning
zeroed blocks.

-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22 15:04       ` Shaun Tancheff
@ 2016-08-22 17:02         ` Tom Yan
  2016-08-22 18:00           ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 17:02 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 22 August 2016 at 15:04, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@gmail.com> wrote:
>>> As mentioned before, as of the latest draft of ACS-4, nothing about a
>>> larger payload size is mentioned. Conservatively speaking, it sort of
>>
>> *payload block size
>>
>>> means that we are allowing four 512-byte block payload on 4Kn device
>>
>> *eight 512-byte-block payload
>>
>>> regardless of the reported limit in the IDENTIFY DEVICE data. I am
>>> really not sure if it's a good thing to do. Doesn't seem necessary
>>> anyway, especially when our block layer does not support such a large
>>> bio size (well, yet), so each request will end up using a payload of
>>> two 512-byte blocks at max anyway.
>>>
>>> Also, it's IMHO better to do it in a seperate patch (series) after the
>>> SCT Write Same support has entered libata's repo too, because this has
>>> nothing to with it but TRIM translation. In case the future ACS
>>> standards has clearer/better instruction on this, it will be easier
>>> for us to revert/follow up too.
>
> I am certainly fine with dropping this patch as it is not critical to
> the reset of the series.
>
> Nothing will break if we stick with the 512 byte fixed limit. This
> is at most a prep patch for handling increased limits should
> they be reported.
>
> All it really is doing is acknowledging that any write same
> must have a payload of sector_size which can be something
> larger than 512 bytes.

Actually I am not sure if we should hard code the limit
ata_format_dsm_trim_descr() / ata_set_lba_range_entries() at all. The
current implementation (with or without your patch) seems redundant
and unnecessary to me.

All we need to do should be: making sure that the block limits VPD
advertises a safe Maximum Write Same Length, and reject Write Same
(16) commands that have "number of blocks" that exceeds the limit
(which is what I introduced in commit 5c79097a28c2, "libata-scsi:
reject WRITE SAME (16) with n_block that exceeds limit").

In that case, we don't need to hard code the limit in the
while-condition again; instead we should just make it end with the
request size, since the accepted request could never be larger than
the limit we advertise.

>
>>> And you'll need to fix the Block Limits VPD simulation
>>> (ata_scsiop_inq_b0) too, so that it will advertise the Maximum Write
>>> Same Length dynamically as per the logical sector size, otherwise your
>>> effort will be completely in vain, even if our block layer is
>>> overhauled in the future.
>
> Martin had earlier suggested that I leave the write same defaults
> as is due to concerns with misbehaving hardware.

It doesn't really apply in libata's anyway. SD_MAX_WS10_BLOCKS means
nothing to ATA drives, except from coincidentally being the same value
as ATA_MAX_SECTORS_LBA48 (which technically should have been 65536
instead).

>
> I think your patch adjusting the reported limits is reasonable
> enough. It seems to me we should have the hardware
> report it's actual limits, for example, report what the spec
> allows.

As you mentioned yourself before, technically SCT Write Same does not
have a limit. The only practical limit is the timeout in the SCSI
layer, so the actual bytes being (over)written is probably our only
concern.

For the case of TRIM, devices do report a limit in their IDENTIFY
DEVICE data. However, as Martin always said, it is not an always-safe
piece of data for us to refer to, that's why we have been statically
allowing only 1-block payload.

Therefore, it seems convenient (and consistent) that we make SCT Write
Same always use the same limit as TRIM, no matter if it is supported
on a certain device. And to make sure the actual bytes being written /
time required per command does not increase enormously as per the
sector size, we decrease the limit accordingly. Certainly that's not
necessary if 16G per command is fine on most devices.

Also, does SCT Write Same commands that write 32M/256M per command
make any sense? I mean would we benefit from such small SCT Write Same
commands at all?

>
> Of course there are lots of reasons to limit the absolute
> maximums.
>
> So in this case we are just enabling the limit to be
> increased but not changing the current black-listing
> that distrusts DSM Trim. Once we have 4Kn devices
> to test then we can start white-listing and see if there
> is an overall increase in performance.
>
>>> Please be noted that, since your haven't touched ata_scsiop_inq_b0 at
>>> all, the reported Maximum Write Same Length will be:
>>>
>>> On device with TRIM support:
>>> - 4194240 LOGICAL sector per request split / command
>>> -- ~=2G on non-4Kn drives
>>> -- ~=16G on non-4Kn drives
>>>
>>> On device without TRIM support:
>>> - 0 --> SD_MAX_WS10_BLOCKS (65535) per request split / command
>>> -- ~= 32M on non-4Kn drives
>>> -- ~=256M on non-4Kn drives
>>>
>>> Even if we ignore the upper limit(s) of the block layer, do we want
>>> such inconsistencies?
>
> Hmm. Overall I think it is still okay if a bit confusing.
> It is possible that for devices which support SCT Write Same
> and DSM TRIM will still Trim faster than they can Write Same,
> However the internal implementation is opaque so I can't
> say if Write Same is often implemented in terms of TRIM
> or not. I mean that's how _I_ do it [Write 1 block and map
> N blocks to it], But not every FTL will have come to the
> same conclusion.

Why would SCT Write Same be implemented in terms of TRIM? Neither
would we need to care about that anyway. Considering we will unlikely
allow multi-block payload TRIM, and we probably have no reason to
touch the SCSI Write Same timeout, the only thing we need to consider
is whether we want to decrease the advertised limit base on the
typical SCT Write Same speed on traditional HDDs and the timeout,
especially in the 4Kn case.

Since I have no experience with SCT Write Same at all, and neither do
I own any spinning HDD at all, I cannot firmly suggest what to do. All
I can suggest is: should we decrease it per sector size? Or would 2G
per command still be too large to avoid timeout?

>
> I also suspect that given the choice for most use casess that
> TRIM is preferred over WS when TRIM supports returning
> zeroed blocks.

Well, for devices with discard_zeroes_data = 1, the block layer will
not issue write same requests (see blkdev_issue_zeroout in
block/blk-lib.c). However, libata only consider the RZAT support bit
from a white list of devices (see ata_scsiop_read_cap in libata-scsi
and the white list in libata-core).

>
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22 17:02         ` Tom Yan
@ 2016-08-22 18:00           ` Shaun Tancheff
  2016-08-22 18:52             ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22 18:00 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 12:02 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 15:04, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@gmail.com> wrote:

> Since I have no experience with SCT Write Same at all, and neither do
> I own any spinning HDD at all, I cannot firmly suggest what to do. All
> I can suggest is: should we decrease it per sector size? Or would 2G
> per command still be too large to avoid timeout?

Timeout for WS is 120 seconds so we should be fine there.

The number to look for is the:
   Max. Sustained Transfer Rate OD (MB/s): 190 8TB (180 5TB)

Which means the above drives should complete a 2G write in
about 10 to 11 seconds.

If these were 4Kn drives and we allowed a 16G max then it
would be 80-90 seconds, assuming the write speed didn't
get any better.

So holding the maximum to around 2G is probably the best
overall, in my opinion.

-- 
Shaun Tancheff

On Mon, Aug 22, 2016 at 12:02 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 15:04, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@gmail.com> wrote:
>>>> As mentioned before, as of the latest draft of ACS-4, nothing about a
>>>> larger payload size is mentioned. Conservatively speaking, it sort of
>>>
>>> *payload block size
>>>
>>>> means that we are allowing four 512-byte block payload on 4Kn device
>>>
>>> *eight 512-byte-block payload
>>>
>>>> regardless of the reported limit in the IDENTIFY DEVICE data. I am
>>>> really not sure if it's a good thing to do. Doesn't seem necessary
>>>> anyway, especially when our block layer does not support such a large
>>>> bio size (well, yet), so each request will end up using a payload of
>>>> two 512-byte blocks at max anyway.
>>>>
>>>> Also, it's IMHO better to do it in a seperate patch (series) after the
>>>> SCT Write Same support has entered libata's repo too, because this has
>>>> nothing to with it but TRIM translation. In case the future ACS
>>>> standards has clearer/better instruction on this, it will be easier
>>>> for us to revert/follow up too.
>>
>> I am certainly fine with dropping this patch as it is not critical to
>> the reset of the series.
>>
>> Nothing will break if we stick with the 512 byte fixed limit. This
>> is at most a prep patch for handling increased limits should
>> they be reported.
>>
>> All it really is doing is acknowledging that any write same
>> must have a payload of sector_size which can be something
>> larger than 512 bytes.
>
> Actually I am not sure if we should hard code the limit
> ata_format_dsm_trim_descr() / ata_set_lba_range_entries() at all. The
> current implementation (with or without your patch) seems redundant
> and unnecessary to me.
>
> All we need to do should be: making sure that the block limits VPD
> advertises a safe Maximum Write Same Length, and reject Write Same
> (16) commands that have "number of blocks" that exceeds the limit
> (which is what I introduced in commit 5c79097a28c2, "libata-scsi:
> reject WRITE SAME (16) with n_block that exceeds limit").
>
> In that case, we don't need to hard code the limit in the
> while-condition again; instead we should just make it end with the
> request size, since the accepted request could never be larger than
> the limit we advertise.
>
>>
>>>> And you'll need to fix the Block Limits VPD simulation
>>>> (ata_scsiop_inq_b0) too, so that it will advertise the Maximum Write
>>>> Same Length dynamically as per the logical sector size, otherwise your
>>>> effort will be completely in vain, even if our block layer is
>>>> overhauled in the future.
>>
>> Martin had earlier suggested that I leave the write same defaults
>> as is due to concerns with misbehaving hardware.
>
> It doesn't really apply in libata's anyway. SD_MAX_WS10_BLOCKS means
> nothing to ATA drives, except from coincidentally being the same value
> as ATA_MAX_SECTORS_LBA48 (which technically should have been 65536
> instead).
>
>>
>> I think your patch adjusting the reported limits is reasonable
>> enough. It seems to me we should have the hardware
>> report it's actual limits, for example, report what the spec
>> allows.
>
> As you mentioned yourself before, technically SCT Write Same does not
> have a limit. The only practical limit is the timeout in the SCSI
> layer, so the actual bytes being (over)written is probably our only
> concern.
>
> For the case of TRIM, devices do report a limit in their IDENTIFY
> DEVICE data. However, as Martin always said, it is not an always-safe
> piece of data for us to refer to, that's why we have been statically
> allowing only 1-block payload.
>
> Therefore, it seems convenient (and consistent) that we make SCT Write
> Same always use the same limit as TRIM, no matter if it is supported
> on a certain device. And to make sure the actual bytes being written /
> time required per command does not increase enormously as per the
> sector size, we decrease the limit accordingly. Certainly that's not
> necessary if 16G per command is fine on most devices.
>
> Also, does SCT Write Same commands that write 32M/256M per command
> make any sense? I mean would we benefit from such small SCT Write Same
> commands at all?
>
>>
>> Of course there are lots of reasons to limit the absolute
>> maximums.
>>
>> So in this case we are just enabling the limit to be
>> increased but not changing the current black-listing
>> that distrusts DSM Trim. Once we have 4Kn devices
>> to test then we can start white-listing and see if there
>> is an overall increase in performance.
>>
>>>> Please be noted that, since your haven't touched ata_scsiop_inq_b0 at
>>>> all, the reported Maximum Write Same Length will be:
>>>>
>>>> On device with TRIM support:
>>>> - 4194240 LOGICAL sector per request split / command
>>>> -- ~=2G on non-4Kn drives
>>>> -- ~=16G on non-4Kn drives
>>>>
>>>> On device without TRIM support:
>>>> - 0 --> SD_MAX_WS10_BLOCKS (65535) per request split / command
>>>> -- ~= 32M on non-4Kn drives
>>>> -- ~=256M on non-4Kn drives
>>>>
>>>> Even if we ignore the upper limit(s) of the block layer, do we want
>>>> such inconsistencies?
>>
>> Hmm. Overall I think it is still okay if a bit confusing.
>> It is possible that for devices which support SCT Write Same
>> and DSM TRIM will still Trim faster than they can Write Same,
>> However the internal implementation is opaque so I can't
>> say if Write Same is often implemented in terms of TRIM
>> or not. I mean that's how _I_ do it [Write 1 block and map
>> N blocks to it], But not every FTL will have come to the
>> same conclusion.
>
> Why would SCT Write Same be implemented in terms of TRIM? Neither
> would we need to care about that anyway. Considering we will unlikely
> allow multi-block payload TRIM, and we probably have no reason to
> touch the SCSI Write Same timeout, the only thing we need to consider
> is whether we want to decrease the advertised limit base on the
> typical SCT Write Same speed on traditional HDDs and the timeout,
> especially in the 4Kn case.
>
> Since I have no experience with SCT Write Same at all, and neither do
> I own any spinning HDD at all, I cannot firmly suggest what to do. All
> I can suggest is: should we decrease it per sector size? Or would 2G
> per command still be too large to avoid timeout?
>
>>
>> I also suspect that given the choice for most use casess that
>> TRIM is preferred over WS when TRIM supports returning
>> zeroed blocks.
>
> Well, for devices with discard_zeroes_data = 1, the block layer will
> not issue write same requests (see blkdev_issue_zeroout in
> block/blk-lib.c). However, libata only consider the RZAT support bit
> from a white list of devices (see ata_scsiop_read_cap in libata-scsi
> and the white list in libata-core).
>
>>
>> --
>> Shaun Tancheff



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22 18:00           ` Shaun Tancheff
@ 2016-08-22 18:52             ` Tom Yan
  2016-08-22 20:57               ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 18:52 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

In that case I see no reason that my suggestion should not be adopted.
Currently speaking (as I mentioned in the commit message) it is
reasonable to decrease it per logical sector size in the TRIM-only
sense as well because of the block layer / bio size limit.

FWIW, as of ACS-4, RANGE LENGTH field in DSM(XL)/TRIM range entry
should be "number of logical sectors". Therefore, if there will be any
4Kn SSDs, _bytes_ TRIM'd per command is expected to be increased with
the sector size as well, just like SCT Write Same.

For the "payload block size" that is "always" 512-byte as per the same
spec, I don't think we need to concern about it. I think it only
matters if we want to enable multi-block TRIM payload according to the
reported limit in IDENTIFY DEVICE data. Probably it merely means that
the "each of the blocks" in the reported limit will always mean 64
TRIM entries, even on 4Kn drives, instead of 512.

On 23 August 2016 at 02:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 12:02 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 22 August 2016 at 15:04, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>>> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@gmail.com> wrote:
>
>> Since I have no experience with SCT Write Same at all, and neither do
>> I own any spinning HDD at all, I cannot firmly suggest what to do. All
>> I can suggest is: should we decrease it per sector size? Or would 2G
>> per command still be too large to avoid timeout?
>
> Timeout for WS is 120 seconds so we should be fine there.
>
> The number to look for is the:
>    Max. Sustained Transfer Rate OD (MB/s): 190 8TB (180 5TB)
>
> Which means the above drives should complete a 2G write in
> about 10 to 11 seconds.
>
> If these were 4Kn drives and we allowed a 16G max then it
> would be 80-90 seconds, assuming the write speed didn't
> get any better.
>
> So holding the maximum to around 2G is probably the best
> overall, in my opinion.
>
> --
> Shaun Tancheff
>
> On Mon, Aug 22, 2016 at 12:02 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 22 August 2016 at 15:04, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> On Mon, Aug 22, 2016 at 3:33 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>>> On 22 August 2016 at 08:31, Tom Yan <tom.ty89@gmail.com> wrote:
>>>>> As mentioned before, as of the latest draft of ACS-4, nothing about a
>>>>> larger payload size is mentioned. Conservatively speaking, it sort of
>>>>
>>>> *payload block size
>>>>
>>>>> means that we are allowing four 512-byte block payload on 4Kn device
>>>>
>>>> *eight 512-byte-block payload
>>>>
>>>>> regardless of the reported limit in the IDENTIFY DEVICE data. I am
>>>>> really not sure if it's a good thing to do. Doesn't seem necessary
>>>>> anyway, especially when our block layer does not support such a large
>>>>> bio size (well, yet), so each request will end up using a payload of
>>>>> two 512-byte blocks at max anyway.
>>>>>
>>>>> Also, it's IMHO better to do it in a seperate patch (series) after the
>>>>> SCT Write Same support has entered libata's repo too, because this has
>>>>> nothing to with it but TRIM translation. In case the future ACS
>>>>> standards has clearer/better instruction on this, it will be easier
>>>>> for us to revert/follow up too.
>>>
>>> I am certainly fine with dropping this patch as it is not critical to
>>> the reset of the series.
>>>
>>> Nothing will break if we stick with the 512 byte fixed limit. This
>>> is at most a prep patch for handling increased limits should
>>> they be reported.
>>>
>>> All it really is doing is acknowledging that any write same
>>> must have a payload of sector_size which can be something
>>> larger than 512 bytes.
>>
>> Actually I am not sure if we should hard code the limit
>> ata_format_dsm_trim_descr() / ata_set_lba_range_entries() at all. The
>> current implementation (with or without your patch) seems redundant
>> and unnecessary to me.
>>
>> All we need to do should be: making sure that the block limits VPD
>> advertises a safe Maximum Write Same Length, and reject Write Same
>> (16) commands that have "number of blocks" that exceeds the limit
>> (which is what I introduced in commit 5c79097a28c2, "libata-scsi:
>> reject WRITE SAME (16) with n_block that exceeds limit").
>>
>> In that case, we don't need to hard code the limit in the
>> while-condition again; instead we should just make it end with the
>> request size, since the accepted request could never be larger than
>> the limit we advertise.
>>
>>>
>>>>> And you'll need to fix the Block Limits VPD simulation
>>>>> (ata_scsiop_inq_b0) too, so that it will advertise the Maximum Write
>>>>> Same Length dynamically as per the logical sector size, otherwise your
>>>>> effort will be completely in vain, even if our block layer is
>>>>> overhauled in the future.
>>>
>>> Martin had earlier suggested that I leave the write same defaults
>>> as is due to concerns with misbehaving hardware.
>>
>> It doesn't really apply in libata's anyway. SD_MAX_WS10_BLOCKS means
>> nothing to ATA drives, except from coincidentally being the same value
>> as ATA_MAX_SECTORS_LBA48 (which technically should have been 65536
>> instead).
>>
>>>
>>> I think your patch adjusting the reported limits is reasonable
>>> enough. It seems to me we should have the hardware
>>> report it's actual limits, for example, report what the spec
>>> allows.
>>
>> As you mentioned yourself before, technically SCT Write Same does not
>> have a limit. The only practical limit is the timeout in the SCSI
>> layer, so the actual bytes being (over)written is probably our only
>> concern.
>>
>> For the case of TRIM, devices do report a limit in their IDENTIFY
>> DEVICE data. However, as Martin always said, it is not an always-safe
>> piece of data for us to refer to, that's why we have been statically
>> allowing only 1-block payload.
>>
>> Therefore, it seems convenient (and consistent) that we make SCT Write
>> Same always use the same limit as TRIM, no matter if it is supported
>> on a certain device. And to make sure the actual bytes being written /
>> time required per command does not increase enormously as per the
>> sector size, we decrease the limit accordingly. Certainly that's not
>> necessary if 16G per command is fine on most devices.
>>
>> Also, does SCT Write Same commands that write 32M/256M per command
>> make any sense? I mean would we benefit from such small SCT Write Same
>> commands at all?
>>
>>>
>>> Of course there are lots of reasons to limit the absolute
>>> maximums.
>>>
>>> So in this case we are just enabling the limit to be
>>> increased but not changing the current black-listing
>>> that distrusts DSM Trim. Once we have 4Kn devices
>>> to test then we can start white-listing and see if there
>>> is an overall increase in performance.
>>>
>>>>> Please be noted that, since your haven't touched ata_scsiop_inq_b0 at
>>>>> all, the reported Maximum Write Same Length will be:
>>>>>
>>>>> On device with TRIM support:
>>>>> - 4194240 LOGICAL sector per request split / command
>>>>> -- ~=2G on non-4Kn drives
>>>>> -- ~=16G on non-4Kn drives
>>>>>
>>>>> On device without TRIM support:
>>>>> - 0 --> SD_MAX_WS10_BLOCKS (65535) per request split / command
>>>>> -- ~= 32M on non-4Kn drives
>>>>> -- ~=256M on non-4Kn drives
>>>>>
>>>>> Even if we ignore the upper limit(s) of the block layer, do we want
>>>>> such inconsistencies?
>>>
>>> Hmm. Overall I think it is still okay if a bit confusing.
>>> It is possible that for devices which support SCT Write Same
>>> and DSM TRIM will still Trim faster than they can Write Same,
>>> However the internal implementation is opaque so I can't
>>> say if Write Same is often implemented in terms of TRIM
>>> or not. I mean that's how _I_ do it [Write 1 block and map
>>> N blocks to it], But not every FTL will have come to the
>>> same conclusion.
>>
>> Why would SCT Write Same be implemented in terms of TRIM? Neither
>> would we need to care about that anyway. Considering we will unlikely
>> allow multi-block payload TRIM, and we probably have no reason to
>> touch the SCSI Write Same timeout, the only thing we need to consider
>> is whether we want to decrease the advertised limit base on the
>> typical SCT Write Same speed on traditional HDDs and the timeout,
>> especially in the 4Kn case.
>>
>> Since I have no experience with SCT Write Same at all, and neither do
>> I own any spinning HDD at all, I cannot firmly suggest what to do. All
>> I can suggest is: should we decrease it per sector size? Or would 2G
>> per command still be too large to avoid timeout?
>>
>>>
>>> I also suspect that given the choice for most use casess that
>>> TRIM is preferred over WS when TRIM supports returning
>>> zeroed blocks.
>>
>> Well, for devices with discard_zeroes_data = 1, the block layer will
>> not issue write same requests (see blkdev_issue_zeroout in
>> block/blk-lib.c). However, libata only consider the RZAT support bit
>> from a white list of devices (see ata_scsiop_read_cap in libata-scsi
>> and the white list in libata-core).
>>
>>>
>>> --
>>> Shaun Tancheff
>
>
>
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22  4:23 ` [PATCH v6 2/4] Add support for SCT Write Same Shaun Tancheff
  2016-08-22  6:27   ` Hannes Reinecke
@ 2016-08-22 19:20   ` Tom Yan
  2016-08-22 19:43     ` Shaun Tancheff
  2016-08-23 10:37   ` Tom Yan
  2 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 19:20 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 22 August 2016 at 12:23, Shaun Tancheff <shaun@tancheff.com> wrote:
> SATA drives may support write same via SCT. This is useful
> for setting the drive contents to a specific pattern (0's).
>
> Translate a SCSI WRITE SAME 16 command to be either a DSM TRIM
> command or an SCT Write Same command.
>
> Based on the UNMAP flag:
>   - When set translate to DSM TRIM
>   - When not set translate to SCT Write Same
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6:
>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
> v5:
>  - Addressed review comments
>  - Report support for ZBC only for zoned devices.
>  - kmap page during rewrite
>  - Fix unmap set to require trim or error, if not unmap then sct write
>    same or error.
> v4:
>  - Added partial MAINTENANCE_IN opcode simulation
>  - Dropped all changes in drivers/scsi/*
>  - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
> v3:
>  - Demux UNMAP/TRIM from WRITE SAME
> v2:
>  - Remove fugly ata hacking from sd.c
>
>  drivers/ata/libata-scsi.c | 199 +++++++++++++++++++++++++++++++++++++++-------
>  include/linux/ata.h       |  43 ++++++++++
>  2 files changed, 213 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 7990cb2..ebf1a04 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1159,8 +1159,6 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
>  {
>         sdev->use_10_for_rw = 1;
>         sdev->use_10_for_ms = 1;
> -       sdev->no_report_opcodes = 1;
> -       sdev->no_write_same = 1;
>
>         /* Schedule policy is determined by ->qc_defer() callback and
>          * it needs to see every deferred qc.  Set dev_blocked to 1 to
> @@ -3287,7 +3285,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>   * @cmd: SCSI command being translated
>   * @num: Maximum number of entries (nominally 64).
>   * @sector: Starting sector
> - * @count: Total Range of request
> + * @count: Total Range of request in logical sectors
>   *
>   * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
>   * descriptor.
> @@ -3330,6 +3328,45 @@ static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
>         return used_bytes;
>  }
>
> +/**
> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
> + * @cmd: SCSI command being translated
> + * @lba: Starting sector
> + * @num: Number of logical sectors to be zero'd.
> + *
> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
> + * descriptor.
> + * NOTE: Writes a pattern (0's) in the foreground.
> + *       Large write-same requents can timeout.
> + */
> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
> +{
> +       u16 *sctpg;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> +       sctpg = ((void *)ata_scsi_rbuf);
> +
> +       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> +       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
> +       put_unaligned_le64(lba,     &sctpg[2]);
> +       put_unaligned_le64(num,     &sctpg[6]);
> +       put_unaligned_le32(0u,      &sctpg[10]);
> +
> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +}
> +
> +/**
> + * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
> + * @qc: Command to be translated
> + *
> + * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
> + * an SCT Write Same command.
> + * Based on WRITE SAME has the UNMAP flag
> + *   When set translate to DSM TRIM
> + *   When clear translate to SCT Write Same
> + */
>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  {
>         struct ata_taskfile *tf = &qc->tf;
> @@ -3342,6 +3379,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         u32 size;
>         u16 fp;
>         u8 bp = 0xff;
> +       u8 unmap = cdb[1] & 0x8;
>
>         /* we may not issue DMA commands if no DMA mode is set */
>         if (unlikely(!dev->dma_mode))
> @@ -3353,11 +3391,26 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         }
>         scsi_16_lba_len(cdb, &block, &n_block);
>
> -       /* for now we only support WRITE SAME with the unmap bit set */
> -       if (unlikely(!(cdb[1] & 0x8))) {
> -               fp = 1;
> -               bp = 3;
> -               goto invalid_fld;
> +       if (unmap) {
> +               /* If trim is not enabled the cmd is invalid. */
> +               if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
> +                   !ata_id_has_trim(dev->id)) {
> +                       fp = 1;
> +                       bp = 3;
> +                       goto invalid_fld;
> +               }
> +               /* If the request is too large the cmd is invalid */
> +               if (n_block > 0xffff * trmax) {
> +                       fp = 2;
> +                       goto invalid_fld;
> +               }

This response should be generally applied to the Write Same (16)
translation, since it is required by SBC,

> +       } else {
> +               /* If write same is not available the cmd is invalid */
> +               if (!ata_id_sct_write_same(dev->id)) {
> +                       fp = 1;
> +                       bp = 3;
> +                       goto invalid_fld;
> +               }

therefore, you should add an n_block check here as well, if you are
going to advertise an Maximum Write Same Length even when the device
supports only SCT Write Same but not TRIM. Most likely you would want
to simply move the existing check one-level up (if the same limit is
advertised no matter TRIM is supported not or not).

>         }
>
>         /*
> @@ -3367,30 +3420,42 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         if (!scsi_sg_count(scmd))
>                 goto invalid_param_len;
>
> -       if (n_block <= 0xffff * trmax) {
> +       if (unmap) {
>                 size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
> +               if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> +                       /* Newer devices support queued TRIM commands */
> +                       tf->protocol = ATA_PROT_NCQ;
> +                       tf->command = ATA_CMD_FPDMA_SEND;
> +                       tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> +                       tf->nsect = qc->tag << 3;
> +                       tf->hob_feature = (size / 512) >> 8;
> +                       tf->feature = size / 512;
> +
> +                       tf->auxiliary = 1;
> +               } else {
> +                       tf->protocol = ATA_PROT_DMA;
> +                       tf->hob_feature = 0;
> +                       tf->feature = ATA_DSM_TRIM;
> +                       tf->hob_nsect = (size / 512) >> 8;
> +                       tf->nsect = size / 512;
> +                       tf->command = ATA_CMD_DSM;
> +               }
>         } else {
> -               fp = 2;
> -               goto invalid_fld;
> -       }
> -
> -       if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> -               /* Newer devices support queued TRIM commands */
> -               tf->protocol = ATA_PROT_NCQ;
> -               tf->command = ATA_CMD_FPDMA_SEND;
> -               tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> -               tf->nsect = qc->tag << 3;
> -               tf->hob_feature = (size / 512) >> 8;
> -               tf->feature = size / 512;
> +               ata_format_sct_write_same(scmd, block, n_block);
>
> -               tf->auxiliary = 1;
> -       } else {
> -               tf->protocol = ATA_PROT_DMA;
>                 tf->hob_feature = 0;
> -               tf->feature = ATA_DSM_TRIM;
> -               tf->hob_nsect = (size / 512) >> 8;
> -               tf->nsect = size / 512;
> -               tf->command = ATA_CMD_DSM;
> +               tf->feature = 0;
> +               tf->hob_nsect = 0;
> +               tf->nsect = 1;
> +               tf->lbah = 0;
> +               tf->lbam = 0;
> +               tf->lbal = ATA_CMD_STANDBYNOW1;
> +               tf->hob_lbah = 0;
> +               tf->hob_lbam = 0;
> +               tf->hob_lbal = 0;
> +               tf->device = ATA_CMD_STANDBYNOW1;
> +               tf->protocol = ATA_PROT_DMA;
> +               tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
>         }
>
>         tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
> @@ -3414,6 +3479,76 @@ invalid_opcode:
>  }
>
>  /**
> + *     ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN
> + *     @args: device MAINTENANCE_IN data / SCSI command of interest.
> + *     @rbuf: Response buffer, to which simulated SCSI cmd output is sent.
> + *
> + *     Yields a subset to satisfy scsi_report_opcode()
> + *
> + *     LOCKING:
> + *     spin_lock_irqsave(host lock)
> + */
> +static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf)
> +{
> +       struct ata_device *dev = args->dev;
> +       u8 *cdb = args->cmd->cmnd;
> +       u8 supported = 0;
> +       unsigned int err = 0;
> +
> +       if (cdb[2] != 1) {
> +               ata_dev_warn(dev, "invalid command format %d\n", cdb[2]);
> +               err = 2;
> +               goto out;
> +       }
> +       switch (cdb[3]) {
> +       case INQUIRY:
> +       case MODE_SENSE:
> +       case MODE_SENSE_10:
> +       case READ_CAPACITY:
> +       case SERVICE_ACTION_IN_16:
> +       case REPORT_LUNS:
> +       case REQUEST_SENSE:
> +       case SYNCHRONIZE_CACHE:
> +       case REZERO_UNIT:
> +       case SEEK_6:
> +       case SEEK_10:
> +       case TEST_UNIT_READY:
> +       case SEND_DIAGNOSTIC:
> +       case MAINTENANCE_IN:
> +       case READ_6:
> +       case READ_10:
> +       case READ_16:
> +       case WRITE_6:
> +       case WRITE_10:
> +       case WRITE_16:
> +       case ATA_12:
> +       case ATA_16:
> +       case VERIFY:
> +       case VERIFY_16:
> +       case MODE_SELECT:
> +       case MODE_SELECT_10:
> +       case START_STOP:
> +               supported = 3;
> +               break;
> +       case WRITE_SAME_16:
> +               if (ata_id_sct_write_same(dev->id))
> +                       supported = 3;
> +               break;
> +       case ZBC_IN:
> +       case ZBC_OUT:
> +               if (ata_id_zoned_cap(dev->id) ||
> +                   dev->class == ATA_DEV_ZAC)
> +                       supported = 3;
> +               break;
> +       default:
> +               break;
> +       }
> +out:
> +       rbuf[1] = supported; /* supported */
> +       return err;
> +}
> +
> +/**
>   *     ata_scsi_report_zones_complete - convert ATA output
>   *     @qc: command structure returning the data
>   *
> @@ -4193,6 +4328,13 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
>                         ata_scsi_invalid_field(dev, cmd, 1);
>                 break;
>
> +       case MAINTENANCE_IN:
> +               if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES)
> +                       ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in);
> +               else
> +                       ata_scsi_invalid_field(dev, cmd, 1);
> +               break;
> +
>         /* all other commands */
>         default:
>                 ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0);
> @@ -4225,7 +4367,6 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
>                 shost->max_lun = 1;
>                 shost->max_channel = 1;
>                 shost->max_cmd_len = 16;
> -               shost->no_write_same = 1;
>
>                 /* Schedule policy is determined by ->qc_defer()
>                  * callback and it needs to see every deferred qc.
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index 45a1d71..fdb1803 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -105,6 +105,7 @@ enum {
>         ATA_ID_CFA_KEY_MGMT     = 162,
>         ATA_ID_CFA_MODES        = 163,
>         ATA_ID_DATA_SET_MGMT    = 169,
> +       ATA_ID_SCT_CMD_XPORT    = 206,
>         ATA_ID_ROT_SPEED        = 217,
>         ATA_ID_PIO4             = (1 << 1),
>
> @@ -789,6 +790,48 @@ static inline bool ata_id_sense_reporting_enabled(const u16 *id)
>  }
>
>  /**
> + *
> + * Word: 206 - SCT Command Transport
> + *    15:12 - Vendor Specific
> + *     11:6 - Reserved
> + *        5 - SCT Command Transport Data Tables supported
> + *        4 - SCT Command Transport Features Control supported
> + *        3 - SCT Command Transport Error Recovery Control supported
> + *        2 - SCT Command Transport Write Same supported
> + *        1 - SCT Command Transport Long Sector Access supported
> + *        0 - SCT Command Transport supported
> + */
> +static inline bool ata_id_sct_data_tables(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_features_ctrl(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_write_same(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_long_sector_access(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_supported(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
> +}
> +
> +/**
>   *     ata_id_major_version    -       get ATA level of drive
>   *     @id: Identify data
>   *
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22  4:23 ` [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat Shaun Tancheff
  2016-08-22  6:27   ` Hannes Reinecke
@ 2016-08-22 19:27   ` Tom Yan
  2016-08-22 19:51     ` Shaun Tancheff
  2016-08-22 21:20   ` Tom Yan
  2 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 19:27 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 22 August 2016 at 12:23, Shaun Tancheff <shaun@tancheff.com> wrote:
> Safely overwriting the attached page to ATA format from the SCSI formatted
> variant.
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6:
>  - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
> v5:
>  - Added prep patch to work with non-page aligned scatterlist pages
>    and use kmap_atomic() to lock page during modification.
>
>  drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/ata.h       | 26 ----------------------
>  2 files changed, 51 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e207b33..7990cb2 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3282,6 +3282,54 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>         return 1;
>  }
>
> +/**
> + * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
> + * @cmd: SCSI command being translated
> + * @num: Maximum number of entries (nominally 64).
> + * @sector: Starting sector
> + * @count: Total Range of request
> + *
> + * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
> + * descriptor.
> + *
> + * Upto 64 entries of the format:
> + *   63:48 Range Length
> + *   47:0  LBA
> + *
> + *  Range Length of 0 is ignored.
> + *  LBA's should be sorted order and not overlap.
> + *
> + * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
> + */
> +static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
> +                                             u64 sector, u32 count)
> +{
> +       __le64 *buffer;
> +       u32 i = 0, used_bytes;
> +       unsigned long flags;
> +
> +       BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
> +
> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> +       buffer = ((void *)ata_scsi_rbuf);
> +       while (i < num) {
> +               u64 entry = sector |
> +                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
> +               buffer[i++] = __cpu_to_le64(entry);
> +               if (count <= 0xffff)
> +                       break;
> +               count -= 0xffff;
> +               sector += 0xffff;
> +       }
> +
> +       used_bytes = ALIGN(i * 8, 512);
> +       memset(buffer + i, 0, used_bytes - i * 8);
> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +
> +       return used_bytes;
> +}
> +
>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  {
>         struct ata_taskfile *tf = &qc->tf;
> @@ -3290,8 +3338,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         const u8 *cdb = scmd->cmnd;
>         u64 block;
>         u32 n_block;
> +       const u32 trmax = ATA_MAX_TRIM_RNUM;

This does not seem like a good idea.

>         u32 size;
> -       void *buf;
>         u16 fp;
>         u8 bp = 0xff;
>
> @@ -3319,10 +3367,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         if (!scsi_sg_count(scmd))
>                 goto invalid_param_len;
>
> -       buf = page_address(sg_page(scsi_sglist(scmd)));
> -
> -       if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
> -               size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
> +       if (n_block <= 0xffff * trmax) {

Note that the limit here would always be the advertised Maximum Write
Same Length (ata_scsiop_inq_b0). It would be best for them to look the
same. Besides, it doesn't seem necessary to create this trmax anyway.

> +               size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
>         } else {
>                 fp = 2;
>                 goto invalid_fld;
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index adbc812..45a1d71 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>  #endif
>  }
>
> -/*
> - * Write LBA Range Entries to the buffer that will cover the extent from
> - * sector to sector + count.  This is used for TRIM and for ADD LBA(S)
> - * TO NV CACHE PINNED SET.
> - */
> -static inline unsigned ata_set_lba_range_entries(void *_buffer,
> -               unsigned num, u64 sector, unsigned long count)
> -{
> -       __le64 *buffer = _buffer;
> -       unsigned i = 0, used_bytes;
> -
> -       while (i < num) {
> -               u64 entry = sector |
> -                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
> -               buffer[i++] = __cpu_to_le64(entry);
> -               if (count <= 0xffff)
> -                       break;
> -               count -= 0xffff;
> -               sector += 0xffff;
> -       }
> -
> -       used_bytes = ALIGN(i * 8, 512);
> -       memset(buffer + i, 0, used_bytes - i * 8);
> -       return used_bytes;
> -}
> -
>  static inline bool ata_ok(u8 status)
>  {
>         return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22 19:20   ` Tom Yan
@ 2016-08-22 19:43     ` Shaun Tancheff
  2016-08-22 20:14       ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22 19:43 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 2:20 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 12:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>> SATA drives may support write same via SCT. This is useful
>> for setting the drive contents to a specific pattern (0's).
>>
>> Translate a SCSI WRITE SAME 16 command to be either a DSM TRIM
>> command or an SCT Write Same command.
>>
>> Based on the UNMAP flag:
>>   - When set translate to DSM TRIM
>>   - When not set translate to SCT Write Same
>>
>> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
>> ---
>> v6:
>>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
>> v5:
>>  - Addressed review comments
>>  - Report support for ZBC only for zoned devices.
>>  - kmap page during rewrite
>>  - Fix unmap set to require trim or error, if not unmap then sct write
>>    same or error.
>> v4:
>>  - Added partial MAINTENANCE_IN opcode simulation
>>  - Dropped all changes in drivers/scsi/*
>>  - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
>> v3:
>>  - Demux UNMAP/TRIM from WRITE SAME
>> v2:
>>  - Remove fugly ata hacking from sd.c
>>
>>  drivers/ata/libata-scsi.c | 199 +++++++++++++++++++++++++++++++++++++++-------
>>  include/linux/ata.h       |  43 ++++++++++
>>  2 files changed, 213 insertions(+), 29 deletions(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index 7990cb2..ebf1a04 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -1159,8 +1159,6 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
>>  {
>>         sdev->use_10_for_rw = 1;
>>         sdev->use_10_for_ms = 1;
>> -       sdev->no_report_opcodes = 1;
>> -       sdev->no_write_same = 1;
>>
>>         /* Schedule policy is determined by ->qc_defer() callback and
>>          * it needs to see every deferred qc.  Set dev_blocked to 1 to
>> @@ -3287,7 +3285,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>>   * @cmd: SCSI command being translated
>>   * @num: Maximum number of entries (nominally 64).
>>   * @sector: Starting sector
>> - * @count: Total Range of request
>> + * @count: Total Range of request in logical sectors
>>   *
>>   * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
>>   * descriptor.
>> @@ -3330,6 +3328,45 @@ static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
>>         return used_bytes;
>>  }
>>
>> +/**
>> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>> + * @cmd: SCSI command being translated
>> + * @lba: Starting sector
>> + * @num: Number of logical sectors to be zero'd.
>> + *
>> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
>> + * descriptor.
>> + * NOTE: Writes a pattern (0's) in the foreground.
>> + *       Large write-same requents can timeout.
>> + */
>> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>> +{
>> +       u16 *sctpg;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>> +       sctpg = ((void *)ata_scsi_rbuf);
>> +
>> +       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
>> +       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
>> +       put_unaligned_le64(lba,     &sctpg[2]);
>> +       put_unaligned_le64(num,     &sctpg[6]);
>> +       put_unaligned_le32(0u,      &sctpg[10]);
>> +
>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>> +}
>> +
>> +/**
>> + * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
>> + * @qc: Command to be translated
>> + *
>> + * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
>> + * an SCT Write Same command.
>> + * Based on WRITE SAME has the UNMAP flag
>> + *   When set translate to DSM TRIM
>> + *   When clear translate to SCT Write Same
>> + */
>>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>  {
>>         struct ata_taskfile *tf = &qc->tf;
>> @@ -3342,6 +3379,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>         u32 size;
>>         u16 fp;
>>         u8 bp = 0xff;
>> +       u8 unmap = cdb[1] & 0x8;
>>
>>         /* we may not issue DMA commands if no DMA mode is set */
>>         if (unlikely(!dev->dma_mode))
>> @@ -3353,11 +3391,26 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>         }
>>         scsi_16_lba_len(cdb, &block, &n_block);
>>
>> -       /* for now we only support WRITE SAME with the unmap bit set */
>> -       if (unlikely(!(cdb[1] & 0x8))) {
>> -               fp = 1;
>> -               bp = 3;
>> -               goto invalid_fld;
>> +       if (unmap) {
>> +               /* If trim is not enabled the cmd is invalid. */
>> +               if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
>> +                   !ata_id_has_trim(dev->id)) {
>> +                       fp = 1;
>> +                       bp = 3;
>> +                       goto invalid_fld;
>> +               }
>> +               /* If the request is too large the cmd is invalid */
>> +               if (n_block > 0xffff * trmax) {
>> +                       fp = 2;
>> +                       goto invalid_fld;
>> +               }
>
> This response should be generally applied to the Write Same (16)
> translation, since it is required by SBC,
>
>> +       } else {
>> +               /* If write same is not available the cmd is invalid */
>> +               if (!ata_id_sct_write_same(dev->id)) {
>> +                       fp = 1;
>> +                       bp = 3;
>> +                       goto invalid_fld;
>> +               }
>
> therefore, you should add an n_block check here as well, if you are
> going to advertise an Maximum Write Same Length even when the device
> supports only SCT Write Same but not TRIM. Most likely you would want
> to simply move the existing check one-level up (if the same limit is
> advertised no matter TRIM is supported not or not).

Why would we enforce upper level limits on something that doesn't
have any?

If the upper level, or SG_IO, chooses to set a timeout of 10 hours and
wipe a whole disk it should be free to do so.

>>         }
>>
>>         /*
>> @@ -3367,30 +3420,42 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>         if (!scsi_sg_count(scmd))
>>                 goto invalid_param_len;
>>
>> -       if (n_block <= 0xffff * trmax) {
>> +       if (unmap) {
>>                 size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
>> +               if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
>> +                       /* Newer devices support queued TRIM commands */
>> +                       tf->protocol = ATA_PROT_NCQ;
>> +                       tf->command = ATA_CMD_FPDMA_SEND;
>> +                       tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
>> +                       tf->nsect = qc->tag << 3;
>> +                       tf->hob_feature = (size / 512) >> 8;
>> +                       tf->feature = size / 512;
>> +
>> +                       tf->auxiliary = 1;
>> +               } else {
>> +                       tf->protocol = ATA_PROT_DMA;
>> +                       tf->hob_feature = 0;
>> +                       tf->feature = ATA_DSM_TRIM;
>> +                       tf->hob_nsect = (size / 512) >> 8;
>> +                       tf->nsect = size / 512;
>> +                       tf->command = ATA_CMD_DSM;
>> +               }
>>         } else {
>> -               fp = 2;
>> -               goto invalid_fld;
>> -       }
>> -
>> -       if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
>> -               /* Newer devices support queued TRIM commands */
>> -               tf->protocol = ATA_PROT_NCQ;
>> -               tf->command = ATA_CMD_FPDMA_SEND;
>> -               tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
>> -               tf->nsect = qc->tag << 3;
>> -               tf->hob_feature = (size / 512) >> 8;
>> -               tf->feature = size / 512;
>> +               ata_format_sct_write_same(scmd, block, n_block);
>>
>> -               tf->auxiliary = 1;
>> -       } else {
>> -               tf->protocol = ATA_PROT_DMA;
>>                 tf->hob_feature = 0;
>> -               tf->feature = ATA_DSM_TRIM;
>> -               tf->hob_nsect = (size / 512) >> 8;
>> -               tf->nsect = size / 512;
>> -               tf->command = ATA_CMD_DSM;
>> +               tf->feature = 0;
>> +               tf->hob_nsect = 0;
>> +               tf->nsect = 1;
>> +               tf->lbah = 0;
>> +               tf->lbam = 0;
>> +               tf->lbal = ATA_CMD_STANDBYNOW1;
>> +               tf->hob_lbah = 0;
>> +               tf->hob_lbam = 0;
>> +               tf->hob_lbal = 0;
>> +               tf->device = ATA_CMD_STANDBYNOW1;
>> +               tf->protocol = ATA_PROT_DMA;
>> +               tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
>>         }
>>
>>         tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
>> @@ -3414,6 +3479,76 @@ invalid_opcode:
>>  }
>>
>>  /**
>> + *     ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN
>> + *     @args: device MAINTENANCE_IN data / SCSI command of interest.
>> + *     @rbuf: Response buffer, to which simulated SCSI cmd output is sent.
>> + *
>> + *     Yields a subset to satisfy scsi_report_opcode()
>> + *
>> + *     LOCKING:
>> + *     spin_lock_irqsave(host lock)
>> + */
>> +static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf)
>> +{
>> +       struct ata_device *dev = args->dev;
>> +       u8 *cdb = args->cmd->cmnd;
>> +       u8 supported = 0;
>> +       unsigned int err = 0;
>> +
>> +       if (cdb[2] != 1) {
>> +               ata_dev_warn(dev, "invalid command format %d\n", cdb[2]);
>> +               err = 2;
>> +               goto out;
>> +       }
>> +       switch (cdb[3]) {
>> +       case INQUIRY:
>> +       case MODE_SENSE:
>> +       case MODE_SENSE_10:
>> +       case READ_CAPACITY:
>> +       case SERVICE_ACTION_IN_16:
>> +       case REPORT_LUNS:
>> +       case REQUEST_SENSE:
>> +       case SYNCHRONIZE_CACHE:
>> +       case REZERO_UNIT:
>> +       case SEEK_6:
>> +       case SEEK_10:
>> +       case TEST_UNIT_READY:
>> +       case SEND_DIAGNOSTIC:
>> +       case MAINTENANCE_IN:
>> +       case READ_6:
>> +       case READ_10:
>> +       case READ_16:
>> +       case WRITE_6:
>> +       case WRITE_10:
>> +       case WRITE_16:
>> +       case ATA_12:
>> +       case ATA_16:
>> +       case VERIFY:
>> +       case VERIFY_16:
>> +       case MODE_SELECT:
>> +       case MODE_SELECT_10:
>> +       case START_STOP:
>> +               supported = 3;
>> +               break;
>> +       case WRITE_SAME_16:
>> +               if (ata_id_sct_write_same(dev->id))
>> +                       supported = 3;
>> +               break;
>> +       case ZBC_IN:
>> +       case ZBC_OUT:
>> +               if (ata_id_zoned_cap(dev->id) ||
>> +                   dev->class == ATA_DEV_ZAC)
>> +                       supported = 3;
>> +               break;
>> +       default:
>> +               break;
>> +       }
>> +out:
>> +       rbuf[1] = supported; /* supported */
>> +       return err;
>> +}
>> +
>> +/**
>>   *     ata_scsi_report_zones_complete - convert ATA output
>>   *     @qc: command structure returning the data
>>   *
>> @@ -4193,6 +4328,13 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
>>                         ata_scsi_invalid_field(dev, cmd, 1);
>>                 break;
>>
>> +       case MAINTENANCE_IN:
>> +               if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES)
>> +                       ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in);
>> +               else
>> +                       ata_scsi_invalid_field(dev, cmd, 1);
>> +               break;
>> +
>>         /* all other commands */
>>         default:
>>                 ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0);
>> @@ -4225,7 +4367,6 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
>>                 shost->max_lun = 1;
>>                 shost->max_channel = 1;
>>                 shost->max_cmd_len = 16;
>> -               shost->no_write_same = 1;
>>
>>                 /* Schedule policy is determined by ->qc_defer()
>>                  * callback and it needs to see every deferred qc.
>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>> index 45a1d71..fdb1803 100644
>> --- a/include/linux/ata.h
>> +++ b/include/linux/ata.h
>> @@ -105,6 +105,7 @@ enum {
>>         ATA_ID_CFA_KEY_MGMT     = 162,
>>         ATA_ID_CFA_MODES        = 163,
>>         ATA_ID_DATA_SET_MGMT    = 169,
>> +       ATA_ID_SCT_CMD_XPORT    = 206,
>>         ATA_ID_ROT_SPEED        = 217,
>>         ATA_ID_PIO4             = (1 << 1),
>>
>> @@ -789,6 +790,48 @@ static inline bool ata_id_sense_reporting_enabled(const u16 *id)
>>  }
>>
>>  /**
>> + *
>> + * Word: 206 - SCT Command Transport
>> + *    15:12 - Vendor Specific
>> + *     11:6 - Reserved
>> + *        5 - SCT Command Transport Data Tables supported
>> + *        4 - SCT Command Transport Features Control supported
>> + *        3 - SCT Command Transport Error Recovery Control supported
>> + *        2 - SCT Command Transport Write Same supported
>> + *        1 - SCT Command Transport Long Sector Access supported
>> + *        0 - SCT Command Transport supported
>> + */
>> +static inline bool ata_id_sct_data_tables(const u16 *id)
>> +{
>> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
>> +}
>> +
>> +static inline bool ata_id_sct_features_ctrl(const u16 *id)
>> +{
>> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
>> +}
>> +
>> +static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
>> +{
>> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
>> +}
>> +
>> +static inline bool ata_id_sct_write_same(const u16 *id)
>> +{
>> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
>> +}
>> +
>> +static inline bool ata_id_sct_long_sector_access(const u16 *id)
>> +{
>> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
>> +}
>> +
>> +static inline bool ata_id_sct_supported(const u16 *id)
>> +{
>> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
>> +}
>> +
>> +/**
>>   *     ata_id_major_version    -       get ATA level of drive
>>   *     @id: Identify data
>>   *
>> --
>> 2.9.3
>>



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22 19:27   ` Tom Yan
@ 2016-08-22 19:51     ` Shaun Tancheff
  2016-08-22 20:12       ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22 19:51 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 2:27 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 12:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>> Safely overwriting the attached page to ATA format from the SCSI formatted
>> variant.
>>
>> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
>> ---
>> v6:
>>  - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
>> v5:
>>  - Added prep patch to work with non-page aligned scatterlist pages
>>    and use kmap_atomic() to lock page during modification.
>>
>>  drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>>  include/linux/ata.h       | 26 ----------------------
>>  2 files changed, 51 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index e207b33..7990cb2 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -3282,6 +3282,54 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>>         return 1;
>>  }
>>
>> +/**
>> + * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
>> + * @cmd: SCSI command being translated
>> + * @num: Maximum number of entries (nominally 64).
>> + * @sector: Starting sector
>> + * @count: Total Range of request
>> + *
>> + * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
>> + * descriptor.
>> + *
>> + * Upto 64 entries of the format:
>> + *   63:48 Range Length
>> + *   47:0  LBA
>> + *
>> + *  Range Length of 0 is ignored.
>> + *  LBA's should be sorted order and not overlap.
>> + *
>> + * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
>> + */
>> +static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
>> +                                             u64 sector, u32 count)
>> +{
>> +       __le64 *buffer;
>> +       u32 i = 0, used_bytes;
>> +       unsigned long flags;
>> +
>> +       BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
>> +
>> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>> +       buffer = ((void *)ata_scsi_rbuf);
>> +       while (i < num) {
>> +               u64 entry = sector |
>> +                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
>> +               buffer[i++] = __cpu_to_le64(entry);
>> +               if (count <= 0xffff)
>> +                       break;
>> +               count -= 0xffff;
>> +               sector += 0xffff;
>> +       }
>> +
>> +       used_bytes = ALIGN(i * 8, 512);
>> +       memset(buffer + i, 0, used_bytes - i * 8);
>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>> +
>> +       return used_bytes;
>> +}
>> +
>>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>  {
>>         struct ata_taskfile *tf = &qc->tf;
>> @@ -3290,8 +3338,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>         const u8 *cdb = scmd->cmnd;
>>         u64 block;
>>         u32 n_block;
>> +       const u32 trmax = ATA_MAX_TRIM_RNUM;
>
> This does not seem like a good idea.
>
>>         u32 size;
>> -       void *buf;
>>         u16 fp;
>>         u8 bp = 0xff;
>>
>> @@ -3319,10 +3367,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>         if (!scsi_sg_count(scmd))
>>                 goto invalid_param_len;
>>
>> -       buf = page_address(sg_page(scsi_sglist(scmd)));
>> -
>> -       if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
>> -               size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
>> +       if (n_block <= 0xffff * trmax) {
>
> Note that the limit here would always be the advertised Maximum Write
> Same Length (ata_scsiop_inq_b0). It would be best for them to look the
> same. Besides, it doesn't seem necessary to create this trmax anyway.

It is entirely a style thing. I tend to prefer hex when describing an interface
that specifies number of bits. The trmax is just to keep the following line
under 80 chars.
I am not tied to either. I can change it if people really find it confusing.

>> +               size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
>>         } else {
>>                 fp = 2;
>>                 goto invalid_fld;
>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>> index adbc812..45a1d71 100644
>> --- a/include/linux/ata.h
>> +++ b/include/linux/ata.h
>> @@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>>  #endif
>>  }
>>
>> -/*
>> - * Write LBA Range Entries to the buffer that will cover the extent from
>> - * sector to sector + count.  This is used for TRIM and for ADD LBA(S)
>> - * TO NV CACHE PINNED SET.
>> - */
>> -static inline unsigned ata_set_lba_range_entries(void *_buffer,
>> -               unsigned num, u64 sector, unsigned long count)
>> -{
>> -       __le64 *buffer = _buffer;
>> -       unsigned i = 0, used_bytes;
>> -
>> -       while (i < num) {
>> -               u64 entry = sector |
>> -                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
>> -               buffer[i++] = __cpu_to_le64(entry);
>> -               if (count <= 0xffff)
>> -                       break;
>> -               count -= 0xffff;
>> -               sector += 0xffff;
>> -       }
>> -
>> -       used_bytes = ALIGN(i * 8, 512);
>> -       memset(buffer + i, 0, used_bytes - i * 8);
>> -       return used_bytes;
>> -}
>> -
>>  static inline bool ata_ok(u8 status)
>>  {
>>         return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
>> --
>> 2.9.3
>>



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22 19:51     ` Shaun Tancheff
@ 2016-08-22 20:12       ` Tom Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Yan @ 2016-08-22 20:12 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

Oh, 0xffff is fine for me, as long as you change it in
ata_scsiop_inq_b0 as well. Actually I think replacing it with a macro
(as suggested by Martin in another thread) is a good idea as well. A
line split would be better than introducing an unnecessary variable
that costs readability in the logic sense.

On 23 August 2016 at 03:51, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 2:27 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 22 August 2016 at 12:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>>> Safely overwriting the attached page to ATA format from the SCSI formatted
>>> variant.
>>>
>>> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
>>> ---
>>> v6:
>>>  - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>>>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
>>> v5:
>>>  - Added prep patch to work with non-page aligned scatterlist pages
>>>    and use kmap_atomic() to lock page during modification.
>>>
>>>  drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>>>  include/linux/ata.h       | 26 ----------------------
>>>  2 files changed, 51 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index e207b33..7990cb2 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -3282,6 +3282,54 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>>>         return 1;
>>>  }
>>>
>>> +/**
>>> + * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
>>> + * @cmd: SCSI command being translated
>>> + * @num: Maximum number of entries (nominally 64).
>>> + * @sector: Starting sector
>>> + * @count: Total Range of request
>>> + *
>>> + * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
>>> + * descriptor.
>>> + *
>>> + * Upto 64 entries of the format:
>>> + *   63:48 Range Length
>>> + *   47:0  LBA
>>> + *
>>> + *  Range Length of 0 is ignored.
>>> + *  LBA's should be sorted order and not overlap.
>>> + *
>>> + * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
>>> + */
>>> +static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
>>> +                                             u64 sector, u32 count)
>>> +{
>>> +       __le64 *buffer;
>>> +       u32 i = 0, used_bytes;
>>> +       unsigned long flags;
>>> +
>>> +       BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
>>> +
>>> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>>> +       buffer = ((void *)ata_scsi_rbuf);
>>> +       while (i < num) {
>>> +               u64 entry = sector |
>>> +                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
>>> +               buffer[i++] = __cpu_to_le64(entry);
>>> +               if (count <= 0xffff)
>>> +                       break;
>>> +               count -= 0xffff;
>>> +               sector += 0xffff;
>>> +       }
>>> +
>>> +       used_bytes = ALIGN(i * 8, 512);
>>> +       memset(buffer + i, 0, used_bytes - i * 8);
>>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>> +
>>> +       return used_bytes;
>>> +}
>>> +
>>>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>>  {
>>>         struct ata_taskfile *tf = &qc->tf;
>>> @@ -3290,8 +3338,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>>         const u8 *cdb = scmd->cmnd;
>>>         u64 block;
>>>         u32 n_block;
>>> +       const u32 trmax = ATA_MAX_TRIM_RNUM;
>>
>> This does not seem like a good idea.
>>
>>>         u32 size;
>>> -       void *buf;
>>>         u16 fp;
>>>         u8 bp = 0xff;
>>>
>>> @@ -3319,10 +3367,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>>         if (!scsi_sg_count(scmd))
>>>                 goto invalid_param_len;
>>>
>>> -       buf = page_address(sg_page(scsi_sglist(scmd)));
>>> -
>>> -       if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
>>> -               size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
>>> +       if (n_block <= 0xffff * trmax) {
>>
>> Note that the limit here would always be the advertised Maximum Write
>> Same Length (ata_scsiop_inq_b0). It would be best for them to look the
>> same. Besides, it doesn't seem necessary to create this trmax anyway.
>
> It is entirely a style thing. I tend to prefer hex when describing an interface
> that specifies number of bits. The trmax is just to keep the following line
> under 80 chars.
> I am not tied to either. I can change it if people really find it confusing.
>
>>> +               size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
>>>         } else {
>>>                 fp = 2;
>>>                 goto invalid_fld;
>>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>>> index adbc812..45a1d71 100644
>>> --- a/include/linux/ata.h
>>> +++ b/include/linux/ata.h
>>> @@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>>>  #endif
>>>  }
>>>
>>> -/*
>>> - * Write LBA Range Entries to the buffer that will cover the extent from
>>> - * sector to sector + count.  This is used for TRIM and for ADD LBA(S)
>>> - * TO NV CACHE PINNED SET.
>>> - */
>>> -static inline unsigned ata_set_lba_range_entries(void *_buffer,
>>> -               unsigned num, u64 sector, unsigned long count)
>>> -{
>>> -       __le64 *buffer = _buffer;
>>> -       unsigned i = 0, used_bytes;
>>> -
>>> -       while (i < num) {
>>> -               u64 entry = sector |
>>> -                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
>>> -               buffer[i++] = __cpu_to_le64(entry);
>>> -               if (count <= 0xffff)
>>> -                       break;
>>> -               count -= 0xffff;
>>> -               sector += 0xffff;
>>> -       }
>>> -
>>> -       used_bytes = ALIGN(i * 8, 512);
>>> -       memset(buffer + i, 0, used_bytes - i * 8);
>>> -       return used_bytes;
>>> -}
>>> -
>>>  static inline bool ata_ok(u8 status)
>>>  {
>>>         return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
>>> --
>>> 2.9.3
>>>
>
>
>
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22 19:43     ` Shaun Tancheff
@ 2016-08-22 20:14       ` Tom Yan
  2016-08-22 22:07         ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 20:14 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 23 August 2016 at 03:43, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> +       if (unmap) {
>>> +               /* If trim is not enabled the cmd is invalid. */
>>> +               if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
>>> +                   !ata_id_has_trim(dev->id)) {
>>> +                       fp = 1;
>>> +                       bp = 3;
>>> +                       goto invalid_fld;
>>> +               }
>>> +               /* If the request is too large the cmd is invalid */
>>> +               if (n_block > 0xffff * trmax) {
>>> +                       fp = 2;
>>> +                       goto invalid_fld;
>>> +               }
>>
>> This response should be generally applied to the Write Same (16)
>> translation, since it is required by SBC,
>>
>>> +       } else {
>>> +               /* If write same is not available the cmd is invalid */
>>> +               if (!ata_id_sct_write_same(dev->id)) {
>>> +                       fp = 1;
>>> +                       bp = 3;
>>> +                       goto invalid_fld;
>>> +               }
>>
>> therefore, you should add an n_block check here as well, if you are
>> going to advertise an Maximum Write Same Length even when the device
>> supports only SCT Write Same but not TRIM. Most likely you would want
>> to simply move the existing check one-level up (if the same limit is
>> advertised no matter TRIM is supported not or not).
>
> Why would we enforce upper level limits on something that doesn't
> have any?

If we advertise a limit in our SATL, it makes sense that we should
make sure the behaviour is consistent when we issue a write same
through the block layer / ioctl and when we issue a SCSI Write Same
command directly (e.g. with sg_write_same). IMHO that's pretty much
why SBC would mandate such behaviour as well.

>
> If the upper level, or SG_IO, chooses to set a timeout of 10 hours and
> wipe a whole disk it should be free to do so.
>

That's why I said, "if you are going to advertise an Maximum Write Same Length".

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4] SCT Write Same / DSM Trim
  2016-08-22 18:52             ` Tom Yan
@ 2016-08-22 20:57               ` Tom Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Yan @ 2016-08-22 20:57 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 22 August 2016 at 18:52, Tom Yan <tom.ty89@gmail.com> wrote:
>
> For the "payload block size" that is "always" 512-byte as per the same
> spec, I don't think we need to concern about it. I think it only
> matters if we want to enable multi-block TRIM payload according to the
> reported limit in IDENTIFY DEVICE data. Probably it merely means that
> the "each of the blocks" in the reported limit will always mean 64
> TRIM entries, even on 4Kn drives, instead of 512.
>

Actually that's what we assumed in our current code that forms the DSM
command as well. See how (hob_)feature for queued TRIM and how
(hob_)nsect for non-queued TRIM are set in ata_scsi_write_same_xlat in
libata-scsi ('size / 512').

So if you are going to increase the payload block size per logical
sector size (in other word, you assume that the payload block size is
_expected_ to be the logical block size, in spite of what ACS told,
you'll need to adjust how the aforementioned fields in the ATA
taskfile are set as well.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22  4:23 ` [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat Shaun Tancheff
  2016-08-22  6:27   ` Hannes Reinecke
  2016-08-22 19:27   ` Tom Yan
@ 2016-08-22 21:20   ` Tom Yan
  2016-08-22 22:00     ` Shaun Tancheff
  2 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 21:20 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
> Safely overwriting the attached page to ATA format from the SCSI formatted
> variant.
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6:
>  - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
> v5:
>  - Added prep patch to work with non-page aligned scatterlist pages
>    and use kmap_atomic() to lock page during modification.
>
>  drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>  include/linux/ata.h       | 26 ----------------------
>  2 files changed, 51 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e207b33..7990cb2 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3282,6 +3282,54 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>         return 1;
>  }
>
> +/**
> + * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
> + * @cmd: SCSI command being translated
> + * @num: Maximum number of entries (nominally 64).
> + * @sector: Starting sector
> + * @count: Total Range of request
> + *
> + * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
> + * descriptor.
> + *
> + * Upto 64 entries of the format:
> + *   63:48 Range Length
> + *   47:0  LBA
> + *
> + *  Range Length of 0 is ignored.
> + *  LBA's should be sorted order and not overlap.
> + *
> + * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
> + */
> +static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
> +                                             u64 sector, u32 count)
> +{
> +       __le64 *buffer;
> +       u32 i = 0, used_bytes;
> +       unsigned long flags;
> +
> +       BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);

Why do we want to check "512" (a literal number) against
ATA_SCSI_RBUF_SIZE here?

> +
> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> +       buffer = ((void *)ata_scsi_rbuf);
> +       while (i < num) {
> +               u64 entry = sector |
> +                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
> +               buffer[i++] = __cpu_to_le64(entry);
> +               if (count <= 0xffff)
> +                       break;
> +               count -= 0xffff;
> +               sector += 0xffff;
> +       }
> +
> +       used_bytes = ALIGN(i * 8, 512);
> +       memset(buffer + i, 0, used_bytes - i * 8);
> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +
> +       return used_bytes;
> +}
> +
>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  {
>         struct ata_taskfile *tf = &qc->tf;
> @@ -3290,8 +3338,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         const u8 *cdb = scmd->cmnd;
>         u64 block;
>         u32 n_block;
> +       const u32 trmax = ATA_MAX_TRIM_RNUM;
>         u32 size;
> -       void *buf;
>         u16 fp;
>         u8 bp = 0xff;
>
> @@ -3319,10 +3367,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         if (!scsi_sg_count(scmd))
>                 goto invalid_param_len;
>
> -       buf = page_address(sg_page(scsi_sglist(scmd)));
> -
> -       if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
> -               size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
> +       if (n_block <= 0xffff * trmax) {
> +               size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
>         } else {
>                 fp = 2;
>                 goto invalid_fld;
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index adbc812..45a1d71 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>  #endif
>  }
>
> -/*
> - * Write LBA Range Entries to the buffer that will cover the extent from
> - * sector to sector + count.  This is used for TRIM and for ADD LBA(S)
> - * TO NV CACHE PINNED SET.
> - */
> -static inline unsigned ata_set_lba_range_entries(void *_buffer,
> -               unsigned num, u64 sector, unsigned long count)
> -{
> -       __le64 *buffer = _buffer;
> -       unsigned i = 0, used_bytes;
> -
> -       while (i < num) {
> -               u64 entry = sector |
> -                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
> -               buffer[i++] = __cpu_to_le64(entry);
> -               if (count <= 0xffff)
> -                       break;
> -               count -= 0xffff;
> -               sector += 0xffff;
> -       }
> -
> -       used_bytes = ALIGN(i * 8, 512);
> -       memset(buffer + i, 0, used_bytes - i * 8);
> -       return used_bytes;
> -}
> -
>  static inline bool ata_ok(u8 status)
>  {
>         return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22 21:20   ` Tom Yan
@ 2016-08-22 22:00     ` Shaun Tancheff
  2016-08-22 23:49       ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22 22:00 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 4:20 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>> Safely overwriting the attached page to ATA format from the SCSI formatted
>> variant.
>>
>> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
>> ---
>> v6:
>>  - Fix bisect bug reported by Tom Yan <tom.ty89@gmail.com>
>>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
>> v5:
>>  - Added prep patch to work with non-page aligned scatterlist pages
>>    and use kmap_atomic() to lock page during modification.
>>
>>  drivers/ata/libata-scsi.c | 56 ++++++++++++++++++++++++++++++++++++++++++-----
>>  include/linux/ata.h       | 26 ----------------------
>>  2 files changed, 51 insertions(+), 31 deletions(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index e207b33..7990cb2 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -3282,6 +3282,54 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>>         return 1;
>>  }
>>
>> +/**
>> + * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
>> + * @cmd: SCSI command being translated
>> + * @num: Maximum number of entries (nominally 64).
>> + * @sector: Starting sector
>> + * @count: Total Range of request
>> + *
>> + * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
>> + * descriptor.
>> + *
>> + * Upto 64 entries of the format:
>> + *   63:48 Range Length
>> + *   47:0  LBA
>> + *
>> + *  Range Length of 0 is ignored.
>> + *  LBA's should be sorted order and not overlap.
>> + *
>> + * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
>> + */
>> +static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
>> +                                             u64 sector, u32 count)
>> +{
>> +       __le64 *buffer;
>> +       u32 i = 0, used_bytes;
>> +       unsigned long flags;
>> +
>> +       BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
>
> Why do we want to check "512" (a literal number) against
> ATA_SCSI_RBUF_SIZE here?

Because this is re-using the response buffer in a new way.
Such reuse could be a surprise to someone refactoring that
code, although it's pretty unlikely. The build bug on
gives some level of confidence that it won't go unnoticed.
It also codifies the assumption that we can write 512 bytes
to the global ata_scsi_rbuf buffer.

As to using a literal 512, I was just following what was here
before.

It looks like there is a ATA_SECT_SIZE that can replace most
of the literal 512's in here though. That might be a nice cleanup
overall. Not sure it belongs here though.

>> +
>> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>> +       buffer = ((void *)ata_scsi_rbuf);
>> +       while (i < num) {
>> +               u64 entry = sector |
>> +                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
>> +               buffer[i++] = __cpu_to_le64(entry);
>> +               if (count <= 0xffff)
>> +                       break;
>> +               count -= 0xffff;
>> +               sector += 0xffff;
>> +       }
>> +
>> +       used_bytes = ALIGN(i * 8, 512);
>> +       memset(buffer + i, 0, used_bytes - i * 8);
>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>> +
>> +       return used_bytes;
>> +}
>> +
>>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>  {
>>         struct ata_taskfile *tf = &qc->tf;
>> @@ -3290,8 +3338,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>         const u8 *cdb = scmd->cmnd;
>>         u64 block;
>>         u32 n_block;
>> +       const u32 trmax = ATA_MAX_TRIM_RNUM;
>>         u32 size;
>> -       void *buf;
>>         u16 fp;
>>         u8 bp = 0xff;
>>
>> @@ -3319,10 +3367,8 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>>         if (!scsi_sg_count(scmd))
>>                 goto invalid_param_len;
>>
>> -       buf = page_address(sg_page(scsi_sglist(scmd)));
>> -
>> -       if (n_block <= 65535 * ATA_MAX_TRIM_RNUM) {
>> -               size = ata_set_lba_range_entries(buf, ATA_MAX_TRIM_RNUM, block, n_block);
>> +       if (n_block <= 0xffff * trmax) {
>> +               size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
>>         } else {
>>                 fp = 2;
>>                 goto invalid_fld;
>> diff --git a/include/linux/ata.h b/include/linux/ata.h
>> index adbc812..45a1d71 100644
>> --- a/include/linux/ata.h
>> +++ b/include/linux/ata.h
>> @@ -1071,32 +1071,6 @@ static inline void ata_id_to_hd_driveid(u16 *id)
>>  #endif
>>  }
>>
>> -/*
>> - * Write LBA Range Entries to the buffer that will cover the extent from
>> - * sector to sector + count.  This is used for TRIM and for ADD LBA(S)
>> - * TO NV CACHE PINNED SET.
>> - */
>> -static inline unsigned ata_set_lba_range_entries(void *_buffer,
>> -               unsigned num, u64 sector, unsigned long count)
>> -{
>> -       __le64 *buffer = _buffer;
>> -       unsigned i = 0, used_bytes;
>> -
>> -       while (i < num) {
>> -               u64 entry = sector |
>> -                       ((u64)(count > 0xffff ? 0xffff : count) << 48);
>> -               buffer[i++] = __cpu_to_le64(entry);
>> -               if (count <= 0xffff)
>> -                       break;
>> -               count -= 0xffff;
>> -               sector += 0xffff;
>> -       }
>> -
>> -       used_bytes = ALIGN(i * 8, 512);
>> -       memset(buffer + i, 0, used_bytes - i * 8);
>> -       return used_bytes;
>> -}
>> -
>>  static inline bool ata_ok(u8 status)
>>  {
>>         return ((status & (ATA_BUSY | ATA_DRDY | ATA_DF | ATA_DRQ | ATA_ERR))
>> --
>> 2.9.3
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ide" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html



-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22 20:14       ` Tom Yan
@ 2016-08-22 22:07         ` Shaun Tancheff
  2016-08-22 23:09           ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-22 22:07 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 3:14 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 23 August 2016 at 03:43, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>
>> Why would we enforce upper level limits on something that doesn't
>> have any?
>
> If we advertise a limit in our SATL, it makes sense that we should
> make sure the behaviour is consistent when we issue a write same
> through the block layer / ioctl and when we issue a SCSI Write Same
> command directly (e.g. with sg_write_same). IMHO that's pretty much
> why SBC would mandate such behaviour as well.

Breaking would be advertising a limit that is too high and failing.
Advertising a lower limit and succeeding may not be ideal for all
possible use cases, but it's not breaking behaviour.

>>
>> If the upper level, or SG_IO, chooses to set a timeout of 10 hours and
>> wipe a whole disk it should be free to do so.
>>
>
> That's why I said, "if you are going to advertise an Maximum Write Same Length".

-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22 22:07         ` Shaun Tancheff
@ 2016-08-22 23:09           ` Tom Yan
  2016-08-23  0:36             ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 23:09 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

I am not sure about what you mean here. Rejecting SCSI Write Same
commands that has its "number of blocks" field set to a value higher
than the device's reported Maximum Write Same Length is only natural
and mandated by SBC. We have no reason (even if it is practically not
a must) not to do it while we are implementing a SCSI-ATA Translation
Layer here as long as we advertise Maximum Write Same Length. It does
not matter here whether the command ends up being translated to SCT
Write Same or TRIM.

How high or how lower the limit should be advertised has nothing to do
with the checking.

FWIW, letting the SCSI/block layer fall back with SD_MAX_WS10_BLOCKS
does NOT count as advertising Maximum Write Same Length, that's why we
may or may not (in terms of SBC) check n_block against it if we are
really gonna leave ata_scsiop_inq_b0 in libata-scsi untouched.

On 22 August 2016 at 22:07, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 3:14 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 23 August 2016 at 03:43, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>>
>>> Why would we enforce upper level limits on something that doesn't
>>> have any?
>>
>> If we advertise a limit in our SATL, it makes sense that we should
>> make sure the behaviour is consistent when we issue a write same
>> through the block layer / ioctl and when we issue a SCSI Write Same
>> command directly (e.g. with sg_write_same). IMHO that's pretty much
>> why SBC would mandate such behaviour as well.
>
> Breaking would be advertising a limit that is too high and failing.
> Advertising a lower limit and succeeding may not be ideal for all
> possible use cases, but it's not breaking behaviour.
>
>>>
>>> If the upper level, or SG_IO, chooses to set a timeout of 10 hours and
>>> wipe a whole disk it should be free to do so.
>>>
>>
>> That's why I said, "if you are going to advertise an Maximum Write Same Length".
>
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22 22:00     ` Shaun Tancheff
@ 2016-08-22 23:49       ` Tom Yan
  2016-08-23  1:01         ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-22 23:49 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

The only 512 I can see in the old code is the one in:

> -       used_bytes = ALIGN(i * 8, 512);

where the alignment is necessary because 'used_bytes' will be returned
as 'size', which will be used for setting the number of 512-byte block
payload when we construct the ATA taskfile / command. It may NOT be a
good idea to replace it with ATA_SECT_SIZE. Some comment could be
nice.

So I don't think it makes any sense to check ATA_SCSI_RBUF_SIZE
against 512 here.

On 22 August 2016 at 22:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> Because this is re-using the response buffer in a new way.
> Such reuse could be a surprise to someone refactoring that
> code, although it's pretty unlikely. The build bug on
> gives some level of confidence that it won't go unnoticed.
> It also codifies the assumption that we can write 512 bytes
> to the global ata_scsi_rbuf buffer.
>
> As to using a literal 512, I was just following what was here
> before.
>
> It looks like there is a ATA_SECT_SIZE that can replace most
> of the literal 512's in here though. That might be a nice cleanup
> overall. Not sure it belongs here though.
>

>>> +       used_bytes = ALIGN(i * 8, 512);
>>> +       memset(buffer + i, 0, used_bytes - i * 8);
>>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);

And I don't think you have a reason to use 512 here either. It appears
to me that you should use ATA_SCSI_RBUF_SIZE instead (see
ata_scsi_rbuf_put in libata-scsi). If not, it should probably be a
_derived_ value (e.g. from `i`) that tells the _actual_ size of
`buffer`.

Again, note that even when we prefer to stick with _one_ 512-byte
block TRIM payload, we should probably NEVER _hard code_ such limit
(for it's really ugly and unnecessary) in functions like this. All we
need to do is to advertise the limit (or lower) as Maximum Write
Length, and make sure our SATL works as per SBC that it will reject
SCSI Write Same commands that is "larger" than that.

>>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>> +
>>> +       return used_bytes;
>>> +}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22 23:09           ` Tom Yan
@ 2016-08-23  0:36             ` Shaun Tancheff
  2016-08-23  5:55               ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-23  0:36 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 6:09 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> I am not sure about what you mean here. Rejecting SCSI Write Same
> commands that has its "number of blocks" field set to a value higher
> than the device's reported Maximum Write Same Length is only natural
> and mandated by SBC. We have no reason (even if it is practically not
> a must) not to do it while we are implementing a SCSI-ATA Translation
> Layer here as long as we advertise Maximum Write Same Length. It does
> not matter here whether the command ends up being translated to SCT
> Write Same or TRIM.
>
> How high or how lower the limit should be advertised has nothing to do
> with the checking.
>
> FWIW, letting the SCSI/block layer fall back with SD_MAX_WS10_BLOCKS
> does NOT count as advertising Maximum Write Same Length, that's why we
> may or may not (in terms of SBC) check n_block against it if we are
> really gonna leave ata_scsiop_inq_b0 in libata-scsi untouched.

Sorry I'm still a bit confused.

SCT Write Same does not have a limit ... it's a u64 of logical sectors.
Any limit specified is smaller based on other parts of the stack.
The SATL code being used to emulating SCSI Write Same which does
have a limited number of sectors .. so falling back to the SCSI limit
seems reasonable.

So the limit that is being applied is either the current TRIM limit,
or the SCSI Write Same limit.
-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-22 23:49       ` Tom Yan
@ 2016-08-23  1:01         ` Shaun Tancheff
  2016-08-23  5:26           ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-23  1:01 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Mon, Aug 22, 2016 at 6:49 PM, Tom Yan <tom.ty89@gmail.com> wrote:
> The only 512 I can see in the old code is the one in:
>
>> -       used_bytes = ALIGN(i * 8, 512);
>
> where the alignment is necessary because 'used_bytes' will be returned
> as 'size', which will be used for setting the number of 512-byte block
> payload when we construct the ATA taskfile / command. It may NOT be a
> good idea to replace it with ATA_SECT_SIZE. Some comment could be
> nice.

Not sure I agree with that analysis. Could just as well assign it directly,
as ALIGN() is just superfluous here.
Later the count is always 512/512 -> 1. Always.

"i" is used here to limit the number of bytes that need to be memset()
after filling to payload. Personally I think memset is fast enough that
it is better to do before rather than later but I figure if the code works
let it be.

> So I don't think it makes any sense to check ATA_SCSI_RBUF_SIZE
> against 512 here.

Again, not sure I agree, but I don't really care on that point. Just many
years of defensive coding.

> On 22 August 2016 at 22:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> Because this is re-using the response buffer in a new way.
>> Such reuse could be a surprise to someone refactoring that
>> code, although it's pretty unlikely. The build bug on
>> gives some level of confidence that it won't go unnoticed.
>> It also codifies the assumption that we can write 512 bytes
>> to the global ata_scsi_rbuf buffer.
>>
>> As to using a literal 512, I was just following what was here
>> before.
>>
>> It looks like there is a ATA_SECT_SIZE that can replace most
>> of the literal 512's in here though. That might be a nice cleanup
>> overall. Not sure it belongs here though.
>>
>
>>>> +       used_bytes = ALIGN(i * 8, 512);
>>>> +       memset(buffer + i, 0, used_bytes - i * 8);
>>>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>
> And I don't think you have a reason to use 512 here either. It appears
> to me that you should use ATA_SCSI_RBUF_SIZE instead (see
> ata_scsi_rbuf_put in libata-scsi). If not, it should probably be a
> _derived_ value (e.g. from `i`) that tells the _actual_ size of
> `buffer`.

Nope. We *must* copy the whole 512 byte payload into the sglist.
Otherwise what was the point of the memset to clear out an cruft?
Failing to move the whole payload into place could leave whatever
garbage is in the buffer to be interpreted as an actual trim and do
real damage. I certainly can't use ATA_SCSI_RBUF_SIZE because
the payload attached to the cmd need only be 512 bytes. Trying
to copy in 4k is going to cause bad things when you check
the return from sg_copy_from_buffer() and notice you failed to
copy in you payload.

> Again, note that even when we prefer to stick with _one_ 512-byte
> block TRIM payload, we should probably NEVER _hard code_ such limit
> (for it's really ugly and unnecessary) in functions like this. All we

The interface requires well formed 512 byte chunks so we have to
at least have to do enough to ensure that we work in multiples of
512. Since 512 is all over the spec for this type of thing I think
it would be reasonable to have a define or enum if you don't think
reusing ATA_SECT_SIZE is good maybe something
like ATA_CMD_XFER_SIZE ?

> need to do is to advertise the limit (or lower) as Maximum Write
> Length, and make sure our SATL works as per SBC that it will reject
> SCSI Write Same commands that is "larger" than that.
>>>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>>> +
>>>> +       return used_bytes;
>>>> +}


-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-23  1:01         ` Shaun Tancheff
@ 2016-08-23  5:26           ` Tom Yan
  2016-08-23  6:20             ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-23  5:26 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 23 August 2016 at 01:01, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 6:49 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> The only 512 I can see in the old code is the one in:
>>
>>> -       used_bytes = ALIGN(i * 8, 512);
>>
>> where the alignment is necessary because 'used_bytes' will be returned
>> as 'size', which will be used for setting the number of 512-byte block
>> payload when we construct the ATA taskfile / command. It may NOT be a
>> good idea to replace it with ATA_SECT_SIZE. Some comment could be
>> nice.
>
> Not sure I agree with that analysis. Could just as well assign it directly,
> as ALIGN() is just superfluous here.
> Later the count is always 512/512 -> 1. Always.

It is always 1 merely because we prefer sticking to that. Say we want
to enable multi-block payload now, it won't be 1 anymore.

Also note that the payload is NOT always fully-filled. Say, I issue a
discard request of 1M or a SCSI Write Same (16) commands of 2048
blocks. If you do not round up used_bytes, the result of the division
will be 0 in those cases.

Basically that's the mathematical version of the story of why we need
the following memset().

>
> "i" is used here to limit the number of bytes that need to be memset()
> after filling to payload. Personally I think memset is fast enough that
> it is better to do before rather than later but I figure if the code works
> let it be.

Not sure what you mean by "fast enough"/"do before". What memset() do
here is to locate the offset/location to starting filling the memory
(buffer + i) and fill zero until the payload is 512-byte aligned
(used_bytes - i * 8, where used_bytes is an aligned/rounded-up-to-512
version of i * 8). Again, it does NOT and should NOT limit the
allocated memory to _one_ 512-byte block.

>
>> So I don't think it makes any sense to check ATA_SCSI_RBUF_SIZE
>> against 512 here.
>
> Again, not sure I agree, but I don't really care on that point. Just many
> years of defensive coding.

The thing is you can't even tell what exactly you are defending here.
This would only make the perhaps already obfuscated code more
obfuscated, without actually preventing any bad thing to happen.

The only case you would want to check ATA_SCSI_RBUF_SIZE here is, you
somehow want to use it as buflen parameter in the following
sg_copy_from_buffer call. Then you may say "I want to make sure that
we wouldn't carelessly decrease the currently 4096 ATA_SCSI_RBUF_SIZE
to a value smaller than 512 someday, which cannot even accommodate a
single full-block payload." And even that's a weak reason.

>
>> On 22 August 2016 at 22:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> Because this is re-using the response buffer in a new way.
>>> Such reuse could be a surprise to someone refactoring that
>>> code, although it's pretty unlikely. The build bug on
>>> gives some level of confidence that it won't go unnoticed.
>>> It also codifies the assumption that we can write 512 bytes
>>> to the global ata_scsi_rbuf buffer.
>>>
>>> As to using a literal 512, I was just following what was here
>>> before.
>>>
>>> It looks like there is a ATA_SECT_SIZE that can replace most
>>> of the literal 512's in here though. That might be a nice cleanup
>>> overall. Not sure it belongs here though.
>>>
>>
>>>>> +       used_bytes = ALIGN(i * 8, 512);
>>>>> +       memset(buffer + i, 0, used_bytes - i * 8);
>>>>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
>>
>> And I don't think you have a reason to use 512 here either. It appears
>> to me that you should use ATA_SCSI_RBUF_SIZE instead (see
>> ata_scsi_rbuf_put in libata-scsi). If not, it should probably be a
>> _derived_ value (e.g. from `i`) that tells the _actual_ size of
>> `buffer`.
>
> Nope. We *must* copy the whole 512 byte payload into the sglist.
> Otherwise what was the point of the memset to clear out an cruft?
> Failing to move the whole payload into place could leave whatever
> garbage is in the buffer to be interpreted as an actual trim and do
> real damage. I certainly can't use ATA_SCSI_RBUF_SIZE because
> the payload attached to the cmd need only be 512 bytes. Trying
> to copy in 4k is going to cause bad things when you check
> the return from sg_copy_from_buffer() and notice you failed to
> copy in you payload.

I don't really know how scatter/gather list and the related functions
work to be honest. You should probably use used_bytes then, since
used_bytes is exactly the final size of buffer (after memset, and it
is 512-byte aligned, but not always 512 in terms of logic).

>
>> Again, note that even when we prefer to stick with _one_ 512-byte
>> block TRIM payload, we should probably NEVER _hard code_ such limit
>> (for it's really ugly and unnecessary) in functions like this. All we
>
> The interface requires well formed 512 byte chunks so we have to
> at least have to do enough to ensure that we work in multiples of
> 512. Since 512 is all over the spec for this type of thing I think
> it would be reasonable to have a define or enum if you don't think
> reusing ATA_SECT_SIZE is good maybe something
> like ATA_CMD_XFER_SIZE ?

I have no idea. Perhaps it is fine to use ATA_SECT_SIZE if what you
described ("512 is all over the spec for this type of thing") is true.
It's just personally I don't see any benefit in terms of readability
introducing macro like that. If anything should be done, I would say
it is to add comment above ALIGN() and the ATA taskfiles pointing out
512 is used because ACS specifically pointed out that "TRIM payload
blocks are always 512-byte, but not the logical sector size or so".

>
>> need to do is to advertise the limit (or lower) as Maximum Write
>> Length, and make sure our SATL works as per SBC that it will reject
>> SCSI Write Same commands that is "larger" than that.
>>>>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>>>> +
>>>>> +       return used_bytes;
>>>>> +}
>
>
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-23  0:36             ` Shaun Tancheff
@ 2016-08-23  5:55               ` Tom Yan
  2016-08-23  6:11                 ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-23  5:55 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 23 August 2016 at 00:36, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Mon, Aug 22, 2016 at 6:09 PM, Tom Yan <tom.ty89@gmail.com> wrote:
>> I am not sure about what you mean here. Rejecting SCSI Write Same
>> commands that has its "number of blocks" field set to a value higher
>> than the device's reported Maximum Write Same Length is only natural
>> and mandated by SBC. We have no reason (even if it is practically not
>> a must) not to do it while we are implementing a SCSI-ATA Translation
>> Layer here as long as we advertise Maximum Write Same Length. It does
>> not matter here whether the command ends up being translated to SCT
>> Write Same or TRIM.
>>
>> How high or how lower the limit should be advertised has nothing to do
>> with the checking.
>>
>> FWIW, letting the SCSI/block layer fall back with SD_MAX_WS10_BLOCKS
>> does NOT count as advertising Maximum Write Same Length, that's why we
>> may or may not (in terms of SBC) check n_block against it if we are
>> really gonna leave ata_scsiop_inq_b0 in libata-scsi untouched.
>
> Sorry I'm still a bit confused.

It's alright. I guess I am not good at expressing my ideas. (And poor English)

>
> SCT Write Same does not have a limit ... it's a u64 of logical sectors.

As I've said, it is not really about SCT Write Same or TRIM. The check
here is added just to make our SATL (which is in some sense a SCSI
device) behave as how we advertised it (Maximum Write Same Length /
rbuf[36] set in ata_scsiop_inq_b0, that's why the value of the check
should always be the same as that).

So for both cases (TRIM and SCT Write Same), _if_ we advertise Maximum
Write Same Length, we should make our SATL (or in other word, our
"SCSI device") reject SCSI Write Same commands as per what we "told"
the users (and the SCSI/block layer).

The SCSI disk driver or so would NOT reject Write Same commands for us
as per the Maximum Write Same Length we reported, because it is the
responsibility of the SCSI device itself (as per SBC). The limit will
only be used to tell the block layer how large at max should each
split of the discard / write same request be.

> Any limit specified is smaller based on other parts of the stack.
> The SATL code being used to emulating SCSI Write Same which does
> have a limited number of sectors .. so falling back to the SCSI limit
> seems reasonable.

I am really not going to tell the whole story here again. We have a
really long discussion on whether we should advertise Maximum Write
Same Length for SCT Write Same, and the value we should advertise.
(Didn't we come to an conclusion on that as well?)

The check should be added here is just a knee-jerk reaction for the
Maximum Write Same Length we advertise. If you end up not advertising
one for SCT Write Same, you don't need to add the check here.

>
> So the limit that is being applied is either the current TRIM limit,
> or the SCSI Write Same limit.
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-23  5:55               ` Tom Yan
@ 2016-08-23  6:11                 ` Shaun Tancheff
  2016-08-23  7:57                   ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-23  6:11 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Tue, Aug 23, 2016 at 12:55 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> I am really not going to tell the whole story here again. We have a
> really long discussion on whether we should advertise Maximum Write
> Same Length for SCT Write Same, and the value we should advertise.
> (Didn't we come to an conclusion on that as well?)

I had thought the conclusion was leave b0 as is and let the WS10 limit
take effect per Martin.
-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-23  5:26           ` Tom Yan
@ 2016-08-23  6:20             ` Shaun Tancheff
  2016-08-23  7:53               ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-23  6:20 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:

> It is always 1 merely because we prefer sticking to that. Say we want
> to enable multi-block payload now, it won't be 1 anymore.

Sorry, I though that DSM TRIM is using 512 bytes here because
WRITE_SAME_16 has a payload of a single logical sector.

> Also note that the payload is NOT always fully-filled.

Oh, I was considering filling the remainder of the buffer
with 0's using memset() as you described to be "filly-filled"
in this case. Sorry for the confusion.

-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-23  6:20             ` Shaun Tancheff
@ 2016-08-23  7:53               ` Tom Yan
  2016-08-23  8:42                 ` Shaun Tancheff
  2016-08-24  3:33                 ` Martin K. Petersen
  0 siblings, 2 replies; 55+ messages in thread
From: Tom Yan @ 2016-08-23  7:53 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 23 August 2016 at 06:20, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>
>> It is always 1 merely because we prefer sticking to that. Say we want
>> to enable multi-block payload now, it won't be 1 anymore.
>
> Sorry, I though that DSM TRIM is using 512 bytes here because
> WRITE_SAME_16 has a payload of a single logical sector.

Nope, SCSI Write Same commands does not have payload (or in SCSI
terms, parameter list / data-out buffer).

>
>> Also note that the payload is NOT always fully-filled.
>
> Oh, I was considering filling the remainder of the buffer
> with 0's using memset() as you described to be "filly-filled"
> in this case. Sorry for the confusion.

Well yeah _after_ memset() it should always be "fully-filled" (i.e.
512-byte aligned).

>
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-23  6:11                 ` Shaun Tancheff
@ 2016-08-23  7:57                   ` Tom Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Yan @ 2016-08-23  7:57 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

Hmm,

On 22 August 2016 at 18:00, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>
> Timeout for WS is 120 seconds so we should be fine there.
>
> The number to look for is the:
>    Max. Sustained Transfer Rate OD (MB/s): 190 8TB (180 5TB)
>
> Which means the above drives should complete a 2G write in
> about 10 to 11 seconds.
>
> If these were 4Kn drives and we allowed a 16G max then it
> would be 80-90 seconds, assuming the write speed didn't
> get any better.
>
> So holding the maximum to around 2G is probably the best
> overall, in my opinion.
>
> --
> Shaun Tancheff

On 23 August 2016 at 06:11, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>
> I had thought the conclusion was leave b0 as is and let the WS10 limit
> take effect per Martin.

If that's your final call then you do not need to add the check for
SCT Write Same. (But please make sure that the check for TRIM does not
go away)

> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-23  7:53               ` Tom Yan
@ 2016-08-23  8:42                 ` Shaun Tancheff
  2016-08-23  9:04                   ` Tom Yan
  2016-08-24  3:33                 ` Martin K. Petersen
  1 sibling, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-23  8:42 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Tue, Aug 23, 2016 at 2:53 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 23 August 2016 at 06:20, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>
>>> It is always 1 merely because we prefer sticking to that. Say we want
>>> to enable multi-block payload now, it won't be 1 anymore.
>>
>> Sorry, I though that DSM TRIM is using 512 bytes here because
>> WRITE_SAME_16 has a payload of a single logical sector.
>
> Nope, SCSI Write Same commands does not have payload (or in SCSI
> terms, parameter list / data-out buffer).

Hmm. Not sure about T10, but that's not how I read
sd_setup_write_same_cmnd(), it always setups up a transfer of
sector_size for scsi_init_io().

As I understand things, this is how the cmd's sglist is populated and why
this should be using sg_copy_from_buffer() rather than open coding
to the page.
-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-23  8:42                 ` Shaun Tancheff
@ 2016-08-23  9:04                   ` Tom Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Yan @ 2016-08-23  9:04 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

There's comment about `rq->__data_len` in sd.c (and blkdev.h,
definition of struct `request`). Apparently it is "internal only" (in
terms of the kernel).

As for `cmd->transfersize`, it's probably talking about the CDB
itself. Although different commands have CDB of various lengths (much
shorter than 512-byte), but I guess when the actual command is sent to
the device, we will still transfer it in a full logical block for some
reason.

Anyway these will not have anything to do with how we form the TRIM
payload. The SATL simply read the LBA (starting offset) and NUMBER OF
BLOCK (to TRIM) field in the CDB and pack ranges according to that.
All we care is the actual TRIM limit of the ATA device (and
conservative concern on bogus ones).

On 23 August 2016 at 08:42, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Tue, Aug 23, 2016 at 2:53 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 23 August 2016 at 06:20, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>>> On Tue, Aug 23, 2016 at 12:26 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>>
>>>> It is always 1 merely because we prefer sticking to that. Say we want
>>>> to enable multi-block payload now, it won't be 1 anymore.
>>>
>>> Sorry, I though that DSM TRIM is using 512 bytes here because
>>> WRITE_SAME_16 has a payload of a single logical sector.
>>
>> Nope, SCSI Write Same commands does not have payload (or in SCSI
>> terms, parameter list / data-out buffer).
>
> Hmm. Not sure about T10, but that's not how I read
> sd_setup_write_same_cmnd(), it always setups up a transfer of
> sector_size for scsi_init_io().
>
> As I understand things, this is how the cmd's sglist is populated and why
> this should be using sg_copy_from_buffer() rather than open coding
> to the page.
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-22  4:23 ` [PATCH v6 2/4] Add support for SCT Write Same Shaun Tancheff
  2016-08-22  6:27   ` Hannes Reinecke
  2016-08-22 19:20   ` Tom Yan
@ 2016-08-23 10:37   ` Tom Yan
  2016-08-23 10:56     ` Shaun Tancheff
  2 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-23 10:37 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke, Shaun Tancheff

On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
> SATA drives may support write same via SCT. This is useful
> for setting the drive contents to a specific pattern (0's).
>
> Translate a SCSI WRITE SAME 16 command to be either a DSM TRIM
> command or an SCT Write Same command.
>
> Based on the UNMAP flag:
>   - When set translate to DSM TRIM
>   - When not set translate to SCT Write Same
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v6:
>  - Change to use sg_copy_from_buffer as per Christoph Hellwig <hch@lst.de>
> v5:
>  - Addressed review comments
>  - Report support for ZBC only for zoned devices.
>  - kmap page during rewrite
>  - Fix unmap set to require trim or error, if not unmap then sct write
>    same or error.
> v4:
>  - Added partial MAINTENANCE_IN opcode simulation
>  - Dropped all changes in drivers/scsi/*
>  - Changed to honor the UNMAP flag -> TRIM, no UNMAP -> SCT.
> v3:
>  - Demux UNMAP/TRIM from WRITE SAME
> v2:
>  - Remove fugly ata hacking from sd.c
>
>  drivers/ata/libata-scsi.c | 199 +++++++++++++++++++++++++++++++++++++++-------
>  include/linux/ata.h       |  43 ++++++++++
>  2 files changed, 213 insertions(+), 29 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index 7990cb2..ebf1a04 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -1159,8 +1159,6 @@ static void ata_scsi_sdev_config(struct scsi_device *sdev)
>  {
>         sdev->use_10_for_rw = 1;
>         sdev->use_10_for_ms = 1;
> -       sdev->no_report_opcodes = 1;
> -       sdev->no_write_same = 1;
>
>         /* Schedule policy is determined by ->qc_defer() callback and
>          * it needs to see every deferred qc.  Set dev_blocked to 1 to
> @@ -3287,7 +3285,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>   * @cmd: SCSI command being translated
>   * @num: Maximum number of entries (nominally 64).
>   * @sector: Starting sector
> - * @count: Total Range of request
> + * @count: Total Range of request in logical sectors
>   *
>   * Rewrite the WRITE SAME descriptor to be a DSM TRIM little-endian formatted
>   * descriptor.
> @@ -3330,6 +3328,45 @@ static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
>         return used_bytes;
>  }
>
> +/**
> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
> + * @cmd: SCSI command being translated
> + * @lba: Starting sector
> + * @num: Number of logical sectors to be zero'd.
> + *
> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
> + * descriptor.
> + * NOTE: Writes a pattern (0's) in the foreground.
> + *       Large write-same requents can timeout.
> + */
> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
> +{
> +       u16 *sctpg;
> +       unsigned long flags;
> +
> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> +       sctpg = ((void *)ata_scsi_rbuf);

Because ata_scsi_rbuf is of a fixed size of ATA_SCSI_RBUF_SIZE.

#define ATA_SCSI_RBUF_SIZE      4096
...
static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];

> +
> +       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> +       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
> +       put_unaligned_le64(lba,     &sctpg[2]);
> +       put_unaligned_le64(num,     &sctpg[6]);
> +       put_unaligned_le32(0u,      &sctpg[10]);
> +
> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);

You have no reason to use 512 here instead of ATA_SCSI_RBUF_SIZE this time.

> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +}
> +
> +/**
> + * ata_scsi_write_same_xlat() - SATL Write Same to ATA SCT Write Same
> + * @qc: Command to be translated
> + *
> + * Translate a SCSI WRITE SAME command to be either a DSM TRIM command or
> + * an SCT Write Same command.
> + * Based on WRITE SAME has the UNMAP flag
> + *   When set translate to DSM TRIM
> + *   When clear translate to SCT Write Same
> + */
>  static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  {
>         struct ata_taskfile *tf = &qc->tf;
> @@ -3342,6 +3379,7 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         u32 size;
>         u16 fp;
>         u8 bp = 0xff;
> +       u8 unmap = cdb[1] & 0x8;
>
>         /* we may not issue DMA commands if no DMA mode is set */
>         if (unlikely(!dev->dma_mode))
> @@ -3353,11 +3391,26 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         }
>         scsi_16_lba_len(cdb, &block, &n_block);
>
> -       /* for now we only support WRITE SAME with the unmap bit set */
> -       if (unlikely(!(cdb[1] & 0x8))) {
> -               fp = 1;
> -               bp = 3;
> -               goto invalid_fld;
> +       if (unmap) {
> +               /* If trim is not enabled the cmd is invalid. */
> +               if ((dev->horkage & ATA_HORKAGE_NOTRIM) ||
> +                   !ata_id_has_trim(dev->id)) {
> +                       fp = 1;
> +                       bp = 3;
> +                       goto invalid_fld;
> +               }
> +               /* If the request is too large the cmd is invalid */
> +               if (n_block > 0xffff * trmax) {
> +                       fp = 2;
> +                       goto invalid_fld;
> +               }
> +       } else {
> +               /* If write same is not available the cmd is invalid */
> +               if (!ata_id_sct_write_same(dev->id)) {
> +                       fp = 1;
> +                       bp = 3;
> +                       goto invalid_fld;
> +               }
>         }
>
>         /*
> @@ -3367,30 +3420,42 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         if (!scsi_sg_count(scmd))
>                 goto invalid_param_len;
>
> -       if (n_block <= 0xffff * trmax) {
> +       if (unmap) {
>                 size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
> +               if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> +                       /* Newer devices support queued TRIM commands */
> +                       tf->protocol = ATA_PROT_NCQ;
> +                       tf->command = ATA_CMD_FPDMA_SEND;
> +                       tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> +                       tf->nsect = qc->tag << 3;
> +                       tf->hob_feature = (size / 512) >> 8;
> +                       tf->feature = size / 512;
> +
> +                       tf->auxiliary = 1;
> +               } else {
> +                       tf->protocol = ATA_PROT_DMA;
> +                       tf->hob_feature = 0;
> +                       tf->feature = ATA_DSM_TRIM;
> +                       tf->hob_nsect = (size / 512) >> 8;
> +                       tf->nsect = size / 512;
> +                       tf->command = ATA_CMD_DSM;
> +               }
>         } else {
> -               fp = 2;
> -               goto invalid_fld;
> -       }
> -
> -       if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
> -               /* Newer devices support queued TRIM commands */
> -               tf->protocol = ATA_PROT_NCQ;
> -               tf->command = ATA_CMD_FPDMA_SEND;
> -               tf->hob_nsect = ATA_SUBCMD_FPDMA_SEND_DSM & 0x1f;
> -               tf->nsect = qc->tag << 3;
> -               tf->hob_feature = (size / 512) >> 8;
> -               tf->feature = size / 512;
> +               ata_format_sct_write_same(scmd, block, n_block);
>
> -               tf->auxiliary = 1;
> -       } else {
> -               tf->protocol = ATA_PROT_DMA;
>                 tf->hob_feature = 0;
> -               tf->feature = ATA_DSM_TRIM;
> -               tf->hob_nsect = (size / 512) >> 8;
> -               tf->nsect = size / 512;
> -               tf->command = ATA_CMD_DSM;
> +               tf->feature = 0;
> +               tf->hob_nsect = 0;
> +               tf->nsect = 1;
> +               tf->lbah = 0;
> +               tf->lbam = 0;
> +               tf->lbal = ATA_CMD_STANDBYNOW1;
> +               tf->hob_lbah = 0;
> +               tf->hob_lbam = 0;
> +               tf->hob_lbal = 0;
> +               tf->device = ATA_CMD_STANDBYNOW1;
> +               tf->protocol = ATA_PROT_DMA;
> +               tf->command = ATA_CMD_WRITE_LOG_DMA_EXT;
>         }
>
>         tf->flags |= ATA_TFLAG_ISADDR | ATA_TFLAG_DEVICE | ATA_TFLAG_LBA48 |
> @@ -3414,6 +3479,76 @@ invalid_opcode:
>  }
>
>  /**
> + *     ata_scsiop_maint_in - Simulate a subset of MAINTENANCE_IN
> + *     @args: device MAINTENANCE_IN data / SCSI command of interest.
> + *     @rbuf: Response buffer, to which simulated SCSI cmd output is sent.
> + *
> + *     Yields a subset to satisfy scsi_report_opcode()
> + *
> + *     LOCKING:
> + *     spin_lock_irqsave(host lock)
> + */
> +static unsigned int ata_scsiop_maint_in(struct ata_scsi_args *args, u8 *rbuf)
> +{
> +       struct ata_device *dev = args->dev;
> +       u8 *cdb = args->cmd->cmnd;
> +       u8 supported = 0;
> +       unsigned int err = 0;
> +
> +       if (cdb[2] != 1) {
> +               ata_dev_warn(dev, "invalid command format %d\n", cdb[2]);
> +               err = 2;
> +               goto out;
> +       }
> +       switch (cdb[3]) {
> +       case INQUIRY:
> +       case MODE_SENSE:
> +       case MODE_SENSE_10:
> +       case READ_CAPACITY:
> +       case SERVICE_ACTION_IN_16:
> +       case REPORT_LUNS:
> +       case REQUEST_SENSE:
> +       case SYNCHRONIZE_CACHE:
> +       case REZERO_UNIT:
> +       case SEEK_6:
> +       case SEEK_10:
> +       case TEST_UNIT_READY:
> +       case SEND_DIAGNOSTIC:
> +       case MAINTENANCE_IN:
> +       case READ_6:
> +       case READ_10:
> +       case READ_16:
> +       case WRITE_6:
> +       case WRITE_10:
> +       case WRITE_16:
> +       case ATA_12:
> +       case ATA_16:
> +       case VERIFY:
> +       case VERIFY_16:
> +       case MODE_SELECT:
> +       case MODE_SELECT_10:
> +       case START_STOP:
> +               supported = 3;
> +               break;
> +       case WRITE_SAME_16:
> +               if (ata_id_sct_write_same(dev->id))
> +                       supported = 3;
> +               break;
> +       case ZBC_IN:
> +       case ZBC_OUT:
> +               if (ata_id_zoned_cap(dev->id) ||
> +                   dev->class == ATA_DEV_ZAC)
> +                       supported = 3;
> +               break;
> +       default:
> +               break;
> +       }
> +out:
> +       rbuf[1] = supported; /* supported */
> +       return err;
> +}
> +
> +/**
>   *     ata_scsi_report_zones_complete - convert ATA output
>   *     @qc: command structure returning the data
>   *
> @@ -4193,6 +4328,13 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd)
>                         ata_scsi_invalid_field(dev, cmd, 1);
>                 break;
>
> +       case MAINTENANCE_IN:
> +               if (scsicmd[1] == MI_REPORT_SUPPORTED_OPERATION_CODES)
> +                       ata_scsi_rbuf_fill(&args, ata_scsiop_maint_in);
> +               else
> +                       ata_scsi_invalid_field(dev, cmd, 1);
> +               break;
> +
>         /* all other commands */
>         default:
>                 ata_scsi_set_sense(dev, cmd, ILLEGAL_REQUEST, 0x20, 0x0);
> @@ -4225,7 +4367,6 @@ int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
>                 shost->max_lun = 1;
>                 shost->max_channel = 1;
>                 shost->max_cmd_len = 16;
> -               shost->no_write_same = 1;
>
>                 /* Schedule policy is determined by ->qc_defer()
>                  * callback and it needs to see every deferred qc.
> diff --git a/include/linux/ata.h b/include/linux/ata.h
> index 45a1d71..fdb1803 100644
> --- a/include/linux/ata.h
> +++ b/include/linux/ata.h
> @@ -105,6 +105,7 @@ enum {
>         ATA_ID_CFA_KEY_MGMT     = 162,
>         ATA_ID_CFA_MODES        = 163,
>         ATA_ID_DATA_SET_MGMT    = 169,
> +       ATA_ID_SCT_CMD_XPORT    = 206,
>         ATA_ID_ROT_SPEED        = 217,
>         ATA_ID_PIO4             = (1 << 1),
>
> @@ -789,6 +790,48 @@ static inline bool ata_id_sense_reporting_enabled(const u16 *id)
>  }
>
>  /**
> + *
> + * Word: 206 - SCT Command Transport
> + *    15:12 - Vendor Specific
> + *     11:6 - Reserved
> + *        5 - SCT Command Transport Data Tables supported
> + *        4 - SCT Command Transport Features Control supported
> + *        3 - SCT Command Transport Error Recovery Control supported
> + *        2 - SCT Command Transport Write Same supported
> + *        1 - SCT Command Transport Long Sector Access supported
> + *        0 - SCT Command Transport supported
> + */
> +static inline bool ata_id_sct_data_tables(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 5) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_features_ctrl(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 4) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_error_recovery_ctrl(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 3) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_write_same(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 2) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_long_sector_access(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 1) ? true : false;
> +}
> +
> +static inline bool ata_id_sct_supported(const u16 *id)
> +{
> +       return id[ATA_ID_SCT_CMD_XPORT] & (1 << 0) ? true : false;
> +}
> +
> +/**
>   *     ata_id_major_version    -       get ATA level of drive
>   *     @id: Identify data
>   *
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-23 10:37   ` Tom Yan
@ 2016-08-23 10:56     ` Shaun Tancheff
  2016-08-24  5:57       ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-23 10:56 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Tue, Aug 23, 2016 at 5:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>> +/**
>> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>> + * @cmd: SCSI command being translated
>> + * @lba: Starting sector
>> + * @num: Number of logical sectors to be zero'd.
>> + *
>> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
>> + * descriptor.
>> + * NOTE: Writes a pattern (0's) in the foreground.
>> + *       Large write-same requents can timeout.
>> + */
>> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>> +{
>> +       u16 *sctpg;
>> +       unsigned long flags;
>> +
>> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>> +       sctpg = ((void *)ata_scsi_rbuf);
>
> Because ata_scsi_rbuf is of a fixed size of ATA_SCSI_RBUF_SIZE.
>
> #define ATA_SCSI_RBUF_SIZE      4096
> ...
> static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
>
>> +
>> +       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
>> +       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
>> +       put_unaligned_le64(lba,     &sctpg[2]);
>> +       put_unaligned_le64(num,     &sctpg[6]);
>> +       put_unaligned_le32(0u,      &sctpg[10]);
>> +
>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
>
> You have no reason to use 512 here instead of ATA_SCSI_RBUF_SIZE this time.

Ah .. because SCT Write Same is a fixed 512 byte transfer?
Ah .. because I only have 512 bytes to copy?

>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>> +}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-23  7:53               ` Tom Yan
  2016-08-23  8:42                 ` Shaun Tancheff
@ 2016-08-24  3:33                 ` Martin K. Petersen
  2016-08-24  5:31                   ` Tom Yan
  1 sibling, 1 reply; 55+ messages in thread
From: Martin K. Petersen @ 2016-08-24  3:33 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, Shaun Tancheff, linux-ide, LKML, Tejun Heo,
	Christoph Hellwig, Martin K . Petersen, Damien Le Moal,
	Hannes Reinecke, Josh Bingaman, Hannes Reinecke

>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:

Tom> Nope, SCSI Write Same commands does not have payload (or in SCSI
Tom> terms, parameter list / data-out buffer).

WRITE SAME has a a payload of 1 logical block (unless NDOB is set but we
have had no good reason to support that yet).

UNMAP has a payload that varies based on the number of range
descriptors. The SCSI disk driver only ever issues a single descriptor
but since libata doesn't support UNMAP this doesn't really come into
play.

Ideally there would be a way to distinguish between device limits for
WRITE SAME with the UNMAP bit and for "regular" WRITE SAME. One way to
do that would be to transition the libata discard implementation over to
single-range UNMAP, fill out the relevant VPD page B0 fields and leave
the WRITE SAME bits for writing zeroes.

One reason that has not been particularly compelling is that the WRITE
SAME payload buffer does double duty to hold the ATA DSM TRIM range
descriptors and matches the required ATA payload size. Whereas the UNMAP
command would only provide 24 bytes of TRIM range space.

Also, please be careful with transfer lengths, __data_len, etc. As
mentioned, the transfer length WRITE SAME is typically 512 bytes and
that's the number of bytes that need to be DMA'ed and transferred over
the wire by the controller. But from a command completion perspective we
need to complete however many bytes the command acted upon. Unlike reads
and writes there is not a 1:1 mapping between the transfer length and
the affected area. So we do a bit of magic after the buffer has been
mapped to ensure that the completion byte count matches the number of
blocks that were affected by the command rather than the size of the
data buffer in bytes.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-24  3:33                 ` Martin K. Petersen
@ 2016-08-24  5:31                   ` Tom Yan
  2016-08-24 21:28                     ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-24  5:31 UTC (permalink / raw)
  To: Martin K. Petersen
  Cc: Shaun Tancheff, Shaun Tancheff, linux-ide, LKML, Tejun Heo,
	Christoph Hellwig, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 24 August 2016 at 11:33, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>
> Tom> Nope, SCSI Write Same commands does not have payload (or in SCSI
> Tom> terms, parameter list / data-out buffer).
>
> WRITE SAME has a a payload of 1 logical block (unless NDOB is set but we
> have had no good reason to support that yet).

Interesting, I wasn't aware of the bit. I just didn't see any
parameter list defined for any of the Write Same commands. Ah wait, it
carries the pattern (the "same") and so.

Hmm, it doesn't seem like the translation implemented in this patch
series cares about the payload though?

>
> UNMAP has a payload that varies based on the number of range
> descriptors. The SCSI disk driver only ever issues a single descriptor
> but since libata doesn't support UNMAP this doesn't really come into
> play.
>
> Ideally there would be a way to distinguish between device limits for
> WRITE SAME with the UNMAP bit and for "regular" WRITE SAME. One way to
> do that would be to transition the libata discard implementation over to
> single-range UNMAP, fill out the relevant VPD page B0 fields and leave
> the WRITE SAME bits for writing zeroes.
>
> One reason that has not been particularly compelling is that the WRITE
> SAME payload buffer does double duty to hold the ATA DSM TRIM range

Huh? Why would the SATL care about its payload buffer for TRIM (i.e.
when the UNMAP bit is set)? Doesn't it just read the LBA and NUMBER OF
BLOCKS field and pack TRIM ranges/payload according to that?

> descriptors and matches the required ATA payload size. Whereas the UNMAP

Why would it need to "matches the required ATA payload size"?

> command would only provide 24 bytes of TRIM range space.

I don't really follow. The UNMAP descriptor has LBA (8 bytes / 64-bit)
and NUMBER OF BLOCKS (4 bytes / 32-bit) field of the same length as
Write Same (16). Even if the SCSI disk driver will only send one
descriptor, it should work as good as Write Same (16).

>
> Also, please be careful with transfer lengths, __data_len, etc. As
> mentioned, the transfer length WRITE SAME is typically 512 bytes and
> that's the number of bytes that need to be DMA'ed and transferred over
> the wire by the controller. But from a command completion perspective we
> need to complete however many bytes the command acted upon. Unlike reads
> and writes there is not a 1:1 mapping between the transfer length and
> the affected area. So we do a bit of magic after the buffer has been
> mapped to ensure that the completion byte count matches the number of
> blocks that were affected by the command rather than the size of the
> data buffer in bytes.
>
> --
> Martin K. Petersen      Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-23 10:56     ` Shaun Tancheff
@ 2016-08-24  5:57       ` Tom Yan
  2016-08-24  6:10         ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-24  5:57 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

Never mind. I was a bit lightheaded.

Anyway I don't think you should use ata_scsi_rbuf. It is a buffer
created and used for ata_scsi_simulate, which interacts with the SCSI
layer but not the ATA device (v.s. ata_scsi_translate). You should
probably create buffer inside ata_format_dsm_trim_descr() and
ata_format_sct_write_same() of size(s) you need.

On 23 August 2016 at 18:56, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Tue, Aug 23, 2016 at 5:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>>> +/**
>>> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>>> + * @cmd: SCSI command being translated
>>> + * @lba: Starting sector
>>> + * @num: Number of logical sectors to be zero'd.
>>> + *
>>> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
>>> + * descriptor.
>>> + * NOTE: Writes a pattern (0's) in the foreground.
>>> + *       Large write-same requents can timeout.
>>> + */
>>> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>>> +{
>>> +       u16 *sctpg;
>>> +       unsigned long flags;
>>> +
>>> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>>> +       sctpg = ((void *)ata_scsi_rbuf);
>>
>> Because ata_scsi_rbuf is of a fixed size of ATA_SCSI_RBUF_SIZE.
>>
>> #define ATA_SCSI_RBUF_SIZE      4096
>> ...
>> static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
>>
>>> +
>>> +       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
>>> +       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
>>> +       put_unaligned_le64(lba,     &sctpg[2]);
>>> +       put_unaligned_le64(num,     &sctpg[6]);
>>> +       put_unaligned_le32(0u,      &sctpg[10]);
>>> +
>>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
>>
>> You have no reason to use 512 here instead of ATA_SCSI_RBUF_SIZE this time.
>
> Ah .. because SCT Write Same is a fixed 512 byte transfer?
> Ah .. because I only have 512 bytes to copy?
>
>>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>> +}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-24  5:57       ` Tom Yan
@ 2016-08-24  6:10         ` Tom Yan
  2016-08-24 22:04           ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-24  6:10 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

Btw, I wonder if you need to memset your buffer with 0 first, like
what is done in ata_scsi_rbuf_get.

On 24 August 2016 at 13:57, Tom Yan <tom.ty89@gmail.com> wrote:
> Never mind. I was a bit lightheaded.
>
> Anyway I don't think you should use ata_scsi_rbuf. It is a buffer
> created and used for ata_scsi_simulate, which interacts with the SCSI
> layer but not the ATA device (v.s. ata_scsi_translate). You should
> probably create buffer inside ata_format_dsm_trim_descr() and
> ata_format_sct_write_same() of size(s) you need.
>
> On 23 August 2016 at 18:56, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Tue, Aug 23, 2016 at 5:37 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>> On 22 August 2016 at 04:23, Shaun Tancheff <shaun@tancheff.com> wrote:
>>>> +/**
>>>> + * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>>>> + * @cmd: SCSI command being translated
>>>> + * @lba: Starting sector
>>>> + * @num: Number of logical sectors to be zero'd.
>>>> + *
>>>> + * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
>>>> + * descriptor.
>>>> + * NOTE: Writes a pattern (0's) in the foreground.
>>>> + *       Large write-same requents can timeout.
>>>> + */
>>>> +static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>>>> +{
>>>> +       u16 *sctpg;
>>>> +       unsigned long flags;
>>>> +
>>>> +       spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
>>>> +       sctpg = ((void *)ata_scsi_rbuf);
>>>
>>> Because ata_scsi_rbuf is of a fixed size of ATA_SCSI_RBUF_SIZE.
>>>
>>> #define ATA_SCSI_RBUF_SIZE      4096
>>> ...
>>> static u8 ata_scsi_rbuf[ATA_SCSI_RBUF_SIZE];
>>>
>>>> +
>>>> +       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
>>>> +       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
>>>> +       put_unaligned_le64(lba,     &sctpg[2]);
>>>> +       put_unaligned_le64(num,     &sctpg[6]);
>>>> +       put_unaligned_le32(0u,      &sctpg[10]);
>>>> +
>>>> +       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
>>>
>>> You have no reason to use 512 here instead of ATA_SCSI_RBUF_SIZE this time.
>>
>> Ah .. because SCT Write Same is a fixed 512 byte transfer?
>> Ah .. because I only have 512 bytes to copy?
>>
>>>> +       spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>>>> +}

^ permalink raw reply	[flat|nested] 55+ messages in thread

* [PATCH v6 3/4 RESEND] SCT Write Same / DSM Trim
  2016-08-22  6:30   ` Hannes Reinecke
@ 2016-08-24 18:08     ` Shaun Tancheff
  2016-08-25  7:01       ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-24 18:08 UTC (permalink / raw)
  To: Hannes Reinecke, linux-ide, linux-kernel
  Cc: Shaun Tancheff, Tejun Heo, Christoph Hellwig, Tom Yan,
	Martin K . Petersen, Damien Le Moal, Josh Bingaman,
	Hannes Reinecke, Shaun Tancheff

Correct handling of devices with sector_size other that 512 bytes.

In the case of a 4Kn device sector_size it is possible to describe a much
larger DSM Trim than the current fixed default of 512 bytes.

This patch assumes the minimum descriptor is sector_size and fills out
the descriptor accordingly.

The ACS-2 specification is quite clear that the DSM command payload is
sized as number of 512 byte transfers so a 4Kn device will operate
correctly without this patch.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v5:
 - Reshuffled descripton.
 - Added support for a sector_size descriptor other than 512 bytes.

 drivers/ata/libata-scsi.c | 85 +++++++++++++++++++++++++++++++----------------
 1 file changed, 57 insertions(+), 28 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index ebf1a04..37f456e 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3283,7 +3283,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
 /**
  * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
  * @cmd: SCSI command being translated
- * @num: Maximum number of entries (nominally 64).
+ * @trmax: Maximum number of entries that will fit in sector_size bytes.
  * @sector: Starting sector
  * @count: Total Range of request in logical sectors
  *
@@ -3298,63 +3298,80 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
  *  LBA's should be sorted order and not overlap.
  *
  * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
+ *
+ * Return: Number of bytes copied into sglist.
  */
-static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
-					      u64 sector, u32 count)
+static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax,
+					u64 sector, u32 count)
 {
-	__le64 *buffer;
-	u32 i = 0, used_bytes;
+	struct scsi_device *sdp = cmd->device;
+	size_t len = sdp->sector_size;
+	size_t r;
+	__le64 *buf;
+	u32 i = 0;
 	unsigned long flags;
 
-	BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
+	WARN_ON(len > ATA_SCSI_RBUF_SIZE);
+
+	if (len > ATA_SCSI_RBUF_SIZE)
+		len = ATA_SCSI_RBUF_SIZE;
 
 	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
-	buffer = ((void *)ata_scsi_rbuf);
-	while (i < num) {
+	buf = ((void *)ata_scsi_rbuf);
+	memset(buf, 0, len);
+	while (i < trmax) {
 		u64 entry = sector |
 			((u64)(count > 0xffff ? 0xffff : count) << 48);
-		buffer[i++] = __cpu_to_le64(entry);
+		buf[i++] = __cpu_to_le64(entry);
 		if (count <= 0xffff)
 			break;
 		count -= 0xffff;
 		sector += 0xffff;
 	}
-
-	used_bytes = ALIGN(i * 8, 512);
-	memset(buffer + i, 0, used_bytes - i * 8);
-	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
+	r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
 	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
 
-	return used_bytes;
+	return r;
 }
 
 /**
  * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
  * @cmd: SCSI command being translated
  * @lba: Starting sector
- * @num: Number of logical sectors to be zero'd.
+ * @num: Number of sectors to be zero'd.
  *
- * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
+ * Rewrite the WRITE SAME payload to be an SCT Write Same formatted
  * descriptor.
  * NOTE: Writes a pattern (0's) in the foreground.
- *       Large write-same requents can timeout.
+ *
+ * Return: Number of bytes copied into sglist.
  */
-static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
+static size_t ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
 {
-	u16 *sctpg;
+	struct scsi_device *sdp = cmd->device;
+	size_t len = sdp->sector_size;
+	size_t r;
+	u16 *buf;
 	unsigned long flags;
 
 	spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
-	sctpg = ((void *)ata_scsi_rbuf);
+	buf = ((void *)ata_scsi_rbuf);
+
+	put_unaligned_le16(0x0002,  &buf[0]); /* SCT_ACT_WRITE_SAME */
+	put_unaligned_le16(0x0101,  &buf[1]); /* WRITE PTRN FG */
+	put_unaligned_le64(lba,     &buf[2]);
+	put_unaligned_le64(num,     &buf[6]);
+	put_unaligned_le32(0u,      &buf[10]); /* pattern */
+
+	WARN_ON(len > ATA_SCSI_RBUF_SIZE);
 
-	put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
-	put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
-	put_unaligned_le64(lba,     &sctpg[2]);
-	put_unaligned_le64(num,     &sctpg[6]);
-	put_unaligned_le32(0u,      &sctpg[10]);
+	if (len > ATA_SCSI_RBUF_SIZE)
+		len = ATA_SCSI_RBUF_SIZE;
 
-	sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
+	r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
 	spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
+
+	return r;
 }
 
 /**
@@ -3371,11 +3388,13 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 {
 	struct ata_taskfile *tf = &qc->tf;
 	struct scsi_cmnd *scmd = qc->scsicmd;
+	struct scsi_device *sdp = scmd->device;
+	size_t len = sdp->sector_size;
 	struct ata_device *dev = qc->dev;
 	const u8 *cdb = scmd->cmnd;
 	u64 block;
 	u32 n_block;
-	const u32 trmax = ATA_MAX_TRIM_RNUM;
+	const u32 trmax = len >> 3;
 	u32 size;
 	u16 fp;
 	u8 bp = 0xff;
@@ -3420,8 +3439,16 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 	if (!scsi_sg_count(scmd))
 		goto invalid_param_len;
 
+	/*
+	 * size must match sector size in bytes
+	 * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count)
+	 * is defined as number of 512 byte blocks to be transferred.
+	 */
 	if (unmap) {
 		size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
+		if (size != len)
+			goto invalid_param_len;
+
 		if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
 			/* Newer devices support queued TRIM commands */
 			tf->protocol = ATA_PROT_NCQ;
@@ -3441,7 +3468,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
 			tf->command = ATA_CMD_DSM;
 		}
 	} else {
-		ata_format_sct_write_same(scmd, block, n_block);
+		size = ata_format_sct_write_same(scmd, block, n_block);
+		if (size != len)
+			goto invalid_param_len;
 
 		tf->hob_feature = 0;
 		tf->feature = 0;
-- 
2.9.3


^ permalink raw reply related	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-24  5:31                   ` Tom Yan
@ 2016-08-24 21:28                     ` Shaun Tancheff
  2016-08-25  6:31                       ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-24 21:28 UTC (permalink / raw)
  To: Tom Yan
  Cc: Martin K. Petersen, Shaun Tancheff, linux-ide, LKML, Tejun Heo,
	Christoph Hellwig, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Wed, Aug 24, 2016 at 12:31 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 24 August 2016 at 11:33, Martin K. Petersen
> <martin.petersen@oracle.com> wrote:
>>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>>
>> Tom> Nope, SCSI Write Same commands does not have payload (or in SCSI
>> Tom> terms, parameter list / data-out buffer).
>>
>> WRITE SAME has a a payload of 1 logical block (unless NDOB is set but we
>> have had no good reason to support that yet).
>
> Interesting, I wasn't aware of the bit. I just didn't see any
> parameter list defined for any of the Write Same commands. Ah wait, it
> carries the pattern (the "same") and so.
>
> Hmm, it doesn't seem like the translation implemented in this patch
> series cares about the payload though?

As repeated here and elsewhere the payload is:
    scsi_sglist(cmd)
and it was put there by scsi_init_io() when it called scsi_init_sgtable()

>> UNMAP has a payload that varies based on the number of range
>> descriptors. The SCSI disk driver only ever issues a single descriptor
>> but since libata doesn't support UNMAP this doesn't really come into
>> play.
>>
>> Ideally there would be a way to distinguish between device limits for
>> WRITE SAME with the UNMAP bit and for "regular" WRITE SAME. One way to
>> do that would be to transition the libata discard implementation over to
>> single-range UNMAP, fill out the relevant VPD page B0 fields and leave
>> the WRITE SAME bits for writing zeroes.
>>
>> One reason that has not been particularly compelling is that the WRITE
>> SAME payload buffer does double duty to hold the ATA DSM TRIM range
>
> Huh? Why would the SATL care about its payload buffer for TRIM (i.e.
> when the UNMAP bit is set)? Doesn't it just read the LBA and NUMBER OF
> BLOCKS field and pack TRIM ranges/payload according to that?
>
>> descriptors and matches the required ATA payload size. Whereas the UNMAP
>
> Why would it need to "matches the required ATA payload size"?
>
>> command would only provide 24 bytes of TRIM range space.
>
> I don't really follow. The UNMAP descriptor has LBA (8 bytes / 64-bit)
> and NUMBER OF BLOCKS (4 bytes / 32-bit) field of the same length as
> Write Same (16). Even if the SCSI disk driver will only send one
> descriptor, it should work as good as Write Same (16).

The "payload" is the data block transferred with the command.
The "descriptor" is, in this context, the contents of the payload as
it "describes" what the action the command is supposed to perform.

The "payload" contains the "descriptor" that "describes" how
DSM TRIM should determine which logical blocks it should UNMAP.

>> Also, please be careful with transfer lengths, __data_len, etc. As
>> mentioned, the transfer length WRITE SAME is typically 512 bytes and
>> that's the number of bytes that need to be DMA'ed and transferred over
>> the wire by the controller. But from a command completion perspective we
>> need to complete however many bytes the command acted upon. Unlike reads
>> and writes there is not a 1:1 mapping between the transfer length and
>> the affected area. So we do a bit of magic after the buffer has been
>> mapped to ensure that the completion byte count matches the number of
>> blocks that were affected by the command rather than the size of the
>> data buffer in bytes.
>>
>> --
>> Martin K. Petersen      Oracle Linux Engineering
-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-24  6:10         ` Tom Yan
@ 2016-08-24 22:04           ` Shaun Tancheff
  2016-08-25  6:23             ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-24 22:04 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Wed, Aug 24, 2016 at 1:10 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> Btw, I wonder if you need to memset your buffer with 0 first, like
> what is done in ata_scsi_rbuf_get.

It is not necessary as the defined buffer is completely filled out here.

Are you thinking as a sort of future proofing?
Ex: In the unlikely event that the SCT Write Same command
descriptor is expanded in a future ACS?

It is more likely to see the command deprecated and replaced with a
new SCT feature.

Regardless of how unlikely I would consider a memset here to clear
the remainder of the payload.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-24 22:04           ` Shaun Tancheff
@ 2016-08-25  6:23             ` Tom Yan
  2016-08-25  7:31               ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-25  6:23 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

You only fill the bytes that you want to to set explicitly:

+       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
+       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
+       put_unaligned_le64(lba,     &sctpg[2]);
+       put_unaligned_le64(num,     &sctpg[6]);
+       put_unaligned_le32(0u,      &sctpg[10]);

What I doubted is, if you don't memset (zero-fill) the buffer first,
will other bytes have indeterministic value that causes random
unexpected behavior?

On 25 August 2016 at 06:04, Shaun Tancheff <shaun@tancheff.com> wrote:
> On Wed, Aug 24, 2016 at 1:10 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> Btw, I wonder if you need to memset your buffer with 0 first, like
>> what is done in ata_scsi_rbuf_get.
>
> It is not necessary as the defined buffer is completely filled out here.
>
> Are you thinking as a sort of future proofing?
> Ex: In the unlikely event that the SCT Write Same command
> descriptor is expanded in a future ACS?
>
> It is more likely to see the command deprecated and replaced with a
> new SCT feature.
>
> Regardless of how unlikely I would consider a memset here to clear
> the remainder of the payload.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-24 21:28                     ` Shaun Tancheff
@ 2016-08-25  6:31                       ` Tom Yan
  2016-08-25  7:18                         ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-25  6:31 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Martin K. Petersen, Shaun Tancheff, linux-ide, LKML, Tejun Heo,
	Christoph Hellwig, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On 25 August 2016 at 05:28, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Wed, Aug 24, 2016 at 12:31 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> On 24 August 2016 at 11:33, Martin K. Petersen
>> <martin.petersen@oracle.com> wrote:
>>>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>>>
>>> Tom> Nope, SCSI Write Same commands does not have payload (or in SCSI
>>> Tom> terms, parameter list / data-out buffer).
>>>
>>> WRITE SAME has a a payload of 1 logical block (unless NDOB is set but we
>>> have had no good reason to support that yet).
>>
>> Interesting, I wasn't aware of the bit. I just didn't see any
>> parameter list defined for any of the Write Same commands. Ah wait, it
>> carries the pattern (the "same") and so.
>>
>> Hmm, it doesn't seem like the translation implemented in this patch
>> series cares about the payload though?
>
> As repeated here and elsewhere the payload is:
>     scsi_sglist(cmd)
> and it was put there by scsi_init_io() when it called scsi_init_sgtable()

What I mean is:

put_unaligned_le32(0u,      &sctpg[10]);

So even if the payload of the SCSI Write Same commands instruct a
non-zero pattern writing, SCT Write Same will conveniently ignore that
do zero-filling anyway. That's what I mean by "doesn't care about the
payload".

Though that would only be case with SG_IO though. SCSI Write Same
issued from block layer (BLKZEROOUT) will be instructing zero-filling
anyway.

>
>>> UNMAP has a payload that varies based on the number of range
>>> descriptors. The SCSI disk driver only ever issues a single descriptor
>>> but since libata doesn't support UNMAP this doesn't really come into
>>> play.
>>>
>>> Ideally there would be a way to distinguish between device limits for
>>> WRITE SAME with the UNMAP bit and for "regular" WRITE SAME. One way to
>>> do that would be to transition the libata discard implementation over to
>>> single-range UNMAP, fill out the relevant VPD page B0 fields and leave
>>> the WRITE SAME bits for writing zeroes.
>>>
>>> One reason that has not been particularly compelling is that the WRITE
>>> SAME payload buffer does double duty to hold the ATA DSM TRIM range
>>
>> Huh? Why would the SATL care about its payload buffer for TRIM (i.e.
>> when the UNMAP bit is set)? Doesn't it just read the LBA and NUMBER OF
>> BLOCKS field and pack TRIM ranges/payload according to that?
>>
>>> descriptors and matches the required ATA payload size. Whereas the UNMAP
>>
>> Why would it need to "matches the required ATA payload size"?
>>
>>> command would only provide 24 bytes of TRIM range space.
>>
>> I don't really follow. The UNMAP descriptor has LBA (8 bytes / 64-bit)
>> and NUMBER OF BLOCKS (4 bytes / 32-bit) field of the same length as
>> Write Same (16). Even if the SCSI disk driver will only send one
>> descriptor, it should work as good as Write Same (16).
>
> The "payload" is the data block transferred with the command.
> The "descriptor" is, in this context, the contents of the payload as
> it "describes" what the action the command is supposed to perform.
>

I know right.

> The "payload" contains the "descriptor" that "describes" how
> DSM TRIM should determine which logical blocks it should UNMAP.

This should only be the case of UNMAP command, but not SCSI Write Same
with UNMAP bit set. And either way it should not affect how large the
ATA TRIM payload can be.

>
>>> Also, please be careful with transfer lengths, __data_len, etc. As
>>> mentioned, the transfer length WRITE SAME is typically 512 bytes and
>>> that's the number of bytes that need to be DMA'ed and transferred over
>>> the wire by the controller. But from a command completion perspective we
>>> need to complete however many bytes the command acted upon. Unlike reads
>>> and writes there is not a 1:1 mapping between the transfer length and
>>> the affected area. So we do a bit of magic after the buffer has been
>>> mapped to ensure that the completion byte count matches the number of
>>> blocks that were affected by the command rather than the size of the
>>> data buffer in bytes.
>>>
>>> --
>>> Martin K. Petersen      Oracle Linux Engineering
> --
> Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4 RESEND] SCT Write Same / DSM Trim
  2016-08-24 18:08     ` [PATCH v6 3/4 RESEND] " Shaun Tancheff
@ 2016-08-25  7:01       ` Tom Yan
  2016-08-25  8:03         ` Shaun Tancheff
  0 siblings, 1 reply; 55+ messages in thread
From: Tom Yan @ 2016-08-25  7:01 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Hannes Reinecke, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Josh Bingaman,
	Hannes Reinecke, Shaun Tancheff

Really please just drop this patch. There is no rational reason for
you to associate the maximum payload size to the logical sector size.

And please stop using the ATA SCSI Response Buffer (ata_scsi_rbuf)
that is used for response to the SCSI layer for SCSI commands that
won't really interact with the ATA device (i.e. triggers an ATA
command), while ata_format_sct_write_same() and
ata_scsi_write_same_xlat() are used for constructing payload that is
going to be send to the ATA device. Can't you even see that these are
of different direction to different layer?

On 25 August 2016 at 02:08, Shaun Tancheff <shaun@tancheff.com> wrote:
> Correct handling of devices with sector_size other that 512 bytes.
>
> In the case of a 4Kn device sector_size it is possible to describe a much
> larger DSM Trim than the current fixed default of 512 bytes.
>
> This patch assumes the minimum descriptor is sector_size and fills out
> the descriptor accordingly.
>
> The ACS-2 specification is quite clear that the DSM command payload is
> sized as number of 512 byte transfers so a 4Kn device will operate
> correctly without this patch.
>
> Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
> ---
> v5:
>  - Reshuffled descripton.
>  - Added support for a sector_size descriptor other than 512 bytes.
>
>  drivers/ata/libata-scsi.c | 85 +++++++++++++++++++++++++++++++----------------
>  1 file changed, 57 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index ebf1a04..37f456e 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -3283,7 +3283,7 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>  /**
>   * ata_format_dsm_trim_descr() - SATL Write Same to DSM Trim
>   * @cmd: SCSI command being translated
> - * @num: Maximum number of entries (nominally 64).
> + * @trmax: Maximum number of entries that will fit in sector_size bytes.
>   * @sector: Starting sector
>   * @count: Total Range of request in logical sectors
>   *
> @@ -3298,63 +3298,80 @@ static unsigned int ata_scsi_pass_thru(struct ata_queued_cmd *qc)
>   *  LBA's should be sorted order and not overlap.
>   *
>   * NOTE: this is the same format as ADD LBA(S) TO NV CACHE PINNED SET
> + *
> + * Return: Number of bytes copied into sglist.
>   */
> -static unsigned int ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 num,
> -                                             u64 sector, u32 count)
> +static size_t ata_format_dsm_trim_descr(struct scsi_cmnd *cmd, u32 trmax,
> +                                       u64 sector, u32 count)
>  {
> -       __le64 *buffer;
> -       u32 i = 0, used_bytes;
> +       struct scsi_device *sdp = cmd->device;
> +       size_t len = sdp->sector_size;
> +       size_t r;
> +       __le64 *buf;
> +       u32 i = 0;
>         unsigned long flags;
>
> -       BUILD_BUG_ON(512 > ATA_SCSI_RBUF_SIZE);
> +       WARN_ON(len > ATA_SCSI_RBUF_SIZE);
> +
> +       if (len > ATA_SCSI_RBUF_SIZE)
> +               len = ATA_SCSI_RBUF_SIZE;
>
>         spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> -       buffer = ((void *)ata_scsi_rbuf);
> -       while (i < num) {
> +       buf = ((void *)ata_scsi_rbuf);
> +       memset(buf, 0, len);
> +       while (i < trmax) {
>                 u64 entry = sector |
>                         ((u64)(count > 0xffff ? 0xffff : count) << 48);
> -               buffer[i++] = __cpu_to_le64(entry);
> +               buf[i++] = __cpu_to_le64(entry);
>                 if (count <= 0xffff)
>                         break;
>                 count -= 0xffff;
>                 sector += 0xffff;
>         }
> -
> -       used_bytes = ALIGN(i * 8, 512);
> -       memset(buffer + i, 0, used_bytes - i * 8);
> -       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buffer, 512);
> +       r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
>         spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
>
> -       return used_bytes;
> +       return r;
>  }
>
>  /**
>   * ata_format_dsm_trim_descr() - SATL Write Same to ATA SCT Write Same
>   * @cmd: SCSI command being translated
>   * @lba: Starting sector
> - * @num: Number of logical sectors to be zero'd.
> + * @num: Number of sectors to be zero'd.
>   *
> - * Rewrite the WRITE SAME descriptor to be an SCT Write Same formatted
> + * Rewrite the WRITE SAME payload to be an SCT Write Same formatted
>   * descriptor.
>   * NOTE: Writes a pattern (0's) in the foreground.
> - *       Large write-same requents can timeout.
> + *
> + * Return: Number of bytes copied into sglist.
>   */
> -static void ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
> +static size_t ata_format_sct_write_same(struct scsi_cmnd *cmd, u64 lba, u64 num)
>  {
> -       u16 *sctpg;
> +       struct scsi_device *sdp = cmd->device;
> +       size_t len = sdp->sector_size;
> +       size_t r;
> +       u16 *buf;
>         unsigned long flags;
>
>         spin_lock_irqsave(&ata_scsi_rbuf_lock, flags);
> -       sctpg = ((void *)ata_scsi_rbuf);
> +       buf = ((void *)ata_scsi_rbuf);
> +
> +       put_unaligned_le16(0x0002,  &buf[0]); /* SCT_ACT_WRITE_SAME */
> +       put_unaligned_le16(0x0101,  &buf[1]); /* WRITE PTRN FG */
> +       put_unaligned_le64(lba,     &buf[2]);
> +       put_unaligned_le64(num,     &buf[6]);
> +       put_unaligned_le32(0u,      &buf[10]); /* pattern */
> +
> +       WARN_ON(len > ATA_SCSI_RBUF_SIZE);
>
> -       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> -       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
> -       put_unaligned_le64(lba,     &sctpg[2]);
> -       put_unaligned_le64(num,     &sctpg[6]);
> -       put_unaligned_le32(0u,      &sctpg[10]);
> +       if (len > ATA_SCSI_RBUF_SIZE)
> +               len = ATA_SCSI_RBUF_SIZE;
>
> -       sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), sctpg, 512);
> +       r = sg_copy_from_buffer(scsi_sglist(cmd), scsi_sg_count(cmd), buf, len);
>         spin_unlock_irqrestore(&ata_scsi_rbuf_lock, flags);
> +
> +       return r;
>  }
>
>  /**
> @@ -3371,11 +3388,13 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>  {
>         struct ata_taskfile *tf = &qc->tf;
>         struct scsi_cmnd *scmd = qc->scsicmd;
> +       struct scsi_device *sdp = scmd->device;
> +       size_t len = sdp->sector_size;
>         struct ata_device *dev = qc->dev;
>         const u8 *cdb = scmd->cmnd;
>         u64 block;
>         u32 n_block;
> -       const u32 trmax = ATA_MAX_TRIM_RNUM;
> +       const u32 trmax = len >> 3;
>         u32 size;
>         u16 fp;
>         u8 bp = 0xff;
> @@ -3420,8 +3439,16 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>         if (!scsi_sg_count(scmd))
>                 goto invalid_param_len;
>
> +       /*
> +        * size must match sector size in bytes
> +        * For DATA SET MANAGEMENT TRIM in ACS-2 nsect (aka count)
> +        * is defined as number of 512 byte blocks to be transferred.
> +        */
>         if (unmap) {
>                 size = ata_format_dsm_trim_descr(scmd, trmax, block, n_block);
> +               if (size != len)
> +                       goto invalid_param_len;
> +
>                 if (ata_ncq_enabled(dev) && ata_fpdma_dsm_supported(dev)) {
>                         /* Newer devices support queued TRIM commands */
>                         tf->protocol = ATA_PROT_NCQ;
> @@ -3441,7 +3468,9 @@ static unsigned int ata_scsi_write_same_xlat(struct ata_queued_cmd *qc)
>                         tf->command = ATA_CMD_DSM;
>                 }
>         } else {
> -               ata_format_sct_write_same(scmd, block, n_block);
> +               size = ata_format_sct_write_same(scmd, block, n_block);
> +               if (size != len)
> +                       goto invalid_param_len;
>
>                 tf->hob_feature = 0;
>                 tf->feature = 0;
> --
> 2.9.3
>

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat
  2016-08-25  6:31                       ` Tom Yan
@ 2016-08-25  7:18                         ` Shaun Tancheff
  0 siblings, 0 replies; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-25  7:18 UTC (permalink / raw)
  To: Tom Yan
  Cc: Martin K. Petersen, Shaun Tancheff, linux-ide, LKML, Tejun Heo,
	Christoph Hellwig, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Thu, Aug 25, 2016 at 1:31 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> On 25 August 2016 at 05:28, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
>> On Wed, Aug 24, 2016 at 12:31 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>>> On 24 August 2016 at 11:33, Martin K. Petersen
>>> <martin.petersen@oracle.com> wrote:
>>>>>>>>> "Tom" == Tom Yan <tom.ty89@gmail.com> writes:
>>>>
>>>> Tom> Nope, SCSI Write Same commands does not have payload (or in SCSI
>>>> Tom> terms, parameter list / data-out buffer).
>>>>
>>>> WRITE SAME has a a payload of 1 logical block (unless NDOB is set but we
>>>> have had no good reason to support that yet).
>>>
>>> Interesting, I wasn't aware of the bit. I just didn't see any
>>> parameter list defined for any of the Write Same commands. Ah wait, it
>>> carries the pattern (the "same") and so.
>>>
>>> Hmm, it doesn't seem like the translation implemented in this patch
>>> series cares about the payload though?
>>
>> As repeated here and elsewhere the payload is:
>>     scsi_sglist(cmd)
>> and it was put there by scsi_init_io() when it called scsi_init_sgtable()
>
> What I mean is:
>
> put_unaligned_le32(0u,      &sctpg[10]);
>
> So even if the payload of the SCSI Write Same commands instruct a
> non-zero pattern writing, SCT Write Same will conveniently ignore that
> do zero-filling anyway. That's what I mean by "doesn't care about the
> payload".

If you would like to add support for that it would be nice. I am not
planning to do so here.

> Though that would only be case with SG_IO though. SCSI Write Same
> issued from block layer (BLKZEROOUT) will be instructing zero-filling
> anyway.

>>>> UNMAP has a payload that varies based on the number of range
>>>> descriptors. The SCSI disk driver only ever issues a single descriptor
>>>> but since libata doesn't support UNMAP this doesn't really come into
>>>> play.
>>>>
>>>> Ideally there would be a way to distinguish between device limits for
>>>> WRITE SAME with the UNMAP bit and for "regular" WRITE SAME. One way to
>>>> do that would be to transition the libata discard implementation over to
>>>> single-range UNMAP, fill out the relevant VPD page B0 fields and leave
>>>> the WRITE SAME bits for writing zeroes.
>>>>
>>>> One reason that has not been particularly compelling is that the WRITE
>>>> SAME payload buffer does double duty to hold the ATA DSM TRIM range
>>>
>>> Huh? Why would the SATL care about its payload buffer for TRIM (i.e.
>>> when the UNMAP bit is set)? Doesn't it just read the LBA and NUMBER OF
>>> BLOCKS field and pack TRIM ranges/payload according to that?
>>>
>>>> descriptors and matches the required ATA payload size. Whereas the UNMAP
>>>
>>> Why would it need to "matches the required ATA payload size"?
>>>
>>>> command would only provide 24 bytes of TRIM range space.
>>>
>>> I don't really follow. The UNMAP descriptor has LBA (8 bytes / 64-bit)
>>> and NUMBER OF BLOCKS (4 bytes / 32-bit) field of the same length as
>>> Write Same (16). Even if the SCSI disk driver will only send one
>>> descriptor, it should work as good as Write Same (16).
>>
>> The "payload" is the data block transferred with the command.
>> The "descriptor" is, in this context, the contents of the payload as
>> it "describes" what the action the command is supposed to perform.
>>
>
> I know right.
>
>> The "payload" contains the "descriptor" that "describes" how
>> DSM TRIM should determine which logical blocks it should UNMAP.
>
> This should only be the case of UNMAP command, but not SCSI Write Same
> with UNMAP bit set. And either way it should not affect how large the
> ATA TRIM payload can be.

The current SATL does not report support for UNMAP.
If you think it should be added please submit a patch.

If you would like to extend the current translate to support sending
multiple blocks of trim data please submit a patch.

>>>> Also, please be careful with transfer lengths, __data_len, etc. As
>>>> mentioned, the transfer length WRITE SAME is typically 512 bytes and
>>>> that's the number of bytes that need to be DMA'ed and transferred over
>>>> the wire by the controller. But from a command completion perspective we
>>>> need to complete however many bytes the command acted upon. Unlike reads
>>>> and writes there is not a 1:1 mapping between the transfer length and
>>>> the affected area. So we do a bit of magic after the buffer has been
>>>> mapped to ensure that the completion byte count matches the number of
>>>> blocks that were affected by the command rather than the size of the
>>>> data buffer in bytes.
>>>>
>>>> --
>>>> Martin K. Petersen      Oracle Linux Engineering
>> --
>> Shaun Tancheff
-- 
Shaun Tancheff

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 2/4] Add support for SCT Write Same
  2016-08-25  6:23             ` Tom Yan
@ 2016-08-25  7:31               ` Shaun Tancheff
  0 siblings, 0 replies; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-25  7:31 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, linux-ide, LKML, Tejun Heo, Christoph Hellwig,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

On Thu, Aug 25, 2016 at 1:23 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> You only fill the bytes that you want to to set explicitly:
>
> +       put_unaligned_le16(0x0002,  &sctpg[0]); /* SCT_ACT_WRITE_SAME */
> +       put_unaligned_le16(0x0101,  &sctpg[1]); /* WRITE PTRN FG */
> +       put_unaligned_le64(lba,     &sctpg[2]);
> +       put_unaligned_le64(num,     &sctpg[6]);
> +       put_unaligned_le32(0u,      &sctpg[10]);
>
> What I doubted is, if you don't memset (zero-fill) the buffer first,
> will other bytes have indeterministic value that causes random
> unexpected behavior?

No.
If there is random or unexpected behaviour the device is broken
and some other remedy, such as blacklisting, is required.

---
Shaun

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4 RESEND] SCT Write Same / DSM Trim
  2016-08-25  7:01       ` Tom Yan
@ 2016-08-25  8:03         ` Shaun Tancheff
  2016-08-25  9:30           ` Tom Yan
  0 siblings, 1 reply; 55+ messages in thread
From: Shaun Tancheff @ 2016-08-25  8:03 UTC (permalink / raw)
  To: Tom Yan
  Cc: Shaun Tancheff, Hannes Reinecke, linux-ide, LKML, Tejun Heo,
	Christoph Hellwig, Martin K . Petersen, Damien Le Moal,
	Josh Bingaman, Hannes Reinecke

On Thu, Aug 25, 2016 at 2:01 AM, Tom Yan <tom.ty89@gmail.com> wrote:
> Really please just drop this patch. There is no rational reason for
> you to associate the maximum payload size to the logical sector size.

Been over this many, many times now. It has to do with the size of
the buffer setup through WRITE SAME in drivers/scsi/sd.c

> And please stop using the ATA SCSI Response Buffer (ata_scsi_rbuf)
> that is used for response to the SCSI layer for SCSI commands that
> won't really interact with the ATA device (i.e. triggers an ATA
> command), while ata_format_sct_write_same() and
> ata_scsi_write_same_xlat() are used for constructing payload that is
> going to be send to the ATA device. Can't you even see that these are
> of different direction to different layer?

Adding a new global buffer where there is one there already is
kind of silly. The buffer already has a perfectly acceptable
spinlock and the time spent copying data around is trivially
small in comparison to the I/O operation so there is not
likely to be any contention over the buffer.

It is memory. Why do you think ata_scsi_rbuf is so special?

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 3/4 RESEND] SCT Write Same / DSM Trim
  2016-08-25  8:03         ` Shaun Tancheff
@ 2016-08-25  9:30           ` Tom Yan
  0 siblings, 0 replies; 55+ messages in thread
From: Tom Yan @ 2016-08-25  9:30 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: Shaun Tancheff, Hannes Reinecke, linux-ide, LKML, Tejun Heo,
	Christoph Hellwig, Martin K . Petersen, Damien Le Moal,
	Josh Bingaman, Hannes Reinecke

On 25 August 2016 at 16:03, Shaun Tancheff <shaun.tancheff@seagate.com> wrote:
> On Thu, Aug 25, 2016 at 2:01 AM, Tom Yan <tom.ty89@gmail.com> wrote:
>> Really please just drop this patch. There is no rational reason for
>> you to associate the maximum payload size to the logical sector size.
>
> Been over this many, many times now. It has to do with the size of
> the buffer setup through WRITE SAME in drivers/scsi/sd.c

Is it because we are re-using the scatter-gather list that the the sd
driver used for issuing write same / unmap commands to our SATL for
issuing ATA commands from our SATL to the AHCI host / ATA device? And
that the scatter-gather list cannot be extended on-demand? (Which
sounds pathetically defected)

>
>> And please stop using the ATA SCSI Response Buffer (ata_scsi_rbuf)
>> that is used for response to the SCSI layer for SCSI commands that
>> won't really interact with the ATA device (i.e. triggers an ATA
>> command), while ata_format_sct_write_same() and
>> ata_scsi_write_same_xlat() are used for constructing payload that is
>> going to be send to the ATA device. Can't you even see that these are
>> of different direction to different layer?
>
> Adding a new global buffer where there is one there already is

I never said global. On the contrary I almost explicitly stated that
you should not use a global buffer for that (and let
ata_format_sct_write_same() and ata_scsi_write_same_xlat() share it).

> kind of silly. The buffer already has a perfectly acceptable
> spinlock and the time spent copying data around is trivially
> small in comparison to the I/O operation so there is not
> likely to be any contention over the buffer.
>
> It is memory. Why do you think ata_scsi_rbuf is so special?

Practically it _might_ work just fine. But your are messing up the
whole layer/logic model by doing that.

But seriously, never mind. It's almost hilarious that:

1. Martin has been insisting on that we should stay at 1-block
payload, while he then conveniently ignore the fact that ACS
specifically instructed vendors to report the payload limit in
IDENTIFY DEVICE data in terms of 512-byte block (in other word, always
the maximum number of ranges the device can handle divided by 64), and
suggested that we should statically advertise 8-block payload (in
512-byte / ACS sense) for 4Kn devices.

2. And then you conveniently convert his idea on the Maximum Write
Same Length to the buffer size limit, without even taking care of the
VPD field.

And this is just _one_ example.

I am really done doing this. I'll just let others conveniently review
it and let the series through. Sorry for all the annoyance.

^ permalink raw reply	[flat|nested] 55+ messages in thread

* Re: [PATCH v6 0/4] SCT Write Same
  2016-08-22  4:23 [PATCH v6 0/4] SCT Write Same Shaun Tancheff
                   ` (4 preceding siblings ...)
  2016-08-22  6:32 ` [PATCH v6 0/4] SCT Write Same Hannes Reinecke
@ 2016-08-25 15:28 ` Tejun Heo
  5 siblings, 0 replies; 55+ messages in thread
From: Tejun Heo @ 2016-08-25 15:28 UTC (permalink / raw)
  To: Shaun Tancheff
  Cc: linux-ide, linux-kernel, Christoph Hellwig, Tom Yan,
	Martin K . Petersen, Damien Le Moal, Hannes Reinecke,
	Josh Bingaman, Hannes Reinecke

Hello,

On Sun, Aug 21, 2016 at 11:23:17PM -0500, Shaun Tancheff wrote:
> At some point the method of issuing Write Same for ATA drives changed.
> Currently write same is commonly available via SCT so expose the SCT
> capabilities and use SCT Write Same when it is available.
> 
> This is useful for zoned based media that prefers to support discard
> with lbprz set, aka discard zeroes data by mapping discard operations to
> reset write pointer operations. Conventional zones that do not support
> reset write pointer can still honor the discard zeroes data by issuing
> a write same over the zone.
> 
> It may also be nice to know if various controllers that currently
> disable WRITE SAME will work with the SCT Write Same code path:
>   aacraid, arcmsr, megaraid, 3w-9xxx, 3w-sas, 3w-xxxx, gdth, hpsa, ips,
>   megaraid, pmcraid, storvsc_drv

Applied 1-4 to libata/for-4.9.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 55+ messages in thread

end of thread, other threads:[~2016-08-25 15:28 UTC | newest]

Thread overview: 55+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22  4:23 [PATCH v6 0/4] SCT Write Same Shaun Tancheff
2016-08-22  4:23 ` [PATCH v6 1/4] libata: Safely overwrite attached page in WRITE SAME xlat Shaun Tancheff
2016-08-22  6:27   ` Hannes Reinecke
2016-08-22 19:27   ` Tom Yan
2016-08-22 19:51     ` Shaun Tancheff
2016-08-22 20:12       ` Tom Yan
2016-08-22 21:20   ` Tom Yan
2016-08-22 22:00     ` Shaun Tancheff
2016-08-22 23:49       ` Tom Yan
2016-08-23  1:01         ` Shaun Tancheff
2016-08-23  5:26           ` Tom Yan
2016-08-23  6:20             ` Shaun Tancheff
2016-08-23  7:53               ` Tom Yan
2016-08-23  8:42                 ` Shaun Tancheff
2016-08-23  9:04                   ` Tom Yan
2016-08-24  3:33                 ` Martin K. Petersen
2016-08-24  5:31                   ` Tom Yan
2016-08-24 21:28                     ` Shaun Tancheff
2016-08-25  6:31                       ` Tom Yan
2016-08-25  7:18                         ` Shaun Tancheff
2016-08-22  4:23 ` [PATCH v6 2/4] Add support for SCT Write Same Shaun Tancheff
2016-08-22  6:27   ` Hannes Reinecke
2016-08-22 19:20   ` Tom Yan
2016-08-22 19:43     ` Shaun Tancheff
2016-08-22 20:14       ` Tom Yan
2016-08-22 22:07         ` Shaun Tancheff
2016-08-22 23:09           ` Tom Yan
2016-08-23  0:36             ` Shaun Tancheff
2016-08-23  5:55               ` Tom Yan
2016-08-23  6:11                 ` Shaun Tancheff
2016-08-23  7:57                   ` Tom Yan
2016-08-23 10:37   ` Tom Yan
2016-08-23 10:56     ` Shaun Tancheff
2016-08-24  5:57       ` Tom Yan
2016-08-24  6:10         ` Tom Yan
2016-08-24 22:04           ` Shaun Tancheff
2016-08-25  6:23             ` Tom Yan
2016-08-25  7:31               ` Shaun Tancheff
2016-08-22  4:23 ` [PATCH v6 3/4] SCT Write Same / DSM Trim Shaun Tancheff
2016-08-22  6:30   ` Hannes Reinecke
2016-08-24 18:08     ` [PATCH v6 3/4 RESEND] " Shaun Tancheff
2016-08-25  7:01       ` Tom Yan
2016-08-25  8:03         ` Shaun Tancheff
2016-08-25  9:30           ` Tom Yan
2016-08-22  8:31   ` [PATCH v6 3/4] " Tom Yan
2016-08-22  8:33     ` Tom Yan
2016-08-22 15:04       ` Shaun Tancheff
2016-08-22 17:02         ` Tom Yan
2016-08-22 18:00           ` Shaun Tancheff
2016-08-22 18:52             ` Tom Yan
2016-08-22 20:57               ` Tom Yan
2016-08-22  4:23 ` [PATCH v6 4/4] SCT Write Same handle ATA_DFLAG_PIO Shaun Tancheff
2016-08-22  6:31   ` Hannes Reinecke
2016-08-22  6:32 ` [PATCH v6 0/4] SCT Write Same Hannes Reinecke
2016-08-25 15:28 ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.