All of lore.kernel.org
 help / color / mirror / Atom feed
* Discard/trim/thin provisioning update
@ 2010-08-19 15:48 Martin K. Petersen
  2010-08-19 15:48 ` [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP Martin K. Petersen
                   ` (5 more replies)
  0 siblings, 6 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 15:48 UTC (permalink / raw)
  To: linux-ide, linux-scsi, hch

This patchkit updates a few things in the discard/trim/thin provisioning
department.  First of all we add support for the newly minted Thin
Provisioning VPD.  The other main feature is support for bigger TRIM
payloads on ATA devices that support it.  And finally there's a few
bugfixes for our SCSI VPD wrapper (will we ever get this right?).

[PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
[PATCH 2/6] libata: Report supported TRIM payload size
[PATCH 3/6] block: Make max_discard_sectors sector_t
[PATCH 4/6] scsi: Fix VPD page wrapper
[PATCH 5/6] scsi_debug: Update thin provisioning support
[PATCH 6/6] sd: Update thin provisioning support

-- 
Martin K. Petersen	Oracle Linux Engineering


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
  2010-08-19 15:48 Discard/trim/thin provisioning update Martin K. Petersen
@ 2010-08-19 15:48 ` Martin K. Petersen
  2010-08-19 16:15   ` James Bottomley
  2010-08-23  8:32   ` Tejun Heo
  2010-08-19 15:48 ` [PATCH 2/6] libata: Report supported TRIM payload size Martin K. Petersen
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 15:48 UTC (permalink / raw)
  To: linux-ide, linux-scsi, hch; +Cc: Martin K. Petersen

Until now identifying that a device supports WRITE SAME(16) with the
UNMAP bit set has been black magic.  Implement support for the new (SBC3
r24) Thin Provisioning VPD page and the TPWS bit.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/ata/libata-scsi.c |   13 +++++++++++++
 1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index a54273d..e280ae6 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2001,6 +2001,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
 		0x89,	/* page 0x89, ata info page */
 		0xb0,	/* page 0xb0, block limits page */
 		0xb1,	/* page 0xb1, block device characteristics page */
+		0xb2,	/* page 0xb2, thin provisioning page */
 	};
 
 	rbuf[3] = sizeof(pages);	/* number of supported VPD pages */
@@ -2172,6 +2173,15 @@ static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
 	return 0;
 }
 
+static unsigned int ata_scsiop_inq_b2(struct ata_scsi_args *args, u8 *rbuf)
+{
+	rbuf[1] = 0xb1;
+	rbuf[3] = 0x3c;
+	rbuf[5] = 1 << 6;	/* TPWS */
+
+	return 0;
+}
+
 /**
  *	ata_scsiop_noop - Command handler that simply returns success.
  *	@args: device IDENTIFY data / SCSI command of interest.
@@ -3250,6 +3260,9 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd,
 		case 0xb1:
 			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b1);
 			break;
+		case 0xb2:
+			ata_scsi_rbuf_fill(&args, ata_scsiop_inq_b2);
+			break;
 		default:
 			ata_scsi_invalid_field(cmd, done);
 			break;
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-19 15:48 Discard/trim/thin provisioning update Martin K. Petersen
  2010-08-19 15:48 ` [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP Martin K. Petersen
@ 2010-08-19 15:48 ` Martin K. Petersen
  2010-08-19 17:27   ` Greg Freemyer
  2010-08-23 13:50   ` Christoph Hellwig
  2010-08-19 15:48 ` [PATCH 3/6] block: Make max_discard_sectors sector_t Martin K. Petersen
                   ` (3 subsequent siblings)
  5 siblings, 2 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 15:48 UTC (permalink / raw)
  To: linux-ide, linux-scsi, hch; +Cc: Martin K. Petersen

ATA IDENTIFY DEVICE word 105 contains the number of 512-byte blocks of
TRIM payload information the device can accept in one command.  Use this
value to enable payloads > 512 bytes.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/ata/libata-scsi.c |    7 ++++++-
 include/linux/ata.h       |    9 +++++++++
 2 files changed, 15 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index e280ae6..145f099 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -2152,7 +2152,12 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
 	 * with the unmap bit set.
 	 */
 	if (ata_id_has_trim(args->id)) {
-		put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
+		unsigned int blocks;
+
+		/* Default to 1 if unspecified in word 105.  Cap at 1 page. */
+		blocks = clamp(ata_id_trim_range_blocks(args->id), 1U, 8U);
+
+		put_unaligned_be32(65535 * 512 / 8 * blocks, &rbuf[20]);
 		put_unaligned_be32(1, &rbuf[28]);
 	}
 
diff --git a/include/linux/ata.h b/include/linux/ata.h
index fe6e681..5584356 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -88,6 +88,7 @@ enum {
 	ATA_ID_HW_CONFIG	= 93,
 	ATA_ID_SPG		= 98,
 	ATA_ID_LBA_CAPACITY_2	= 100,
+	ATA_ID_TRIM_RANGE_BLKS	= 105,
 	ATA_ID_SECTOR_SIZE	= 106,
 	ATA_ID_LAST_LUN		= 126,
 	ATA_ID_DLF		= 128,
@@ -827,6 +828,14 @@ static inline int ata_id_has_zero_after_trim(const u16 *id)
 	return 0;
 }
 
+static inline unsigned int ata_id_trim_range_blocks(const u16 *id)
+{
+	if (ata_id_has_trim(id))
+		return id[ATA_ID_TRIM_RANGE_BLKS];
+
+	return 0;
+}
+
 static inline int ata_id_current_chs_valid(const u16 *id)
 {
 	/* For ATA-1 devices, if the INITIALIZE DEVICE PARAMETERS command
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/6] block: Make max_discard_sectors sector_t
  2010-08-19 15:48 Discard/trim/thin provisioning update Martin K. Petersen
  2010-08-19 15:48 ` [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP Martin K. Petersen
  2010-08-19 15:48 ` [PATCH 2/6] libata: Report supported TRIM payload size Martin K. Petersen
@ 2010-08-19 15:48 ` Martin K. Petersen
  2010-08-20  8:59   ` Christoph Hellwig
  2010-08-19 15:48 ` [PATCH 4/6] scsi: Fix VPD page wrapper Martin K. Petersen
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 15:48 UTC (permalink / raw)
  To: linux-ide, linux-scsi, hch; +Cc: Martin K. Petersen

With 4KB ATA TRIM payloads we can express ranges above the 32 bits
allowed by SCSI.  Convert max_discard_sectors to sector_t.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 block/blk-lib.c        |    5 +++--
 block/blk-settings.c   |    2 +-
 include/linux/blkdev.h |    4 ++--
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/block/blk-lib.c b/block/blk-lib.c
index d0216b9..dbed1e9 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -54,8 +54,9 @@ int blkdev_issue_discard(struct block_device *bdev, sector_t sector,
 
 	while (nr_sects && !ret) {
 		unsigned int sector_size = q->limits.logical_block_size;
-		unsigned int max_discard_sectors =
-			min(q->limits.max_discard_sectors, UINT_MAX >> 9);
+		sector_t max_discard_sectors =
+			min(q->limits.max_discard_sectors,
+			    (sector_t)UINT_MAX >> 9);
 
 		bio = bio_alloc(gfp_mask, 1);
 		if (!bio)
diff --git a/block/blk-settings.c b/block/blk-settings.c
index f5ed5a1..a4a2f42 100644
--- a/block/blk-settings.c
+++ b/block/blk-settings.c
@@ -246,7 +246,7 @@ EXPORT_SYMBOL(blk_queue_max_hw_sectors);
  * @max_discard_sectors: maximum number of sectors to discard
  **/
 void blk_queue_max_discard_sectors(struct request_queue *q,
-		unsigned int max_discard_sectors)
+		sector_t max_discard_sectors)
 {
 	q->limits.max_discard_sectors = max_discard_sectors;
 }
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 09a8402..d083146 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -316,7 +316,7 @@ struct queue_limits {
 	unsigned int		alignment_offset;
 	unsigned int		io_min;
 	unsigned int		io_opt;
-	unsigned int		max_discard_sectors;
+	sector_t		max_discard_sectors;
 	unsigned int		discard_granularity;
 	unsigned int		discard_alignment;
 
@@ -934,7 +934,7 @@ extern void blk_queue_max_hw_sectors(struct request_queue *, unsigned int);
 extern void blk_queue_max_segments(struct request_queue *, unsigned short);
 extern void blk_queue_max_segment_size(struct request_queue *, unsigned int);
 extern void blk_queue_max_discard_sectors(struct request_queue *q,
-		unsigned int max_discard_sectors);
+		sector_t max_discard_sectors);
 extern void blk_queue_logical_block_size(struct request_queue *, unsigned short);
 extern void blk_queue_physical_block_size(struct request_queue *, unsigned short);
 extern void blk_queue_alignment_offset(struct request_queue *q,
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/6] scsi: Fix VPD page wrapper
  2010-08-19 15:48 Discard/trim/thin provisioning update Martin K. Petersen
                   ` (2 preceding siblings ...)
  2010-08-19 15:48 ` [PATCH 3/6] block: Make max_discard_sectors sector_t Martin K. Petersen
@ 2010-08-19 15:48 ` Martin K. Petersen
  2010-08-19 15:49 ` [PATCH 5/6] scsi_debug: Update thin provisioning support Martin K. Petersen
  2010-08-19 15:49 ` [PATCH 6/6] sd: " Martin K. Petersen
  5 siblings, 0 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 15:48 UTC (permalink / raw)
  To: linux-ide, linux-scsi, hch; +Cc: Martin K. Petersen

Fix two bugs in the VPD page wrapper:

    - Don't return failure if the user asked for page 0

    - The end of buffer check failed to account for the page header size
      and consequently didn't work

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/scsi/scsi.c |    4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi.c b/drivers/scsi/scsi.c
index ad0ed21..348fba0 100644
--- a/drivers/scsi/scsi.c
+++ b/drivers/scsi/scsi.c
@@ -1046,13 +1046,13 @@ int scsi_get_vpd_page(struct scsi_device *sdev, u8 page, unsigned char *buf,
 
 	/* If the user actually wanted this page, we can skip the rest */
 	if (page == 0)
-		return -EINVAL;
+		return 0;
 
 	for (i = 0; i < min((int)buf[3], buf_len - 4); i++)
 		if (buf[i + 4] == page)
 			goto found;
 
-	if (i < buf[3] && i > buf_len)
+	if (i < buf[3] && i >= buf_len - 4)
 		/* ran off the end of the buffer, give us benefit of doubt */
 		goto found;
 	/* The device claims it doesn't support the requested page */
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/6] scsi_debug: Update thin provisioning support
  2010-08-19 15:48 Discard/trim/thin provisioning update Martin K. Petersen
                   ` (3 preceding siblings ...)
  2010-08-19 15:48 ` [PATCH 4/6] scsi: Fix VPD page wrapper Martin K. Petersen
@ 2010-08-19 15:49 ` Martin K. Petersen
  2010-08-23 16:37   ` Douglas Gilbert
  2010-08-19 15:49 ` [PATCH 6/6] sd: " Martin K. Petersen
  5 siblings, 1 reply; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 15:49 UTC (permalink / raw)
  To: linux-ide, linux-scsi, hch; +Cc: Martin K. Petersen

The previous thin provisioning support was not very user friendly
because it depended on all the relevant options being set on the command
line.

Implement support for the Thin Provisioning VPD page from SBC3 r24 and
add module options for TPU (UNMAP) and TPWS (WRITE SAME (16) with UNMAP
bit).  This allows us to have sane default and to enable thin
provisioning with a simple tpu=1 or tpws=1 on the command line depending
on whether we want UNMAP or WRITE SAME behavior.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/scsi/scsi_debug.c |  100 +++++++++++++++++++++++++++++++--------------
 1 files changed, 69 insertions(+), 31 deletions(-)

diff --git a/drivers/scsi/scsi_debug.c b/drivers/scsi/scsi_debug.c
index 136329b..dbade50 100644
--- a/drivers/scsi/scsi_debug.c
+++ b/drivers/scsi/scsi_debug.c
@@ -109,10 +109,12 @@ static const char * scsi_debug_version_date = "20100324";
 #define DEF_PHYSBLK_EXP 0
 #define DEF_LOWEST_ALIGNED 0
 #define DEF_OPT_BLKS 64
-#define DEF_UNMAP_MAX_BLOCKS 0
-#define DEF_UNMAP_MAX_DESC 0
-#define DEF_UNMAP_GRANULARITY 0
+#define DEF_UNMAP_MAX_BLOCKS 0xFFFFFFFF
+#define DEF_UNMAP_MAX_DESC 256
+#define DEF_UNMAP_GRANULARITY 1
 #define DEF_UNMAP_ALIGNMENT 0
+#define DEF_TPWS 0
+#define DEF_TPU 0
 
 /* bit mask values for scsi_debug_opts */
 #define SCSI_DEBUG_OPT_NOISE   1
@@ -177,10 +179,12 @@ static int scsi_debug_ato = DEF_ATO;
 static int scsi_debug_physblk_exp = DEF_PHYSBLK_EXP;
 static int scsi_debug_lowest_aligned = DEF_LOWEST_ALIGNED;
 static int scsi_debug_opt_blks = DEF_OPT_BLKS;
-static int scsi_debug_unmap_max_desc = DEF_UNMAP_MAX_DESC;
-static int scsi_debug_unmap_max_blocks = DEF_UNMAP_MAX_BLOCKS;
-static int scsi_debug_unmap_granularity = DEF_UNMAP_GRANULARITY;
-static int scsi_debug_unmap_alignment = DEF_UNMAP_ALIGNMENT;
+static unsigned int scsi_debug_unmap_max_desc = DEF_UNMAP_MAX_DESC;
+static unsigned int scsi_debug_unmap_max_blocks = DEF_UNMAP_MAX_BLOCKS;
+static unsigned int scsi_debug_unmap_granularity = DEF_UNMAP_GRANULARITY;
+static unsigned int scsi_debug_unmap_alignment = DEF_UNMAP_ALIGNMENT;
+static unsigned int scsi_debug_tpws = DEF_TPWS;
+static unsigned int scsi_debug_tpu = DEF_TPU;
 
 static int scsi_debug_cmnd_count = 0;
 
@@ -723,16 +727,9 @@ static int inquiry_evpd_b0(unsigned char * arr)
 	/* Optimal Transfer Length */
 	put_unaligned_be32(scsi_debug_opt_blks, &arr[8]);
 
-	if (scsi_debug_unmap_max_desc) {
-		unsigned int blocks;
-
-		if (scsi_debug_unmap_max_blocks)
-			blocks = scsi_debug_unmap_max_blocks;
-		else
-			blocks = 0xffffffff;
-
+	if (scsi_debug_tpu) {
 		/* Maximum Unmap LBA Count */
-		put_unaligned_be32(blocks, &arr[16]);
+		put_unaligned_be32(scsi_debug_unmap_max_blocks, &arr[16]);
 
 		/* Maximum Unmap Block Descriptor Count */
 		put_unaligned_be32(scsi_debug_unmap_max_desc, &arr[20]);
@@ -745,10 +742,9 @@ static int inquiry_evpd_b0(unsigned char * arr)
 	}
 
 	/* Optimal Unmap Granularity */
-	if (scsi_debug_unmap_granularity) {
-		put_unaligned_be32(scsi_debug_unmap_granularity, &arr[24]);
-		return 0x3c; /* Mandatory page length for thin provisioning */
-	}
+	put_unaligned_be32(scsi_debug_unmap_granularity, &arr[24]);
+
+	return 0x3c; /* Mandatory page length for thin provisioning */
 
 	return sizeof(vpdb0_data);
 }
@@ -765,6 +761,21 @@ static int inquiry_evpd_b1(unsigned char *arr)
 	return 0x3c;
 }
 
+/* Thin provisioning VPD page (SBC-3) */
+static int inquiry_evpd_b2(unsigned char *arr)
+{
+	memset(arr, 0, 0x8);
+	arr[0] = 0;			/* threshold exponent */
+
+	if (scsi_debug_tpu)
+		arr[1] = 1 << 7;
+
+	if (scsi_debug_tpws)
+		arr[1] |= 1 << 6;
+
+	return 0x8;
+}
+
 #define SDEBUG_LONG_INQ_SZ 96
 #define SDEBUG_MAX_INQ_ARR_SZ 584
 
@@ -820,6 +831,7 @@ static int resp_inquiry(struct scsi_cmnd * scp, int target,
 			arr[n++] = 0x89;  /* ATA information */
 			arr[n++] = 0xb0;  /* Block limits (SBC) */
 			arr[n++] = 0xb1;  /* Block characteristics (SBC) */
+			arr[n++] = 0xb2;  /* Thin provisioning (SBC) */
 			arr[3] = n - 4;	  /* number of supported VPD pages */
 		} else if (0x80 == cmd[2]) { /* unit serial number */
 			arr[1] = cmd[2];	/*sanity */
@@ -867,6 +879,9 @@ static int resp_inquiry(struct scsi_cmnd * scp, int target,
 		} else if (0xb1 == cmd[2]) { /* Block characteristics (SBC) */
 			arr[1] = cmd[2];        /*sanity */
 			arr[3] = inquiry_evpd_b1(&arr[4]);
+		} else if (0xb2 == cmd[2]) { /* Thin provisioning (SBC) */
+			arr[1] = cmd[2];        /*sanity */
+			arr[3] = inquiry_evpd_b2(&arr[4]);
 		} else {
 			/* Illegal request, invalid field in cdb */
 			mk_sense_buffer(devip, ILLEGAL_REQUEST,
@@ -1038,7 +1053,7 @@ static int resp_readcap16(struct scsi_cmnd * scp,
 	arr[13] = scsi_debug_physblk_exp & 0xf;
 	arr[14] = (scsi_debug_lowest_aligned >> 8) & 0x3f;
 
-	if (scsi_debug_unmap_granularity)
+	if (scsi_debug_tpu || scsi_debug_tpws)
 		arr[14] |= 0x80; /* TPE */
 
 	arr[15] = scsi_debug_lowest_aligned & 0xff;
@@ -2706,6 +2721,8 @@ module_param_named(unmap_max_blocks, scsi_debug_unmap_max_blocks, int, S_IRUGO);
 module_param_named(unmap_max_desc, scsi_debug_unmap_max_desc, int, S_IRUGO);
 module_param_named(unmap_granularity, scsi_debug_unmap_granularity, int, S_IRUGO);
 module_param_named(unmap_alignment, scsi_debug_unmap_alignment, int, S_IRUGO);
+module_param_named(tpu, scsi_debug_tpu, int, S_IRUGO);
+module_param_named(tpws, scsi_debug_tpws, int, S_IRUGO);
 
 MODULE_AUTHOR("Eric Youngdale + Douglas Gilbert");
 MODULE_DESCRIPTION("SCSI debug adapter driver");
@@ -2737,10 +2754,12 @@ MODULE_PARM_DESC(dix, "data integrity extensions mask (def=0)");
 MODULE_PARM_DESC(dif, "data integrity field type: 0-3 (def=0)");
 MODULE_PARM_DESC(guard, "protection checksum: 0=crc, 1=ip (def=0)");
 MODULE_PARM_DESC(ato, "application tag ownership: 0=disk 1=host (def=1)");
-MODULE_PARM_DESC(unmap_max_blocks, "max # of blocks can be unmapped in one cmd (def=0)");
-MODULE_PARM_DESC(unmap_max_desc, "max # of ranges that can be unmapped in one cmd (def=0)");
-MODULE_PARM_DESC(unmap_granularity, "thin provisioning granularity in blocks (def=0)");
+MODULE_PARM_DESC(unmap_max_blocks, "max # of blocks can be unmapped in one cmd (def=0xffffffff)");
+MODULE_PARM_DESC(unmap_max_desc, "max # of ranges that can be unmapped in one cmd (def=256)");
+MODULE_PARM_DESC(unmap_granularity, "thin provisioning granularity in blocks (def=1)");
 MODULE_PARM_DESC(unmap_alignment, "lowest aligned thin provisioning lba (def=0)");
+MODULE_PARM_DESC(tpu, "enable TP, support UNMAP command (def=0)");
+MODULE_PARM_DESC(tpws, "enable TP, support WRITE SAME(16) with UNMAP bit (def=0)");
 
 static char sdebug_info[256];
 
@@ -3128,7 +3147,7 @@ static ssize_t sdebug_map_show(struct device_driver *ddp, char *buf)
 {
 	ssize_t count;
 
-	if (scsi_debug_unmap_granularity == 0)
+	if (scsi_debug_tpu == 0 && scsi_debug_tpws == 0)
 		return scnprintf(buf, PAGE_SIZE, "0-%u\n",
 				 sdebug_store_sectors);
 
@@ -3320,10 +3339,21 @@ static int __init scsi_debug_init(void)
 		memset(dif_storep, 0xff, dif_size);
 	}
 
-	if (scsi_debug_unmap_granularity) {
+	/* Thin Provisioning */
+	if (scsi_debug_tpu || scsi_debug_tpws) {
 		unsigned int map_bytes;
 
-		if (scsi_debug_unmap_granularity < scsi_debug_unmap_alignment) {
+		scsi_debug_unmap_max_blocks =
+			clamp(scsi_debug_unmap_max_blocks, 0U, 0xffffffffU);
+
+		scsi_debug_unmap_max_desc =
+			clamp(scsi_debug_unmap_max_desc, 0U, 256U);
+
+		scsi_debug_unmap_granularity =
+			clamp(scsi_debug_unmap_granularity, 1U, 0xffffffffU);
+
+		if (scsi_debug_unmap_alignment &&
+		    scsi_debug_unmap_granularity < scsi_debug_unmap_alignment) {
 			printk(KERN_ERR
 			       "%s: ERR: unmap_granularity < unmap_alignment\n",
 			       __func__);
@@ -3640,7 +3670,7 @@ int scsi_debug_queuecommand(struct scsi_cmnd *SCpnt, done_funct_t done)
 			errsts = resp_readcap16(SCpnt, devip);
 		else if (cmd[1] == SAI_GET_LBA_STATUS) {
 
-			if (scsi_debug_unmap_max_desc == 0) {
+			if (scsi_debug_tpu == 0 && scsi_debug_tpws == 0) {
 				mk_sense_buffer(devip, ILLEGAL_REQUEST,
 						INVALID_COMMAND_OPCODE, 0);
 				errsts = check_condition_result;
@@ -3751,8 +3781,16 @@ write:
 		}
 		break;
 	case WRITE_SAME_16:
-		if (cmd[1] & 0x8)
-			unmap = 1;
+		if (cmd[1] & 0x8) {
+			if (scsi_debug_tpws == 0) {
+				mk_sense_buffer(devip, ILLEGAL_REQUEST,
+						INVALID_FIELD_IN_CDB, 0);
+				errsts = check_condition_result;
+			} else
+				unmap = 1;
+		}
+		if (errsts)
+			break;
 		/* fall through */
 	case WRITE_SAME:
 		errsts = check_readiness(SCpnt, 0, devip);
@@ -3766,7 +3804,7 @@ write:
 		if (errsts)
 			break;
 
-		if (scsi_debug_unmap_max_desc == 0) {
+		if (scsi_debug_unmap_max_desc == 0 || scsi_debug_tpu == 0) {
 			mk_sense_buffer(devip, ILLEGAL_REQUEST,
 					INVALID_COMMAND_OPCODE, 0);
 			errsts = check_condition_result;
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/6] sd: Update thin provisioning support
  2010-08-19 15:48 Discard/trim/thin provisioning update Martin K. Petersen
                   ` (4 preceding siblings ...)
  2010-08-19 15:49 ` [PATCH 5/6] scsi_debug: Update thin provisioning support Martin K. Petersen
@ 2010-08-19 15:49 ` Martin K. Petersen
  2010-08-20  9:00   ` Christoph Hellwig
  5 siblings, 1 reply; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 15:49 UTC (permalink / raw)
  To: linux-ide, linux-scsi, hch; +Cc: Martin K. Petersen

Add support for the Thin Provisioning VPD page and use the TPU and TPWS
bits to switch between UNMAP and WRITE SAME(16) for discards.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
---
 drivers/scsi/sd.c |   40 +++++++++++++++++++++++++++++++++++-----
 drivers/scsi/sd.h |    2 ++
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index 8802e48..7d6a61c 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1991,14 +1991,19 @@ static void sd_read_block_limits(struct scsi_disk *sdkp)
 		lba_count = get_unaligned_be32(&buffer[20]);
 		desc_count = get_unaligned_be32(&buffer[24]);
 
-		if (lba_count) {
+		if (sdkp->tpu && desc_count && lba_count)
+			sdkp->unmap = 1;
+		else if (!sdkp->tpws) {
+			sd_printk(KERN_ERR, sdkp, "Thin provisioning is "  \
+				  "enabled but neither TPU, nor TPWS are " \
+				  "set. Disabling discard!\n");
+			goto out;
+		}
+
+		if (lba_count)
 			q->limits.max_discard_sectors =
 				lba_count * sector_sz >> 9;
 
-			if (desc_count)
-				sdkp->unmap = 1;
-		}
-
 		granularity = get_unaligned_be32(&buffer[28]);
 
 		if (granularity)
@@ -2039,6 +2044,30 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 	kfree(buffer);
 }
 
+/**
+ * sd_read_thin_provisioning - Query thin provisioning VPD page
+ * @disk: disk to query
+ */
+static void sd_read_thin_provisioning(struct scsi_disk *sdkp)
+{
+	unsigned char *buffer;
+	const int vpd_len = 8;
+
+	if (sdkp->thin_provisioning == 0)
+		return;
+
+	buffer = kmalloc(vpd_len, GFP_KERNEL);
+
+	if (!buffer || scsi_get_vpd_page(sdkp->device, 0xb2, buffer, vpd_len))
+		goto out;
+
+	sdkp->tpu   = (buffer[5] >> 7) & 1;	/* UNMAP */
+	sdkp->tpws  = (buffer[5] >> 6) & 1;	/* WRITE SAME(16) with UNMAP */
+
+ out:
+	kfree(buffer);
+}
+
 static int sd_try_extended_inquiry(struct scsi_device *sdp)
 {
 	/*
@@ -2090,6 +2119,7 @@ static int sd_revalidate_disk(struct gendisk *disk)
 		sd_read_capacity(sdkp, buffer);
 
 		if (sd_try_extended_inquiry(sdp)) {
+			sd_read_thin_provisioning(sdkp);
 			sd_read_block_limits(sdkp);
 			sd_read_block_characteristics(sdkp);
 		}
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 43d3caf..9646062 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -62,6 +62,8 @@ struct scsi_disk {
 	unsigned	first_scan : 1;
 	unsigned	thin_provisioning : 1;
 	unsigned	unmap : 1;
+	unsigned	tpws : 1;
+	unsigned	tpu : 1;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
  2010-08-19 15:48 ` [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP Martin K. Petersen
@ 2010-08-19 16:15   ` James Bottomley
  2010-08-19 16:20     ` Martin K. Petersen
  2010-08-23  8:32   ` Tejun Heo
  1 sibling, 1 reply; 27+ messages in thread
From: James Bottomley @ 2010-08-19 16:15 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On Thu, 2010-08-19 at 11:48 -0400, Martin K. Petersen wrote:
> Until now identifying that a device supports WRITE SAME(16) with the
> UNMAP bit set has been black magic.  Implement support for the new (SBC3
> r24) Thin Provisioning VPD page and the TPWS bit.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  drivers/ata/libata-scsi.c |   13 +++++++++++++
>  1 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index a54273d..e280ae6 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2001,6 +2001,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
>  		0x89,	/* page 0x89, ata info page */
>  		0xb0,	/* page 0xb0, block limits page */
>  		0xb1,	/* page 0xb1, block device characteristics page */
> +		0xb2,	/* page 0xb2, thin provisioning page */

Should this be unconditional?  Shouldn't it be conditioned on the
current supported trim indicator (which is word 169 being non-zero
unless they've changed it yet again)?

James



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
  2010-08-19 16:15   ` James Bottomley
@ 2010-08-19 16:20     ` Martin K. Petersen
  0 siblings, 0 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 16:20 UTC (permalink / raw)
  To: James Bottomley; +Cc: Martin K. Petersen, linux-ide, linux-scsi, hch

>>>>> "James" == James Bottomley <James.Bottomley@suse.de> writes:

James> Should this be unconditional?  Shouldn't it be conditioned on the
James> current supported trim indicator (which is word 169 being
James> non-zero unless they've changed it yet again)?

I thought about it.  But the VPD page isn't gated on thin provisioning
being enabled.

With the SCSI hat on we won't enable discard unless TPE=1 in READ
CAPACITY(16).  Regardless of whether the device reports TPU=1 or TPWS=1
in the TP VPD.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-19 15:48 ` [PATCH 2/6] libata: Report supported TRIM payload size Martin K. Petersen
@ 2010-08-19 17:27   ` Greg Freemyer
  2010-08-19 18:05     ` Martin K. Petersen
  2010-08-23 13:50   ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Greg Freemyer @ 2010-08-19 17:27 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On Thu, Aug 19, 2010 at 11:48 AM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
> ATA IDENTIFY DEVICE word 105 contains the number of 512-byte blocks of
> TRIM payload information the device can accept in one command.  Use this
> value to enable payloads > 512 bytes.
>
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  drivers/ata/libata-scsi.c |    7 ++++++-
>  include/linux/ata.h       |    9 +++++++++
>  2 files changed, 15 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index e280ae6..145f099 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2152,7 +2152,12 @@ static unsigned int ata_scsiop_inq_b0(struct ata_scsi_args *args, u8 *rbuf)
>         * with the unmap bit set.
>         */
>        if (ata_id_has_trim(args->id)) {
> -               put_unaligned_be32(65535 * 512 / 8, &rbuf[20]);
> +               unsigned int blocks;

since blocks is nebulous, a comment would be nice, maybe:

                 unsigned int blocks;    /* the ATA spec specifies
512-byte blocks for trim payload */

> +
> +               /* Default to 1 if unspecified in word 105.  Cap at 1 page. */
> +               blocks = clamp(ata_id_trim_range_blocks(args->id), 1U, 8U);


Should there at least be a todo comment about raising that cap?

Or is there a fundamental reason for it.  ie. I don't think the ATA
spec calls for it, so this is a kernel restriction I assume.

> +
> +               put_unaligned_be32(65535 * 512 / 8 * blocks, &rbuf[20]);

Is this patch actually enabling the block layer to initiate ATA
multi-sector trim payloads, or is this only allowing the max payloads
sectors to be queried?

Are there plans (or existing code) to accumulate trim ranges in the
block layer and create trim commands with multiple sectors?

AFAIK, the only block layer feature exported to the file system layer
only accepts one range.

I'd like to see multiple trim ranges accumulated either by the
filesystem prior to calling into the block layer, of have the block
layer accumulate them.  (I suspect its easier for the filesystem to do
it via the proposed fitrim() ioctl that is on the ext4 list recently.

ie. the current proposed fitrim() ioctl calls into the block layer for
each trim range, but it could easily have the ability to accumulate a
vector of trim ranges and call the block layer only once per trim
vector if the block layer had a interface for that.

Greg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-19 17:27   ` Greg Freemyer
@ 2010-08-19 18:05     ` Martin K. Petersen
  2010-08-19 21:08       ` Greg Freemyer
  2010-08-19 21:50       ` Mark Lord
  0 siblings, 2 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-19 18:05 UTC (permalink / raw)
  To: Greg Freemyer; +Cc: Martin K. Petersen, linux-ide, linux-scsi, hch

>>>>> "Greg" == Greg Freemyer <greg.freemyer@gmail.com> writes:

>> + /* Default to 1 if unspecified in word 105. Cap at 1 page. */

Greg> Should there at least be a todo comment about raising that cap?

We don't currently plan to.  The payload is a single page which we can
allocate fairly easily.  A bigger payload would be more problematic.

With the 4KB payload you can adress 16GB with one command.


Greg> Or is there a fundamental reason for it.  ie. I don't think the
Greg> ATA spec calls for it, so this is a kernel restriction I assume.

Yes, this is part of the internal SCSI-ATA translation mechanism in
Linux.  But fwiw, there I'm only aware of one drive that currently
supports more than 512 bytes of payload and it also caps at 4KB.


Greg> Is this patch actually enabling the block layer to initiate ATA
Greg> multi-sector trim payloads, or is this only allowing the max
Greg> payloads sectors to be queried?

TRIM only takes 65535 blocks per entry.  So we need many entries to
decribe a single, contiguous space being freed.  That's what's being
bumped here.


Greg> Are there plans (or existing code) to accumulate trim ranges in
Greg> the block layer and create trim commands with multiple sectors?

I have worked on this on and off.  It's not trivial for several reasons.
We discussed the issues at the storage workshop last week and there was
concensus that the changes were too intrusive.  So we're holding off
until we see what's happening with:

      a) the SSD market. Win 7 also issues lots of small, individual
         trims like we do

      b) the TRIM TNG command that's being worked on in T13

This doesn't mean that TRIM coalescing is impossible and that it will
never happen.  But right now it appears to be more hassle than it's
worth...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-19 18:05     ` Martin K. Petersen
@ 2010-08-19 21:08       ` Greg Freemyer
  2010-08-19 21:50       ` Mark Lord
  1 sibling, 0 replies; 27+ messages in thread
From: Greg Freemyer @ 2010-08-19 21:08 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On Thu, Aug 19, 2010 at 2:05 PM, Martin K. Petersen
<martin.petersen@oracle.com> wrote:
>>>>>> "Greg" == Greg Freemyer <greg.freemyer@gmail.com> writes:
>
>>> + /* Default to 1 if unspecified in word 105. Cap at 1 page. */
>
> Greg> Should there at least be a todo comment about raising that cap?
>
> We don't currently plan to.  The payload is a single page which we can
> allocate fairly easily.  A bigger payload would be more problematic.
>
> With the 4KB payload you can adress 16GB with one command.
>
>
> Greg> Or is there a fundamental reason for it.  ie. I don't think the
> Greg> ATA spec calls for it, so this is a kernel restriction I assume.
>
> Yes, this is part of the internal SCSI-ATA translation mechanism in
> Linux.  But fwiw, there I'm only aware of one drive that currently
> supports more than 512 bytes of payload and it also caps at 4KB.
>

If you're curious about harware support for larger payloads, you might
ask Mark Lord.  I understand the way he implemented it in hdparm
accepts many more 512 byte payload blocks than one page worth.  And
the only SSD hardware he's reported problems with are Intel which
apparently only supports one 512 byte payload.

ie. I believe hdparm will typically use up to 1MB of payload.  And the
ranges he collects via his script and dispatches via hdparm are
typically not contiguous.

But I believe he is bypassing much of the kernel stack.  To be honest,
I have not traced a hdparm created trim command through the kernel, so
I don't know the details, and he may be breaking that 1 MB payload
that is coming from userspace into smaller payloads.

The fundamental problem with hdparm is that because it bypasses most
of the kernel i/o stack, it is not compatible with DM/LVM, but I think
its a great prototype of what the kernel could do eventually.

==> from man hdparm

       --trim-sector-ranges
              For Solid State Drives (SSDs).  EXCEPTIONALLY DANGEROUS.
DO NOT USE THIS FLAG!!  Tells the drive firmware to discard unneeded
data sectors, destroying any data that may have been
              present  within  them.   This makes those sectors
available for immediate use by the firmware's garbage collection
mechanism, to improve scheduling for wear-leveling of the flash
              media.  This option expects one or more sector range
pairs immediately after the flag: an LBA starting address, a colon,
and a sector count, with no intervening  spaces.   EXCEP-
              TIONALLY DANGEROUS. DO NOT USE THIS FLAG!!

              Eg.  hdparm --trim-sector-ranges 1000:4 7894:16 /dev/sdz

       --trim-sector-ranges-stdin
              Identical  to  --trim-sector-ranges above, except the
list of lba:count pairs is read from stdin rather than being specified
on the command line.  This can be used to avoid prob-
              lems with excessively long command lines.  It also
permits batching of many more sector ranges into single commands to
the drive, up to the currently  configured  transfer  limit
              (max_sectors_kb).
==>


>
> Greg> Is this patch actually enabling the block layer to initiate ATA
> Greg> multi-sector trim payloads, or is this only allowing the max
> Greg> payloads sectors to be queried?
>
> TRIM only takes 65535 blocks per entry.  So we need many entries to
> decribe a single, contiguous space being freed.  That's what's being
> bumped here.
>

Thanks, I did not appreciate that restriction and thus my confusion
about the patch

>
> Greg> Are there plans (or existing code) to accumulate trim ranges in
> Greg> the block layer and create trim commands with multiple sectors?
>
> I have worked on this on and off.  It's not trivial for several reasons.
> We discussed the issues at the storage workshop last week and there was
> concensus that the changes were too intrusive.  So we're holding off
> until we see what's happening with:
>
>      a) the SSD market. Win 7 also issues lots of small, individual
>         trims like we do
>
>      b) the TRIM TNG command that's being worked on in T13
>
> This doesn't mean that TRIM coalescing is impossible and that it will
> never happen.  But right now it appears to be more hassle than it's
> worth...

I believe it's almost trivial to coalesce non-contiguous ranges into a
single vector of ranges in the proposed fitrim() ioctl because it is
walking the filesystem structures looking for free ranges.  I assume
its just a matter of maintaining the locks long enough to ensure the
blocks being trimmed are not used by other filesystem code while the
ranges are being collected.

The bigger problem for now is that the block layer does not export a
way to pass in a vectorized list of ranges to discard.

> --
> Martin K. Petersen      Oracle Linux Engineering
>

Greg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-19 18:05     ` Martin K. Petersen
  2010-08-19 21:08       ` Greg Freemyer
@ 2010-08-19 21:50       ` Mark Lord
  2010-08-20  8:58         ` Christoph Hellwig
  1 sibling, 1 reply; 27+ messages in thread
From: Mark Lord @ 2010-08-19 21:50 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: Greg Freemyer, linux-ide, linux-scsi, hch

On 10-08-19 02:05 PM, Martin K. Petersen wrote:
>I'm only aware of one drive that currently
> supports more than 512 bytes of payload and it also caps at 4KB.

SSDs based upon the Indilinx Barefoot controller support
more or less "infinite" payload for TRIM, with no cap.
But it predates idword[105], so just has a zero there.

OCZ Vertex-LE specifies a single sector of payload.

Those are the ones I know about and have on hand here for TRIM.

Cheers

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-19 21:50       ` Mark Lord
@ 2010-08-20  8:58         ` Christoph Hellwig
  2010-08-20 13:53           ` Mark Lord
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2010-08-20  8:58 UTC (permalink / raw)
  To: Mark Lord; +Cc: Martin K. Petersen, Greg Freemyer, linux-ide, linux-scsi, hch

On Thu, Aug 19, 2010 at 05:50:11PM -0400, Mark Lord wrote:
> On 10-08-19 02:05 PM, Martin K. Petersen wrote:
> >I'm only aware of one drive that currently
> >supports more than 512 bytes of payload and it also caps at 4KB.
> 
> SSDs based upon the Indilinx Barefoot controller support
> more or less "infinite" payload for TRIM, with no cap.
> But it predates idword[105], so just has a zero there.

Is there an easy way to identify them?  If so we could add a quirk
for them if it provides enough benefit.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/6] block: Make max_discard_sectors sector_t
  2010-08-19 15:48 ` [PATCH 3/6] block: Make max_discard_sectors sector_t Martin K. Petersen
@ 2010-08-20  8:59   ` Christoph Hellwig
  0 siblings, 0 replies; 27+ messages in thread
From: Christoph Hellwig @ 2010-08-20  8:59 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On Thu, Aug 19, 2010 at 11:48:58AM -0400, Martin K. Petersen wrote:
> With 4KB ATA TRIM payloads we can express ranges above the 32 bits
> allowed by SCSI.  Convert max_discard_sectors to sector_t.

Looks good from a correctnes point of view, but as long as bi_size
is just an unsgined int it's not going to help us much with larger
discard sizes.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sd: Update thin provisioning support
  2010-08-19 15:49 ` [PATCH 6/6] sd: " Martin K. Petersen
@ 2010-08-20  9:00   ` Christoph Hellwig
  2010-08-23 19:06     ` Martin K. Petersen
  0 siblings, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2010-08-20  9:00 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On Thu, Aug 19, 2010 at 11:49:01AM -0400, Martin K. Petersen wrote:
> +		if (sdkp->tpu && desc_count && lba_count)
> +			sdkp->unmap = 1;
> +		else if (!sdkp->tpws) {
> +			sd_printk(KERN_ERR, sdkp, "Thin provisioning is "  \
> +				  "enabled but neither TPU, nor TPWS are " \
> +				  "set. Disabling discard!\n");
> +			goto out;
> +		}
> +

I don't think we can simply break all existing setups with support for
earlier SBC drafts.  I think we should use the TPU and TPWS same bits
if present and else fall back to our current heuristics.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-20  8:58         ` Christoph Hellwig
@ 2010-08-20 13:53           ` Mark Lord
  2010-08-20 17:13             ` Greg Freemyer
  0 siblings, 1 reply; 27+ messages in thread
From: Mark Lord @ 2010-08-20 13:53 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Martin K. Petersen, Greg Freemyer, linux-ide, linux-scsi

On 10-08-20 04:58 AM, Christoph Hellwig wrote:
> On Thu, Aug 19, 2010 at 05:50:11PM -0400, Mark Lord wrote:
>> On 10-08-19 02:05 PM, Martin K. Petersen wrote:
>>> I'm only aware of one drive that currently
>>> supports more than 512 bytes of payload and it also caps at 4KB.
>>
>> SSDs based upon the Indilinx Barefoot controller support
>> more or less "infinite" payload for TRIM, with no cap.
>> But it predates idword[105], so just has a zero there.
>
> Is there an easy way to identify them?  If so we could add a quirk
> for them if it provides enough benefit.


Each brand/model identifies itself differently.
But we could start a whitelist on based on the model name field
from the ATA identify data.

The OCZ VERTEX drives I have here, identify themselves as "OCZ-VERTEX",
and accept very very long payloads (no limit?).

The OCZ VERTEX-LE drives here do have a limit, of 8 sectors,
and identify themselves as "OCZ VERTEX-LE" in that field.

That's what hdparm-9.30 uses to figure out the max TRIM payloads,
in the absence of a value in word 105.

Cheers


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-20 13:53           ` Mark Lord
@ 2010-08-20 17:13             ` Greg Freemyer
  2010-08-20 18:28               ` Mark Lord
  0 siblings, 1 reply; 27+ messages in thread
From: Greg Freemyer @ 2010-08-20 17:13 UTC (permalink / raw)
  To: Mark Lord; +Cc: Christoph Hellwig, Martin K. Petersen, linux-ide, linux-scsi

On Fri, Aug 20, 2010 at 9:53 AM, Mark Lord <kernel@teksavvy.com> wrote:
> On 10-08-20 04:58 AM, Christoph Hellwig wrote:
>>
>> On Thu, Aug 19, 2010 at 05:50:11PM -0400, Mark Lord wrote:
>>>
>>> On 10-08-19 02:05 PM, Martin K. Petersen wrote:
>>>>
>>>> I'm only aware of one drive that currently
>>>> supports more than 512 bytes of payload and it also caps at 4KB.
>>>
>>> SSDs based upon the Indilinx Barefoot controller support
>>> more or less "infinite" payload for TRIM, with no cap.
>>> But it predates idword[105], so just has a zero there.
>>
>> Is there an easy way to identify them?  If so we could add a quirk
>> for them if it provides enough benefit.
>
>
> Each brand/model identifies itself differently.
> But we could start a whitelist on based on the model name field
> from the ATA identify data.
>
> The OCZ VERTEX drives I have here, identify themselves as "OCZ-VERTEX",
> and accept very very long payloads (no limit?).
>
> The OCZ VERTEX-LE drives here do have a limit, of 8 sectors,
> and identify themselves as "OCZ VERTEX-LE" in that field.
>
> That's what hdparm-9.30 uses to figure out the max TRIM payloads,
> in the absence of a value in word 105.
>
> Cheers
>

A whitelist to enable large contiguous range trims via 8 512B-block
payloads seems useful for those devices that don't support word 105.
(ie. 4KB payload)

But, I doubt there is enough observable performance advantage to
justify reworking the internal SCSI-ATA translation mechanism in the
kernel to handle larger payloads.   Especially if only one or two SSDs
will accept greater than 4KB of payload to the trim command.

Greg

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-20 17:13             ` Greg Freemyer
@ 2010-08-20 18:28               ` Mark Lord
  0 siblings, 0 replies; 27+ messages in thread
From: Mark Lord @ 2010-08-20 18:28 UTC (permalink / raw)
  To: Greg Freemyer
  Cc: Christoph Hellwig, Martin K. Petersen, linux-ide, linux-scsi

On 10-08-20 01:13 PM, Greg Freemyer wrote:
> On Fri, Aug 20, 2010 at 9:53 AM, Mark Lord<kernel@teksavvy.com>  wrote:
>> On 10-08-20 04:58 AM, Christoph Hellwig wrote:
>>>
>>> On Thu, Aug 19, 2010 at 05:50:11PM -0400, Mark Lord wrote:
>>>>
>>>> On 10-08-19 02:05 PM, Martin K. Petersen wrote:
>>>>>
>>>>> I'm only aware of one drive that currently
>>>>> supports more than 512 bytes of payload and it also caps at 4KB.
>>>>
>>>> SSDs based upon the Indilinx Barefoot controller support
>>>> more or less "infinite" payload for TRIM, with no cap.
>>>> But it predates idword[105], so just has a zero there.
>>>
>>> Is there an easy way to identify them?  If so we could add a quirk
>>> for them if it provides enough benefit.
..
> A whitelist to enable large contiguous range trims via 8 512B-block
> payloads seems useful for those devices that don't support word 105.
> (ie. 4KB payload)
>
> But, I doubt there is enough observable performance advantage to
> justify reworking the internal SCSI-ATA translation mechanism in the
> kernel to handle larger payloads.   Especially if only one or two SSDs
> will accept greater than 4KB of payload to the trim command.
..

Actually, for in-kernel uses, even just a single 512-byte long list
would likely be adequate.  The biggest gains will come from simply
having more than one range per TRIM command issued.

Once we get that, it's diminishing returns for larger and larger lists,
and in-kernel code is unlikely to ever be able to generate lists that long.

But getting started should be easier than folks are making it out to be.
The first step is to have "file deletion" issue multi-range trims for
all extents of the file at once, rather than one range at a time as at present.

That'll be the biggest gain, and shouldn't be too complex to implement.
Especially not after Christoph/Tejun's current barrier rework stuff is in place.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
  2010-08-19 15:48 ` [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP Martin K. Petersen
  2010-08-19 16:15   ` James Bottomley
@ 2010-08-23  8:32   ` Tejun Heo
  2010-08-23 18:22     ` Douglas Gilbert
  1 sibling, 1 reply; 27+ messages in thread
From: Tejun Heo @ 2010-08-23  8:32 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On 08/19/2010 05:48 PM, Martin K. Petersen wrote:
> Until now identifying that a device supports WRITE SAME(16) with the
> UNMAP bit set has been black magic.  Implement support for the new (SBC3
> r24) Thin Provisioning VPD page and the TPWS bit.
> 
> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
> ---
>  drivers/ata/libata-scsi.c |   13 +++++++++++++
>  1 files changed, 13 insertions(+), 0 deletions(-)
> 
> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
> index a54273d..e280ae6 100644
> --- a/drivers/ata/libata-scsi.c
> +++ b/drivers/ata/libata-scsi.c
> @@ -2001,6 +2001,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
>  		0x89,	/* page 0x89, ata info page */
>  		0xb0,	/* page 0xb0, block limits page */
>  		0xb1,	/* page 0xb1, block device characteristics page */
> +		0xb2,	/* page 0xb2, thin provisioning page */
>  	};
>  
>  	rbuf[3] = sizeof(pages);	/* number of supported VPD pages */
> @@ -2172,6 +2173,15 @@ static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
>  	return 0;
>  }
>  
> +static unsigned int ata_scsiop_inq_b2(struct ata_scsi_args *args, u8 *rbuf)
> +{
> +	rbuf[1] = 0xb1;
> +	rbuf[3] = 0x3c;
> +	rbuf[5] = 1 << 6;	/* TPWS */

I would love a bit more documentation here.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-19 15:48 ` [PATCH 2/6] libata: Report supported TRIM payload size Martin K. Petersen
  2010-08-19 17:27   ` Greg Freemyer
@ 2010-08-23 13:50   ` Christoph Hellwig
  2010-08-23 18:56     ` Martin K. Petersen
  1 sibling, 1 reply; 27+ messages in thread
From: Christoph Hellwig @ 2010-08-23 13:50 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On Thu, Aug 19, 2010 at 11:48:57AM -0400, Martin K. Petersen wrote:
> ATA IDENTIFY DEVICE word 105 contains the number of 512-byte blocks of
> TRIM payload information the device can accept in one command.  Use this
> value to enable payloads > 512 bytes.

Currently ata_scsi_write_same_xlat passes uses a constant 512 for the
payload all over the place, so the larger payloads won't actually work yet.

I don't think advertising a larger size than we actually support is a
good idea.


^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/6] scsi_debug: Update thin provisioning support
  2010-08-19 15:49 ` [PATCH 5/6] scsi_debug: Update thin provisioning support Martin K. Petersen
@ 2010-08-23 16:37   ` Douglas Gilbert
  0 siblings, 0 replies; 27+ messages in thread
From: Douglas Gilbert @ 2010-08-23 16:37 UTC (permalink / raw)
  To: Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On 10-08-19 11:49 AM, Martin K. Petersen wrote:
> The previous thin provisioning support was not very user friendly
> because it depended on all the relevant options being set on the command
> line.
>
> Implement support for the Thin Provisioning VPD page from SBC3 r24 and
> add module options for TPU (UNMAP) and TPWS (WRITE SAME (16) with UNMAP
> bit).  This allows us to have sane default and to enable thin
> provisioning with a simple tpu=1 or tpws=1 on the command line depending
> on whether we want UNMAP or WRITE SAME behavior.
>
> Signed-off-by: Martin K. Petersen<martin.petersen@oracle.com>
Acked-by: Douglas Gilbert <dgilbert@interlog.com>

Looked ok when I took it for a spin. That was with
lk 2.6.35 .

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
  2010-08-23  8:32   ` Tejun Heo
@ 2010-08-23 18:22     ` Douglas Gilbert
  2010-08-23 18:29       ` Douglas Gilbert
  0 siblings, 1 reply; 27+ messages in thread
From: Douglas Gilbert @ 2010-08-23 18:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Martin K. Petersen, linux-ide, linux-scsi, hch

On 10-08-23 04:32 AM, Tejun Heo wrote:
> On 08/19/2010 05:48 PM, Martin K. Petersen wrote:
>> Until now identifying that a device supports WRITE SAME(16) with the
>> UNMAP bit set has been black magic.  Implement support for the new (SBC3
>> r24) Thin Provisioning VPD page and the TPWS bit.
>>
>> Signed-off-by: Martin K. Petersen<martin.petersen@oracle.com>
>> ---
>>   drivers/ata/libata-scsi.c |   13 +++++++++++++
>>   1 files changed, 13 insertions(+), 0 deletions(-)
>>
>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>> index a54273d..e280ae6 100644
>> --- a/drivers/ata/libata-scsi.c
>> +++ b/drivers/ata/libata-scsi.c
>> @@ -2001,6 +2001,7 @@ static unsigned int ata_scsiop_inq_00(struct ata_scsi_args *args, u8 *rbuf)
>>   		0x89,	/* page 0x89, ata info page */
>>   		0xb0,	/* page 0xb0, block limits page */
>>   		0xb1,	/* page 0xb1, block device characteristics page */
>> +		0xb2,	/* page 0xb2, thin provisioning page */
>>   	};
>>
>>   	rbuf[3] = sizeof(pages);	/* number of supported VPD pages */
>> @@ -2172,6 +2173,15 @@ static unsigned int ata_scsiop_inq_b1(struct ata_scsi_args *args, u8 *rbuf)
>>   	return 0;
>>   }
>>
>> +static unsigned int ata_scsiop_inq_b2(struct ata_scsi_args *args, u8 *rbuf)
>> +{
>> +	rbuf[1] = 0xb1;
>> +	rbuf[3] = 0x3c;
>> +	rbuf[5] = 1<<  6;	/* TPWS */
>
> I would love a bit more documentation here.

Yes, that style comes from my code in scsi_debug and
some of my utilities. Recently I added some more
documentation, just prior the the function:

/* SCSI Thin Provisioning VPD page: SBC-3 rev 22 or later */

I see no point in commenting the individual code lines.
Just follow that reference ...

And when I do then I see that rbuf[3] should be 4 (not 0x3c)
unless the DP bit is set.

Doug Gilbert



^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
  2010-08-23 18:22     ` Douglas Gilbert
@ 2010-08-23 18:29       ` Douglas Gilbert
  2010-08-23 19:11         ` Martin K. Petersen
  0 siblings, 1 reply; 27+ messages in thread
From: Douglas Gilbert @ 2010-08-23 18:29 UTC (permalink / raw)
  To: Tejun Heo, Martin K. Petersen; +Cc: linux-ide, linux-scsi, hch

On 10-08-23 02:22 PM, Douglas Gilbert wrote:
> On 10-08-23 04:32 AM, Tejun Heo wrote:
>> On 08/19/2010 05:48 PM, Martin K. Petersen wrote:
>>> Until now identifying that a device supports WRITE SAME(16) with the
>>> UNMAP bit set has been black magic. Implement support for the new (SBC3
>>> r24) Thin Provisioning VPD page and the TPWS bit.
>>>
>>> Signed-off-by: Martin K. Petersen<martin.petersen@oracle.com>
>>> ---
>>> drivers/ata/libata-scsi.c | 13 +++++++++++++
>>> 1 files changed, 13 insertions(+), 0 deletions(-)
>>>
>>> diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
>>> index a54273d..e280ae6 100644
>>> --- a/drivers/ata/libata-scsi.c
>>> +++ b/drivers/ata/libata-scsi.c
>>> @@ -2001,6 +2001,7 @@ static unsigned int ata_scsiop_inq_00(struct
>>> ata_scsi_args *args, u8 *rbuf)
>>> 0x89, /* page 0x89, ata info page */
>>> 0xb0, /* page 0xb0, block limits page */
>>> 0xb1, /* page 0xb1, block device characteristics page */
>>> + 0xb2, /* page 0xb2, thin provisioning page */
>>> };
>>>
>>> rbuf[3] = sizeof(pages); /* number of supported VPD pages */
>>> @@ -2172,6 +2173,15 @@ static unsigned int ata_scsiop_inq_b1(struct
>>> ata_scsi_args *args, u8 *rbuf)
>>> return 0;
>>> }
>>>
>>> +static unsigned int ata_scsiop_inq_b2(struct ata_scsi_args *args, u8
>>> *rbuf)
>>> +{
>>> + rbuf[1] = 0xb1;
>>> + rbuf[3] = 0x3c;
>>> + rbuf[5] = 1<< 6; /* TPWS */
>>
>> I would love a bit more documentation here.
>
> Yes, that style comes from my code in scsi_debug and
> some of my utilities. Recently I added some more
> documentation, just prior the the function:
>
> /* SCSI Thin Provisioning VPD page: SBC-3 rev 22 or later */
>
> I see no point in commenting the individual code lines.
> Just follow that reference ...
>
> And when I do then I see that rbuf[3] should be 4 (not 0x3c)
> unless the DP bit is set.

... and of course rbuf[1] should be 0xb2 . Martin ????
The rbuf[5] line is correct :-)

Doug Gilbert

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/6] libata: Report supported TRIM payload size
  2010-08-23 13:50   ` Christoph Hellwig
@ 2010-08-23 18:56     ` Martin K. Petersen
  0 siblings, 0 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-23 18:56 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Martin K. Petersen, linux-ide, linux-scsi

>>>>> "Christoph" == Christoph Hellwig <hch@lst.de> writes:

Christoph> On Thu, Aug 19, 2010 at 11:48:57AM -0400, Martin K. Petersen wrote:
>> ATA IDENTIFY DEVICE word 105 contains the number of 512-byte blocks
>> of TRIM payload information the device can accept in one command.
>> Use this value to enable payloads > 512 bytes.

Christoph> Currently ata_scsi_write_same_xlat passes uses a constant 512
Christoph> for the payload all over the place, so the larger payloads
Christoph> won't actually work yet.

Christoph> I don't think advertising a larger size than we actually
Christoph> support is a good idea.

I was on crack when I included this in the series last week.  It came
from my discard coalescing tree where things are quite different.

In any case we're also limited by the range of WRITE SAME as long as
we're relying on SAT.  So under all circumstances we're dead in the
water with a payload > 1KB.

Let's just drop this one for now.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 6/6] sd: Update thin provisioning support
  2010-08-20  9:00   ` Christoph Hellwig
@ 2010-08-23 19:06     ` Martin K. Petersen
  0 siblings, 0 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-23 19:06 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Martin K. Petersen, linux-ide, linux-scsi

>>>>> "hch" == Christoph Hellwig <hch@lst.de> writes:

hch> I don't think we can simply break all existing setups with support
hch> for earlier SBC drafts.  I think we should use the TPU and TPWS
hch> same bits if present and else fall back to our current heuristics.

Originally my patch included support for "legacy" TP devices triggered
by TPE=1 but no TP VPD page listed in page 0.  I also supported the even
older TP approach (WRITE SAME without the UNMAP bit, all zero payload).

However, I talked to a few partners and everybody were going to add the
TP VPD to their firmware builds right away.  So I ripped out the compat
stuff because I felt it was weird to have explicit support for an
intermediate SBC3 release in there.

How would you feel about a sysfs switch to "force" TP on for devices
that don't report it correctly?

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP
  2010-08-23 18:29       ` Douglas Gilbert
@ 2010-08-23 19:11         ` Martin K. Petersen
  0 siblings, 0 replies; 27+ messages in thread
From: Martin K. Petersen @ 2010-08-23 19:11 UTC (permalink / raw)
  To: dgilbert; +Cc: Tejun Heo, Martin K. Petersen, linux-ide, linux-scsi, hch

>>>>> "Doug" == Douglas Gilbert <dgilbert@interlog.com> writes:

>> And when I do then I see that rbuf[3] should be 4 (not 0x3c) unless
>> the DP bit is set.

I totally missed the DP bit when I read the spec.


Doug> ... and of course rbuf[1] should be 0xb2 . Martin ????  The
Doug> rbuf[5] line is correct :-)

Copy and paste error.

I'll get these fixed up right away...

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2010-08-23 19:11 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-19 15:48 Discard/trim/thin provisioning update Martin K. Petersen
2010-08-19 15:48 ` [PATCH 1/6] libata: Signal that our SATL supports WRITE SAME(16) with UNMAP Martin K. Petersen
2010-08-19 16:15   ` James Bottomley
2010-08-19 16:20     ` Martin K. Petersen
2010-08-23  8:32   ` Tejun Heo
2010-08-23 18:22     ` Douglas Gilbert
2010-08-23 18:29       ` Douglas Gilbert
2010-08-23 19:11         ` Martin K. Petersen
2010-08-19 15:48 ` [PATCH 2/6] libata: Report supported TRIM payload size Martin K. Petersen
2010-08-19 17:27   ` Greg Freemyer
2010-08-19 18:05     ` Martin K. Petersen
2010-08-19 21:08       ` Greg Freemyer
2010-08-19 21:50       ` Mark Lord
2010-08-20  8:58         ` Christoph Hellwig
2010-08-20 13:53           ` Mark Lord
2010-08-20 17:13             ` Greg Freemyer
2010-08-20 18:28               ` Mark Lord
2010-08-23 13:50   ` Christoph Hellwig
2010-08-23 18:56     ` Martin K. Petersen
2010-08-19 15:48 ` [PATCH 3/6] block: Make max_discard_sectors sector_t Martin K. Petersen
2010-08-20  8:59   ` Christoph Hellwig
2010-08-19 15:48 ` [PATCH 4/6] scsi: Fix VPD page wrapper Martin K. Petersen
2010-08-19 15:49 ` [PATCH 5/6] scsi_debug: Update thin provisioning support Martin K. Petersen
2010-08-23 16:37   ` Douglas Gilbert
2010-08-19 15:49 ` [PATCH 6/6] sd: " Martin K. Petersen
2010-08-20  9:00   ` Christoph Hellwig
2010-08-23 19:06     ` Martin K. Petersen

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.