linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 0/2] Block layer support ZAC/ZBC commands
@ 2016-08-22  4:20 Shaun Tancheff
  2016-08-22  4:20 ` [PATCH v8 1/2] Add bio/request flags to issue ZBC/ZAC commands Shaun Tancheff
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-kernel
  Cc: Shaun Tancheff, Jens Axboe, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Damien Le Moal,
	Hannes Reinecke, Josh Bingaman, David S . Miller,
	Geert Uytterhoeven, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Guenter Roeck, Ming Lei, Mike Christie,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Shaohua Li,
	Dan Williams, Sagi Grimberg, stephen hemminger, Jarkko Sakkinen,
	Alexandre Bounine, Asias He, Sabrina Dubroca, Mike Frysinger,
	Andrea Arcangeli, Linus Walleij, Jeff Layton, J . Bruce Fields

Hi Jens,

This series is based on linus' v4.8-rc2 branch.

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be
suitable for use by Host Managed drives.

ZBC [and ZAC] drives add new commands for discovering and working
with Zones.

Part one of this series expands the bio/request reserved op size from
3 to 4 bits and then adds op codes for each of the ZBC commands:
   Report zones, close zone, finish zone, open zone and reset zone.

Part two of this series deals with integrating these new bio/request
op's with Hannes' zone cache.

This extends the ZBC support up to the block layer allowing direct
control by file systems or device mapper targets. Also by deferring
the zone handling to the authoritative subsystem there is an overall
lower memory usage for holding the active zone information as well
as clarifying responsible party for maintaining the write pointer
for each active zone.

By way of example a DM target may have several writes in progress. To sector
(or lba) for those writes will each depend on the previous write. While the
drive's write pointer will be updated as writes are completed the DM target
will be maintaining both where the next write should be scheduled from and
where the write pointer is based on writes completed w/o errors.

Knowing the drive zone topology enables DM targets and file systems to
extend their block allocation schemes and issue write pointer resets (or
discards) that are zone aligned.

A perhaps non-obvious approach is that a conventional drive will
returns a zone report descriptor with a single large conventional zone.
This is intended to allow a collection of zoned and non-zoned media to
be stitched together to provide a file system with a zoned device with
conventional space mapped to where it is useful.

Patches for util-linux can be found here:
    git@github.com:stancheff/util-linux.git v2.28.1+biof

    https://github.com/stancheff/util-linux/tree/v2.28.1%2Bbiof

This patch is available here:
    https://github.com/stancheff/linux/tree/v4.8-rc2%2Bbiof.v8

    git@github.com:stancheff/linux.git v4.8-rc2+biof.v8

v8:
 - Changed zone report to default to reading from zone cache.
 - Changed ioctl for zone commands to support forcing a query or command
   to be sent to media.
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v7:
 - Initial support for Hannes' zone cache.
v6:
 - Fix page alloc to include DMA flag for ioctl.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
 - Dropped ata16 hackery
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.


Shaun Tancheff (2):
  Add bio/request flags to issue ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer

 MAINTAINERS                       |   9 ++
 block/blk-lib.c                   |  94 +++++++++++++++++
 block/ioctl.c                     | 149 +++++++++++++++++++++++++++
 drivers/scsi/sd.c                 | 121 ++++++++++++++++++++++
 drivers/scsi/sd.h                 |   1 +
 include/linux/bio.h               |   8 +-
 include/linux/blk_types.h         |   7 +-
 include/linux/blkdev.h            |   1 +
 include/linux/blkzoned_api.h      |  25 +++++
 include/uapi/linux/Kbuild         |   1 +
 include/uapi/linux/blkzoned_api.h | 210 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h           |   1 +
 12 files changed, 625 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v8 1/2] Add bio/request flags to issue ZBC/ZAC commands
  2016-08-22  4:20 [PATCH v8 0/2] Block layer support ZAC/ZBC commands Shaun Tancheff
@ 2016-08-22  4:20 ` Shaun Tancheff
  2016-08-24 20:24   ` [PATCH v8 1/2 RESEND] " Shaun Tancheff
  2016-08-22  4:20 ` [PATCH v8 2/2] Add ioctl to issue ZBC/ZAC commands via block layer Shaun Tancheff
  2016-08-24 20:22 ` [PATCH v8 0/2 RESEND] Block layer support ZAC/ZBC commands Shaun Tancheff
  2 siblings, 1 reply; 8+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-kernel
  Cc: Shaun Tancheff, Jens Axboe, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Damien Le Moal,
	Hannes Reinecke, Josh Bingaman, David S . Miller,
	Geert Uytterhoeven, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Guenter Roeck, Ming Lei, Mike Christie,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Shaohua Li,
	Dan Williams, Sagi Grimberg, stephen hemminger, Jarkko Sakkinen,
	Alexandre Bounine, Asias He, Sabrina Dubroca, Mike Frysinger,
	Andrea Arcangeli, Linus Walleij, Jeff Layton, J . Bruce Fields,
	Shaun Tancheff

Add op flags to access to zone information as well as open, close
and reset zones:
  - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
  - REQ_OP_ZONE_OPEN - Explicitly open a zone for writing
  - REQ_OP_ZONE_CLOSE - Explicitly close a zone
  - REQ_OP_ZONE_FINISH - Explicitly finish a zone
  - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone

These op flags can be used to create bio's to control zoned devices
through the block layer.

This is useful for file systems and device mappers that need explicit
control of zoned devices such as Host Managed and Host Aware SMR drives,

Report zones is a device read that requires a buffer.

Open, Close, Finish and Reset are device commands that have no
associated data transfer.
  Open -   Open is a zone for writing.
  Close -  Disallow writing to a zone.
  Finish - Disallow writing a zone and set the WP to the end
           of the zone.
  Reset -  Discard data in a zone and reset the WP to the start
           of the zone.

Sending an LBA of ~0 will attempt to operate on all zones.
This is typically used with Reset to wipe a drive as a Reset
behaves similar to TRIM in that all data in the zone(s) is deleted.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option flag is useful as it can reduce the number
of zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v8:
 - Added Finish Zone op
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v6:
 - Added GFP_DMA to gfp mask.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.

 MAINTAINERS                       |   9 ++
 block/blk-lib.c                   |  94 ++++++++++++++++++++
 drivers/scsi/sd.c                 | 121 +++++++++++++++++++++++++
 drivers/scsi/sd.h                 |   1 +
 include/linux/bio.h               |   8 +-
 include/linux/blk_types.h         |   7 +-
 include/linux/blkdev.h            |   1 +
 include/linux/blkzoned_api.h      |  25 ++++++
 include/uapi/linux/Kbuild         |   1 +
 include/uapi/linux/blkzoned_api.h | 182 ++++++++++++++++++++++++++++++++++++++
 10 files changed, 447 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a306795..aedf311 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12984,6 +12984,15 @@ F:	Documentation/networking/z8530drv.txt
 F:	drivers/net/hamradio/*scc.c
 F:	drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M:	Shaun Tancheff <shaun.tancheff@seagate.com>
+W:	http://seagate.com
+W:	https://github.com/Seagate/ZDM-Device-Mapper
+L:	linux-block@vger.kernel.org
+S:	Maintained
+F:	include/linux/blkzoned_api.h
+F:	include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M:	Seth Jennings <sjenning@redhat.com>
 L:	linux-mm@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..e92bd56 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -266,3 +266,97 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:	target blockdev
+ * @op_flags:	extra bio rw flags. If unsure, use 0.
+ * @sector:	starting sector (report will include this sector).
+ * @opt:	See: zone_report_option, default is 0 (all zones).
+ * @page:	one or more contiguous pages.
+ * @pgsz:	up to size of page in bytes, size of report.
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+			     sector_t sector, u8 opt, struct page *page,
+			     size_t pgsz, gfp_t gfp_mask)
+{
+	struct bdev_zone_report *conv = page_address(page);
+	struct bio *bio;
+	unsigned int nr_iovecs = 1;
+	int ret = 0;
+
+	if (pgsz < (sizeof(struct bdev_zone_report) +
+		    sizeof(struct bdev_zone_descriptor)))
+		return -EINVAL;
+
+	bio = bio_alloc(gfp_mask, nr_iovecs);
+	if (!bio)
+		return -ENOMEM;
+
+	conv->descriptor_count = 0;
+	bio->bi_iter.bi_sector = sector;
+	bio->bi_bdev = bdev;
+	bio->bi_vcnt = 0;
+	bio->bi_iter.bi_size = 0;
+
+	bio_add_page(bio, page, pgsz, 0);
+	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, op_flags);
+	ret = submit_bio_wait(bio);
+
+	/*
+	 * When our request it nak'd the underlying device maybe conventional
+	 * so ... report a single conventional zone the size of the device.
+	 */
+	if (ret == -EIO && conv->descriptor_count) {
+		/* Adjust the conventional to the size of the partition ... */
+		__be64 blksz = cpu_to_be64(bdev->bd_part->nr_sects);
+
+		conv->maximum_lba = blksz;
+		conv->descriptors[0].type = ZTYP_CONVENTIONAL;
+		conv->descriptors[0].flags = ZCOND_CONVENTIONAL << 4;
+		conv->descriptors[0].length = blksz;
+		conv->descriptors[0].lba_start = 0;
+		conv->descriptors[0].lba_wptr = blksz;
+		ret = 0;
+	}
+	bio_put(bio);
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_zone_report);
+
+/**
+ * blkdev_issue_zone_action - queue a report zones operation
+ * @bdev:	target blockdev
+ * @op:		One of REQ_OP_ZONE_* op codes.
+ * @op_flags:	extra bio rw flags. If unsure, use 0.
+ * @sector:	starting lba of sector, Use ~0ul for all zones.
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_action(struct block_device *bdev, unsigned int op,
+			     unsigned int op_flags, sector_t sector,
+			     gfp_t gfp_mask)
+{
+	int ret;
+	struct bio *bio;
+
+	bio = bio_alloc(gfp_mask, 1);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_iter.bi_sector = sector;
+	bio->bi_bdev = bdev;
+	bio->bi_vcnt = 0;
+	bio->bi_iter.bi_size = 0;
+	bio_set_op_attrs(bio, op, op_flags);
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_zone_action);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..d4d04ed 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1134,6 +1134,118 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	return ret;
 }
 
+static int sd_setup_zone_report_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_device *sdp = cmd->device;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct bio *bio = rq->bio;
+	sector_t sector = blk_rq_pos(rq);
+	struct gendisk *disk = rq->rq_disk;
+	unsigned int nr_bytes = blk_rq_bytes(rq);
+	int ret = BLKPREP_KILL;
+
+	WARN_ON(nr_bytes == 0);
+
+	/*
+	 * For conventional drives generate a report that shows a
+	 * large single convetional zone the size of the block device
+	 */
+	if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC) {
+		void *src;
+		struct bdev_zone_report *conv;
+
+		if (nr_bytes < sizeof(struct bdev_zone_report))
+			goto out;
+
+		src = kmap_atomic(bio->bi_io_vec->bv_page);
+		conv = src + bio->bi_io_vec->bv_offset;
+		conv->descriptor_count = cpu_to_be32(1);
+		conv->same_field = ZS_ALL_SAME;
+		conv->maximum_lba = cpu_to_be64(disk->part0.nr_sects);
+		kunmap_atomic(src);
+		goto out;
+	}
+
+	ret = scsi_init_io(cmd);
+	if (ret != BLKPREP_OK)
+		goto out;
+
+	cmd = rq->special;
+	if (sdp->changed) {
+		pr_err("SCSI disk has been changed or is not present.");
+		ret = BLKPREP_KILL;
+		goto out;
+	}
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_IN;
+	cmd->cmnd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(sector, &cmd->cmnd[2]);
+	put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
+	/* FUTURE ... when streamid is available */
+	/* cmd->cmnd[14] = bio_get_streamid(bio); */
+
+	cmd->sc_data_direction = DMA_FROM_DEVICE;
+	cmd->sdb.length = nr_bytes;
+	cmd->transfersize = sdp->sector_size;
+	cmd->underflow = 0;
+	cmd->allowed = SD_MAX_RETRIES;
+	ret = BLKPREP_OK;
+out:
+	return ret;
+}
+
+static int sd_setup_zone_action_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t sector = blk_rq_pos(rq);
+	int ret = BLKPREP_KILL;
+	u8 allbit = 0;
+
+	if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC)
+		goto out;
+
+	if (sector == ~0ul) {
+		allbit = 1;
+		sector = 0;
+	}
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	memset(&cmd->sdb, 0, sizeof(cmd->sdb));
+	cmd->cmnd[0] = ZBC_OUT;
+	switch (req_op(rq)) {
+	case REQ_OP_ZONE_OPEN:
+		cmd->cmnd[1] = ZO_OPEN_ZONE;
+		break;
+	case REQ_OP_ZONE_CLOSE:
+		cmd->cmnd[1] = ZO_CLOSE_ZONE;
+		break;
+	case REQ_OP_ZONE_FINISH:
+		cmd->cmnd[1] = ZO_FINISH_ZONE;
+		break;
+	case REQ_OP_ZONE_RESET:
+		cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+		break;
+	default:
+		goto out;
+	}
+	cmd->cmnd[14] = allbit;
+	put_unaligned_be64(sector, &cmd->cmnd[2]);
+
+	cmd->transfersize = 0;
+	cmd->underflow = 0;
+	cmd->allowed = SD_MAX_RETRIES;
+	cmd->sc_data_direction = DMA_NONE;
+
+	ret = BLKPREP_OK;
+out:
+	return ret;
+}
+
 static int sd_init_command(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -1148,6 +1260,13 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
+	case REQ_OP_ZONE_REPORT:
+		return sd_setup_zone_report_cmnd(cmd);
+	case REQ_OP_ZONE_OPEN:
+	case REQ_OP_ZONE_CLOSE:
+	case REQ_OP_ZONE_FINISH:
+	case REQ_OP_ZONE_RESET:
+		return sd_setup_zone_action_cmnd(cmd);
 	default:
 		BUG();
 	}
@@ -2737,6 +2856,8 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
+
  out:
 	kfree(buffer);
 }
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 765a6f1..f782990 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -94,6 +94,7 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	zoned: 2;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 59ffaa6..66b1b33 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -79,7 +79,13 @@ static inline bool bio_has_data(struct bio *bio)
 
 static inline bool bio_no_advance_iter(struct bio *bio)
 {
-	return bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_WRITE_SAME;
+	return bio_op(bio) == REQ_OP_DISCARD ||
+	       bio_op(bio) == REQ_OP_WRITE_SAME ||
+	       bio_op(bio) == REQ_OP_ZONE_REPORT ||
+	       bio_op(bio) == REQ_OP_ZONE_OPEN ||
+	       bio_op(bio) == REQ_OP_ZONE_CLOSE ||
+	       bio_op(bio) == REQ_OP_ZONE_FINISH ||
+	       bio_op(bio) == REQ_OP_ZONE_RESET;
 }
 
 static inline bool bio_is_rw(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 436f43f..97282c6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -232,13 +232,18 @@ enum rq_flag_bits {
 enum req_op {
 	REQ_OP_READ,
 	REQ_OP_WRITE,
+	REQ_OP_ZONE_REPORT,
+	REQ_OP_ZONE_OPEN,
+	REQ_OP_ZONE_CLOSE,
+	REQ_OP_ZONE_FINISH,
+	REQ_OP_ZONE_RESET,
 	REQ_OP_DISCARD,		/* request to discard sectors */
 	REQ_OP_SECURE_ERASE,	/* request to securely erase sectors */
 	REQ_OP_WRITE_SAME,	/* write same block many times */
 	REQ_OP_FLUSH,		/* request for cache flush */
 };
 
-#define REQ_OP_BITS 3
+#define REQ_OP_BITS 4
 
 typedef unsigned int blk_qc_t;
 #define BLK_QC_T_NONE	-1U
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2c210b6..2b2db36 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
+#include <linux/blkzoned_api.h>
 
 struct module;
 struct scsi_ioctl_command;
diff --git a/include/linux/blkzoned_api.h b/include/linux/blkzoned_api.h
new file mode 100644
index 0000000..47c091a
--- /dev/null
+++ b/include/linux/blkzoned_api.h
@@ -0,0 +1,25 @@
+/*
+ * Functions for zone based SMR devices.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by:
+ * Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef _BLKZONED_API_H
+#define _BLKZONED_API_H
+
+#include <uapi/linux/blkzoned_api.h>
+
+extern int blkdev_issue_zone_action(struct block_device *, unsigned int op,
+				    unsigned int op_flags, sector_t, gfp_t);
+extern int blkdev_issue_zone_report(struct block_device *, unsigned int op_flgs,
+				    sector_t, u8 opt, struct page *, size_t,
+				    gfp_t);
+
+#endif /* _BLKZONED_API_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 185f8ea..50ba85a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -70,6 +70,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += blkzoned_api.h
 header-y += bpf_common.h
 header-y += bpf.h
 header-y += bpqether.h
diff --git a/include/uapi/linux/blkzoned_api.h b/include/uapi/linux/blkzoned_api.h
new file mode 100644
index 0000000..d2bdba5
--- /dev/null
+++ b/include/uapi/linux/blkzoned_api.h
@@ -0,0 +1,182 @@
+/*
+ * Functions for zone based SMR devices.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by:
+ * Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef _UAPI_BLKZONED_API_H
+#define _UAPI_BLKZONED_API_H
+
+#include <linux/types.h>
+
+/**
+ * enum zone_report_option - Report Zones types to be included.
+ *
+ * @ZOPT_NON_SEQ_AND_RESET: Default (all zones).
+ * @ZOPT_ZC1_EMPTY: Zones which are empty.
+ * @ZOPT_ZC2_OPEN_IMPLICIT: Zones open but not explicitly opened
+ * @ZOPT_ZC3_OPEN_EXPLICIT: Zones opened explicitly
+ * @ZOPT_ZC4_CLOSED: Zones closed for writing.
+ * @ZOPT_ZC5_FULL: Zones that are full.
+ * @ZOPT_ZC6_READ_ONLY: Zones that are read-only
+ * @ZOPT_ZC7_OFFLINE: Zones that are offline
+ * @ZOPT_RESET: Zones that are empty
+ * @ZOPT_NON_SEQ: Zones that have HA media-cache writes pending
+ * @ZOPT_NON_WP_ZONES: Zones that do not have Write Pointers (conventional)
+ * @ZOPT_PARTIAL_FLAG: Modifies the definition of the Zone List Length field.
+ *
+ * Used by Report Zones in bdev_zone_get_report: report_option
+ */
+enum bdev_zone_report_option {
+	ZOPT_NON_SEQ_AND_RESET   = 0x00,
+	ZOPT_ZC1_EMPTY,
+	ZOPT_ZC2_OPEN_IMPLICIT,
+	ZOPT_ZC3_OPEN_EXPLICIT,
+	ZOPT_ZC4_CLOSED,
+	ZOPT_ZC5_FULL,
+	ZOPT_ZC6_READ_ONLY,
+	ZOPT_ZC7_OFFLINE,
+	ZOPT_RESET               = 0x10,
+	ZOPT_NON_SEQ             = 0x11,
+	ZOPT_NON_WP_ZONES        = 0x3f,
+	ZOPT_PARTIAL_FLAG        = 0x80,
+};
+
+/**
+ * enum bdev_zone_type - Type of zone in descriptor
+ *
+ * @ZTYP_RESERVED: Reserved
+ * @ZTYP_CONVENTIONAL: Conventional random write zone (No Write Pointer)
+ * @ZTYP_SEQ_WRITE_REQUIRED: Non-sequential writes are rejected.
+ * @ZTYP_SEQ_WRITE_PREFERRED: Non-sequential writes allowed but discouraged.
+ *
+ * Returned from Report Zones. See bdev_zone_descriptor* type.
+ */
+enum bdev_zone_type {
+	ZTYP_RESERVED            = 0,
+	ZTYP_CONVENTIONAL        = 1,
+	ZTYP_SEQ_WRITE_REQUIRED  = 2,
+	ZTYP_SEQ_WRITE_PREFERRED = 3,
+};
+
+/**
+ * enum bdev_zone_condition - Condition of zone in descriptor
+ *
+ * @ZCOND_CONVENTIONAL: N/A
+ * @ZCOND_ZC1_EMPTY: Empty
+ * @ZCOND_ZC2_OPEN_IMPLICIT: Opened via write to zone.
+ * @ZCOND_ZC3_OPEN_EXPLICIT: Opened via open zone command.
+ * @ZCOND_ZC4_CLOSED: Closed
+ * @ZCOND_ZC6_READ_ONLY:
+ * @ZCOND_ZC5_FULL: No remaining space in zone.
+ * @ZCOND_ZC7_OFFLINE: Offline
+ *
+ * Returned from Report Zones. See bdev_zone_descriptor* flags.
+ */
+enum bdev_zone_condition {
+	ZCOND_CONVENTIONAL       = 0,
+	ZCOND_ZC1_EMPTY          = 1,
+	ZCOND_ZC2_OPEN_IMPLICIT  = 2,
+	ZCOND_ZC3_OPEN_EXPLICIT  = 3,
+	ZCOND_ZC4_CLOSED         = 4,
+	/* 0x5 to 0xC are reserved */
+	ZCOND_ZC6_READ_ONLY      = 0xd,
+	ZCOND_ZC5_FULL           = 0xe,
+	ZCOND_ZC7_OFFLINE        = 0xf,
+};
+
+/**
+ * enum bdev_zone_same - Report Zones same code.
+ *
+ * @ZS_ALL_DIFFERENT: All zones differ in type and size.
+ * @ZS_ALL_SAME: All zones are the same size and type.
+ * @ZS_LAST_DIFFERS: All zones are the same size and type except the last zone.
+ * @ZS_SAME_LEN_DIFF_TYPES: All zones are the same length but types differ.
+ *
+ * Returned from Report Zones. See bdev_zone_report* same_field.
+ */
+enum bdev_zone_same {
+	ZS_ALL_DIFFERENT        = 0,
+	ZS_ALL_SAME             = 1,
+	ZS_LAST_DIFFERS         = 2,
+	ZS_SAME_LEN_DIFF_TYPES  = 3,
+};
+
+/**
+ * struct bdev_zone_get_report - ioctl: Report Zones request
+ *
+ * @zone_locator_lba: starting lba for first [reported] zone
+ * @return_page_count: number of *bytes* allocated for result
+ * @report_option: see: zone_report_option enum
+ *
+ * Used to issue report zones command to connected device
+ */
+struct bdev_zone_get_report {
+	__u64 zone_locator_lba;
+	__u32 return_page_count;
+	__u8  report_option;
+} __packed;
+
+/**
+ * struct bdev_zone_descriptor - A Zone descriptor entry from report zones
+ *
+ * @type: see zone_type enum
+ * @flags: Bits 0:reset, 1:non-seq, 2-3: resv, 4-7: see zone_condition enum
+ * @reserved1: padding
+ * @length: length of zone in sectors
+ * @lba_start: lba where the zone starts.
+ * @lba_wptr: lba of the current write pointer.
+ * @reserved: padding
+ *
+ */
+struct bdev_zone_descriptor {
+	__u8 type;
+	__u8 flags;
+	__u8  reserved1[6];
+	__be64 length;
+	__be64 lba_start;
+	__be64 lba_wptr;
+	__u8 reserved[32];
+} __packed;
+
+/**
+ * struct bdev_zone_report - Report Zones result
+ *
+ * @descriptor_count: Number of descriptor entries that follow
+ * @same_field: bits 0-3: enum zone_same (MASK: 0x0F)
+ * @reserved1: padding
+ * @maximum_lba: LBA of the last logical sector on the device, inclusive
+ *               of all logical sectors in all zones.
+ * @reserved2: padding
+ * @descriptors: array of descriptors follows.
+ */
+struct bdev_zone_report {
+	__be32 descriptor_count;
+	__u8 same_field;
+	__u8 reserved1[3];
+	__be64 maximum_lba;
+	__u8 reserved2[48];
+	struct bdev_zone_descriptor descriptors[0];
+} __packed;
+
+/**
+ * struct bdev_zone_report_io - Report Zones ioctl argument.
+ *
+ * @in: Report Zones inputs
+ * @out: Report Zones output
+ */
+struct bdev_zone_report_io {
+	union {
+		struct bdev_zone_get_report in;
+		struct bdev_zone_report out;
+	} data;
+} __packed;
+
+#endif /* _UAPI_BLKZONED_API_H */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v8 2/2] Add ioctl to issue ZBC/ZAC commands via block layer
  2016-08-22  4:20 [PATCH v8 0/2] Block layer support ZAC/ZBC commands Shaun Tancheff
  2016-08-22  4:20 ` [PATCH v8 1/2] Add bio/request flags to issue ZBC/ZAC commands Shaun Tancheff
@ 2016-08-22  4:20 ` Shaun Tancheff
  2016-08-24 20:25   ` [PATCH v8 2/2 RESEND] " Shaun Tancheff
  2016-08-24 20:22 ` [PATCH v8 0/2 RESEND] Block layer support ZAC/ZBC commands Shaun Tancheff
  2 siblings, 1 reply; 8+ messages in thread
From: Shaun Tancheff @ 2016-08-22  4:20 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-kernel
  Cc: Shaun Tancheff, Jens Axboe, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Damien Le Moal,
	Hannes Reinecke, Josh Bingaman, David S . Miller,
	Geert Uytterhoeven, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Guenter Roeck, Ming Lei, Mike Christie,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Shaohua Li,
	Dan Williams, Sagi Grimberg, stephen hemminger, Jarkko Sakkinen,
	Alexandre Bounine, Asias He, Sabrina Dubroca, Mike Frysinger,
	Andrea Arcangeli, Linus Walleij, Jeff Layton, J . Bruce Fields,
	Shaun Tancheff

Add support for ZBC ioctl's
    BLKREPORT     - Issue Report Zones to device.
    BLKZONEACTION - Issue a Zone Action (Close, Finish, Open, or Reset)

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v8:
 - Changed ioctl for zone actions to a single ioctl that takes 
   a structure including the zone, zone action, all flag, and force option
 - Mapped REQ_META flag to 'force unit access' for zone operations
v6:
 - Added GFP_DMA to gfp mask.
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's

 block/ioctl.c                     | 149 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/blkzoned_api.h |  30 +++++++-
 include/uapi/linux/fs.h           |   1 +
 3 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d760523 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -194,6 +194,151 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+				  void __user *parg)
+{
+	int error = -EFAULT;
+	gfp_t gfp = GFP_KERNEL | GFP_DMA;
+	void *iopg = NULL;
+	struct bdev_zone_report_io *bzrpt = NULL;
+	int order = 0;
+	struct page *pgs = NULL;
+	u32 alloc_size = PAGE_SIZE;
+	unsigned int op_flags = 0;
+	u8 opt = 0;
+
+	if (!(mode & FMODE_READ))
+		return -EBADF;
+
+	iopg = (void *)get_zeroed_page(gfp);
+	if (!iopg) {
+		error = -ENOMEM;
+		goto report_zones_out;
+	}
+	bzrpt = iopg;
+	if (copy_from_user(bzrpt, parg, sizeof(*bzrpt))) {
+		error = -EFAULT;
+		goto report_zones_out;
+	}
+	if (bzrpt->data.in.return_page_count > alloc_size) {
+		int npages;
+
+		alloc_size = bzrpt->data.in.return_page_count;
+		npages = (alloc_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		pgs = alloc_pages(gfp, ilog2(npages));
+		if (pgs) {
+			void *mem = page_address(pgs);
+
+			if (!mem) {
+				error = -ENOMEM;
+				goto report_zones_out;
+			}
+			order = ilog2(npages);
+			memset(mem, 0, alloc_size);
+			memcpy(mem, bzrpt, sizeof(*bzrpt));
+			bzrpt = mem;
+		} else {
+			/* Result requires DMA capable memory */
+			pr_err("Not enough memory available for request.\n");
+			error = -ENOMEM;
+			goto report_zones_out;
+		}
+	} else {
+		alloc_size = bzrpt->data.in.return_page_count;
+	}
+	if (bzrpt->data.in.force_unit_access)
+		op_flags |= REQ_META;
+	opt = bzrpt->data.in.report_option;
+	error = blkdev_issue_zone_report(bdev, op_flags,
+			bzrpt->data.in.zone_locator_lba, opt,
+			pgs ? pgs : virt_to_page(iopg),
+			alloc_size, GFP_KERNEL);
+	if (error)
+		goto report_zones_out;
+
+	if (pgs) {
+		void *src = bzrpt;
+		u32 off = 0;
+
+		/*
+		 * When moving a multi-order page with GFP_DMA
+		 * the copy to user can trap "<spans multiple pages>"
+		 * so instead we copy out 1 page at a time.
+		 */
+		while (off < alloc_size && !error) {
+			u32 len = min_t(u32, PAGE_SIZE, alloc_size - off);
+
+			memcpy(iopg, src + off, len);
+			if (copy_to_user(parg + off, iopg, len))
+				error = -EFAULT;
+			off += len;
+		}
+	} else {
+		if (copy_to_user(parg, iopg, alloc_size))
+			error = -EFAULT;
+	}
+
+report_zones_out:
+	if (pgs)
+		__free_pages(pgs, order);
+	if (iopg)
+		free_page((unsigned long)iopg);
+	return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+				  void __user *parg)
+{
+	unsigned int op = 0;
+	unsigned int op_flags = 0;
+	sector_t lba;
+	struct bdev_zone_action za;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	/* When acting on zones we explicitly disallow using a partition. */
+	if (bdev != bdev->bd_contains) {
+		pr_err("%s: All zone operations disallowed on this device\n",
+			__func__);
+		return -EFAULT;
+	}
+
+	if (copy_from_user(&za, parg, sizeof(za)))
+		return -EFAULT;
+
+	switch (za.action) {
+	case ZONE_ACTION_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case ZONE_ACTION_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case ZONE_ACTION_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case ZONE_ACTION_RESET:
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		pr_err("%s: Unknown action: %u\n", __func__, za.action);
+		return -EINVAL;
+	}
+
+	lba = za.zone_locator_lba;
+	if (za.all_zones) {
+		if (lba) {
+			pr_err("%s: if all_zones, LBA must be 0.\n", __func__);
+			return -EINVAL;
+		}
+		lba = ~0ul;
+	}
+	if (za.force_unit_access || lba == ~0ul)
+		op_flags |= REQ_META;
+
+	return blkdev_issue_zone_action(bdev, op, op_flags, lba, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 		unsigned long arg, unsigned long flags)
 {
@@ -568,6 +713,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	case BLKTRACESETUP:
 	case BLKTRACETEARDOWN:
 		return blk_trace_ioctl(bdev, cmd, argp);
+	case BLKREPORT:
+		return blk_zoned_report_ioctl(bdev, mode, argp);
+	case BLKZONEACTION:
+		return blk_zoned_action_ioctl(bdev, mode, argp);
 	case IOC_PR_REGISTER:
 		return blkdev_pr_register(bdev, argp);
 	case IOC_PR_RESERVE:
diff --git a/include/uapi/linux/blkzoned_api.h b/include/uapi/linux/blkzoned_api.h
index d2bdba5..cd81a9f 100644
--- a/include/uapi/linux/blkzoned_api.h
+++ b/include/uapi/linux/blkzoned_api.h
@@ -115,6 +115,7 @@ enum bdev_zone_same {
  * @zone_locator_lba: starting lba for first [reported] zone
  * @return_page_count: number of *bytes* allocated for result
  * @report_option: see: zone_report_option enum
+ * @force_unit_access: Force report from media
  *
  * Used to issue report zones command to connected device
  */
@@ -122,6 +123,25 @@ struct bdev_zone_get_report {
 	__u64 zone_locator_lba;
 	__u32 return_page_count;
 	__u8  report_option;
+	__u8  force_unit_access;
+} __packed;
+
+/**
+ * struct bdev_zone_action - ioctl: Perform Zone Action
+ *
+ * @zone_locator_lba: starting lba for first [reported] zone
+ * @return_page_count: number of *bytes* allocated for result
+ * @action: One of the ZONE_ACTION_*'s Close,Finish,Open, or Reset
+ * @all_zones: Flag to indicate if command should apply to all zones.
+ * @force_unit_access: Force command to media and update zone cache on success
+ *
+ * Used to issue report zones command to connected device
+ */
+struct bdev_zone_action {
+	__u64 zone_locator_lba;
+	__u32 action;
+	__u8  all_zones;
+	__u8  force_unit_access;
 } __packed;
 
 /**
@@ -134,7 +154,6 @@ struct bdev_zone_get_report {
  * @lba_start: lba where the zone starts.
  * @lba_wptr: lba of the current write pointer.
  * @reserved: padding
- *
  */
 struct bdev_zone_descriptor {
 	__u8 type;
@@ -179,4 +198,13 @@ struct bdev_zone_report_io {
 	} data;
 } __packed;
 
+/* continuing from uapi/linux/fs.h: */
+#define BLKREPORT	_IOWR(0x12, 130, struct bdev_zone_report_io)
+#define BLKZONEACTION	_IOW(0x12, 131, struct bdev_zone_action)
+
+#define ZONE_ACTION_CLOSE	0x01
+#define ZONE_ACTION_FINISH	0x02
+#define ZONE_ACTION_OPEN	0x03
+#define ZONE_ACTION_RESET	0x04
+
 #endif /* _UAPI_BLKZONED_API_H */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3b00f7c..350fb3f2 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -222,6 +222,7 @@ struct fsxattr {
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
+/* A jump here: See blkzoned_api.h, Reserving 130 and 131. */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v8 0/2 RESEND] Block layer support ZAC/ZBC commands
  2016-08-22  4:20 [PATCH v8 0/2] Block layer support ZAC/ZBC commands Shaun Tancheff
  2016-08-22  4:20 ` [PATCH v8 1/2] Add bio/request flags to issue ZBC/ZAC commands Shaun Tancheff
  2016-08-22  4:20 ` [PATCH v8 2/2] Add ioctl to issue ZBC/ZAC commands via block layer Shaun Tancheff
@ 2016-08-24 20:22 ` Shaun Tancheff
  2 siblings, 0 replies; 8+ messages in thread
From: Shaun Tancheff @ 2016-08-24 20:22 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-kernel
  Cc: Shaun Tancheff, Jens Axboe, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Damien Le Moal,
	Hannes Reinecke, Josh Bingaman, David S . Miller,
	Geert Uytterhoeven, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Guenter Roeck, Ming Lei, Mike Christie,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Shaohua Li,
	Dan Williams, Sagi Grimberg, stephen hemminger, Jarkko Sakkinen,
	Alexandre Bounine, Asias He, Sabrina Dubroca, Mike Frysinger,
	Andrea Arcangeli, Linus Walleij, Jeff Layton, J . Bruce Fields,
	linux-f2fs-devel, linux-fsdevel, dm-devel, Jaegeuk Kim

(RESENDING to include f2fs, fs-devel and dm-devel)

Hi Jens,

This series is based on linus' v4.8-rc2 branch.

As Host Aware drives are becoming available we would like to be able
to make use of such drives. This series is also intended to be
suitable for use by Host Managed drives.

ZBC [and ZAC] drives add new commands for discovering and working
with Zones.

Part one of this series expands the bio/request reserved op size from
3 to 4 bits and then adds op codes for each of the ZBC commands:
   Report zones, close zone, finish zone, open zone and reset zone.

Part two of this series deals with integrating these new bio/request
op's with Hannes' zone cache.

This extends the ZBC support up to the block layer allowing direct
control by file systems or device mapper targets. Also by deferring
the zone handling to the authoritative subsystem there is an overall
lower memory usage for holding the active zone information as well
as clarifying responsible party for maintaining the write pointer
for each active zone.

By way of example a DM target may have several writes in progress. To sector
(or lba) for those writes will each depend on the previous write. While the
drive's write pointer will be updated as writes are completed the DM target
will be maintaining both where the next write should be scheduled from and
where the write pointer is based on writes completed w/o errors.

Knowing the drive zone topology enables DM targets and file systems to
extend their block allocation schemes and issue write pointer resets (or
discards) that are zone aligned.

A perhaps non-obvious approach is that a conventional drive will
returns a zone report descriptor with a single large conventional zone.
This is intended to allow a collection of zoned and non-zoned media to
be stitched together to provide a file system with a zoned device with
conventional space mapped to where it is useful.

Patches for util-linux can be found here:
    git@github.com:stancheff/util-linux.git v2.28.1+biof

    https://github.com/stancheff/util-linux/tree/v2.28.1%2Bbiof

This patch is available here:
    https://github.com/stancheff/linux/tree/v4.8-rc2%2Bbiof.v8

    git@github.com:stancheff/linux.git v4.8-rc2+biof.v8

v8:
 - Changed zone report to default to reading from zone cache.
 - Changed ioctl for zone commands to support forcing a query or command
   to be sent to media.
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v7:
 - Initial support for Hannes' zone cache.
v6:
 - Fix page alloc to include DMA flag for ioctl.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
 - Dropped ata16 hackery
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
 - Use zoned report reserved bit for ata-passthrough flag.

V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Moved include/uapi/linux/fs.h changes to patch 3
 - Fixed commit message for first patch in series.


Shaun Tancheff (2):
  Add bio/request flags to issue ZBC/ZAC commands
  Add ioctl to issue ZBC/ZAC commands via block layer

 MAINTAINERS                       |   9 ++
 block/blk-lib.c                   |  94 +++++++++++++++++
 block/ioctl.c                     | 149 +++++++++++++++++++++++++++
 drivers/scsi/sd.c                 | 121 ++++++++++++++++++++++
 drivers/scsi/sd.h                 |   1 +
 include/linux/bio.h               |   8 +-
 include/linux/blk_types.h         |   7 +-
 include/linux/blkdev.h            |   1 +
 include/linux/blkzoned_api.h      |  25 +++++
 include/uapi/linux/Kbuild         |   1 +
 include/uapi/linux/blkzoned_api.h | 210 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/fs.h           |   1 +
 12 files changed, 625 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

-- 
2.9.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [PATCH v8 1/2 RESEND] Add bio/request flags to issue ZBC/ZAC commands
  2016-08-22  4:20 ` [PATCH v8 1/2] Add bio/request flags to issue ZBC/ZAC commands Shaun Tancheff
@ 2016-08-24 20:24   ` Shaun Tancheff
  2016-08-26  2:31     ` Damien Le Moal
  0 siblings, 1 reply; 8+ messages in thread
From: Shaun Tancheff @ 2016-08-24 20:24 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-kernel
  Cc: Shaun Tancheff, Jens Axboe, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Damien Le Moal,
	Hannes Reinecke, Josh Bingaman, David S . Miller,
	Geert Uytterhoeven, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Guenter Roeck, Ming Lei, Mike Christie,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Shaohua Li,
	Dan Williams, Sagi Grimberg, stephen hemminger, Jarkko Sakkinen,
	Alexandre Bounine, Asias He, Sabrina Dubroca, Mike Frysinger,
	Andrea Arcangeli, Linus Walleij, Jeff Layton, J . Bruce Fields,
	linux-f2fs-devel, linux-fsdevel, dm-devel, Jaegeuk Kim,
	Shaun Tancheff

(RESENDING to include f2fs, fs-devel and dm-devel)

Add op flags to access to zone information as well as open, close
and reset zones:
  - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
  - REQ_OP_ZONE_OPEN - Explicitly open a zone for writing
  - REQ_OP_ZONE_CLOSE - Explicitly close a zone
  - REQ_OP_ZONE_FINISH - Explicitly finish a zone
  - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone

These op flags can be used to create bio's to control zoned devices
through the block layer.

This is useful for file systems and device mappers that need explicit
control of zoned devices such as Host Managed and Host Aware SMR drives,

Report zones is a device read that requires a buffer.

Open, Close, Finish and Reset are device commands that have no
associated data transfer.
  Open -   Open is a zone for writing.
  Close -  Disallow writing to a zone.
  Finish - Disallow writing a zone and set the WP to the end
           of the zone.
  Reset -  Discard data in a zone and reset the WP to the start
           of the zone.

Sending an LBA of ~0 will attempt to operate on all zones.
This is typically used with Reset to wipe a drive as a Reset
behaves similar to TRIM in that all data in the zone(s) is deleted.

Report zones currently defaults to reporting on all zones. It expected
that support for the zone option flag will piggy back on streamid
support. The report option flag is useful as it can reduce the number
of zones in each report, but not critical.

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v8:
 - Added Finish Zone op
 - Fixed report zones copy to user to work when HARDENED_USERCOPY is enabled
v6:
 - Added GFP_DMA to gfp mask.
v5:
 - In sd_setup_zone_action_cmnd, remove unused vars and fix switch indent
 - In blk-lib fix documentation
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's
V3:
 - Rebase on Mike Cristie's separate bio operations
 - Update blkzoned_api.h to include report zones PARTIAL bit.
V2:
 - Changed bi_rw to op_flags clarify sepeartion of bio op from flags.
 - Fixed memory leak in blkdev_issue_zone_report failing to put_bio().
 - Documented opt in blkdev_issue_zone_report.
 - Removed include/uapi/linux/fs.h from this patch.

 MAINTAINERS                       |   9 ++
 block/blk-lib.c                   |  94 ++++++++++++++++++++
 drivers/scsi/sd.c                 | 121 +++++++++++++++++++++++++
 drivers/scsi/sd.h                 |   1 +
 include/linux/bio.h               |   8 +-
 include/linux/blk_types.h         |   7 +-
 include/linux/blkdev.h            |   1 +
 include/linux/blkzoned_api.h      |  25 ++++++
 include/uapi/linux/Kbuild         |   1 +
 include/uapi/linux/blkzoned_api.h | 182 ++++++++++++++++++++++++++++++++++++++
 10 files changed, 447 insertions(+), 2 deletions(-)
 create mode 100644 include/linux/blkzoned_api.h
 create mode 100644 include/uapi/linux/blkzoned_api.h

diff --git a/MAINTAINERS b/MAINTAINERS
index a306795..aedf311 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -12984,6 +12984,15 @@ F:	Documentation/networking/z8530drv.txt
 F:	drivers/net/hamradio/*scc.c
 F:	drivers/net/hamradio/z8530.h
 
+ZBC AND ZBC BLOCK DEVICES
+M:	Shaun Tancheff <shaun.tancheff@seagate.com>
+W:	http://seagate.com
+W:	https://github.com/Seagate/ZDM-Device-Mapper
+L:	linux-block@vger.kernel.org
+S:	Maintained
+F:	include/linux/blkzoned_api.h
+F:	include/uapi/linux/blkzoned_api.h
+
 ZBUD COMPRESSED PAGE ALLOCATOR
 M:	Seth Jennings <sjenning@redhat.com>
 L:	linux-mm@kvack.org
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 083e56f..e92bd56 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -266,3 +266,97 @@ int blkdev_issue_zeroout(struct block_device *bdev, sector_t sector,
 	return __blkdev_issue_zeroout(bdev, sector, nr_sects, gfp_mask);
 }
 EXPORT_SYMBOL(blkdev_issue_zeroout);
+
+/**
+ * blkdev_issue_zone_report - queue a report zones operation
+ * @bdev:	target blockdev
+ * @op_flags:	extra bio rw flags. If unsure, use 0.
+ * @sector:	starting sector (report will include this sector).
+ * @opt:	See: zone_report_option, default is 0 (all zones).
+ * @page:	one or more contiguous pages.
+ * @pgsz:	up to size of page in bytes, size of report.
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_report(struct block_device *bdev, unsigned int op_flags,
+			     sector_t sector, u8 opt, struct page *page,
+			     size_t pgsz, gfp_t gfp_mask)
+{
+	struct bdev_zone_report *conv = page_address(page);
+	struct bio *bio;
+	unsigned int nr_iovecs = 1;
+	int ret = 0;
+
+	if (pgsz < (sizeof(struct bdev_zone_report) +
+		    sizeof(struct bdev_zone_descriptor)))
+		return -EINVAL;
+
+	bio = bio_alloc(gfp_mask, nr_iovecs);
+	if (!bio)
+		return -ENOMEM;
+
+	conv->descriptor_count = 0;
+	bio->bi_iter.bi_sector = sector;
+	bio->bi_bdev = bdev;
+	bio->bi_vcnt = 0;
+	bio->bi_iter.bi_size = 0;
+
+	bio_add_page(bio, page, pgsz, 0);
+	bio_set_op_attrs(bio, REQ_OP_ZONE_REPORT, op_flags);
+	ret = submit_bio_wait(bio);
+
+	/*
+	 * When our request it nak'd the underlying device maybe conventional
+	 * so ... report a single conventional zone the size of the device.
+	 */
+	if (ret == -EIO && conv->descriptor_count) {
+		/* Adjust the conventional to the size of the partition ... */
+		__be64 blksz = cpu_to_be64(bdev->bd_part->nr_sects);
+
+		conv->maximum_lba = blksz;
+		conv->descriptors[0].type = ZTYP_CONVENTIONAL;
+		conv->descriptors[0].flags = ZCOND_CONVENTIONAL << 4;
+		conv->descriptors[0].length = blksz;
+		conv->descriptors[0].lba_start = 0;
+		conv->descriptors[0].lba_wptr = blksz;
+		ret = 0;
+	}
+	bio_put(bio);
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_zone_report);
+
+/**
+ * blkdev_issue_zone_action - queue a report zones operation
+ * @bdev:	target blockdev
+ * @op:		One of REQ_OP_ZONE_* op codes.
+ * @op_flags:	extra bio rw flags. If unsure, use 0.
+ * @sector:	starting lba of sector, Use ~0ul for all zones.
+ * @gfp_mask:	memory allocation flags (for bio_alloc)
+ *
+ * Description:
+ *    Issue a zone report request for the sectors in question.
+ */
+int blkdev_issue_zone_action(struct block_device *bdev, unsigned int op,
+			     unsigned int op_flags, sector_t sector,
+			     gfp_t gfp_mask)
+{
+	int ret;
+	struct bio *bio;
+
+	bio = bio_alloc(gfp_mask, 1);
+	if (!bio)
+		return -ENOMEM;
+
+	bio->bi_iter.bi_sector = sector;
+	bio->bi_bdev = bdev;
+	bio->bi_vcnt = 0;
+	bio->bi_iter.bi_size = 0;
+	bio_set_op_attrs(bio, op, op_flags);
+	ret = submit_bio_wait(bio);
+	bio_put(bio);
+	return ret;
+}
+EXPORT_SYMBOL(blkdev_issue_zone_action);
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index d3e852a..d4d04ed 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -1134,6 +1134,118 @@ static int sd_setup_read_write_cmnd(struct scsi_cmnd *SCpnt)
 	return ret;
 }
 
+static int sd_setup_zone_report_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_device *sdp = cmd->device;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	struct bio *bio = rq->bio;
+	sector_t sector = blk_rq_pos(rq);
+	struct gendisk *disk = rq->rq_disk;
+	unsigned int nr_bytes = blk_rq_bytes(rq);
+	int ret = BLKPREP_KILL;
+
+	WARN_ON(nr_bytes == 0);
+
+	/*
+	 * For conventional drives generate a report that shows a
+	 * large single convetional zone the size of the block device
+	 */
+	if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC) {
+		void *src;
+		struct bdev_zone_report *conv;
+
+		if (nr_bytes < sizeof(struct bdev_zone_report))
+			goto out;
+
+		src = kmap_atomic(bio->bi_io_vec->bv_page);
+		conv = src + bio->bi_io_vec->bv_offset;
+		conv->descriptor_count = cpu_to_be32(1);
+		conv->same_field = ZS_ALL_SAME;
+		conv->maximum_lba = cpu_to_be64(disk->part0.nr_sects);
+		kunmap_atomic(src);
+		goto out;
+	}
+
+	ret = scsi_init_io(cmd);
+	if (ret != BLKPREP_OK)
+		goto out;
+
+	cmd = rq->special;
+	if (sdp->changed) {
+		pr_err("SCSI disk has been changed or is not present.");
+		ret = BLKPREP_KILL;
+		goto out;
+	}
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	cmd->cmnd[0] = ZBC_IN;
+	cmd->cmnd[1] = ZI_REPORT_ZONES;
+	put_unaligned_be64(sector, &cmd->cmnd[2]);
+	put_unaligned_be32(nr_bytes, &cmd->cmnd[10]);
+	/* FUTURE ... when streamid is available */
+	/* cmd->cmnd[14] = bio_get_streamid(bio); */
+
+	cmd->sc_data_direction = DMA_FROM_DEVICE;
+	cmd->sdb.length = nr_bytes;
+	cmd->transfersize = sdp->sector_size;
+	cmd->underflow = 0;
+	cmd->allowed = SD_MAX_RETRIES;
+	ret = BLKPREP_OK;
+out:
+	return ret;
+}
+
+static int sd_setup_zone_action_cmnd(struct scsi_cmnd *cmd)
+{
+	struct request *rq = cmd->request;
+	struct scsi_disk *sdkp = scsi_disk(rq->rq_disk);
+	sector_t sector = blk_rq_pos(rq);
+	int ret = BLKPREP_KILL;
+	u8 allbit = 0;
+
+	if (sdkp->zoned != 1 && sdkp->device->type != TYPE_ZBC)
+		goto out;
+
+	if (sector == ~0ul) {
+		allbit = 1;
+		sector = 0;
+	}
+
+	cmd->cmd_len = 16;
+	memset(cmd->cmnd, 0, cmd->cmd_len);
+	memset(&cmd->sdb, 0, sizeof(cmd->sdb));
+	cmd->cmnd[0] = ZBC_OUT;
+	switch (req_op(rq)) {
+	case REQ_OP_ZONE_OPEN:
+		cmd->cmnd[1] = ZO_OPEN_ZONE;
+		break;
+	case REQ_OP_ZONE_CLOSE:
+		cmd->cmnd[1] = ZO_CLOSE_ZONE;
+		break;
+	case REQ_OP_ZONE_FINISH:
+		cmd->cmnd[1] = ZO_FINISH_ZONE;
+		break;
+	case REQ_OP_ZONE_RESET:
+		cmd->cmnd[1] = ZO_RESET_WRITE_POINTER;
+		break;
+	default:
+		goto out;
+	}
+	cmd->cmnd[14] = allbit;
+	put_unaligned_be64(sector, &cmd->cmnd[2]);
+
+	cmd->transfersize = 0;
+	cmd->underflow = 0;
+	cmd->allowed = SD_MAX_RETRIES;
+	cmd->sc_data_direction = DMA_NONE;
+
+	ret = BLKPREP_OK;
+out:
+	return ret;
+}
+
 static int sd_init_command(struct scsi_cmnd *cmd)
 {
 	struct request *rq = cmd->request;
@@ -1148,6 +1260,13 @@ static int sd_init_command(struct scsi_cmnd *cmd)
 	case REQ_OP_READ:
 	case REQ_OP_WRITE:
 		return sd_setup_read_write_cmnd(cmd);
+	case REQ_OP_ZONE_REPORT:
+		return sd_setup_zone_report_cmnd(cmd);
+	case REQ_OP_ZONE_OPEN:
+	case REQ_OP_ZONE_CLOSE:
+	case REQ_OP_ZONE_FINISH:
+	case REQ_OP_ZONE_RESET:
+		return sd_setup_zone_action_cmnd(cmd);
 	default:
 		BUG();
 	}
@@ -2737,6 +2856,8 @@ static void sd_read_block_characteristics(struct scsi_disk *sdkp)
 		queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, sdkp->disk->queue);
 	}
 
+	sdkp->zoned = (buffer[8] >> 4) & 3;
+
  out:
 	kfree(buffer);
 }
diff --git a/drivers/scsi/sd.h b/drivers/scsi/sd.h
index 765a6f1..f782990 100644
--- a/drivers/scsi/sd.h
+++ b/drivers/scsi/sd.h
@@ -94,6 +94,7 @@ struct scsi_disk {
 	unsigned	lbpvpd : 1;
 	unsigned	ws10 : 1;
 	unsigned	ws16 : 1;
+	unsigned	zoned: 2;
 };
 #define to_scsi_disk(obj) container_of(obj,struct scsi_disk,dev)
 
diff --git a/include/linux/bio.h b/include/linux/bio.h
index 59ffaa6..66b1b33 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -79,7 +79,13 @@ static inline bool bio_has_data(struct bio *bio)
 
 static inline bool bio_no_advance_iter(struct bio *bio)
 {
-	return bio_op(bio) == REQ_OP_DISCARD || bio_op(bio) == REQ_OP_WRITE_SAME;
+	return bio_op(bio) == REQ_OP_DISCARD ||
+	       bio_op(bio) == REQ_OP_WRITE_SAME ||
+	       bio_op(bio) == REQ_OP_ZONE_REPORT ||
+	       bio_op(bio) == REQ_OP_ZONE_OPEN ||
+	       bio_op(bio) == REQ_OP_ZONE_CLOSE ||
+	       bio_op(bio) == REQ_OP_ZONE_FINISH ||
+	       bio_op(bio) == REQ_OP_ZONE_RESET;
 }
 
 static inline bool bio_is_rw(struct bio *bio)
diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h
index 436f43f..97282c6 100644
--- a/include/linux/blk_types.h
+++ b/include/linux/blk_types.h
@@ -232,13 +232,18 @@ enum rq_flag_bits {
 enum req_op {
 	REQ_OP_READ,
 	REQ_OP_WRITE,
+	REQ_OP_ZONE_REPORT,
+	REQ_OP_ZONE_OPEN,
+	REQ_OP_ZONE_CLOSE,
+	REQ_OP_ZONE_FINISH,
+	REQ_OP_ZONE_RESET,
 	REQ_OP_DISCARD,		/* request to discard sectors */
 	REQ_OP_SECURE_ERASE,	/* request to securely erase sectors */
 	REQ_OP_WRITE_SAME,	/* write same block many times */
 	REQ_OP_FLUSH,		/* request for cache flush */
 };
 
-#define REQ_OP_BITS 3
+#define REQ_OP_BITS 4
 
 typedef unsigned int blk_qc_t;
 #define BLK_QC_T_NONE	-1U
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 2c210b6..2b2db36 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -24,6 +24,7 @@
 #include <linux/rcupdate.h>
 #include <linux/percpu-refcount.h>
 #include <linux/scatterlist.h>
+#include <linux/blkzoned_api.h>
 
 struct module;
 struct scsi_ioctl_command;
diff --git a/include/linux/blkzoned_api.h b/include/linux/blkzoned_api.h
new file mode 100644
index 0000000..47c091a
--- /dev/null
+++ b/include/linux/blkzoned_api.h
@@ -0,0 +1,25 @@
+/*
+ * Functions for zone based SMR devices.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by:
+ * Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef _BLKZONED_API_H
+#define _BLKZONED_API_H
+
+#include <uapi/linux/blkzoned_api.h>
+
+extern int blkdev_issue_zone_action(struct block_device *, unsigned int op,
+				    unsigned int op_flags, sector_t, gfp_t);
+extern int blkdev_issue_zone_report(struct block_device *, unsigned int op_flgs,
+				    sector_t, u8 opt, struct page *, size_t,
+				    gfp_t);
+
+#endif /* _BLKZONED_API_H */
diff --git a/include/uapi/linux/Kbuild b/include/uapi/linux/Kbuild
index 185f8ea..50ba85a 100644
--- a/include/uapi/linux/Kbuild
+++ b/include/uapi/linux/Kbuild
@@ -70,6 +70,7 @@ header-y += bfs_fs.h
 header-y += binfmts.h
 header-y += blkpg.h
 header-y += blktrace_api.h
+header-y += blkzoned_api.h
 header-y += bpf_common.h
 header-y += bpf.h
 header-y += bpqether.h
diff --git a/include/uapi/linux/blkzoned_api.h b/include/uapi/linux/blkzoned_api.h
new file mode 100644
index 0000000..d2bdba5
--- /dev/null
+++ b/include/uapi/linux/blkzoned_api.h
@@ -0,0 +1,182 @@
+/*
+ * Functions for zone based SMR devices.
+ *
+ * Copyright (C) 2015 Seagate Technology PLC
+ *
+ * Written by:
+ * Shaun Tancheff <shaun.tancheff@seagate.com>
+ *
+ * This file is licensed under  the terms of the GNU General Public
+ * License version 2. This program is licensed "as is" without any
+ * warranty of any kind, whether express or implied.
+ */
+
+#ifndef _UAPI_BLKZONED_API_H
+#define _UAPI_BLKZONED_API_H
+
+#include <linux/types.h>
+
+/**
+ * enum zone_report_option - Report Zones types to be included.
+ *
+ * @ZOPT_NON_SEQ_AND_RESET: Default (all zones).
+ * @ZOPT_ZC1_EMPTY: Zones which are empty.
+ * @ZOPT_ZC2_OPEN_IMPLICIT: Zones open but not explicitly opened
+ * @ZOPT_ZC3_OPEN_EXPLICIT: Zones opened explicitly
+ * @ZOPT_ZC4_CLOSED: Zones closed for writing.
+ * @ZOPT_ZC5_FULL: Zones that are full.
+ * @ZOPT_ZC6_READ_ONLY: Zones that are read-only
+ * @ZOPT_ZC7_OFFLINE: Zones that are offline
+ * @ZOPT_RESET: Zones that are empty
+ * @ZOPT_NON_SEQ: Zones that have HA media-cache writes pending
+ * @ZOPT_NON_WP_ZONES: Zones that do not have Write Pointers (conventional)
+ * @ZOPT_PARTIAL_FLAG: Modifies the definition of the Zone List Length field.
+ *
+ * Used by Report Zones in bdev_zone_get_report: report_option
+ */
+enum bdev_zone_report_option {
+	ZOPT_NON_SEQ_AND_RESET   = 0x00,
+	ZOPT_ZC1_EMPTY,
+	ZOPT_ZC2_OPEN_IMPLICIT,
+	ZOPT_ZC3_OPEN_EXPLICIT,
+	ZOPT_ZC4_CLOSED,
+	ZOPT_ZC5_FULL,
+	ZOPT_ZC6_READ_ONLY,
+	ZOPT_ZC7_OFFLINE,
+	ZOPT_RESET               = 0x10,
+	ZOPT_NON_SEQ             = 0x11,
+	ZOPT_NON_WP_ZONES        = 0x3f,
+	ZOPT_PARTIAL_FLAG        = 0x80,
+};
+
+/**
+ * enum bdev_zone_type - Type of zone in descriptor
+ *
+ * @ZTYP_RESERVED: Reserved
+ * @ZTYP_CONVENTIONAL: Conventional random write zone (No Write Pointer)
+ * @ZTYP_SEQ_WRITE_REQUIRED: Non-sequential writes are rejected.
+ * @ZTYP_SEQ_WRITE_PREFERRED: Non-sequential writes allowed but discouraged.
+ *
+ * Returned from Report Zones. See bdev_zone_descriptor* type.
+ */
+enum bdev_zone_type {
+	ZTYP_RESERVED            = 0,
+	ZTYP_CONVENTIONAL        = 1,
+	ZTYP_SEQ_WRITE_REQUIRED  = 2,
+	ZTYP_SEQ_WRITE_PREFERRED = 3,
+};
+
+/**
+ * enum bdev_zone_condition - Condition of zone in descriptor
+ *
+ * @ZCOND_CONVENTIONAL: N/A
+ * @ZCOND_ZC1_EMPTY: Empty
+ * @ZCOND_ZC2_OPEN_IMPLICIT: Opened via write to zone.
+ * @ZCOND_ZC3_OPEN_EXPLICIT: Opened via open zone command.
+ * @ZCOND_ZC4_CLOSED: Closed
+ * @ZCOND_ZC6_READ_ONLY:
+ * @ZCOND_ZC5_FULL: No remaining space in zone.
+ * @ZCOND_ZC7_OFFLINE: Offline
+ *
+ * Returned from Report Zones. See bdev_zone_descriptor* flags.
+ */
+enum bdev_zone_condition {
+	ZCOND_CONVENTIONAL       = 0,
+	ZCOND_ZC1_EMPTY          = 1,
+	ZCOND_ZC2_OPEN_IMPLICIT  = 2,
+	ZCOND_ZC3_OPEN_EXPLICIT  = 3,
+	ZCOND_ZC4_CLOSED         = 4,
+	/* 0x5 to 0xC are reserved */
+	ZCOND_ZC6_READ_ONLY      = 0xd,
+	ZCOND_ZC5_FULL           = 0xe,
+	ZCOND_ZC7_OFFLINE        = 0xf,
+};
+
+/**
+ * enum bdev_zone_same - Report Zones same code.
+ *
+ * @ZS_ALL_DIFFERENT: All zones differ in type and size.
+ * @ZS_ALL_SAME: All zones are the same size and type.
+ * @ZS_LAST_DIFFERS: All zones are the same size and type except the last zone.
+ * @ZS_SAME_LEN_DIFF_TYPES: All zones are the same length but types differ.
+ *
+ * Returned from Report Zones. See bdev_zone_report* same_field.
+ */
+enum bdev_zone_same {
+	ZS_ALL_DIFFERENT        = 0,
+	ZS_ALL_SAME             = 1,
+	ZS_LAST_DIFFERS         = 2,
+	ZS_SAME_LEN_DIFF_TYPES  = 3,
+};
+
+/**
+ * struct bdev_zone_get_report - ioctl: Report Zones request
+ *
+ * @zone_locator_lba: starting lba for first [reported] zone
+ * @return_page_count: number of *bytes* allocated for result
+ * @report_option: see: zone_report_option enum
+ *
+ * Used to issue report zones command to connected device
+ */
+struct bdev_zone_get_report {
+	__u64 zone_locator_lba;
+	__u32 return_page_count;
+	__u8  report_option;
+} __packed;
+
+/**
+ * struct bdev_zone_descriptor - A Zone descriptor entry from report zones
+ *
+ * @type: see zone_type enum
+ * @flags: Bits 0:reset, 1:non-seq, 2-3: resv, 4-7: see zone_condition enum
+ * @reserved1: padding
+ * @length: length of zone in sectors
+ * @lba_start: lba where the zone starts.
+ * @lba_wptr: lba of the current write pointer.
+ * @reserved: padding
+ *
+ */
+struct bdev_zone_descriptor {
+	__u8 type;
+	__u8 flags;
+	__u8  reserved1[6];
+	__be64 length;
+	__be64 lba_start;
+	__be64 lba_wptr;
+	__u8 reserved[32];
+} __packed;
+
+/**
+ * struct bdev_zone_report - Report Zones result
+ *
+ * @descriptor_count: Number of descriptor entries that follow
+ * @same_field: bits 0-3: enum zone_same (MASK: 0x0F)
+ * @reserved1: padding
+ * @maximum_lba: LBA of the last logical sector on the device, inclusive
+ *               of all logical sectors in all zones.
+ * @reserved2: padding
+ * @descriptors: array of descriptors follows.
+ */
+struct bdev_zone_report {
+	__be32 descriptor_count;
+	__u8 same_field;
+	__u8 reserved1[3];
+	__be64 maximum_lba;
+	__u8 reserved2[48];
+	struct bdev_zone_descriptor descriptors[0];
+} __packed;
+
+/**
+ * struct bdev_zone_report_io - Report Zones ioctl argument.
+ *
+ * @in: Report Zones inputs
+ * @out: Report Zones output
+ */
+struct bdev_zone_report_io {
+	union {
+		struct bdev_zone_get_report in;
+		struct bdev_zone_report out;
+	} data;
+} __packed;
+
+#endif /* _UAPI_BLKZONED_API_H */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [PATCH v8 2/2 RESEND] Add ioctl to issue ZBC/ZAC commands via block layer
  2016-08-22  4:20 ` [PATCH v8 2/2] Add ioctl to issue ZBC/ZAC commands via block layer Shaun Tancheff
@ 2016-08-24 20:25   ` Shaun Tancheff
  0 siblings, 0 replies; 8+ messages in thread
From: Shaun Tancheff @ 2016-08-24 20:25 UTC (permalink / raw)
  To: linux-block, linux-scsi, linux-kernel
  Cc: Shaun Tancheff, Jens Axboe, Christoph Hellwig,
	James E . J . Bottomley, Martin K . Petersen, Damien Le Moal,
	Hannes Reinecke, Josh Bingaman, David S . Miller,
	Geert Uytterhoeven, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Guenter Roeck, Ming Lei, Mike Christie,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Shaohua Li,
	Dan Williams, Sagi Grimberg, stephen hemminger, Jarkko Sakkinen,
	Alexandre Bounine, Asias He, Sabrina Dubroca, Mike Frysinger,
	Andrea Arcangeli, Linus Walleij, Jeff Layton, J . Bruce Fields,
	linux-f2fs-devel, linux-fsdevel, dm-devel, Jaegeuk Kim,
	Shaun Tancheff

(RESENDING to include f2fs, fs-devel and dm-devel)

Add support for ZBC ioctl's
    BLKREPORT     - Issue Report Zones to device.
    BLKZONEACTION - Issue a Zone Action (Close, Finish, Open, or Reset)

Signed-off-by: Shaun Tancheff <shaun.tancheff@seagate.com>
---
v8:
 - Changed ioctl for zone actions to a single ioctl that takes 
   a structure including the zone, zone action, all flag, and force option
 - Mapped REQ_META flag to 'force unit access' for zone operations
v6:
 - Added GFP_DMA to gfp mask.
v4:
 - Rebase on linux-next tag next-20160617.
 - Change bio flags to bio op's

 block/ioctl.c                     | 149 ++++++++++++++++++++++++++++++++++++++
 include/uapi/linux/blkzoned_api.h |  30 +++++++-
 include/uapi/linux/fs.h           |   1 +
 3 files changed, 179 insertions(+), 1 deletion(-)

diff --git a/block/ioctl.c b/block/ioctl.c
index ed2397f..d760523 100644
--- a/block/ioctl.c
+++ b/block/ioctl.c
@@ -194,6 +194,151 @@ int blkdev_reread_part(struct block_device *bdev)
 }
 EXPORT_SYMBOL(blkdev_reread_part);
 
+static int blk_zoned_report_ioctl(struct block_device *bdev, fmode_t mode,
+				  void __user *parg)
+{
+	int error = -EFAULT;
+	gfp_t gfp = GFP_KERNEL | GFP_DMA;
+	void *iopg = NULL;
+	struct bdev_zone_report_io *bzrpt = NULL;
+	int order = 0;
+	struct page *pgs = NULL;
+	u32 alloc_size = PAGE_SIZE;
+	unsigned int op_flags = 0;
+	u8 opt = 0;
+
+	if (!(mode & FMODE_READ))
+		return -EBADF;
+
+	iopg = (void *)get_zeroed_page(gfp);
+	if (!iopg) {
+		error = -ENOMEM;
+		goto report_zones_out;
+	}
+	bzrpt = iopg;
+	if (copy_from_user(bzrpt, parg, sizeof(*bzrpt))) {
+		error = -EFAULT;
+		goto report_zones_out;
+	}
+	if (bzrpt->data.in.return_page_count > alloc_size) {
+		int npages;
+
+		alloc_size = bzrpt->data.in.return_page_count;
+		npages = (alloc_size + PAGE_SIZE - 1) >> PAGE_SHIFT;
+		pgs = alloc_pages(gfp, ilog2(npages));
+		if (pgs) {
+			void *mem = page_address(pgs);
+
+			if (!mem) {
+				error = -ENOMEM;
+				goto report_zones_out;
+			}
+			order = ilog2(npages);
+			memset(mem, 0, alloc_size);
+			memcpy(mem, bzrpt, sizeof(*bzrpt));
+			bzrpt = mem;
+		} else {
+			/* Result requires DMA capable memory */
+			pr_err("Not enough memory available for request.\n");
+			error = -ENOMEM;
+			goto report_zones_out;
+		}
+	} else {
+		alloc_size = bzrpt->data.in.return_page_count;
+	}
+	if (bzrpt->data.in.force_unit_access)
+		op_flags |= REQ_META;
+	opt = bzrpt->data.in.report_option;
+	error = blkdev_issue_zone_report(bdev, op_flags,
+			bzrpt->data.in.zone_locator_lba, opt,
+			pgs ? pgs : virt_to_page(iopg),
+			alloc_size, GFP_KERNEL);
+	if (error)
+		goto report_zones_out;
+
+	if (pgs) {
+		void *src = bzrpt;
+		u32 off = 0;
+
+		/*
+		 * When moving a multi-order page with GFP_DMA
+		 * the copy to user can trap "<spans multiple pages>"
+		 * so instead we copy out 1 page at a time.
+		 */
+		while (off < alloc_size && !error) {
+			u32 len = min_t(u32, PAGE_SIZE, alloc_size - off);
+
+			memcpy(iopg, src + off, len);
+			if (copy_to_user(parg + off, iopg, len))
+				error = -EFAULT;
+			off += len;
+		}
+	} else {
+		if (copy_to_user(parg, iopg, alloc_size))
+			error = -EFAULT;
+	}
+
+report_zones_out:
+	if (pgs)
+		__free_pages(pgs, order);
+	if (iopg)
+		free_page((unsigned long)iopg);
+	return error;
+}
+
+static int blk_zoned_action_ioctl(struct block_device *bdev, fmode_t mode,
+				  void __user *parg)
+{
+	unsigned int op = 0;
+	unsigned int op_flags = 0;
+	sector_t lba;
+	struct bdev_zone_action za;
+
+	if (!(mode & FMODE_WRITE))
+		return -EBADF;
+
+	/* When acting on zones we explicitly disallow using a partition. */
+	if (bdev != bdev->bd_contains) {
+		pr_err("%s: All zone operations disallowed on this device\n",
+			__func__);
+		return -EFAULT;
+	}
+
+	if (copy_from_user(&za, parg, sizeof(za)))
+		return -EFAULT;
+
+	switch (za.action) {
+	case ZONE_ACTION_CLOSE:
+		op = REQ_OP_ZONE_CLOSE;
+		break;
+	case ZONE_ACTION_FINISH:
+		op = REQ_OP_ZONE_FINISH;
+		break;
+	case ZONE_ACTION_OPEN:
+		op = REQ_OP_ZONE_OPEN;
+		break;
+	case ZONE_ACTION_RESET:
+		op = REQ_OP_ZONE_RESET;
+		break;
+	default:
+		pr_err("%s: Unknown action: %u\n", __func__, za.action);
+		return -EINVAL;
+	}
+
+	lba = za.zone_locator_lba;
+	if (za.all_zones) {
+		if (lba) {
+			pr_err("%s: if all_zones, LBA must be 0.\n", __func__);
+			return -EINVAL;
+		}
+		lba = ~0ul;
+	}
+	if (za.force_unit_access || lba == ~0ul)
+		op_flags |= REQ_META;
+
+	return blkdev_issue_zone_action(bdev, op, op_flags, lba, GFP_KERNEL);
+}
+
 static int blk_ioctl_discard(struct block_device *bdev, fmode_t mode,
 		unsigned long arg, unsigned long flags)
 {
@@ -568,6 +713,10 @@ int blkdev_ioctl(struct block_device *bdev, fmode_t mode, unsigned cmd,
 	case BLKTRACESETUP:
 	case BLKTRACETEARDOWN:
 		return blk_trace_ioctl(bdev, cmd, argp);
+	case BLKREPORT:
+		return blk_zoned_report_ioctl(bdev, mode, argp);
+	case BLKZONEACTION:
+		return blk_zoned_action_ioctl(bdev, mode, argp);
 	case IOC_PR_REGISTER:
 		return blkdev_pr_register(bdev, argp);
 	case IOC_PR_RESERVE:
diff --git a/include/uapi/linux/blkzoned_api.h b/include/uapi/linux/blkzoned_api.h
index d2bdba5..cd81a9f 100644
--- a/include/uapi/linux/blkzoned_api.h
+++ b/include/uapi/linux/blkzoned_api.h
@@ -115,6 +115,7 @@ enum bdev_zone_same {
  * @zone_locator_lba: starting lba for first [reported] zone
  * @return_page_count: number of *bytes* allocated for result
  * @report_option: see: zone_report_option enum
+ * @force_unit_access: Force report from media
  *
  * Used to issue report zones command to connected device
  */
@@ -122,6 +123,25 @@ struct bdev_zone_get_report {
 	__u64 zone_locator_lba;
 	__u32 return_page_count;
 	__u8  report_option;
+	__u8  force_unit_access;
+} __packed;
+
+/**
+ * struct bdev_zone_action - ioctl: Perform Zone Action
+ *
+ * @zone_locator_lba: starting lba for first [reported] zone
+ * @return_page_count: number of *bytes* allocated for result
+ * @action: One of the ZONE_ACTION_*'s Close,Finish,Open, or Reset
+ * @all_zones: Flag to indicate if command should apply to all zones.
+ * @force_unit_access: Force command to media and update zone cache on success
+ *
+ * Used to issue report zones command to connected device
+ */
+struct bdev_zone_action {
+	__u64 zone_locator_lba;
+	__u32 action;
+	__u8  all_zones;
+	__u8  force_unit_access;
 } __packed;
 
 /**
@@ -134,7 +154,6 @@ struct bdev_zone_get_report {
  * @lba_start: lba where the zone starts.
  * @lba_wptr: lba of the current write pointer.
  * @reserved: padding
- *
  */
 struct bdev_zone_descriptor {
 	__u8 type;
@@ -179,4 +198,13 @@ struct bdev_zone_report_io {
 	} data;
 } __packed;
 
+/* continuing from uapi/linux/fs.h: */
+#define BLKREPORT	_IOWR(0x12, 130, struct bdev_zone_report_io)
+#define BLKZONEACTION	_IOW(0x12, 131, struct bdev_zone_action)
+
+#define ZONE_ACTION_CLOSE	0x01
+#define ZONE_ACTION_FINISH	0x02
+#define ZONE_ACTION_OPEN	0x03
+#define ZONE_ACTION_RESET	0x04
+
 #endif /* _UAPI_BLKZONED_API_H */
diff --git a/include/uapi/linux/fs.h b/include/uapi/linux/fs.h
index 3b00f7c..350fb3f2 100644
--- a/include/uapi/linux/fs.h
+++ b/include/uapi/linux/fs.h
@@ -222,6 +222,7 @@ struct fsxattr {
 #define BLKSECDISCARD _IO(0x12,125)
 #define BLKROTATIONAL _IO(0x12,126)
 #define BLKZEROOUT _IO(0x12,127)
+/* A jump here: See blkzoned_api.h, Reserving 130 and 131. */
 
 #define BMAP_IOCTL 1		/* obsolete - kept for compatibility */
 #define FIBMAP	   _IO(0x00,1)	/* bmap access */
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 1/2 RESEND] Add bio/request flags to issue ZBC/ZAC commands
  2016-08-24 20:24   ` [PATCH v8 1/2 RESEND] " Shaun Tancheff
@ 2016-08-26  2:31     ` Damien Le Moal
  2016-08-26  5:17       ` Shaun Tancheff
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Le Moal @ 2016-08-26  2:31 UTC (permalink / raw)
  To: Shaun Tancheff, linux-block, linux-scsi, linux-kernel
  Cc: Jens Axboe, Christoph Hellwig, James E . J . Bottomley,
	Martin K . Petersen, Hannes Reinecke, Josh Bingaman,
	David S . Miller, Geert Uytterhoeven, Andrew Morton,
	Greg Kroah-Hartman, Mauro Carvalho Chehab, Guenter Roeck,
	Ming Lei, Mike Christie, Keith Busch, Mike Snitzer,
	Johannes Thumshirn, Shaohua Li, Dan Williams, Sagi Grimberg,
	stephen hemminger, Jarkko Sakkinen, Alexandre Bounine, Asias He,
	Sabrina Dubroca, Mike Frysinger, Andrea Arcangeli, Linus Walleij,
	Jeff Layton, J . Bruce Fields, linux-f2fs-devel, linux-fsdevel,
	dm-devel, Jaegeuk Kim, Shaun Tancheff


Shaun,

On 8/25/16 05:24, Shaun Tancheff wrote:
> (RESENDING to include f2fs, fs-devel and dm-devel)
>
> Add op flags to access to zone information as well as open, close
> and reset zones:
>   - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
>   - REQ_OP_ZONE_OPEN - Explicitly open a zone for writing
>   - REQ_OP_ZONE_CLOSE - Explicitly close a zone
>   - REQ_OP_ZONE_FINISH - Explicitly finish a zone
>   - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone
>
> These op flags can be used to create bio's to control zoned devices
> through the block layer.

I still have a hard time seeing the need for the REQ_OP_ZONE_REPORT 
operation assuming that the device queue will hold a zone information 
cache, Hannes RB-tree or your array type, whichever.

Let's try to think simply here: if the disk user (and FS, a device 
mapper or an application doing raw disk accesses) wants to access the 
disk zone information, why would it need to issue a BIO when calling 
blkdev_lookup_zone would exactly give that information straight out of 
memory (so much faster) ? I thought hard about this, but cannot think of 
any value for the BIO-to-disk option. It seems to me to be equivalent to 
systematically doing a page cache read even if the page cache tells us 
that the page is up-to-date...

Moreover, issuing a report zone to the disk may return information that 
is in fact incorrect, as that would not take into account the eventual 
set of write requests that was dispatched but not yet processed by the 
disk (some zone write pointer may be reported with a value lower than 
what the zone cache maintains).

Dealing (and fixing) these inconsistencies would force an update of the 
report zone result using the information of the zone cache, which in 
itself sounds like a good justification of not doing a report zones in 
the first place.

I am fine with the other operations, and in fact having a BIO interface 
for them to send down to the SCSI layer is better than any other method. 
It will causes them to be seen in sd_init_command, which is the path 
taken for read and write commands too. So all zone cache information 
checking and updating can be done in that single place and serialized 
with a spinlock. Maintenance of the zone cache information becomes very 
easy.

Any divergence of the zone cache information with the actual state of 
the disk will likely cause an error (unaligned write or other). Having a 
specific report zone BIO option will not help the disk user recover from 
those. Hannes current implementation make sure that the information of 
the zone for the failed request is automatically updated. That is enough 
to maintain the zone cache information uptodate, and a zone information 
can be marked as "in update" for the user to notice and wait for the 
refreshed information.

The ioctl code for reporting zones does not need the specific request op 
code either. Again, blkdev_lookup_zone can provide zone information, and 
an emulation of the reporting options filtering is also trivial to 
implement on top of that, if really required (I do not think that is 
strongly needed though).

Without the report zone operation, your patch set size would 
significantly shrink and merging with Hannes work becomes very easy too.

Please let me know what you think. If we drop this, we can get a clean 
and full ZBC support patch set ready in no time at all.

Best regards.

-- 
Damien Le Moal, Ph.D.
Sr. Manager, System Software Group, HGST Research,
HGST, a Western Digital brand
Damien.LeMoal@hgst.com
(+81) 0466-98-3593 (ext. 513593)
1 kirihara-cho, Fujisawa,
Kanagawa, 252-0888 Japan
www.hgst.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH v8 1/2 RESEND] Add bio/request flags to issue ZBC/ZAC commands
  2016-08-26  2:31     ` Damien Le Moal
@ 2016-08-26  5:17       ` Shaun Tancheff
  0 siblings, 0 replies; 8+ messages in thread
From: Shaun Tancheff @ 2016-08-26  5:17 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Shaun Tancheff, linux-block, linux-scsi, LKML, Jens Axboe,
	Christoph Hellwig, James E . J . Bottomley, Martin K . Petersen,
	Hannes Reinecke, Josh Bingaman, David S . Miller,
	Geert Uytterhoeven, Andrew Morton, Greg Kroah-Hartman,
	Mauro Carvalho Chehab, Guenter Roeck, Ming Lei, Mike Christie,
	Keith Busch, Mike Snitzer, Johannes Thumshirn, Shaohua Li,
	Dan Williams, Sagi Grimberg, stephen hemminger, Jarkko Sakkinen,
	Alexandre Bounine, Asias He, Sabrina Dubroca, Mike Frysinger,
	Andrea Arcangeli, Linus Walleij, Jeff Layton, J . Bruce Fields,
	linux-f2fs-devel, linux-fsdevel, dm-devel, Jaegeuk Kim

On Thu, Aug 25, 2016 at 9:31 PM, Damien Le Moal <damien.lemoal@hgst.com> wrote:
>
> Shaun,
>
> On 8/25/16 05:24, Shaun Tancheff wrote:
>>
>> (RESENDING to include f2fs, fs-devel and dm-devel)
>>
>> Add op flags to access to zone information as well as open, close
>> and reset zones:
>>   - REQ_OP_ZONE_REPORT - Query zone information (Report zones)
>>   - REQ_OP_ZONE_OPEN - Explicitly open a zone for writing
>>   - REQ_OP_ZONE_CLOSE - Explicitly close a zone
>>   - REQ_OP_ZONE_FINISH - Explicitly finish a zone
>>   - REQ_OP_ZONE_RESET - Reset Write Pointer to start of zone
>>
>> These op flags can be used to create bio's to control zoned devices
>> through the block layer.
>
>
> I still have a hard time seeing the need for the REQ_OP_ZONE_REPORT
> operation assuming that the device queue will hold a zone information cache,
> Hannes RB-tree or your array type, whichever.
>
> Let's try to think simply here: if the disk user (and FS, a device mapper or
> an application doing raw disk accesses) wants to access the disk zone
> information, why would it need to issue a BIO when calling
> blkdev_lookup_zone would exactly give that information straight out of
> memory (so much faster) ? I thought hard about this, but cannot think of any
> value for the BIO-to-disk option. It seems to me to be equivalent to
> systematically doing a page cache read even if the page cache tells us that
> the page is up-to-date...

Firstly the BIO abstraction here gives a common interface to
getting the zone information and works even for embedded
systems that are not willing / convinced to enable
SCSI_ZBC + BLK_ZONED.

Secondly when SCSI_ZBC + BLK_ZONED are enabled it just
returns from the zone cache [as you can hopefully find
in the second half of this series]. I did add a 'force' option
but it's not intended to be used lightly.

Thirdly it is my belief that BIO abstraction is more easily
adapted to working with [and through] the device mapper
layer (s).

Today we both have the issue where if a file system
supports working with a ZBC device there can be no
device mapper stacked between the file system and
the actual zoned device. This is also true of our respective
device mapper targets.

It is my current belief that teaching the device mapper
layer to include REQ_OP_ZONE* operations is relatively
straight forward and can be done w/o affecting existing
targets that don't specifically need to operate on zones.
Something similar to the way flush is handled currently.
If the target doesn't ask to see zone operations the default
mapping rules apply.

Examples of why I would like to add REQ_OP_ZONE*
support to the device mapper:

I think it would be really neat if I could just to a quick
dm-linear and put big chunk of SSD in front of dm-zoned
or dm-zdm as it would be a nice way to boost performance.

Similarly it enable using dm-linear to stitch together enough
conventional space with a ZBC drive to see if Dave Chinner's
XFS proposal from a couple of years ago could work.

> Moreover, issuing a report zone to the disk may return information that is
> in fact incorrect, as that would not take into account the eventual set of
> write requests that was dispatched but not yet processed by the disk (some
> zone write pointer may be reported with a value lower than what the zone
> cache maintains).

Yes but issuing a zone report to media is not the expected path
when the zone cache is available. It is there to 'force' a re-sync
and it is intended that the user of the call knows that the force
is being applied and wants it to happen. Perhaps I should make
it two flags? One to force a reply form the device and second
flag to re-sync the zone cache with the result? There is one
piece of information that can only be retrieved by going to the
device and that is the 'non-seq resources' flag and it is only
used by Host Aware devices ... as far as I understand.

> Dealing (and fixing) these inconsistencies would force an update of the
> report zone result using the information of the zone cache, which in itself
> sounds like a good justification of not doing a report zones in the first
> place.

When report zones is just pulling from the zone cache it should
not be a problem. So the normal activity [when SCSI_ZBC +
BLK_ZONED are enabled] should not be introducing any
inconsistencies.

> I am fine with the other operations, and in fact having a BIO interface for
> them to send down to the SCSI layer is better than any other method. It will
> causes them to be seen in sd_init_command, which is the path taken for read
> and write commands too. So all zone cache information checking and updating
> can be done in that single place and serialized with a spinlock. Maintenance
> of the zone cache information becomes very easy.
>
> Any divergence of the zone cache information with the actual state of the
> disk will likely cause an error (unaligned write or other). Having a
> specific report zone BIO option will not help the disk user recover from
> those. Hannes current implementation make sure that the information of the
> zone for the failed request is automatically updated. That is enough to
> maintain the zone cache information uptodate, and a zone information can be
> marked as "in update" for the user to notice and wait for the refreshed
> information.

Which is why the zone cache is consider the default authority .. when it
is available.

Hopefully you will find this use case more compelling:

User issues:
   # sg_reset_wp --all /dev/sdz
User then proceeds to run:
   # mkfs.f2fs -m /dev/sdz

Now quite possibly the zone information is 'out of sync' because the user
used a ZBC command that we to catch in the zone cache.

Now if mkfs.f2fs does a reports zones with a force it will fix the
zone cache and everything goes nice and quick.

Since the output of report zones is the same format as the
ZBC spec [and therefore SG_IO] it should be a stright forward
change in mkfs.f2fs to call BLKREPORT instead of via SG_IO.

> The ioctl code for reporting zones does not need the specific request op
> code either. Again, blkdev_lookup_zone can provide zone information, and an
> emulation of the reporting options filtering is also trivial to implement on
> top of that, if really required (I do not think that is strongly needed
> though).
>
> Without the report zone operation, your patch set size would significantly
> shrink and merging with Hannes work becomes very easy too.
>
> Please let me know what you think. If we drop this, we can get a clean and
> full ZBC support patch set ready in no time at all.

-- 
Regards,
Shaun Tancheff

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2016-08-26  5:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-22  4:20 [PATCH v8 0/2] Block layer support ZAC/ZBC commands Shaun Tancheff
2016-08-22  4:20 ` [PATCH v8 1/2] Add bio/request flags to issue ZBC/ZAC commands Shaun Tancheff
2016-08-24 20:24   ` [PATCH v8 1/2 RESEND] " Shaun Tancheff
2016-08-26  2:31     ` Damien Le Moal
2016-08-26  5:17       ` Shaun Tancheff
2016-08-22  4:20 ` [PATCH v8 2/2] Add ioctl to issue ZBC/ZAC commands via block layer Shaun Tancheff
2016-08-24 20:25   ` [PATCH v8 2/2 RESEND] " Shaun Tancheff
2016-08-24 20:22 ` [PATCH v8 0/2 RESEND] Block layer support ZAC/ZBC commands Shaun Tancheff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).