linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/3] Fix dm-crypt zoned block device support
@ 2021-04-17  2:33 Damien Le Moal
  2021-04-17  2:33 ` [PATCH v2 1/3] dm: Introduce zone append support control Damien Le Moal
                   ` (3 more replies)
  0 siblings, 4 replies; 15+ messages in thread
From: Damien Le Moal @ 2021-04-17  2:33 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel
  Cc: Johannes Thumshirn, Shinichiro Kawasaki

Mike,

Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
of the zone to be written instead of the actual location sector to
write. The write location is determined by the device and returned to
the host upon completion of the operation.

This interface, while simple and efficient for writing into sequential
zones of a zoned block device, is incompatible with the use of sector
values to calculate a cypher block IV. All data written in a zone is
encrypted using an IV calculated from the first sectors of the zone,
but read operation will specify any sector within the zone, resulting
in an IV mismatch between encryption and decryption. Reads fail in that
case.

Using a single sector value (e.g. the zone start sector) for all read
and writes into a zone can solve this problem, but at the cost of
weakening the cypher chosen by the user. Emulating zone append using
regular writes would be another potential solution, but it is complex
and would add a lot of overhead.

Instead, to solve this problem, explicitly disable support for zone
append operations in dm-crypt if the target was setup using a cypher IV
mode using sector values. The null and random IV modes can still be used
with zone append operations. This lack of support for zone append is
exposed to the user by setting the dm-crypt target queue limit
max_zone_append_sectors to 0. This change is done in patches 1 and 2.

Patch 3 fixes zonefs to fall back to using regular write when
max_zone_append_sectors is 0 (Note: I can take this patch through the
zonefs tree. But since I have nothing else for an eventual rc8 and next
cycle, you can take it too. No chance of conflict).

Overall, these changes do not break user space:
1) There is no interface allowing a user to use zone append write
without a file system. So applications using directly a raw dm-crypt
device will continue working using regular write operations.
2) btrfs zoned support was added in 5.12. Anybody trying btrfs-zoned on
top of dm-crypt would have faced the read failures already. So there
are no existing deployments to preserve. Same for zonefs.

For file systems, using zone append with encryption will need to be
supported within the file system (e.g. fscrypt). In this case, cypher IV
calculation can rely for instance on file block offsets as these are
known before a zone append operation write these blocks to disk at
unknown locations.

Reviews and comments are very much welcome.

Changes from v1:
* Addressed Johannes comments by renaming the CRYPT_IV_NO_SECTORS flag
  to CRYPT_IV_ZONE_APPEND to avoid a double negation with !test_bit().
  This also clarifies that the flag is used solely to check zone append
  support.
* Removed btrfs patch (former patch 3) as David is taking that patch
  through the btrfs tree misc-next branch.
* Added reviewed-by, Fixes and Cc tags.

Damien Le Moal (3):
  dm: Introduce zone append support control
  dm crypt: Fix zoned block device support
  zonefs: fix synchronous write to sequential zone files

 drivers/md/dm-crypt.c         | 49 ++++++++++++++++++++++++++++-------
 drivers/md/dm-table.c         | 41 +++++++++++++++++++++++++++++
 fs/zonefs/super.c             | 16 +++++++++---
 fs/zonefs/zonefs.h            |  2 ++
 include/linux/device-mapper.h |  6 +++++
 5 files changed, 101 insertions(+), 13 deletions(-)

-- 
2.30.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/3] dm: Introduce zone append support control
  2021-04-17  2:33 [PATCH v2 0/3] Fix dm-crypt zoned block device support Damien Le Moal
@ 2021-04-17  2:33 ` Damien Le Moal
  2021-04-17  2:33 ` [PATCH v2 2/3] dm crypt: Fix zoned block device support Damien Le Moal
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2021-04-17  2:33 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel
  Cc: Johannes Thumshirn, Shinichiro Kawasaki

Add the boolean field zone_append_not_supported to the dm_target
structure to allow a target implementing a zoned block device to
explicitly opt out from zone append (REQ_OP_ZONE_APPEND) operations
support. When set to true by the target constructor, the target device
queue limit max_zone_append_sectors is set to 0 in
dm_table_set_restrictions() so that users of the target (e.g. file
systems) can detect that the device cannot process zone append
operations.

Detection for the target support of zone append is done similarly to
the detection for other device features such as secure erase, using a
helper function. For zone append, the function
dm_table_supports_zone_append() is defined if CONFIG_BLK_DEV_ZONED is
enabled.

Fixes: 8e225f04d2dd ("dm crypt: Enable zoned block device support")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 drivers/md/dm-table.c         | 41 +++++++++++++++++++++++++++++++++++
 include/linux/device-mapper.h |  6 +++++
 2 files changed, 47 insertions(+)

diff --git a/drivers/md/dm-table.c b/drivers/md/dm-table.c
index e5f0f1703c5d..9efd7a0ee27e 100644
--- a/drivers/md/dm-table.c
+++ b/drivers/md/dm-table.c
@@ -1999,6 +1999,37 @@ static int device_requires_stable_pages(struct dm_target *ti,
 	return blk_queue_stable_writes(q);
 }
 
+#ifdef CONFIG_BLK_DEV_ZONED
+static int device_not_zone_append_capable(struct dm_target *ti,
+					  struct dm_dev *dev, sector_t start,
+					  sector_t len, void *data)
+{
+	struct request_queue *q = bdev_get_queue(dev->bdev);
+
+	return !blk_queue_is_zoned(q) ||
+		!q->limits.max_zone_append_sectors;
+}
+
+static bool dm_table_supports_zone_append(struct dm_table *t)
+{
+	struct dm_target *ti;
+	unsigned int i;
+
+	for (i = 0; i < dm_table_get_num_targets(t); i++) {
+		ti = dm_table_get_target(t, i);
+
+		if (ti->zone_append_not_supported)
+			return false;
+
+		if (!ti->type->iterate_devices ||
+		    ti->type->iterate_devices(ti, device_not_zone_append_capable, NULL))
+			return false;
+	}
+
+	return true;
+}
+#endif
+
 void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 			       struct queue_limits *limits)
 {
@@ -2091,6 +2122,16 @@ void dm_table_set_restrictions(struct dm_table *t, struct request_queue *q,
 	if (blk_queue_is_zoned(q)) {
 		WARN_ON_ONCE(queue_is_mq(q));
 		q->nr_zones = blkdev_nr_zones(t->md->disk);
+
+		/*
+		 * All zoned devices support zone append by default. However,
+		 * some zoned targets (e.g. dm-crypt) cannot support this
+		 * operation. Check here if the target indicated the lack of
+		 * support for zone append and set max_zone_append_sectors to 0
+		 * in that case so that users (e.g. an FS) can detect this fact.
+		 */
+		if (!dm_table_supports_zone_append(t))
+			q->limits.max_zone_append_sectors = 0;
 	}
 #endif
 
diff --git a/include/linux/device-mapper.h b/include/linux/device-mapper.h
index 5c641f930caf..4da699add262 100644
--- a/include/linux/device-mapper.h
+++ b/include/linux/device-mapper.h
@@ -361,6 +361,12 @@ struct dm_target {
 	 * Set if we need to limit the number of in-flight bios when swapping.
 	 */
 	bool limit_swap_bios:1;
+
+	/*
+	 * Set if this target is a zoned device that cannot accept
+	 * zone append operations.
+	 */
+	bool zone_append_not_supported:1;
 };
 
 void *dm_per_bio_data(struct bio *bio, size_t data_size);
-- 
2.30.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/3] dm crypt: Fix zoned block device support
  2021-04-17  2:33 [PATCH v2 0/3] Fix dm-crypt zoned block device support Damien Le Moal
  2021-04-17  2:33 ` [PATCH v2 1/3] dm: Introduce zone append support control Damien Le Moal
@ 2021-04-17  2:33 ` Damien Le Moal
  2021-04-17 10:39   ` Johannes Thumshirn
  2021-04-18 11:00   ` [dm-devel] " Milan Broz
  2021-04-17  2:33 ` [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files Damien Le Moal
  2021-04-19 12:52 ` [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support Mikulas Patocka
  3 siblings, 2 replies; 15+ messages in thread
From: Damien Le Moal @ 2021-04-17  2:33 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel
  Cc: Johannes Thumshirn, Shinichiro Kawasaki

Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector of
the zone to be written instead of the actual location sector to write.
The write location is determined by the device and returned to the host
upon completion of the operation. This interface, while simple and
efficient for writing into sequential zones of a zoned block device, is
incompatible with the use of sector values to calculate a cypher block
IV. All data written in a zone end up using the same IV values
corresponding to the first sectors of the zone, but read operation will
specify any sector within the zone, resulting in an IV mismatch between
encryption and decryption.

Using a single sector value (e.g. the zone start sector) for all read
and writes into a zone can solve this problem, but at the cost of
weakening the cypher chosen by the user. Instead, to solve this
problem, explicitly disable support for zone append operations using
the zone_append_not_supported field of struct dm_target if the IV mode
used is sector-based, that is for all IVs modes except null and random.

The cypher flag CRYPT_IV_NO_SECTORS iis introduced to indicate that the
cypher does not use sector values. This flag is set in
crypt_ctr_ivmode() for the null and random IV modes and checked in
crypt_ctr() to set to true zone_append_not_supported if
CRYPT_IV_NO_SECTORS is not set for the chosen cypher.

Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: 8e225f04d2dd ("dm crypt: Enable zoned block device support")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
---
 drivers/md/dm-crypt.c | 49 +++++++++++++++++++++++++++++++++++--------
 1 file changed, 40 insertions(+), 9 deletions(-)

diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index b0ab080f2567..6ef35bb29ce5 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -137,6 +137,7 @@ enum cipher_flags {
 	CRYPT_MODE_INTEGRITY_AEAD,	/* Use authenticated mode for cipher */
 	CRYPT_IV_LARGE_SECTORS,		/* Calculate IV from sector_size, not 512B sectors */
 	CRYPT_ENCRYPT_PREPROCESS,	/* Must preprocess data for encryption (elephant) */
+	CRYPT_IV_ZONE_APPEND,		/* IV mode supports zone append operations */
 };
 
 /*
@@ -2750,9 +2751,10 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 	}
 
 	/* Choose ivmode, see comments at iv code. */
-	if (ivmode == NULL)
+	if (ivmode == NULL) {
 		cc->iv_gen_ops = NULL;
-	else if (strcmp(ivmode, "plain") == 0)
+		set_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags);
+	} else if (strcmp(ivmode, "plain") == 0)
 		cc->iv_gen_ops = &crypt_iv_plain_ops;
 	else if (strcmp(ivmode, "plain64") == 0)
 		cc->iv_gen_ops = &crypt_iv_plain64_ops;
@@ -2762,9 +2764,10 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 		cc->iv_gen_ops = &crypt_iv_essiv_ops;
 	else if (strcmp(ivmode, "benbi") == 0)
 		cc->iv_gen_ops = &crypt_iv_benbi_ops;
-	else if (strcmp(ivmode, "null") == 0)
+	else if (strcmp(ivmode, "null") == 0) {
 		cc->iv_gen_ops = &crypt_iv_null_ops;
-	else if (strcmp(ivmode, "eboiv") == 0)
+		set_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags);
+	} else if (strcmp(ivmode, "eboiv") == 0)
 		cc->iv_gen_ops = &crypt_iv_eboiv_ops;
 	else if (strcmp(ivmode, "elephant") == 0) {
 		cc->iv_gen_ops = &crypt_iv_elephant_ops;
@@ -2791,6 +2794,7 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
 		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
 	} else if (strcmp(ivmode, "random") == 0) {
 		cc->iv_gen_ops = &crypt_iv_random_ops;
+		set_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags);
 		/* Need storage space in integrity fields. */
 		cc->integrity_iv_size = cc->iv_size;
 	} else {
@@ -3281,14 +3285,32 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
 	}
 	cc->start = tmpll;
 
-	/*
-	 * For zoned block devices, we need to preserve the issuer write
-	 * ordering. To do so, disable write workqueues and force inline
-	 * encryption completion.
-	 */
 	if (bdev_is_zoned(cc->dev->bdev)) {
+		/*
+		 * For zoned block devices, we need to preserve the issuer write
+		 * ordering. To do so, disable write workqueues and force inline
+		 * encryption completion.
+		 */
 		set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags);
 		set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags);
+
+		/*
+		 * All zone append writes to a zone of a zoned block device will
+		 * have the same BIO sector (the start of the zone). When the
+		 * cypher IV mode uses sector values, all data targeting a
+		 * zone will be encrypted using the first sector numbers of the
+		 * zone. This will not result in write errors but will
+		 * cause most reads to fail as reads will use the sector values
+		 * for the actual data locations, resulting in IV mismatch.
+		 * To avoid this problem, allow zone append operations only when
+		 * the selected IV mode indicated that zone append operations
+		 * are supported, that is, IV modes that do not use sector
+		 * values (null and random IVs).
+		 */
+		if (!test_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags)) {
+			DMWARN("Zone append is not supported with the selected IV mode");
+			ti->zone_append_not_supported = true;
+		}
 	}
 
 	if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
@@ -3356,6 +3378,15 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
 	struct dm_crypt_io *io;
 	struct crypt_config *cc = ti->private;
 
+	/*
+	 * For zoned targets, we should not see any zone append operation if
+	 * the cypher IV mode selected does not support them. In the unlikely
+	 * case we do see one such operation, warn and fail the request.
+	 */
+	if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND &&
+			 !test_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags)))
+		return DM_MAPIO_KILL;
+
 	/*
 	 * If bio is REQ_PREFLUSH or REQ_OP_DISCARD, just bypass crypt queues.
 	 * - for REQ_PREFLUSH device-mapper core ensures that no IO is in-flight
-- 
2.30.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files
  2021-04-17  2:33 [PATCH v2 0/3] Fix dm-crypt zoned block device support Damien Le Moal
  2021-04-17  2:33 ` [PATCH v2 1/3] dm: Introduce zone append support control Damien Le Moal
  2021-04-17  2:33 ` [PATCH v2 2/3] dm crypt: Fix zoned block device support Damien Le Moal
@ 2021-04-17  2:33 ` Damien Le Moal
  2021-04-19  6:45   ` Christoph Hellwig
  2021-04-19 12:52 ` [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support Mikulas Patocka
  3 siblings, 1 reply; 15+ messages in thread
From: Damien Le Moal @ 2021-04-17  2:33 UTC (permalink / raw)
  To: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel
  Cc: Johannes Thumshirn, Shinichiro Kawasaki

Synchronous writes to sequential zone files cannot use zone append
operations if the underlying zoned device queue limit
max_zone_append_sectors is 0, indicating that the device does not
support this operation. In this case, fall back to using regular write
operations.

Fixes: 02ef12a663c7 ("zonefs: use REQ_OP_ZONE_APPEND for sync DIO")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
---
 fs/zonefs/super.c  | 16 ++++++++++++----
 fs/zonefs/zonefs.h |  2 ++
 2 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/fs/zonefs/super.c b/fs/zonefs/super.c
index 049e36c69ed7..b97566b9dff7 100644
--- a/fs/zonefs/super.c
+++ b/fs/zonefs/super.c
@@ -689,14 +689,15 @@ static ssize_t zonefs_file_dio_append(struct kiocb *iocb, struct iov_iter *from)
 {
 	struct inode *inode = file_inode(iocb->ki_filp);
 	struct zonefs_inode_info *zi = ZONEFS_I(inode);
-	struct block_device *bdev = inode->i_sb->s_bdev;
-	unsigned int max;
+	struct super_block *sb = inode->i_sb;
+	struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
+	struct block_device *bdev = sb->s_bdev;
+	sector_t max = sbi->s_max_zone_append_sectors;
 	struct bio *bio;
 	ssize_t size;
 	int nr_pages;
 	ssize_t ret;
 
-	max = queue_max_zone_append_sectors(bdev_get_queue(bdev));
 	max = ALIGN_DOWN(max << SECTOR_SHIFT, inode->i_sb->s_blocksize);
 	iov_iter_truncate(from, max);
 
@@ -853,6 +854,8 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 
 	/* Enforce sequential writes (append only) in sequential zones */
 	if (zi->i_ztype == ZONEFS_ZTYPE_SEQ) {
+		struct zonefs_sb_info *sbi = ZONEFS_SB(sb);
+
 		mutex_lock(&zi->i_truncate_mutex);
 		if (iocb->ki_pos != zi->i_wpoffset) {
 			mutex_unlock(&zi->i_truncate_mutex);
@@ -860,7 +863,7 @@ static ssize_t zonefs_file_dio_write(struct kiocb *iocb, struct iov_iter *from)
 			goto inode_unlock;
 		}
 		mutex_unlock(&zi->i_truncate_mutex);
-		append = sync;
+		append = sync && sbi->s_max_zone_append_sectors;
 	}
 
 	if (append)
@@ -1683,6 +1686,11 @@ static int zonefs_fill_super(struct super_block *sb, void *data, int silent)
 		sbi->s_mount_opts &= ~ZONEFS_MNTOPT_EXPLICIT_OPEN;
 	}
 
+	sbi->s_max_zone_append_sectors =
+		queue_max_zone_append_sectors(bdev_get_queue(sb->s_bdev));
+	if (!sbi->s_max_zone_append_sectors)
+		zonefs_info(sb, "Zone append is not supported: falling back to using regular writes\n");
+
 	ret = zonefs_read_super(sb);
 	if (ret)
 		return ret;
diff --git a/fs/zonefs/zonefs.h b/fs/zonefs/zonefs.h
index 51141907097c..2b8c3b1a32ea 100644
--- a/fs/zonefs/zonefs.h
+++ b/fs/zonefs/zonefs.h
@@ -185,6 +185,8 @@ struct zonefs_sb_info {
 
 	unsigned int		s_max_open_zones;
 	atomic_t		s_open_zones;
+
+	sector_t		s_max_zone_append_sectors;
 };
 
 static inline struct zonefs_sb_info *ZONEFS_SB(struct super_block *sb)
-- 
2.30.2


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/3] dm crypt: Fix zoned block device support
  2021-04-17  2:33 ` [PATCH v2 2/3] dm crypt: Fix zoned block device support Damien Le Moal
@ 2021-04-17 10:39   ` Johannes Thumshirn
  2021-04-18 11:00   ` [dm-devel] " Milan Broz
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2021-04-17 10:39 UTC (permalink / raw)
  To: Damien Le Moal, dm-devel, Mike Snitzer, linux-block, Jens Axboe,
	linux-nvme, Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel
  Cc: Shinichiro Kawasaki

Looks good,
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH v2 2/3] dm crypt: Fix zoned block device support
  2021-04-17  2:33 ` [PATCH v2 2/3] dm crypt: Fix zoned block device support Damien Le Moal
  2021-04-17 10:39   ` Johannes Thumshirn
@ 2021-04-18 11:00   ` Milan Broz
  1 sibling, 0 replies; 15+ messages in thread
From: Milan Broz @ 2021-04-18 11:00 UTC (permalink / raw)
  To: Damien Le Moal, dm-devel, Mike Snitzer, linux-block, Jens Axboe,
	linux-nvme, Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel
  Cc: Johannes Thumshirn, Shinichiro Kawasaki, Ondrej Kozina

On 17/04/2021 04:33, Damien Le Moal wrote:
> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector of
> the zone to be written instead of the actual location sector to write.
> The write location is determined by the device and returned to the host
> upon completion of the operation. This interface, while simple and
> efficient for writing into sequential zones of a zoned block device, is
> incompatible with the use of sector values to calculate a cypher block
> IV. All data written in a zone end up using the same IV values
> corresponding to the first sectors of the zone, but read operation will
> specify any sector within the zone, resulting in an IV mismatch between
> encryption and decryption.
> 
> Using a single sector value (e.g. the zone start sector) for all read
> and writes into a zone can solve this problem, but at the cost of
> weakening the cypher chosen by the user.

Reusing IV breaks the basic principle in disk encryption that block (sector)
must not be decrypted to correct plaintext if it is relocated.
(IOW, you must not use the same IV with the same key for another sector location.)

Please note that dm-crypt allows being configured to use such insecure
ciphers/IV to provide compatibility with older/foreign encryption systems only.

That said, IV is not the only place in dm-crypt that depends
on position (sector offset).

Dm-crypt allows using AEAD (authenticated encryption).
For now, it requires stacking over dm-integrity, but the idea was to use devices
that can store per-sector metadata natively.

In AEAD, offset (sector) is part of authenticated data (so decryption of relocated
sector fails). For AEAD, we use random IV. So your patch is not complete.

I think the proper solution would be to disable zone append for the dm-crypt
target completely, just set a flag for the whole target.
(Or somehow emulate that global sector offset).

Please do not introduce such complexity as you tried in this patchset.
I am sure this will backfire on us later.

If you want to use encryption with zoned devices properly,
it must be designed for it. FDE is not.

In this case, probably btrfs fs encryption layer is the proper place that allows
configuring different keys or tweak zones IV to allow block-based encryption.

Allowing to stack over dm-crypt in the meantime is OK, but disable these
incompatible zoning features completely, please.

We often use online reencryption - which can switch arbitrary encryption
parameters for active devices expecting sector offset is invariant.
I am not sure if this can even work with zone append and provide expected security.

Also, this brings the question if dm-integrity has the same issue...

Thanks,
Milan


 Instead, to solve this
> problem, explicitly disable support for zone append operations using
> the zone_append_not_supported field of struct dm_target if the IV mode
> used is sector-based, that is for all IVs modes except null and random.
> 
> The cypher flag CRYPT_IV_NO_SECTORS iis introduced to indicate that the
> cypher does not use sector values. This flag is set in
> crypt_ctr_ivmode() for the null and random IV modes and checked in
> crypt_ctr() to set to true zone_append_not_supported if
> CRYPT_IV_NO_SECTORS is not set for the chosen cypher.
> 
> Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
> Fixes: 8e225f04d2dd ("dm crypt: Enable zoned block device support")
> Cc: stable@vger.kernel.org
> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com>
> ---
>  drivers/md/dm-crypt.c | 49 +++++++++++++++++++++++++++++++++++--------
>  1 file changed, 40 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
> index b0ab080f2567..6ef35bb29ce5 100644
> --- a/drivers/md/dm-crypt.c
> +++ b/drivers/md/dm-crypt.c
> @@ -137,6 +137,7 @@ enum cipher_flags {
>  	CRYPT_MODE_INTEGRITY_AEAD,	/* Use authenticated mode for cipher */
>  	CRYPT_IV_LARGE_SECTORS,		/* Calculate IV from sector_size, not 512B sectors */
>  	CRYPT_ENCRYPT_PREPROCESS,	/* Must preprocess data for encryption (elephant) */
> +	CRYPT_IV_ZONE_APPEND,		/* IV mode supports zone append operations */
>  };
>  
>  /*
> @@ -2750,9 +2751,10 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
>  	}
>  
>  	/* Choose ivmode, see comments at iv code. */
> -	if (ivmode == NULL)
> +	if (ivmode == NULL) {
>  		cc->iv_gen_ops = NULL;
> -	else if (strcmp(ivmode, "plain") == 0)
> +		set_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags);
> +	} else if (strcmp(ivmode, "plain") == 0)
>  		cc->iv_gen_ops = &crypt_iv_plain_ops;
>  	else if (strcmp(ivmode, "plain64") == 0)
>  		cc->iv_gen_ops = &crypt_iv_plain64_ops;
> @@ -2762,9 +2764,10 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
>  		cc->iv_gen_ops = &crypt_iv_essiv_ops;
>  	else if (strcmp(ivmode, "benbi") == 0)
>  		cc->iv_gen_ops = &crypt_iv_benbi_ops;
> -	else if (strcmp(ivmode, "null") == 0)
> +	else if (strcmp(ivmode, "null") == 0) {
>  		cc->iv_gen_ops = &crypt_iv_null_ops;
> -	else if (strcmp(ivmode, "eboiv") == 0)
> +		set_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags);
> +	} else if (strcmp(ivmode, "eboiv") == 0)
>  		cc->iv_gen_ops = &crypt_iv_eboiv_ops;
>  	else if (strcmp(ivmode, "elephant") == 0) {
>  		cc->iv_gen_ops = &crypt_iv_elephant_ops;
> @@ -2791,6 +2794,7 @@ static int crypt_ctr_ivmode(struct dm_target *ti, const char *ivmode)
>  		cc->key_extra_size = cc->iv_size + TCW_WHITENING_SIZE;
>  	} else if (strcmp(ivmode, "random") == 0) {
>  		cc->iv_gen_ops = &crypt_iv_random_ops;
> +		set_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags);
>  		/* Need storage space in integrity fields. */
>  		cc->integrity_iv_size = cc->iv_size;
>  	} else {
> @@ -3281,14 +3285,32 @@ static int crypt_ctr(struct dm_target *ti, unsigned int argc, char **argv)
>  	}
>  	cc->start = tmpll;
>  
> -	/*
> -	 * For zoned block devices, we need to preserve the issuer write
> -	 * ordering. To do so, disable write workqueues and force inline
> -	 * encryption completion.
> -	 */
>  	if (bdev_is_zoned(cc->dev->bdev)) {
> +		/*
> +		 * For zoned block devices, we need to preserve the issuer write
> +		 * ordering. To do so, disable write workqueues and force inline
> +		 * encryption completion.
> +		 */
>  		set_bit(DM_CRYPT_NO_WRITE_WORKQUEUE, &cc->flags);
>  		set_bit(DM_CRYPT_WRITE_INLINE, &cc->flags);
> +
> +		/*
> +		 * All zone append writes to a zone of a zoned block device will
> +		 * have the same BIO sector (the start of the zone). When the
> +		 * cypher IV mode uses sector values, all data targeting a
> +		 * zone will be encrypted using the first sector numbers of the
> +		 * zone. This will not result in write errors but will
> +		 * cause most reads to fail as reads will use the sector values
> +		 * for the actual data locations, resulting in IV mismatch.
> +		 * To avoid this problem, allow zone append operations only when
> +		 * the selected IV mode indicated that zone append operations
> +		 * are supported, that is, IV modes that do not use sector
> +		 * values (null and random IVs).
> +		 */
> +		if (!test_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags)) {
> +			DMWARN("Zone append is not supported with the selected IV mode");
> +			ti->zone_append_not_supported = true;
> +		}
>  	}
>  
>  	if (crypt_integrity_aead(cc) || cc->integrity_iv_size) {
> @@ -3356,6 +3378,15 @@ static int crypt_map(struct dm_target *ti, struct bio *bio)
>  	struct dm_crypt_io *io;
>  	struct crypt_config *cc = ti->private;
>  
> +	/*
> +	 * For zoned targets, we should not see any zone append operation if
> +	 * the cypher IV mode selected does not support them. In the unlikely
> +	 * case we do see one such operation, warn and fail the request.
> +	 */
> +	if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND &&
> +			 !test_bit(CRYPT_IV_ZONE_APPEND, &cc->cipher_flags)))
> +		return DM_MAPIO_KILL;
> +
>  	/*
>  	 * If bio is REQ_PREFLUSH or REQ_OP_DISCARD, just bypass crypt queues.
>  	 * - for REQ_PREFLUSH device-mapper core ensures that no IO is in-flight
> 


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files
  2021-04-17  2:33 ` [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files Damien Le Moal
@ 2021-04-19  6:45   ` Christoph Hellwig
  2021-04-19  7:08     ` Damien Le Moal
  2021-04-20  1:20     ` Douglas Gilbert
  0 siblings, 2 replies; 15+ messages in thread
From: Christoph Hellwig @ 2021-04-19  6:45 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel, Johannes Thumshirn, Shinichiro Kawasaki

On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:
> Synchronous writes to sequential zone files cannot use zone append
> operations if the underlying zoned device queue limit
> max_zone_append_sectors is 0, indicating that the device does not
> support this operation. In this case, fall back to using regular write
> operations.

Zone append is a mandatory feature of the zoned device API.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files
  2021-04-19  6:45   ` Christoph Hellwig
@ 2021-04-19  7:08     ` Damien Le Moal
  2021-04-19  7:10       ` Christoph Hellwig
  2021-04-20  1:20     ` Douglas Gilbert
  1 sibling, 1 reply; 15+ messages in thread
From: Damien Le Moal @ 2021-04-19  7:08 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	linux-scsi, Martin K . Petersen, linux-fsdevel,
	Johannes Thumshirn, Shinichiro Kawasaki

On 2021/04/19 15:45, Christoph Hellwig wrote:
> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:
>> Synchronous writes to sequential zone files cannot use zone append
>> operations if the underlying zoned device queue limit
>> max_zone_append_sectors is 0, indicating that the device does not
>> support this operation. In this case, fall back to using regular write
>> operations.
> 
> Zone append is a mandatory feature of the zoned device API.

Yes, I am well aware of that. All physical zoned devices and null blk do support
zone append, but the logical device created by dm-crypt is out. And we cannot
simply disable zone support in dm-crypt as there are use cases out there in the
field that I am aware of, in SMR space.

So this series is a compromise: preserve dm-crypt zone support for SMR (no one
uses the zone append emulation yet, as far as I know) by disabling zone append.

For zonefs, we can:
1) refuse to mount if ZA is disabled, same as btrfs
2) Do as I did in the patch, fallback to regular writes since that is easy to do
(zonefs file size tracks the WP position already).

I chose option (2) to allow for SMR+dm-crypt to still work with zonefs.


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files
  2021-04-19  7:08     ` Damien Le Moal
@ 2021-04-19  7:10       ` Christoph Hellwig
  0 siblings, 0 replies; 15+ messages in thread
From: Christoph Hellwig @ 2021-04-19  7:10 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: Christoph Hellwig, dm-devel, Mike Snitzer, linux-block,
	Jens Axboe, linux-nvme, linux-scsi, Martin K . Petersen,
	linux-fsdevel, Johannes Thumshirn, Shinichiro Kawasaki

On Mon, Apr 19, 2021 at 07:08:46AM +0000, Damien Le Moal wrote:
> 1) refuse to mount if ZA is disabled, same as btrfs

Yes, please do that.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support
  2021-04-17  2:33 [PATCH v2 0/3] Fix dm-crypt zoned block device support Damien Le Moal
                   ` (2 preceding siblings ...)
  2021-04-17  2:33 ` [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files Damien Le Moal
@ 2021-04-19 12:52 ` Mikulas Patocka
  2021-04-19 13:02   ` Damien Le Moal
  3 siblings, 1 reply; 15+ messages in thread
From: Mikulas Patocka @ 2021-04-19 12:52 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel, Shinichiro Kawasaki, Johannes Thumshirn



On Sat, 17 Apr 2021, Damien Le Moal wrote:

> Mike,
> 
> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
> of the zone to be written instead of the actual location sector to
> write. The write location is determined by the device and returned to
> the host upon completion of the operation.

I'd like to ask what's the reason for this semantics? Why can't users of 
the zoned device supply real sector numbers?

> This interface, while simple and efficient for writing into sequential
> zones of a zoned block device, is incompatible with the use of sector
> values to calculate a cypher block IV. All data written in a zone is
> encrypted using an IV calculated from the first sectors of the zone,
> but read operation will specify any sector within the zone, resulting
> in an IV mismatch between encryption and decryption. Reads fail in that
> case.

I would say that it is incompatible with all dm targets - even the linear 
target is changing the sector number and so it may redirect the write 
outside of the range specified in dm-table and cause corruption.

Instead of complicating device mapper with imperfect support, I would just 
disable REQ_OP_ZONE_APPEND on device mapper at all.

Mikulas


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support
  2021-04-19 12:52 ` [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support Mikulas Patocka
@ 2021-04-19 13:02   ` Damien Le Moal
  2021-04-19 13:55     ` Mikulas Patocka
  0 siblings, 1 reply; 15+ messages in thread
From: Damien Le Moal @ 2021-04-19 13:02 UTC (permalink / raw)
  To: Mikulas Patocka
  Cc: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel, Shinichiro Kawasaki, Johannes Thumshirn

On 2021/04/19 21:52, Mikulas Patocka wrote:
> 
> 
> On Sat, 17 Apr 2021, Damien Le Moal wrote:
> 
>> Mike,
>>
>> Zone append BIOs (REQ_OP_ZONE_APPEND) always specify the start sector
>> of the zone to be written instead of the actual location sector to
>> write. The write location is determined by the device and returned to
>> the host upon completion of the operation.
> 
> I'd like to ask what's the reason for this semantics? Why can't users of 
> the zoned device supply real sector numbers?

They can, if they use regular write commands :)

Zone append was designed to address sequential write ordering difficulties on
the host. Unlike regular writes which must be delivered to the device in
sequential order, zone append commands can be sent to the device in any order.
The device will process the write at the WP position when the command is
executed and return the first sector written. This command makes it easy on the
host in multi-queue environment and avoids the need for serializing commands &
locking zones for writing. So very efficient performance.

>> This interface, while simple and efficient for writing into sequential
>> zones of a zoned block device, is incompatible with the use of sector
>> values to calculate a cypher block IV. All data written in a zone is
>> encrypted using an IV calculated from the first sectors of the zone,
>> but read operation will specify any sector within the zone, resulting
>> in an IV mismatch between encryption and decryption. Reads fail in that
>> case.
> 
> I would say that it is incompatible with all dm targets - even the linear 
> target is changing the sector number and so it may redirect the write 
> outside of the range specified in dm-table and cause corruption.

DM remapping of BIO sectors is zone compatible because target entries must be
zone aligned. In the case of zone append, the BIO sector always point to the
start sector of the target zone. DM sector remapping will remap that to another
zone start as all zones are the same size. No issue here. We extensively use
dm-linear for various test environment to reduce the size of the device tested
(to speed up tests). I am confident there are no problems there.

> Instead of complicating device mapper with imperfect support, I would just 
> disable REQ_OP_ZONE_APPEND on device mapper at all.

That was my initial approach, but for dm-crypt only since other targets that
support zoned devices are fine. However, this breaks zoned block device
requirement that zone append be supported so that users are presented with a
uniform interface for different devices. So while simple to do, disabling zone
append is far from ideal.

> 
> Mikulas
> 
> 


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support
  2021-04-19 13:02   ` Damien Le Moal
@ 2021-04-19 13:55     ` Mikulas Patocka
  2021-04-19 14:31       ` Milan Broz
  0 siblings, 1 reply; 15+ messages in thread
From: Mikulas Patocka @ 2021-04-19 13:55 UTC (permalink / raw)
  To: Damien Le Moal
  Cc: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	Christoph Hellwig, linux-scsi, Martin K . Petersen,
	linux-fsdevel, Shinichiro Kawasaki, Johannes Thumshirn,
	Milan Broz



On Mon, 19 Apr 2021, Damien Le Moal wrote:

> > I would say that it is incompatible with all dm targets - even the linear 
> > target is changing the sector number and so it may redirect the write 
> > outside of the range specified in dm-table and cause corruption.
> 
> DM remapping of BIO sectors is zone compatible because target entries must be
> zone aligned. In the case of zone append, the BIO sector always point to the
> start sector of the target zone. DM sector remapping will remap that to another
> zone start as all zones are the same size. No issue here. We extensively use
> dm-linear for various test environment to reduce the size of the device tested
> (to speed up tests). I am confident there are no problems there.
> 
> > Instead of complicating device mapper with imperfect support, I would just 
> > disable REQ_OP_ZONE_APPEND on device mapper at all.
> 
> That was my initial approach, but for dm-crypt only since other targets that
> support zoned devices are fine. However, this breaks zoned block device
> requirement that zone append be supported so that users are presented with a
> uniform interface for different devices. So while simple to do, disabling zone
> append is far from ideal.

So, we could enable it for the linear target and disable for all other 
targets?

I talked with Milan about it and he doesn't want to add more bloat to the 
crypt target. I agree with him.

Mikulas


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support
  2021-04-19 13:55     ` Mikulas Patocka
@ 2021-04-19 14:31       ` Milan Broz
  0 siblings, 0 replies; 15+ messages in thread
From: Milan Broz @ 2021-04-19 14:31 UTC (permalink / raw)
  To: Mikulas Patocka, Damien Le Moal
  Cc: Jens Axboe, linux-scsi, Mike Snitzer, Johannes Thumshirn,
	linux-nvme, linux-block, Shinichiro Kawasaki, dm-devel,
	Martin K . Petersen, linux-fsdevel, Christoph Hellwig

On 19/04/2021 15:55, Mikulas Patocka wrote:
> 
> 
> On Mon, 19 Apr 2021, Damien Le Moal wrote:
> 
>>> I would say that it is incompatible with all dm targets - even the linear 
>>> target is changing the sector number and so it may redirect the write 
>>> outside of the range specified in dm-table and cause corruption.
>>
>> DM remapping of BIO sectors is zone compatible because target entries must be
>> zone aligned. In the case of zone append, the BIO sector always point to the
>> start sector of the target zone. DM sector remapping will remap that to another
>> zone start as all zones are the same size. No issue here. We extensively use
>> dm-linear for various test environment to reduce the size of the device tested
>> (to speed up tests). I am confident there are no problems there.
>>
>>> Instead of complicating device mapper with imperfect support, I would just 
>>> disable REQ_OP_ZONE_APPEND on device mapper at all.
>>
>> That was my initial approach, but for dm-crypt only since other targets that
>> support zoned devices are fine. However, this breaks zoned block device
>> requirement that zone append be supported so that users are presented with a
>> uniform interface for different devices. So while simple to do, disabling zone
>> append is far from ideal.
> 
> So, we could enable it for the linear target and disable for all other 
> targets?
> 
> I talked with Milan about it and he doesn't want to add more bloat to the 
> crypt target. I agree with him.

This is all fine even for dm-crypt IF the tweaking is unique for the sector position
(it can be something just derived from the sector offset in principle).

For FDE, we must never allow writing sectors to different positions with the same
tweak (IV) and key - there are real attacks based on this issue.

So zones can do any recalculation and reshuffling it wants if sector tweak
in dm-crypt is unique.
(Another solution would be to use different keys for different areas, but that
is not possible with dm-crypt or FDE in general, but fs encryption can do that.)

If you want dm-crypt to support zones properly, there is a need for emulation
of the real sector offset - because that is what IV expects now.

And I think such emulation should be in DM core, not in dm-crypt itself, because other
targets can need the same functionality (I guess that dm-integrity journal
has a problem with that already, Mikulas will know more).

For online reencryption we also use multiple targets in the table that dynamically moves
(linear combined with dm-crypt), so dm-crypt must support all commands as
dm-linear to make this work.

I hope I understand the problem correctly; all I want is to so avoid patching
the wrong place (dmcrypt crypto) because that problem will appear elsewhere later.
Also for security it would be nice to not add exceptions to encryption
code - it is always recipe for disaster.

Milan

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files
  2021-04-19  6:45   ` Christoph Hellwig
  2021-04-19  7:08     ` Damien Le Moal
@ 2021-04-20  1:20     ` Douglas Gilbert
  2021-04-20  1:35       ` Damien Le Moal
  1 sibling, 1 reply; 15+ messages in thread
From: Douglas Gilbert @ 2021-04-20  1:20 UTC (permalink / raw)
  To: Christoph Hellwig, Damien Le Moal
  Cc: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	linux-scsi, Martin K . Petersen, linux-fsdevel,
	Johannes Thumshirn, Shinichiro Kawasaki

On 2021-04-19 2:45 a.m., Christoph Hellwig wrote:
> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:
>> Synchronous writes to sequential zone files cannot use zone append
>> operations if the underlying zoned device queue limit
>> max_zone_append_sectors is 0, indicating that the device does not
>> support this operation. In this case, fall back to using regular write
>> operations.
> 
> Zone append is a mandatory feature of the zoned device API.

So a hack required for ZNS and not needed by ZBC and ZAC becomes
a "mandatory feature" in a Linux API. Like many hacks, that one might
come back to bite you :-)

Doug Gilbert


_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files
  2021-04-20  1:20     ` Douglas Gilbert
@ 2021-04-20  1:35       ` Damien Le Moal
  0 siblings, 0 replies; 15+ messages in thread
From: Damien Le Moal @ 2021-04-20  1:35 UTC (permalink / raw)
  To: dgilbert, Christoph Hellwig
  Cc: dm-devel, Mike Snitzer, linux-block, Jens Axboe, linux-nvme,
	linux-scsi, Martin K . Petersen, linux-fsdevel,
	Johannes Thumshirn, Shinichiro Kawasaki

On 2021/04/20 10:20, Douglas Gilbert wrote:
> On 2021-04-19 2:45 a.m., Christoph Hellwig wrote:
>> On Sat, Apr 17, 2021 at 11:33:23AM +0900, Damien Le Moal wrote:
>>> Synchronous writes to sequential zone files cannot use zone append
>>> operations if the underlying zoned device queue limit
>>> max_zone_append_sectors is 0, indicating that the device does not
>>> support this operation. In this case, fall back to using regular write
>>> operations.
>>
>> Zone append is a mandatory feature of the zoned device API.
> 
> So a hack required for ZNS and not needed by ZBC and ZAC becomes
> a "mandatory feature" in a Linux API. Like many hacks, that one might
> come back to bite you :-)

Zone append is not a hack in ZNS. It is a write interface that fits very well
with the multi-queue nature of NVMe. The "hack" is the emulation in scsi.

We decided on having this mandatory for zoned devices (all types) to make sure
that file systems do not have to implement different IO paths for sequential
writing to zones. Zone append does simplify a lot of things and allows to get
the best performance from ZNS drives. Zone write locking/serialization of writes
per zones using regular writes is much harder to implement, make a mess of the
file system code, and would kill write performance on ZNS.


-- 
Damien Le Moal
Western Digital Research

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2021-04-20  1:35 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-17  2:33 [PATCH v2 0/3] Fix dm-crypt zoned block device support Damien Le Moal
2021-04-17  2:33 ` [PATCH v2 1/3] dm: Introduce zone append support control Damien Le Moal
2021-04-17  2:33 ` [PATCH v2 2/3] dm crypt: Fix zoned block device support Damien Le Moal
2021-04-17 10:39   ` Johannes Thumshirn
2021-04-18 11:00   ` [dm-devel] " Milan Broz
2021-04-17  2:33 ` [PATCH v2 3/3] zonefs: fix synchronous write to sequential zone files Damien Le Moal
2021-04-19  6:45   ` Christoph Hellwig
2021-04-19  7:08     ` Damien Le Moal
2021-04-19  7:10       ` Christoph Hellwig
2021-04-20  1:20     ` Douglas Gilbert
2021-04-20  1:35       ` Damien Le Moal
2021-04-19 12:52 ` [dm-devel] [PATCH v2 0/3] Fix dm-crypt zoned block device support Mikulas Patocka
2021-04-19 13:02   ` Damien Le Moal
2021-04-19 13:55     ` Mikulas Patocka
2021-04-19 14:31       ` Milan Broz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).