linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv2 0/7] 64-bit data integrity field support
@ 2022-02-01 19:01 Keith Busch
  2022-02-01 19:01 ` [PATCHv2 1/7] block: support pi with extended metadata Keith Busch
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

The NVM Express protocol added enhancements to the data integrity field
formats beyond the T10 defined protection information. A detailed
description of the new formats can be found in the NVMe's NVM Command
Set Specification, section 5.2, available at:

  https://nvmexpress.org/wp-content/uploads/NVM-Command-Set-Specification-1.0b-2021.12.18-Ratified.pdf

This series implements one possible new format: the CRC64 guard with
48-bit reference tags. This does not add support for the variable
"storage tag" field, or any potential hardware acceleration.

Changes since RFC v1:

  - Generated the reflected CRC table and cacluated CRC accordingly
    instead of reflecting the input/output (Eric Biggers, patch 3)

  - Fixed Kconfig.debug dependency for CRC tests (patch 4)

  - Fixed endian conversion sparse error (patch 7).

  - Added support for PRACT (Klaus Jensen, patch 7)

Keith Busch (7):
  block: support pi with extended metadata
  nvme: allow integrity on extended metadata formats
  lib: add rocksoft model crc64
  lib: add crc64 tests
  asm-generic: introduce be48 unaligned accessors
  block: add pi for nvme enhanced integrity
  nvme: add support for enhanced metadata

 block/Kconfig                   |   1 +
 block/bio-integrity.c           |   1 +
 block/t10-pi.c                  | 198 +++++++++++++++++++++++++++++++-
 drivers/nvme/host/core.c        | 167 ++++++++++++++++++++++-----
 drivers/nvme/host/nvme.h        |   4 +-
 include/asm-generic/unaligned.h |  26 +++++
 include/linux/blk-integrity.h   |   1 +
 include/linux/crc64.h           |   2 +
 include/linux/nvme.h            |  53 ++++++++-
 include/linux/t10-pi.h          |  20 ++++
 lib/Kconfig.debug               |   4 +
 lib/Makefile                    |   1 +
 lib/crc64.c                     |  26 +++++
 lib/gen_crc64table.c            |  51 ++++++--
 lib/test_crc64.c                |  68 +++++++++++
 15 files changed, 576 insertions(+), 47 deletions(-)
 create mode 100644 lib/test_crc64.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCHv2 1/7] block: support pi with extended metadata
  2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
@ 2022-02-01 19:01 ` Keith Busch
  2022-02-02  4:38   ` Martin K. Petersen
  2022-02-02 13:11   ` Hannes Reinecke
  2022-02-01 19:01 ` [PATCHv2 2/7] nvme: allow integrity on extended metadata formats Keith Busch
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

The nvme spec allows protection information formats with metadata
extending beyond the pi field. Use the actual size of the metadata field
for incrementing the protection buffer.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/bio-integrity.c         | 1 +
 block/t10-pi.c                | 4 ++--
 include/linux/blk-integrity.h | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index d25114715459..e40b1f965960 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -165,6 +165,7 @@ static blk_status_t bio_integrity_process(struct bio *bio,
 
 	iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
 	iter.interval = 1 << bi->interval_exp;
+	iter.tuple_size = bi->tuple_size;
 	iter.seed = proc_iter->bi_sector;
 	iter.prot_buf = bvec_virt(bip->bip_vec);
 
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 25a52a2a09a8..758a76518854 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -44,7 +44,7 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
 			pi->ref_tag = 0;
 
 		iter->data_buf += iter->interval;
-		iter->prot_buf += sizeof(struct t10_pi_tuple);
+		iter->prot_buf += iter->tuple_size;
 		iter->seed++;
 	}
 
@@ -93,7 +93,7 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
 
 next:
 		iter->data_buf += iter->interval;
-		iter->prot_buf += sizeof(struct t10_pi_tuple);
+		iter->prot_buf += iter->tuple_size;
 		iter->seed++;
 	}
 
diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h
index 8a038ea0717e..378b2459efe2 100644
--- a/include/linux/blk-integrity.h
+++ b/include/linux/blk-integrity.h
@@ -19,6 +19,7 @@ struct blk_integrity_iter {
 	sector_t		seed;
 	unsigned int		data_size;
 	unsigned short		interval;
+	unsigned char		tuple_size;
 	const char		*disk_name;
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 2/7] nvme: allow integrity on extended metadata formats
  2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
  2022-02-01 19:01 ` [PATCHv2 1/7] block: support pi with extended metadata Keith Busch
@ 2022-02-01 19:01 ` Keith Busch
  2022-02-02  4:39   ` Martin K. Petersen
  2022-02-02 13:12   ` Hannes Reinecke
  2022-02-01 19:01 ` [PATCHv2 3/7] lib: add rocksoft model crc64 Keith Busch
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

The block integrity subsystem knows how to construct protection
information buffers with metadata beyond the protection information
fields. Remove the driver restriction.

Note, this can only work if the PI field appears first in the metadata,
as the integrity subsystem doesn't calculate guard tags on preceding
metadata.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 5e0bfda04bd7..b3eabf6a08b9 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1726,12 +1726,9 @@ static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
 	struct nvme_ctrl *ctrl = ns->ctrl;
 
-	/*
-	 * The PI implementation requires the metadata size to be equal to the
-	 * t10 pi tuple size.
-	 */
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-	if (ns->ms == sizeof(struct t10_pi_tuple))
+	if (id->dps & NVME_NS_DPS_PI_FIRST ||
+	    ns->ms == sizeof(struct t10_pi_tuple))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
 	else
 		ns->pi_type = 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 3/7] lib: add rocksoft model crc64
  2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
  2022-02-01 19:01 ` [PATCHv2 1/7] block: support pi with extended metadata Keith Busch
  2022-02-01 19:01 ` [PATCHv2 2/7] nvme: allow integrity on extended metadata formats Keith Busch
@ 2022-02-01 19:01 ` Keith Busch
  2022-02-02  4:40   ` Martin K. Petersen
  2022-02-02 13:13   ` Hannes Reinecke
  2022-02-01 19:01 ` [PATCHv2 4/7] lib: add crc64 tests Keith Busch
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch, Eric Biggers

The NVM Express specification extended data integrity fields to 64 bits
using the Rocksoft^TM parameters. Add the poly to the crc64 table
generation, and provide a library routine implementing the algorithm.

The Rocksoft 64-bit CRC model parameters are as follows:
    Poly: 0xAD93D23594C93659
    Initial value: 0xFFFFFFFFFFFFFFFF
    Reflected Input: True
    Reflected Output: True
    Xor Final: 0xFFFFFFFFFFFFFFFF

Since this model used reflected bits, the implementation generates the
reflected table so the result is ordered consistently.

Cc: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:

  Generate a reflected table for the polynomial so that the inputs and
  outputs don't need to be reflected during calculating the CRC

 include/linux/crc64.h |  2 ++
 lib/crc64.c           | 26 ++++++++++++++++++++++
 lib/gen_crc64table.c  | 51 +++++++++++++++++++++++++++++++++----------
 3 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/include/linux/crc64.h b/include/linux/crc64.h
index c756e65a1b58..9f2f20216503 100644
--- a/include/linux/crc64.h
+++ b/include/linux/crc64.h
@@ -8,4 +8,6 @@
 #include <linux/types.h>
 
 u64 __pure crc64_be(u64 crc, const void *p, size_t len);
+u64 __pure crc64_rocksoft(u64 crc, const void *p, size_t len);
+
 #endif /* _LINUX_CRC64_H */
diff --git a/lib/crc64.c b/lib/crc64.c
index 9f852a89ee2a..e223e72be44d 100644
--- a/lib/crc64.c
+++ b/lib/crc64.c
@@ -22,6 +22,13 @@
  * x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
  * x^7 + x^4 + x + 1
  *
+ * crc64rocksoft[256] table is from the Rocksoft specification polynomial
+ * defined as,
+ *
+ * x^64 + x^63 + x^61 + x^59 + x^58 + x^56 + x^55 + x^52 + x^49 + x^48 + x^47 +
+ * x^46 + x^44 + x^41 + x^37 + x^36 + x^34 + x^32 + x^31 + x^28 + x^26 + x^23 +
+ * x^22 + x^19 + x^16 + x^13 + x^12 + x^10 + x^9 + x^6 + x^4 + x^3 + 1
+ *
  * Copyright 2018 SUSE Linux.
  *   Author: Coly Li <colyli@suse.de>
  */
@@ -55,3 +62,22 @@ u64 __pure crc64_be(u64 crc, const void *p, size_t len)
 	return crc;
 }
 EXPORT_SYMBOL_GPL(crc64_be);
+
+/**
+ * crc64_rocksoft - Calculate bitwise Rocksoft CRC64
+ * @crc: seed value for computation. (u64)~0 for a new CRC calculation, or the
+ * 	 previous crc64 value if computing incrementally.
+ * @p: pointer to buffer over which CRC64 is run
+ * @len: length of buffer @p
+ */
+u64 __pure crc64_rocksoft(u64 crc, const void *p, size_t len)
+{
+	const unsigned char *_p = p;
+	size_t i;
+
+	for (i = 0; i < len; i++)
+		crc = (crc >> 8) ^ crc64rocksofttable[(crc & 0xff) ^ *_p++];
+
+	return crc ^ (u64)~0;
+}
+EXPORT_SYMBOL_GPL(crc64_rocksoft);
diff --git a/lib/gen_crc64table.c b/lib/gen_crc64table.c
index 094b43aef8db..55e222acd0b8 100644
--- a/lib/gen_crc64table.c
+++ b/lib/gen_crc64table.c
@@ -17,10 +17,30 @@
 #include <stdio.h>
 
 #define CRC64_ECMA182_POLY 0x42F0E1EBA9EA3693ULL
+#define CRC64_ROCKSOFT_POLY 0x9A6C9329AC4BC9B5ULL
 
 static uint64_t crc64_table[256] = {0};
+static uint64_t crc64_rocksoft_table[256] = {0};
 
-static void generate_crc64_table(void)
+static void generate_reflected_crc64_table(uint64_t table[256], uint64_t poly)
+{
+	uint64_t i, j, c, crc;
+
+	for (i = 0; i < 256; i++) {
+		crc = 0ULL;
+		c = i;
+
+		for (j = 0; j < 8; j++) {
+			if ((crc ^ (c >> j)) & 1)
+				crc = (crc >> 1) ^ poly;
+			else
+				crc >>= 1;
+		}
+		table[i] = crc;
+	}
+}
+
+static void generate_crc64_table(uint64_t table[256], uint64_t poly)
 {
 	uint64_t i, j, c, crc;
 
@@ -30,26 +50,22 @@ static void generate_crc64_table(void)
 
 		for (j = 0; j < 8; j++) {
 			if ((crc ^ c) & 0x8000000000000000ULL)
-				crc = (crc << 1) ^ CRC64_ECMA182_POLY;
+				crc = (crc << 1) ^ poly;
 			else
 				crc <<= 1;
 			c <<= 1;
 		}
 
-		crc64_table[i] = crc;
+		table[i] = crc;
 	}
 }
 
-static void print_crc64_table(void)
+static void output_table(uint64_t table[256])
 {
 	int i;
 
-	printf("/* this file is generated - do not edit */\n\n");
-	printf("#include <linux/types.h>\n");
-	printf("#include <linux/cache.h>\n\n");
-	printf("static const u64 ____cacheline_aligned crc64table[256] = {\n");
 	for (i = 0; i < 256; i++) {
-		printf("\t0x%016" PRIx64 "ULL", crc64_table[i]);
+		printf("\t0x%016" PRIx64 "ULL", table[i]);
 		if (i & 0x1)
 			printf(",\n");
 		else
@@ -58,9 +74,22 @@ static void print_crc64_table(void)
 	printf("};\n");
 }
 
+static void print_crc64_tables(void)
+{
+	printf("/* this file is generated - do not edit */\n\n");
+	printf("#include <linux/types.h>\n");
+	printf("#include <linux/cache.h>\n\n");
+	printf("static const u64 ____cacheline_aligned crc64table[256] = {\n");
+	output_table(crc64_table);
+
+	printf("\nstatic const u64 ____cacheline_aligned crc64rocksofttable[256] = {\n");
+	output_table(crc64_rocksoft_table);
+}
+
 int main(int argc, char *argv[])
 {
-	generate_crc64_table();
-	print_crc64_table();
+	generate_crc64_table(crc64_table, CRC64_ECMA182_POLY);
+	generate_reflected_crc64_table(crc64_rocksoft_table, CRC64_ROCKSOFT_POLY);
+	print_crc64_tables();
 	return 0;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 4/7] lib: add crc64 tests
  2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
                   ` (2 preceding siblings ...)
  2022-02-01 19:01 ` [PATCHv2 3/7] lib: add rocksoft model crc64 Keith Busch
@ 2022-02-01 19:01 ` Keith Busch
  2022-02-02  4:42   ` Martin K. Petersen
  2022-02-02 13:14   ` Hannes Reinecke
  2022-02-01 19:01 ` [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors Keith Busch
                   ` (2 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

Provide a module to test the rocksoft crc64 calculations with well known
inputs and exepected values.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:

  Fixed Kconfig dependency

 lib/Kconfig.debug |  4 +++
 lib/Makefile      |  1 +
 lib/test_crc64.c  | 68 +++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 73 insertions(+)
 create mode 100644 lib/test_crc64.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 14b89aa37c5c..149de11ae903 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2214,6 +2214,10 @@ config TEST_UUID
 config TEST_XARRAY
 	tristate "Test the XArray code at runtime"
 
+config TEST_CRC64
+	depends on CRC64
+	tristate "Test the crc64 code at runtime"
+
 config TEST_OVERFLOW
 	tristate "Test check_*_overflow() functions at runtime"
 
diff --git a/lib/Makefile b/lib/Makefile
index 300f569c626b..e100a4d6a950 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -103,6 +103,7 @@ obj-$(CONFIG_TEST_HMM) += test_hmm.o
 obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o
 obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
 obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o
+obj-$(CONFIG_TEST_CRC64) += test_crc64.o
 #
 # CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
 # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
diff --git a/lib/test_crc64.c b/lib/test_crc64.c
new file mode 100644
index 000000000000..283fef8f110e
--- /dev/null
+++ b/lib/test_crc64.c
@@ -0,0 +1,68 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Tests were selected from NVM Express NVM Command Set Specification 1.0a,
+ * section 5.2.1.3.5 "64b CRC Test Cases" available here:
+ *
+ *   https://nvmexpress.org/wp-content/uploads/NVMe-NVM-Command-Set-Specification-1.0a-2021.07.26-Ratified.pdf
+ *
+ * Copyright 2022 Keith Busch <kbusch@kernel.org>
+ */
+
+#include <linux/crc64.h>
+#include <linux/module.h>
+
+static unsigned int tests_passed;
+static unsigned int tests_run;
+
+#define ALL_ZEROS 0x6482D367EB22B64EULL
+#define ALL_FFS 0xC0DDBA7302ECA3ACULL
+#define INC 0x3E729F5F6750449CULL
+#define DEC 0x9A2DF64B8E9E517EULL
+
+static u8 buffer[4096];
+
+#define CRC_CHECK(c, v) do {					\
+	tests_run++;						\
+	if (c != v)						\
+		printk("BUG at %s:%d expected:%llx got:%llx\n", \
+			__func__, __LINE__, v, c);		\
+	else							\
+		tests_passed++;					\
+} while (0)
+
+
+static int crc_tests(void)
+{
+	__u64 crc;
+	int i;
+
+	memset(buffer, 0, sizeof(buffer));
+	crc = crc64_rocksoft(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, ALL_ZEROS);
+
+	memset(buffer, 0xff, sizeof(buffer));
+	crc = crc64_rocksoft(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, ALL_FFS);
+
+	for (i = 0; i < 4096; i++)
+		buffer[i] = i & 0xff;
+	crc = crc64_rocksoft(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, INC);
+
+	for (i = 0; i < 4096; i++)
+		buffer[i] = 0xff - (i & 0xff);
+	crc = crc64_rocksoft(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, DEC);
+
+	printk("CRC64: %u of %u tests passed\n", tests_passed, tests_run);
+	return (tests_run == tests_passed) ? 0 : -EINVAL;
+}
+
+static void crc_exit(void)
+{
+}
+
+module_init(crc_tests);
+module_exit(crc_exit);
+MODULE_AUTHOR("Keith Busch <kbusch@kernel.org>");
+MODULE_LICENSE("GPL");
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors
  2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
                   ` (3 preceding siblings ...)
  2022-02-01 19:01 ` [PATCHv2 4/7] lib: add crc64 tests Keith Busch
@ 2022-02-01 19:01 ` Keith Busch
  2022-02-02  4:41   ` Martin K. Petersen
  2022-02-02 13:15   ` Hannes Reinecke
  2022-02-01 19:01 ` [PATCHv2 6/7] block: add pi for nvme enhanced integrity Keith Busch
  2022-02-01 19:01 ` [PATCHv2 7/7] nvme: add support for enhanced metadata Keith Busch
  6 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch, Arnd Bergmann

The NVMe protocol extended the data integrity fields with unaligned
48-bit reference tags. Provide some helper accessors in
preparation for these.

Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/asm-generic/unaligned.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 1c4242416c9f..8fc637379899 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -126,4 +126,30 @@ static inline void put_unaligned_le24(const u32 val, void *p)
 	__put_unaligned_le24(val, p);
 }
 
+static inline void __put_unaligned_be48(const u64 val, __u8 *p)
+{
+	*p++ = val >> 40;
+	*p++ = val >> 32;
+	*p++ = val >> 24;
+	*p++ = val >> 16;
+	*p++ = val >> 8;
+	*p++ = val;
+}
+
+static inline void put_unaligned_be48(const u64 val, void *p)
+{
+	__put_unaligned_be48(val, p);
+}
+
+static inline u64 __get_unaligned_be48(const u8 *p)
+{
+	return (u64)p[0] << 40 | (u64)p[1] << 32 | p[2] << 24 |
+		p[3] << 16 | p[4] << 8 | p[5];
+}
+
+static inline u64 get_unaligned_be48(const void *p)
+{
+	return __get_unaligned_be48(p);
+}
+
 #endif /* __ASM_GENERIC_UNALIGNED_H */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 6/7] block: add pi for nvme enhanced integrity
  2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
                   ` (4 preceding siblings ...)
  2022-02-01 19:01 ` [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors Keith Busch
@ 2022-02-01 19:01 ` Keith Busch
  2022-02-02  4:35   ` Martin K. Petersen
                     ` (2 more replies)
  2022-02-01 19:01 ` [PATCHv2 7/7] nvme: add support for enhanced metadata Keith Busch
  6 siblings, 3 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

The NVMe specification defines larger data integrity formats beyond the
t10 tuple. Add support for the specification defined CRC64 formats,
assuming the reference tag does not need to be split with the "storage
tag".

Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/Kconfig          |   1 +
 block/t10-pi.c         | 194 +++++++++++++++++++++++++++++++++++++++++
 include/linux/t10-pi.h |  20 +++++
 3 files changed, 215 insertions(+)

diff --git a/block/Kconfig b/block/Kconfig
index 205f8d01c695..e3ce9196ad07 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -75,6 +75,7 @@ config BLK_DEV_INTEGRITY_T10
 	tristate
 	depends on BLK_DEV_INTEGRITY
 	select CRC_T10DIF
+	select CRC64
 
 config BLK_DEV_ZONED
 	bool "Zoned block device support"
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 758a76518854..7bfefe970bc5 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -7,8 +7,10 @@
 #include <linux/t10-pi.h>
 #include <linux/blk-integrity.h>
 #include <linux/crc-t10dif.h>
+#include <linux/crc64.h>
 #include <linux/module.h>
 #include <net/checksum.h>
+#include <asm/unaligned.h>
 
 typedef __be16 (csum_fn) (void *, unsigned int);
 
@@ -278,4 +280,196 @@ const struct blk_integrity_profile t10_pi_type3_ip = {
 };
 EXPORT_SYMBOL(t10_pi_type3_ip);
 
+static __be64 nvme_pi_crc64(void *data, unsigned int len)
+{
+	return cpu_to_be64(crc64_rocksoft(~0ULL, data, len));
+}
+
+static blk_status_t nvme_crc64_generate(struct blk_integrity_iter *iter,
+					enum t10_dif_type type)
+{
+	unsigned int i;
+
+	for (i = 0 ; i < iter->data_size ; i += iter->interval) {
+		struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
+
+		pi->guard_tag = nvme_pi_crc64(iter->data_buf, iter->interval);
+		pi->app_tag = 0;
+
+		if (type == T10_PI_TYPE1_PROTECTION)
+			put_unaligned_be48(iter->seed, pi->ref_tag);
+		else
+			put_unaligned_be48(0ULL, pi->ref_tag);
+
+		iter->data_buf += iter->interval;
+		iter->prot_buf += iter->tuple_size;
+		iter->seed++;
+	}
+
+	return BLK_STS_OK;
+}
+
+static bool nvme_crc64_ref_escape(u8 *ref_tag)
+{
+	static u8 ref_escape[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+
+	return memcmp(ref_tag, ref_escape, sizeof(ref_escape)) == 0;
+}
+
+static blk_status_t nvme_crc64_verify(struct blk_integrity_iter *iter,
+				      enum t10_dif_type type)
+{
+	unsigned int i;
+
+	for (i = 0; i < iter->data_size; i += iter->interval) {
+		struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
+		u64 ref, seed;
+		__be64 csum;
+
+		if (type == T10_PI_TYPE1_PROTECTION) {
+			if (pi->app_tag == T10_PI_APP_ESCAPE)
+				goto next;
+
+			ref = get_unaligned_be48(pi->ref_tag);
+			seed = iter->seed & 0xffffffffffffull;
+			if (ref != seed) {
+				pr_err("%s: ref tag error at location %llu (rcvd %llu)\n",
+					iter->disk_name, seed, ref);
+				return BLK_STS_PROTECTION;
+			}
+		} else if (type == T10_PI_TYPE3_PROTECTION) {
+			if (pi->app_tag == T10_PI_APP_ESCAPE &&
+			    nvme_crc64_ref_escape(pi->ref_tag))
+				goto next;
+		}
+
+		csum = nvme_pi_crc64(iter->data_buf, iter->interval);
+		if (pi->guard_tag != csum) {
+			pr_err("%s: guard tag error at sector %llu " \
+			       "(rcvd %016llx, want %016llx)\n",
+				iter->disk_name, (unsigned long long)iter->seed,
+				be64_to_cpu(pi->guard_tag), be64_to_cpu(csum));
+			return BLK_STS_PROTECTION;
+		}
+
+next:
+		iter->data_buf += iter->interval;
+		iter->prot_buf += iter->tuple_size;
+		iter->seed++;
+	}
+
+	return BLK_STS_OK;
+}
+
+static blk_status_t nvme_pi_type1_verify_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_verify(iter, T10_PI_TYPE1_PROTECTION);
+}
+
+static blk_status_t nvme_pi_type1_generate_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_generate(iter, T10_PI_TYPE1_PROTECTION);
+}
+
+static void nvme_pi_type1_prepare(struct request *rq)
+{
+	const int tuple_sz = rq->q->integrity.tuple_size;
+	u64 ref_tag = nvme_pi_extended_ref_tag(rq);
+	struct bio *bio;
+
+	__rq_for_each_bio(bio, rq) {
+		struct bio_integrity_payload *bip = bio_integrity(bio);
+		u64 virt = bip_get_seed(bip) & 0xffffffffffffull;
+		struct bio_vec iv;
+		struct bvec_iter iter;
+
+		/* Already remapped? */
+		if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
+			break;
+
+		bip_for_each_vec(iv, bip, iter) {
+			unsigned int j;
+			void *p;
+
+			p = bvec_kmap_local(&iv);
+			for (j = 0; j < iv.bv_len; j += tuple_sz) {
+				struct nvme_crc64_pi_tuple *pi = p;
+				u64 ref = get_unaligned_be48(pi->ref_tag);
+
+				if (ref == virt)
+					put_unaligned_be48(ref_tag, pi->ref_tag);
+				virt++;
+				ref_tag++;
+				p += tuple_sz;
+			}
+			kunmap_local(p);
+		}
+
+		bip->bip_flags |= BIP_MAPPED_INTEGRITY;
+	}
+}
+
+static void nvme_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
+{
+	unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
+	const int tuple_sz = rq->q->integrity.tuple_size;
+	u64 ref_tag = nvme_pi_extended_ref_tag(rq);
+	struct bio *bio;
+
+	__rq_for_each_bio(bio, rq) {
+		struct bio_integrity_payload *bip = bio_integrity(bio);
+		u64 virt = bip_get_seed(bip) & 0xffffffffffffull;
+		struct bio_vec iv;
+		struct bvec_iter iter;
+
+		bip_for_each_vec(iv, bip, iter) {
+			unsigned int j;
+			void *p;
+
+			p = bvec_kmap_local(&iv);
+			for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
+				struct nvme_crc64_pi_tuple *pi = p;
+				u64 ref = get_unaligned_be48(pi->ref_tag);
+
+				if (ref == ref_tag)
+					put_unaligned_be48(virt, pi->ref_tag);
+				virt++;
+				ref_tag++;
+				intervals--;
+				p += tuple_sz;
+			}
+			kunmap_local(p);
+		}
+	}
+}
+
+static blk_status_t nvme_pi_type3_verify_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_verify(iter, T10_PI_TYPE3_PROTECTION);
+}
+
+static blk_status_t nvme_pi_type3_generate_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_generate(iter, T10_PI_TYPE3_PROTECTION);
+}
+
+const struct blk_integrity_profile nvme_pi_type1_crc64 = {
+	.name			= "NVME-DIF-TYPE1-CRC64",
+	.generate_fn		= nvme_pi_type1_generate_crc,
+	.verify_fn		= nvme_pi_type1_verify_crc,
+	.prepare_fn		= nvme_pi_type1_prepare,
+	.complete_fn		= nvme_pi_type1_complete,
+};
+EXPORT_SYMBOL(nvme_pi_type1_crc64);
+
+const struct blk_integrity_profile nvme_pi_type3_crc64 = {
+	.name			= "NVME-DIF-TYPE3-CRC64",
+	.generate_fn		= nvme_pi_type3_generate_crc,
+	.verify_fn		= nvme_pi_type3_verify_crc,
+	.prepare_fn		= t10_pi_type3_prepare,
+	.complete_fn		= t10_pi_type3_complete,
+};
+EXPORT_SYMBOL(nvme_pi_type3_crc64);
+
+MODULE_LICENSE("GPL");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/t10-pi.h b/include/linux/t10-pi.h
index c635c2e014e3..fd3a9b99500a 100644
--- a/include/linux/t10-pi.h
+++ b/include/linux/t10-pi.h
@@ -53,4 +53,24 @@ extern const struct blk_integrity_profile t10_pi_type1_ip;
 extern const struct blk_integrity_profile t10_pi_type3_crc;
 extern const struct blk_integrity_profile t10_pi_type3_ip;
 
+struct nvme_crc64_pi_tuple {
+	__be64 guard_tag;
+	__be16 app_tag;
+	__u8   ref_tag[6];
+};
+
+static inline u64 nvme_pi_extended_ref_tag(struct request *rq)
+{
+	unsigned int shift = ilog2(queue_logical_block_size(rq->q));
+
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	if (rq->q->integrity.interval_exp)
+		shift = rq->q->integrity.interval_exp;
+#endif
+	return blk_rq_pos(rq) >> (shift - SECTOR_SHIFT) & 0xffffffffffffull;
+}
+
+extern const struct blk_integrity_profile nvme_pi_type1_crc64;
+extern const struct blk_integrity_profile nvme_pi_type3_crc64;
+
 #endif
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCHv2 7/7] nvme: add support for enhanced metadata
  2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
                   ` (5 preceding siblings ...)
  2022-02-01 19:01 ` [PATCHv2 6/7] block: add pi for nvme enhanced integrity Keith Busch
@ 2022-02-01 19:01 ` Keith Busch
  2022-02-02  4:48   ` Martin K. Petersen
  2022-02-02 13:28   ` Hannes Reinecke
  6 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-01 19:01 UTC (permalink / raw)
  To: linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

NVM Express ratified TP 4069 defines new protection information formats.
Implement support for the CRC64 guard tags.

Since the block layer doesn't support variable length reference tags,
driver support for the Storage Tag space is not supported at this time.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
v1->v2:

  Added support for PRACT

  Fixed endian conversion

 drivers/nvme/host/core.c | 164 +++++++++++++++++++++++++++++++++------
 drivers/nvme/host/nvme.h |   4 +-
 include/linux/nvme.h     |  53 +++++++++++--
 3 files changed, 190 insertions(+), 31 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index b3eabf6a08b9..0f2ea2a4c718 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -882,6 +882,30 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 	return BLK_STS_OK;
 }
 
+static inline void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd,
+				    struct request *req)
+{
+	u32 upper, lower;
+	u64 ref48;
+
+	/* both rw and write zeroes share the same reftag format */
+	switch (ns->guard_type) {
+	case NVME_NVM_NS_16B_GUARD:
+		cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
+		break;
+	case NVME_NVM_NS_64B_GUARD:
+		ref48 = nvme_pi_extended_ref_tag(req);
+		lower = lower_32_bits(ref48);
+		upper = upper_32_bits(ref48);
+
+		cmnd->rw.reftag = cpu_to_le32(lower);
+		cmnd->rw.cdw3 = cpu_to_le32(upper);
+		break;
+	default:
+		break;
+	}
+}
+
 static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
 		struct request *req, struct nvme_command *cmnd)
 {
@@ -903,8 +927,7 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
 		switch (ns->pi_type) {
 		case NVME_NS_DPS_PI_TYPE1:
 		case NVME_NS_DPS_PI_TYPE2:
-			cmnd->write_zeroes.reftag =
-				cpu_to_le32(t10_pi_ref_tag(req));
+			nvme_set_ref_tag(ns, cmnd, req);
 			break;
 		}
 	}
@@ -931,7 +954,8 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 	cmnd->rw.opcode = op;
 	cmnd->rw.flags = 0;
 	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
-	cmnd->rw.rsvd2 = 0;
+	cmnd->rw.cdw2 = 0;
+	cmnd->rw.cdw3 = 0;
 	cmnd->rw.metadata = 0;
 	cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
 	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
@@ -965,7 +989,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 					NVME_RW_PRINFO_PRCHK_REF;
 			if (op == nvme_cmd_zone_append)
 				control |= NVME_RW_APPEND_PIREMAP;
-			cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
+			nvme_set_ref_tag(ns, cmnd, req);
 			break;
 		}
 	}
@@ -1619,33 +1643,58 @@ int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
 				u32 max_integrity_segments)
 {
 	struct blk_integrity integrity = { };
 
-	switch (pi_type) {
+	switch (ns->pi_type) {
 	case NVME_NS_DPS_PI_TYPE3:
-		integrity.profile = &t10_pi_type3_crc;
-		integrity.tag_size = sizeof(u16) + sizeof(u32);
-		integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+		switch (ns->guard_type) {
+		case NVME_NVM_NS_16B_GUARD:
+			integrity.profile = &t10_pi_type3_crc;
+			integrity.tag_size = sizeof(u16) + sizeof(u32);
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		case NVME_NVM_NS_64B_GUARD:
+			integrity.profile = &nvme_pi_type1_crc64;
+			integrity.tag_size = sizeof(u16) + 6;
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		default:
+			integrity.profile = NULL;
+			break;
+		}
 		break;
 	case NVME_NS_DPS_PI_TYPE1:
 	case NVME_NS_DPS_PI_TYPE2:
-		integrity.profile = &t10_pi_type1_crc;
-		integrity.tag_size = sizeof(u16);
-		integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+		switch (ns->guard_type) {
+		case NVME_NVM_NS_16B_GUARD:
+			integrity.profile = &t10_pi_type1_crc;
+			integrity.tag_size = sizeof(u16);
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		case NVME_NVM_NS_64B_GUARD:
+			integrity.profile = &nvme_pi_type1_crc64;
+			integrity.tag_size = sizeof(u16);
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		default:
+			integrity.profile = NULL;
+			break;
+		}
 		break;
 	default:
 		integrity.profile = NULL;
 		break;
 	}
-	integrity.tuple_size = ms;
+
+	integrity.tuple_size = ns->ms;
 	blk_integrity_register(disk, &integrity);
 	blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
 }
 #else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
 				u32 max_integrity_segments)
 {
 }
@@ -1722,17 +1771,75 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	return 0;
 }
 
-static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
+static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
+	bool first = id->dps & NVME_NS_DPS_PI_FIRST;
+	unsigned lbaf = nvme_lbaf_index(id->flbas);
 	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_command c = { };
+	struct nvme_id_ns_nvm *nvm;
+	int ret = 0;
+	u32 elbaf;
+
+	ns->pi_size = 0;
+	ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
+	if (!(ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)) {
+		ns->pi_size = sizeof(struct t10_pi_tuple);
+		ns->guard_type = NVME_NVM_NS_16B_GUARD;
+		goto set_pi;
+	}
 
-	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-	if (id->dps & NVME_NS_DPS_PI_FIRST ||
-	    ns->ms == sizeof(struct t10_pi_tuple))
+	nvm = kzalloc(sizeof(*nvm), GFP_KERNEL);
+	if (!nvm)
+		return -ENOMEM;
+
+	c.identify.opcode = nvme_admin_identify;
+	c.identify.nsid = cpu_to_le32(ns->head->ns_id);
+	c.identify.cns = NVME_ID_CNS_CS_NS;
+	c.identify.csi = NVME_CSI_NVM;
+
+	ret = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, nvm, sizeof(*nvm));
+	if (ret)
+		goto free_data;
+
+	elbaf = le32_to_cpu(nvm->elbaf[lbaf]);
+
+	/* no support for storage tag formats right now */
+	if (nvme_elbaf_sts(elbaf))
+		goto free_data;
+
+	ns->guard_type = nvme_elbaf_guard_type(elbaf);
+	switch (ns->guard_type) {
+	case NVME_NVM_NS_64B_GUARD:
+		ns->pi_size = sizeof(struct nvme_crc64_pi_tuple);
+		break;
+	case NVME_NVM_NS_16B_GUARD:
+		ns->pi_size = sizeof(struct t10_pi_tuple);
+		break;
+	default:
+		break;
+	}
+
+free_data:
+	kfree(nvm);
+set_pi:
+	if (ns->pi_size && (first || ns->ms == ns->pi_size))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
 	else
 		ns->pi_type = 0;
 
+	return ret;
+}
+
+static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+	int ret;
+
+	ret = nvme_init_ms(ns, id);
+	if (ret)
+		return ret;
+
 	ns->features &= ~(NVME_NS_METADATA_SUPPORTED | NVME_NS_EXT_LBAS);
 	if (!ns->ms || !(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
 		return 0;
@@ -1850,7 +1957,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	if (ns->ms) {
 		if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
 		    (ns->features & NVME_NS_METADATA_SUPPORTED))
-			nvme_init_integrity(disk, ns->ms, ns->pi_type,
+			nvme_init_integrity(disk, ns,
 					    ns->ctrl->max_integrity_segments);
 		else if (!nvme_ns_has_pi(ns))
 			capacity = 0;
@@ -1905,7 +2012,7 @@ static void nvme_set_chunk_sectors(struct nvme_ns *ns, struct nvme_id_ns *id)
 
 static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
-	unsigned lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
+	unsigned lbaf = nvme_lbaf_index(id->flbas);
 	int ret;
 
 	blk_mq_freeze_queue(ns->disk->queue);
@@ -2252,20 +2359,27 @@ static int nvme_configure_timestamp(struct nvme_ctrl *ctrl)
 	return ret;
 }
 
-static int nvme_configure_acre(struct nvme_ctrl *ctrl)
+static int nvme_configure_host_options(struct nvme_ctrl *ctrl)
 {
 	struct nvme_feat_host_behavior *host;
+	u8 acre = 0, lbafee = 0;
 	int ret;
 
 	/* Don't bother enabling the feature if retry delay is not reported */
-	if (!ctrl->crdt[0])
+	if (ctrl->crdt[0])
+		acre = NVME_ENABLE_ACRE;
+	if (ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)
+		lbafee = NVME_ENABLE_LBAFEE;
+
+	if (!acre && !lbafee)
 		return 0;
 
 	host = kzalloc(sizeof(*host), GFP_KERNEL);
 	if (!host)
 		return 0;
 
-	host->acre = NVME_ENABLE_ACRE;
+	host->acre = acre;
+	host->lbafee = lbafee;
 	ret = nvme_set_features(ctrl, NVME_FEAT_HOST_BEHAVIOR, 0,
 				host, sizeof(*host), NULL);
 	kfree(host);
@@ -3104,7 +3218,7 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
-	ret = nvme_configure_acre(ctrl);
+	ret = nvme_configure_host_options(ctrl);
 	if (ret < 0)
 		return ret;
 
@@ -4725,12 +4839,14 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ns) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ns_zns) != NVME_IDENTIFY_DATA_SIZE);
+	BUILD_BUG_ON(sizeof(struct nvme_id_ns_nvm) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl_zns) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl_nvm) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_lba_range_type) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_smart_log) != 512);
 	BUILD_BUG_ON(sizeof(struct nvme_dbbuf) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_directive_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_feat_host_behavior) != 512);
 }
 
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index a162f6c6da6e..9cde9445506a 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -452,9 +452,11 @@ struct nvme_ns {
 
 	int lba_shift;
 	u16 ms;
+	u16 pi_size;
 	u16 sgs;
 	u32 sws;
 	u8 pi_type;
+	u8 guard_type;
 #ifdef CONFIG_BLK_DEV_ZONED
 	u64 zsze;
 #endif
@@ -477,7 +479,7 @@ struct nvme_ns {
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
 static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
-	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
+	return ns->pi_type && ns->ms == ns->pi_size;
 }
 
 struct nvme_ctrl_ops {
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 855dd9b3e84b..4342b7eed3e2 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -238,6 +238,7 @@ enum {
 enum nvme_ctrl_attr {
 	NVME_CTRL_ATTR_HID_128_BIT	= (1 << 0),
 	NVME_CTRL_ATTR_TBKAS		= (1 << 6),
+	NVME_CTRL_ATTR_ELBAS		= (1 << 15),
 };
 
 struct nvme_id_ctrl {
@@ -391,8 +392,7 @@ struct nvme_id_ns {
 	__le16			endgid;
 	__u8			nguid[16];
 	__u8			eui64[8];
-	struct nvme_lbaf	lbaf[16];
-	__u8			rsvd192[192];
+	struct nvme_lbaf	lbaf[64];
 	__u8			vs[3712];
 };
 
@@ -410,8 +410,7 @@ struct nvme_id_ns_zns {
 	__le32			rrl;
 	__le32			frl;
 	__u8			rsvd20[2796];
-	struct nvme_zns_lbafe	lbafe[16];
-	__u8			rsvd3072[768];
+	struct nvme_zns_lbafe	lbafe[64];
 	__u8			vs[256];
 };
 
@@ -420,6 +419,30 @@ struct nvme_id_ctrl_zns {
 	__u8	rsvd1[4095];
 };
 
+struct nvme_id_ns_nvm {
+	__le64	lbstm;
+	__u8	pic;
+	__u8	rsvd9[3];
+	__le32	elbaf[64];
+	__u8	rsvd268[3828];
+};
+
+enum {
+	NVME_ID_NS_NVM_STS_MASK		= 0x3f,
+	NVME_ID_NS_NVM_GUARD_SHIFT	= 7,
+	NVME_ID_NS_NVM_GUARD_MASK	= 0x3,
+};
+
+static inline __u8 nvme_elbaf_sts(__u32 elbaf)
+{
+	return elbaf & NVME_ID_NS_NVM_STS_MASK;
+}
+
+static inline __u8 nvme_elbaf_guard_type(__u32 elbaf)
+{
+	return (elbaf >> NVME_ID_NS_NVM_GUARD_SHIFT) & NVME_ID_NS_NVM_GUARD_MASK;
+}
+
 struct nvme_id_ctrl_nvm {
 	__u8	vsl;
 	__u8	wzsl;
@@ -470,6 +493,8 @@ enum {
 	NVME_NS_FEAT_IO_OPT	= 1 << 4,
 	NVME_NS_ATTR_RO		= 1 << 0,
 	NVME_NS_FLBAS_LBA_MASK	= 0xf,
+	NVME_NS_FLBAS_LBA_UMASK	= 0x60,
+	NVME_NS_FLBAS_LBA_SHIFT	= 1,
 	NVME_NS_FLBAS_META_EXT	= 0x10,
 	NVME_NS_NMIC_SHARED	= 1 << 0,
 	NVME_LBAF_RP_BEST	= 0,
@@ -488,6 +513,18 @@ enum {
 	NVME_NS_DPS_PI_TYPE3	= 3,
 };
 
+enum {
+	NVME_NVM_NS_16B_GUARD	= 0,
+	NVME_NVM_NS_32B_GUARD	= 1,
+	NVME_NVM_NS_64B_GUARD	= 2,
+};
+
+static inline __u8 nvme_lbaf_index(__u8 flbas)
+{
+	return (flbas & NVME_NS_FLBAS_LBA_MASK) |
+		((flbas & NVME_NS_FLBAS_LBA_UMASK) >> NVME_NS_FLBAS_LBA_SHIFT);
+}
+
 /* Identify Namespace Metadata Capabilities (MC): */
 enum {
 	NVME_MC_EXTENDED_LBA	= (1 << 0),
@@ -834,7 +871,8 @@ struct nvme_rw_command {
 	__u8			flags;
 	__u16			command_id;
 	__le32			nsid;
-	__u64			rsvd2;
+	__le32			cdw2;
+	__le32			cdw3;
 	__le64			metadata;
 	union nvme_data_ptr	dptr;
 	__le64			slba;
@@ -988,11 +1026,14 @@ enum {
 
 struct nvme_feat_host_behavior {
 	__u8 acre;
-	__u8 resv1[511];
+	__u8 etdas;
+	__u8 lbafee;
+	__u8 resv1[509];
 };
 
 enum {
 	NVME_ENABLE_ACRE	= 1,
+	NVME_ENABLE_LBAFEE	= 1,
 };
 
 /* Admin commands */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 6/7] block: add pi for nvme enhanced integrity
  2022-02-01 19:01 ` [PATCHv2 6/7] block: add pi for nvme enhanced integrity Keith Busch
@ 2022-02-02  4:35   ` Martin K. Petersen
  2022-02-02 13:19   ` Hannes Reinecke
  2022-02-02 18:40   ` Bart Van Assche
  2 siblings, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02  4:35 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli


Keith,

This all looks pretty good to me. Only nit I have is:

> +static blk_status_t nvme_pi_type1_verify_crc(struct blk_integrity_iter *iter)
> +{
> +	return nvme_crc64_verify(iter, T10_PI_TYPE1_PROTECTION);
> +}
> +
> +static blk_status_t nvme_pi_type1_generate_crc(struct blk_integrity_iter *iter)
> +{
> +	return nvme_crc64_generate(iter, T10_PI_TYPE1_PROTECTION);
> +}

Since we will definitely need to support the CRC32C variants, the
nvme_pi_type1_ prefix is a bit too generic. Wish we had gone with Type 4
and 5 like I originally proposed in SCSI. Not a big fan of this "almost
exactly like T10 Type 1 except for all these differences" situation that
NVMe ended up with.

Anyway. So I think the NVMe-specific format helpers need to at the very
least capture that they are for the CRC64 case.

Other than that it looks OK.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/7] block: support pi with extended metadata
  2022-02-01 19:01 ` [PATCHv2 1/7] block: support pi with extended metadata Keith Busch
@ 2022-02-02  4:38   ` Martin K. Petersen
  2022-02-02 13:11   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02  4:38 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli


Keith,

> The nvme spec allows protection information formats with metadata
> extending beyond the pi field.

This may be true but it seems the rationale for the patch in the context
of this series is to enable PI metadata bigger than t10_pi_tuple?

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/7] nvme: allow integrity on extended metadata formats
  2022-02-01 19:01 ` [PATCHv2 2/7] nvme: allow integrity on extended metadata formats Keith Busch
@ 2022-02-02  4:39   ` Martin K. Petersen
  2022-02-02 13:12   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02  4:39 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli


Keith,

> The block integrity subsystem knows how to construct protection
> information buffers with metadata beyond the protection information
> fields. Remove the driver restriction.
>
> Note, this can only work if the PI field appears first in the metadata,
> as the integrity subsystem doesn't calculate guard tags on preceding
> metadata.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 3/7] lib: add rocksoft model crc64
  2022-02-01 19:01 ` [PATCHv2 3/7] lib: add rocksoft model crc64 Keith Busch
@ 2022-02-02  4:40   ` Martin K. Petersen
  2022-02-02 13:13   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02  4:40 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli, Eric Biggers


Keith,

> The NVM Express specification extended data integrity fields to 64
> bits using the Rocksoft^TM parameters. Add the poly to the crc64 table
> generation, and provide a library routine implementing the algorithm.

Looks OK to me.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors
  2022-02-01 19:01 ` [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors Keith Busch
@ 2022-02-02  4:41   ` Martin K. Petersen
  2022-02-02 13:15   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02  4:41 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli, Arnd Bergmann


Keith,

> The NVMe protocol extended the data integrity fields with unaligned
> 48-bit reference tags. Provide some helper accessors in
> preparation for these.

Looks good. Needed this a few times in the past.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 4/7] lib: add crc64 tests
  2022-02-01 19:01 ` [PATCHv2 4/7] lib: add crc64 tests Keith Busch
@ 2022-02-02  4:42   ` Martin K. Petersen
  2022-02-02 13:14   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02  4:42 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli


Keith,

> Provide a module to test the rocksoft crc64 calculations with well
> known inputs and exepected values.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 7/7] nvme: add support for enhanced metadata
  2022-02-01 19:01 ` [PATCHv2 7/7] nvme: add support for enhanced metadata Keith Busch
@ 2022-02-02  4:48   ` Martin K. Petersen
  2022-02-02 13:28   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02  4:48 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli


Keith,

> NVM Express ratified TP 4069 defines new protection information
> formats.  Implement support for the CRC64 guard tags.

Looks fine.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 1/7] block: support pi with extended metadata
  2022-02-01 19:01 ` [PATCHv2 1/7] block: support pi with extended metadata Keith Busch
  2022-02-02  4:38   ` Martin K. Petersen
@ 2022-02-02 13:11   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 13:11 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli

On 2/1/22 20:01, Keith Busch wrote:
> The nvme spec allows protection information formats with metadata
> extending beyond the pi field. Use the actual size of the metadata field
> for incrementing the protection buffer.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   block/bio-integrity.c         | 1 +
>   block/t10-pi.c                | 4 ++--
>   include/linux/blk-integrity.h | 1 +
>   3 files changed, 4 insertions(+), 2 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 2/7] nvme: allow integrity on extended metadata formats
  2022-02-01 19:01 ` [PATCHv2 2/7] nvme: allow integrity on extended metadata formats Keith Busch
  2022-02-02  4:39   ` Martin K. Petersen
@ 2022-02-02 13:12   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 13:12 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli

On 2/1/22 20:01, Keith Busch wrote:
> The block integrity subsystem knows how to construct protection
> information buffers with metadata beyond the protection information
> fields. Remove the driver restriction.
> 
> Note, this can only work if the PI field appears first in the metadata,
> as the integrity subsystem doesn't calculate guard tags on preceding
> metadata.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   drivers/nvme/host/core.c | 7 ++-----
>   1 file changed, 2 insertions(+), 5 deletions(-)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 3/7] lib: add rocksoft model crc64
  2022-02-01 19:01 ` [PATCHv2 3/7] lib: add rocksoft model crc64 Keith Busch
  2022-02-02  4:40   ` Martin K. Petersen
@ 2022-02-02 13:13   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 13:13 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Eric Biggers

On 2/1/22 20:01, Keith Busch wrote:
> The NVM Express specification extended data integrity fields to 64 bits
> using the Rocksoft^TM parameters. Add the poly to the crc64 table
> generation, and provide a library routine implementing the algorithm.
> 
> The Rocksoft 64-bit CRC model parameters are as follows:
>      Poly: 0xAD93D23594C93659
>      Initial value: 0xFFFFFFFFFFFFFFFF
>      Reflected Input: True
>      Reflected Output: True
>      Xor Final: 0xFFFFFFFFFFFFFFFF
> 
> Since this model used reflected bits, the implementation generates the
> reflected table so the result is ordered consistently.
> 
> Cc: Eric Biggers <ebiggers@kernel.org>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> v1->v2:
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 4/7] lib: add crc64 tests
  2022-02-01 19:01 ` [PATCHv2 4/7] lib: add crc64 tests Keith Busch
  2022-02-02  4:42   ` Martin K. Petersen
@ 2022-02-02 13:14   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 13:14 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli

On 2/1/22 20:01, Keith Busch wrote:
> Provide a module to test the rocksoft crc64 calculations with well known
> inputs and exepected values.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> v1->v2:
> 
>    Fixed Kconfig dependency
> 
>   lib/Kconfig.debug |  4 +++
>   lib/Makefile      |  1 +
>   lib/test_crc64.c  | 68 +++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 73 insertions(+)
>   create mode 100644 lib/test_crc64.c
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors
  2022-02-01 19:01 ` [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors Keith Busch
  2022-02-02  4:41   ` Martin K. Petersen
@ 2022-02-02 13:15   ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 13:15 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli, Arnd Bergmann

On 2/1/22 20:01, Keith Busch wrote:
> The NVMe protocol extended the data integrity fields with unaligned
> 48-bit reference tags. Provide some helper accessors in
> preparation for these.
> 
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   include/asm-generic/unaligned.h | 26 ++++++++++++++++++++++++++
>   1 file changed, 26 insertions(+)
> 
Hehe.

Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 6/7] block: add pi for nvme enhanced integrity
  2022-02-01 19:01 ` [PATCHv2 6/7] block: add pi for nvme enhanced integrity Keith Busch
  2022-02-02  4:35   ` Martin K. Petersen
@ 2022-02-02 13:19   ` Hannes Reinecke
  2022-02-02 18:40   ` Bart Van Assche
  2 siblings, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 13:19 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli

On 2/1/22 20:01, Keith Busch wrote:
> The NVMe specification defines larger data integrity formats beyond the
> t10 tuple. Add support for the specification defined CRC64 formats,
> assuming the reference tag does not need to be split with the "storage
> tag".
> 
> Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>   block/Kconfig          |   1 +
>   block/t10-pi.c         | 194 +++++++++++++++++++++++++++++++++++++++++
>   include/linux/t10-pi.h |  20 +++++
>   3 files changed, 215 insertions(+)
> 
Reviewed-by: Hannes Reinecke <hare@suse.de>

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 7/7] nvme: add support for enhanced metadata
  2022-02-01 19:01 ` [PATCHv2 7/7] nvme: add support for enhanced metadata Keith Busch
  2022-02-02  4:48   ` Martin K. Petersen
@ 2022-02-02 13:28   ` Hannes Reinecke
  2022-02-02 15:41     ` Keith Busch
  1 sibling, 1 reply; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 13:28 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli

On 2/1/22 20:01, Keith Busch wrote:
> NVM Express ratified TP 4069 defines new protection information formats.
> Implement support for the CRC64 guard tags.
> 
> Since the block layer doesn't support variable length reference tags,
> driver support for the Storage Tag space is not supported at this time.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
> v1->v2:
> 
>    Added support for PRACT
> 
>    Fixed endian conversion
> 
>   drivers/nvme/host/core.c | 164 +++++++++++++++++++++++++++++++++------
>   drivers/nvme/host/nvme.h |   4 +-
>   include/linux/nvme.h     |  53 +++++++++++--
>   3 files changed, 190 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index b3eabf6a08b9..0f2ea2a4c718 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -882,6 +882,30 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>   	return BLK_STS_OK;
>   }
>   
> +static inline void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd,
> +				    struct request *req)
> +{
> +	u32 upper, lower;
> +	u64 ref48;
> +
> +	/* both rw and write zeroes share the same reftag format */
> +	switch (ns->guard_type) {
> +	case NVME_NVM_NS_16B_GUARD:
> +		cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
> +		break;
> +	case NVME_NVM_NS_64B_GUARD:
> +		ref48 = nvme_pi_extended_ref_tag(req);
> +		lower = lower_32_bits(ref48);
> +		upper = upper_32_bits(ref48);
> +
> +		cmnd->rw.reftag = cpu_to_le32(lower);
> +		cmnd->rw.cdw3 = cpu_to_le32(upper);
> +		break;
> +	default:
> +		break;
> +	}
> +}
> +
>   static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
>   		struct request *req, struct nvme_command *cmnd)
>   {
> @@ -903,8 +927,7 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
>   		switch (ns->pi_type) {
>   		case NVME_NS_DPS_PI_TYPE1:
>   		case NVME_NS_DPS_PI_TYPE2:
> -			cmnd->write_zeroes.reftag =
> -				cpu_to_le32(t10_pi_ref_tag(req));
> +			nvme_set_ref_tag(ns, cmnd, req);
>   			break;
>   		}
>   	}
> @@ -931,7 +954,8 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
>   	cmnd->rw.opcode = op;
>   	cmnd->rw.flags = 0;
>   	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
> -	cmnd->rw.rsvd2 = 0;
> +	cmnd->rw.cdw2 = 0;
> +	cmnd->rw.cdw3 = 0;
>   	cmnd->rw.metadata = 0;
>   	cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
>   	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
> @@ -965,7 +989,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
>   					NVME_RW_PRINFO_PRCHK_REF;
>   			if (op == nvme_cmd_zone_append)
>   				control |= NVME_RW_APPEND_PIREMAP;
> -			cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
> +			nvme_set_ref_tag(ns, cmnd, req);
>   			break;
>   		}
>   	}
> @@ -1619,33 +1643,58 @@ int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
>   }
>   
>   #ifdef CONFIG_BLK_DEV_INTEGRITY
> -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
> +static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
>   				u32 max_integrity_segments)
>   {
>   	struct blk_integrity integrity = { };
>   
> -	switch (pi_type) {
> +	switch (ns->pi_type) {
>   	case NVME_NS_DPS_PI_TYPE3:
> -		integrity.profile = &t10_pi_type3_crc;
> -		integrity.tag_size = sizeof(u16) + sizeof(u32);
> -		integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> +		switch (ns->guard_type) {
> +		case NVME_NVM_NS_16B_GUARD:
> +			integrity.profile = &t10_pi_type3_crc;
> +			integrity.tag_size = sizeof(u16) + sizeof(u32);
> +			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> +			break;
> +		case NVME_NVM_NS_64B_GUARD:
> +			integrity.profile = &nvme_pi_type1_crc64;
> +			integrity.tag_size = sizeof(u16) + 6;
> +			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> +			break;
> +		default:
> +			integrity.profile = NULL;
> +			break;
> +		}
>   		break;
>   	case NVME_NS_DPS_PI_TYPE1:
>   	case NVME_NS_DPS_PI_TYPE2:
> -		integrity.profile = &t10_pi_type1_crc;
> -		integrity.tag_size = sizeof(u16);
> -		integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> +		switch (ns->guard_type) {
> +		case NVME_NVM_NS_16B_GUARD:
> +			integrity.profile = &t10_pi_type1_crc;
> +			integrity.tag_size = sizeof(u16);
> +			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> +			break;
> +		case NVME_NVM_NS_64B_GUARD:
> +			integrity.profile = &nvme_pi_type1_crc64;
> +			integrity.tag_size = sizeof(u16);

Is that correct? Shouldn't it be '8' like in the above case?

> +			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> +			break;
> +		default:
> +			integrity.profile = NULL;
> +			break;
> +		}
>   		break;
>   	default:
>   		integrity.profile = NULL;
>   		break;
>   	}
> -	integrity.tuple_size = ms;
> +
> +	integrity.tuple_size = ns->ms;
>   	blk_integrity_register(disk, &integrity);
>   	blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
>   }
>   #else
> -static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
> +static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
>   				u32 max_integrity_segments)
>   {
>   }
> @@ -1722,17 +1771,75 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
>   	return 0;
>   }
>   
> -static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
> +static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id)
>   {
> +	bool first = id->dps & NVME_NS_DPS_PI_FIRST;
> +	unsigned lbaf = nvme_lbaf_index(id->flbas);
>   	struct nvme_ctrl *ctrl = ns->ctrl;
> +	struct nvme_command c = { };
> +	struct nvme_id_ns_nvm *nvm;
> +	int ret = 0;
> +	u32 elbaf;
> +
> +	ns->pi_size = 0;
> +	ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
> +	if (!(ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)) {
> +		ns->pi_size = sizeof(struct t10_pi_tuple);
> +		ns->guard_type = NVME_NVM_NS_16B_GUARD;
> +		goto set_pi;
> +	}
>   
> -	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
> -	if (id->dps & NVME_NS_DPS_PI_FIRST ||
> -	    ns->ms == sizeof(struct t10_pi_tuple))
> +	nvm = kzalloc(sizeof(*nvm), GFP_KERNEL);
> +	if (!nvm)
> +		return -ENOMEM;
> +
> +	c.identify.opcode = nvme_admin_identify;
> +	c.identify.nsid = cpu_to_le32(ns->head->ns_id);
> +	c.identify.cns = NVME_ID_CNS_CS_NS;
> +	c.identify.csi = NVME_CSI_NVM;
> +
> +	ret = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, nvm, sizeof(*nvm));
> +	if (ret)
> +		goto free_data;
> +
> +	elbaf = le32_to_cpu(nvm->elbaf[lbaf]);
> +
> +	/* no support for storage tag formats right now */
> +	if (nvme_elbaf_sts(elbaf))
> +		goto free_data;
> +
> +	ns->guard_type = nvme_elbaf_guard_type(elbaf);
> +	switch (ns->guard_type) {
> +	case NVME_NVM_NS_64B_GUARD:
> +		ns->pi_size = sizeof(struct nvme_crc64_pi_tuple);
> +		break;
> +	case NVME_NVM_NS_16B_GUARD:
> +		ns->pi_size = sizeof(struct t10_pi_tuple);
> +		break;
> +	default:
> +		break;
> +	}
> +
> +free_data:
> +	kfree(nvm);
> +set_pi:
> +	if (ns->pi_size && (first || ns->ms == ns->pi_size))
>   		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
>   	else
>   		ns->pi_type = 0;
>   
> +	return ret;
> +}
> +
> +static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
> +{
> +	struct nvme_ctrl *ctrl = ns->ctrl;
> +	int ret;
> +
> +	ret = nvme_init_ms(ns, id);
> +	if (ret)
> +		return ret;
> +
>   	ns->features &= ~(NVME_NS_METADATA_SUPPORTED | NVME_NS_EXT_LBAS);
>   	if (!ns->ms || !(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
>   		return 0;
> @@ -1850,7 +1957,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
>   	if (ns->ms) {
>   		if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
>   		    (ns->features & NVME_NS_METADATA_SUPPORTED))
> -			nvme_init_integrity(disk, ns->ms, ns->pi_type,
> +			nvme_init_integrity(disk, ns,
>   					    ns->ctrl->max_integrity_segments);
>   		else if (!nvme_ns_has_pi(ns))
>   			capacity = 0;
> @@ -1905,7 +2012,7 @@ static void nvme_set_chunk_sectors(struct nvme_ns *ns, struct nvme_id_ns *id)
>   
>   static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
>   {
> -	unsigned lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
> +	unsigned lbaf = nvme_lbaf_index(id->flbas);
>   	int ret;
>   
>   	blk_mq_freeze_queue(ns->disk->queue);
> @@ -2252,20 +2359,27 @@ static int nvme_configure_timestamp(struct nvme_ctrl *ctrl)
>   	return ret;
>   }
>   
> -static int nvme_configure_acre(struct nvme_ctrl *ctrl)
> +static int nvme_configure_host_options(struct nvme_ctrl *ctrl)
>   {
>   	struct nvme_feat_host_behavior *host;
> +	u8 acre = 0, lbafee = 0;
>   	int ret;
>   
>   	/* Don't bother enabling the feature if retry delay is not reported */
> -	if (!ctrl->crdt[0])
> +	if (ctrl->crdt[0])
> +		acre = NVME_ENABLE_ACRE;
> +	if (ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)
> +		lbafee = NVME_ENABLE_LBAFEE;
> +
> +	if (!acre && !lbafee)
>   		return 0;
>   
>   	host = kzalloc(sizeof(*host), GFP_KERNEL);
>   	if (!host)
>   		return 0;
>   
> -	host->acre = NVME_ENABLE_ACRE;
> +	host->acre = acre;
> +	host->lbafee = lbafee;
>   	ret = nvme_set_features(ctrl, NVME_FEAT_HOST_BEHAVIOR, 0,
>   				host, sizeof(*host), NULL);
>   	kfree(host);
> @@ -3104,7 +3218,7 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
>   	if (ret < 0)
>   		return ret;
>   
> -	ret = nvme_configure_acre(ctrl);
> +	ret = nvme_configure_host_options(ctrl);
>   	if (ret < 0)
>   		return ret;
>   

This could be made into a separate patch, is it's not directly related 
to PI support.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		           Kernel Storage Architect
hare@suse.de			                  +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 7/7] nvme: add support for enhanced metadata
  2022-02-02 13:28   ` Hannes Reinecke
@ 2022-02-02 15:41     ` Keith Busch
  2022-02-02 15:47       ` Martin K. Petersen
  2022-02-02 16:38       ` Hannes Reinecke
  0 siblings, 2 replies; 26+ messages in thread
From: Keith Busch @ 2022-02-02 15:41 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli

On Wed, Feb 02, 2022 at 02:28:53PM +0100, Hannes Reinecke wrote:
> On 2/1/22 20:01, Keith Busch wrote:
> > +			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> > +			break;
> > +		case NVME_NVM_NS_64B_GUARD:
> > +			integrity.profile = &nvme_pi_type1_crc64

I just noticed this should be type3, not type1...

> > +			integrity.tag_size = sizeof(u16) + 6;
> > +			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> > +			break;
> > +		default:
> > +			integrity.profile = NULL;
> > +			break;
> > +		}
> >   		break;
> >   	case NVME_NS_DPS_PI_TYPE1:
> >   	case NVME_NS_DPS_PI_TYPE2:
> > -		integrity.profile = &t10_pi_type1_crc;
> > -		integrity.tag_size = sizeof(u16);
> > -		integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> > +		switch (ns->guard_type) {
> > +		case NVME_NVM_NS_16B_GUARD:
> > +			integrity.profile = &t10_pi_type1_crc;
> > +			integrity.tag_size = sizeof(u16);
> > +			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
> > +			break;
> > +		case NVME_NVM_NS_64B_GUARD:
> > +			integrity.profile = &nvme_pi_type1_crc64;
> > +			integrity.tag_size = sizeof(u16);
> 
> Is that correct? Shouldn't it be '8' like in the above case?

For type1 and 2, I believe tag_size refers to the "application" tag,
which is 2 bytes here.

The reason it is 8 bytes for type3 is because there is no ref tag, so
that portion of the metadata becomes part of the opaque application
tag_size.

> > @@ -3104,7 +3218,7 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
> >   	if (ret < 0)
> >   		return ret;
> > -	ret = nvme_configure_acre(ctrl);
> > +	ret = nvme_configure_host_options(ctrl);
> >   	if (ret < 0)
> >   		return ret;
> 
> This could be made into a separate patch, is it's not directly related to PI
> support.

Well, the driver can't read the new PI formats without enabling host
supported features for it. Enabling the feature tells the controller
we're going to check for it, so I don't think we could reasonably split
this part into a prep patch from the part that sets up the PI formats.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 7/7] nvme: add support for enhanced metadata
  2022-02-02 15:41     ` Keith Busch
@ 2022-02-02 15:47       ` Martin K. Petersen
  2022-02-02 16:38       ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Martin K. Petersen @ 2022-02-02 15:47 UTC (permalink / raw)
  To: Keith Busch
  Cc: Hannes Reinecke, linux-nvme, linux-kernel, linux-block, axboe,
	hch, martin.petersen, colyli


Keith,

> For type1 and 2, I believe tag_size refers to the "application" tag,
> which is 2 bytes here.

Yep.

> The reason it is 8 bytes for type3 is because there is no ref tag, so
> that portion of the metadata becomes part of the opaque application
> tag_size.

Correct.

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 7/7] nvme: add support for enhanced metadata
  2022-02-02 15:41     ` Keith Busch
  2022-02-02 15:47       ` Martin K. Petersen
@ 2022-02-02 16:38       ` Hannes Reinecke
  1 sibling, 0 replies; 26+ messages in thread
From: Hannes Reinecke @ 2022-02-02 16:38 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-kernel, linux-block, axboe, hch,
	martin.petersen, colyli

On 2/2/22 16:41, Keith Busch wrote:
> On Wed, Feb 02, 2022 at 02:28:53PM +0100, Hannes Reinecke wrote:
>> On 2/1/22 20:01, Keith Busch wrote:
[ .. ]
>>> @@ -3104,7 +3218,7 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
>>>    	if (ret < 0)
>>>    		return ret;
>>> -	ret = nvme_configure_acre(ctrl);
>>> +	ret = nvme_configure_host_options(ctrl);
>>>    	if (ret < 0)
>>>    		return ret;
>>
>> This could be made into a separate patch, is it's not directly related to PI
>> support. >
> Well, the driver can't read the new PI formats without enabling host
> supported features for it. Enabling the feature tells the controller
> we're going to check for it, so I don't think we could reasonably split
> this part into a prep patch from the part that sets up the PI formats.

Actually I was thinking about a patch renaming 'nvme_configure_acre' 
into 'nvme_configure_host_options', as _this_ really is independent.
When mixed together with the PI stuff it's hard to track down from the 
commit message when it got changed.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Felix Imendörffer

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCHv2 6/7] block: add pi for nvme enhanced integrity
  2022-02-01 19:01 ` [PATCHv2 6/7] block: add pi for nvme enhanced integrity Keith Busch
  2022-02-02  4:35   ` Martin K. Petersen
  2022-02-02 13:19   ` Hannes Reinecke
@ 2022-02-02 18:40   ` Bart Van Assche
  2 siblings, 0 replies; 26+ messages in thread
From: Bart Van Assche @ 2022-02-02 18:40 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-kernel, linux-block
  Cc: axboe, hch, martin.petersen, colyli

On 2/1/22 11:01, Keith Busch wrote:
> +			ref = get_unaligned_be48(pi->ref_tag);
> +			seed = iter->seed & 0xffffffffffffull;

The "& 0xffffffffffffull" operation occurs three times in this patch. 
Has it been considered to introduce a lower_48_bits() function?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2022-02-02 18:41 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-01 19:01 [PATCHv2 0/7] 64-bit data integrity field support Keith Busch
2022-02-01 19:01 ` [PATCHv2 1/7] block: support pi with extended metadata Keith Busch
2022-02-02  4:38   ` Martin K. Petersen
2022-02-02 13:11   ` Hannes Reinecke
2022-02-01 19:01 ` [PATCHv2 2/7] nvme: allow integrity on extended metadata formats Keith Busch
2022-02-02  4:39   ` Martin K. Petersen
2022-02-02 13:12   ` Hannes Reinecke
2022-02-01 19:01 ` [PATCHv2 3/7] lib: add rocksoft model crc64 Keith Busch
2022-02-02  4:40   ` Martin K. Petersen
2022-02-02 13:13   ` Hannes Reinecke
2022-02-01 19:01 ` [PATCHv2 4/7] lib: add crc64 tests Keith Busch
2022-02-02  4:42   ` Martin K. Petersen
2022-02-02 13:14   ` Hannes Reinecke
2022-02-01 19:01 ` [PATCHv2 5/7] asm-generic: introduce be48 unaligned accessors Keith Busch
2022-02-02  4:41   ` Martin K. Petersen
2022-02-02 13:15   ` Hannes Reinecke
2022-02-01 19:01 ` [PATCHv2 6/7] block: add pi for nvme enhanced integrity Keith Busch
2022-02-02  4:35   ` Martin K. Petersen
2022-02-02 13:19   ` Hannes Reinecke
2022-02-02 18:40   ` Bart Van Assche
2022-02-01 19:01 ` [PATCHv2 7/7] nvme: add support for enhanced metadata Keith Busch
2022-02-02  4:48   ` Martin K. Petersen
2022-02-02 13:28   ` Hannes Reinecke
2022-02-02 15:41     ` Keith Busch
2022-02-02 15:47       ` Martin K. Petersen
2022-02-02 16:38       ` Hannes Reinecke

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).