linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCHv3 00/10] 64-bit data integrity field support
@ 2022-02-22 16:31 Keith Busch
  2022-02-22 16:31 ` [PATCHv3 01/10] block: support pi with extended metadata Keith Busch
                   ` (9 more replies)
  0 siblings, 10 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

The NVM Express protocol added enhancements to the data integrity field
formats beyond the T10 defined protection information. A detailed
description of the new formats can be found in the NVMe's NVM Command
Set Specification, section 5.2, available at:

  https://nvmexpress.org/wp-content/uploads/NVM-Command-Set-Specification-1.0b-2021.12.18-Ratified.pdf

This series implements one possible new format: the CRC64 guard with
48-bit reference tags. This does not add support for the variable
"storage tag" field, or any potential hardware acceleration.

Changes since v2:

  Introduced lower_48_bits() (Bart)

  Added "64" suffix to crc names to distinguish from other crc
  calculations (Martin)

  Added module support to crypto library, and have the integrity
  generate/verify use that instead of directly calling the generic table
  lookup implementation. This is modeled after the t10 implementation.

  Added an x86 pclmul accelerated crc64 calculation

  Added speed tests to the test module so that the generic and library
  function could be compared.

  Bug fix in nvme to select the correct integrity profile.

  Added reviews

Note, I am not particularly well versed with x86 assembly, so I'm reasonably
sure that the pcl implementation could be improved. It currently is
about 20x faster than the generic implementation, but I believe it could
be over 30x faster if done more efficiently.

Keith Busch (10):
  block: support pi with extended metadata
  nvme: allow integrity on extended metadata formats
  asm-generic: introduce be48 unaligned accessors
  linux/kernel: introduce lower_48_bits macro
  lib: add rocksoft model crc64
  crypto: add rocksoft 64b crc guard tag framework
  lib: add crc64 tests
  block: add pi for nvme enhanced integrity
  nvme: add support for enhanced metadata
  x86/crypto: add pclmul acceleration for crc64

 arch/x86/crypto/Makefile                  |   3 +
 arch/x86/crypto/crc64-rocksoft-pcl-asm.S  | 215 ++++++++++++++++++++++
 arch/x86/crypto/crc64-rocksoft-pcl_glue.c | 117 ++++++++++++
 block/Kconfig                             |   1 +
 block/bio-integrity.c                     |   1 +
 block/t10-pi.c                            | 198 +++++++++++++++++++-
 crypto/Kconfig                            |  20 ++
 crypto/Makefile                           |   1 +
 crypto/crc64_rocksoft_generic.c           | 104 +++++++++++
 drivers/nvme/host/core.c                  | 175 ++++++++++++++----
 drivers/nvme/host/nvme.h                  |   4 +-
 include/asm-generic/unaligned.h           |  26 +++
 include/linux/blk-integrity.h             |   1 +
 include/linux/crc64.h                     |   7 +
 include/linux/kernel.h                    |   6 +
 include/linux/nvme.h                      |  53 +++++-
 include/linux/t10-pi.h                    |  20 ++
 lib/Kconfig                               |   9 +
 lib/Kconfig.debug                         |   4 +
 lib/Makefile                              |   2 +
 lib/crc64-rocksoft.c                      | 129 +++++++++++++
 lib/crc64.c                               |  26 +++
 lib/gen_crc64table.c                      |  51 +++--
 lib/test_crc64.c                          | 133 +++++++++++++
 24 files changed, 1252 insertions(+), 54 deletions(-)
 create mode 100644 arch/x86/crypto/crc64-rocksoft-pcl-asm.S
 create mode 100644 arch/x86/crypto/crc64-rocksoft-pcl_glue.c
 create mode 100644 crypto/crc64_rocksoft_generic.c
 create mode 100644 lib/crc64-rocksoft.c
 create mode 100644 lib/test_crc64.c

-- 
2.25.4


^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCHv3 01/10] block: support pi with extended metadata
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-25 16:01   ` Christoph Hellwig
  2022-02-22 16:31 ` [PATCHv3 02/10] nvme: allow integrity on extended metadata formats Keith Busch
                   ` (8 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch, Hannes Reinecke

The nvme spec allows protection information formats with metadata
extending beyond the pi field. Use the actual size of the metadata field
for incrementing the buffer.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/bio-integrity.c         | 1 +
 block/t10-pi.c                | 4 ++--
 include/linux/blk-integrity.h | 1 +
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/block/bio-integrity.c b/block/bio-integrity.c
index 6996e7bd66e9..32929c89ba8a 100644
--- a/block/bio-integrity.c
+++ b/block/bio-integrity.c
@@ -165,6 +165,7 @@ static blk_status_t bio_integrity_process(struct bio *bio,
 
 	iter.disk_name = bio->bi_bdev->bd_disk->disk_name;
 	iter.interval = 1 << bi->interval_exp;
+	iter.tuple_size = bi->tuple_size;
 	iter.seed = proc_iter->bi_sector;
 	iter.prot_buf = bvec_virt(bip->bip_vec);
 
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 25a52a2a09a8..758a76518854 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -44,7 +44,7 @@ static blk_status_t t10_pi_generate(struct blk_integrity_iter *iter,
 			pi->ref_tag = 0;
 
 		iter->data_buf += iter->interval;
-		iter->prot_buf += sizeof(struct t10_pi_tuple);
+		iter->prot_buf += iter->tuple_size;
 		iter->seed++;
 	}
 
@@ -93,7 +93,7 @@ static blk_status_t t10_pi_verify(struct blk_integrity_iter *iter,
 
 next:
 		iter->data_buf += iter->interval;
-		iter->prot_buf += sizeof(struct t10_pi_tuple);
+		iter->prot_buf += iter->tuple_size;
 		iter->seed++;
 	}
 
diff --git a/include/linux/blk-integrity.h b/include/linux/blk-integrity.h
index 8a038ea0717e..378b2459efe2 100644
--- a/include/linux/blk-integrity.h
+++ b/include/linux/blk-integrity.h
@@ -19,6 +19,7 @@ struct blk_integrity_iter {
 	sector_t		seed;
 	unsigned int		data_size;
 	unsigned short		interval;
+	unsigned char		tuple_size;
 	const char		*disk_name;
 };
 
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 02/10] nvme: allow integrity on extended metadata formats
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
  2022-02-22 16:31 ` [PATCHv3 01/10] block: support pi with extended metadata Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-25 16:02   ` Christoph Hellwig
  2022-02-22 16:31 ` [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors Keith Busch
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch, Hannes Reinecke

The block integrity subsystem knows how to construct protection
information buffers with metadata beyond the protection information
fields. Remove the driver restriction.

Note, this can only work if the PI field appears first in the metadata,
as the integrity subsystem doesn't calculate guard tags on preceding
metadata.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 7 ++-----
 1 file changed, 2 insertions(+), 5 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 3b876dcab730..8132b1282082 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -1760,12 +1760,9 @@ static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
 	struct nvme_ctrl *ctrl = ns->ctrl;
 
-	/*
-	 * The PI implementation requires the metadata size to be equal to the
-	 * t10 pi tuple size.
-	 */
 	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-	if (ns->ms == sizeof(struct t10_pi_tuple))
+	if (id->dps & NVME_NS_DPS_PI_FIRST ||
+	    ns->ms == sizeof(struct t10_pi_tuple))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
 	else
 		ns->pi_type = 0;
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
  2022-02-22 16:31 ` [PATCHv3 01/10] block: support pi with extended metadata Keith Busch
  2022-02-22 16:31 ` [PATCHv3 02/10] nvme: allow integrity on extended metadata formats Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-22 16:52   ` Chaitanya Kulkarni
  2022-02-25 16:03   ` Christoph Hellwig
  2022-02-22 16:31 ` [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro Keith Busch
                   ` (6 subsequent siblings)
  9 siblings, 2 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch,
	Hannes Reinecke, Arnd Bergmann

The NVMe protocol extended the data integrity fields with unaligned
48-bit reference tags. Provide some helper accessors in
preparation for these.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/asm-generic/unaligned.h | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

diff --git a/include/asm-generic/unaligned.h b/include/asm-generic/unaligned.h
index 1c4242416c9f..8fc637379899 100644
--- a/include/asm-generic/unaligned.h
+++ b/include/asm-generic/unaligned.h
@@ -126,4 +126,30 @@ static inline void put_unaligned_le24(const u32 val, void *p)
 	__put_unaligned_le24(val, p);
 }
 
+static inline void __put_unaligned_be48(const u64 val, __u8 *p)
+{
+	*p++ = val >> 40;
+	*p++ = val >> 32;
+	*p++ = val >> 24;
+	*p++ = val >> 16;
+	*p++ = val >> 8;
+	*p++ = val;
+}
+
+static inline void put_unaligned_be48(const u64 val, void *p)
+{
+	__put_unaligned_be48(val, p);
+}
+
+static inline u64 __get_unaligned_be48(const u8 *p)
+{
+	return (u64)p[0] << 40 | (u64)p[1] << 32 | p[2] << 24 |
+		p[3] << 16 | p[4] << 8 | p[5];
+}
+
+static inline u64 get_unaligned_be48(const void *p)
+{
+	return __get_unaligned_be48(p);
+}
+
 #endif /* __ASM_GENERIC_UNALIGNED_H */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
                   ` (2 preceding siblings ...)
  2022-02-22 16:31 ` [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-22 16:45   ` Joe Perches
  2022-02-22 16:48   ` Chaitanya Kulkarni
  2022-02-22 16:31 ` [PATCHv3 05/10] lib: add rocksoft model crc64 Keith Busch
                   ` (5 subsequent siblings)
  9 siblings, 2 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch, Bart Van Assche

Recent data integrity field enhancements allow 48-bit reference tags.
Introduce a helper macro since this will be a repeated operation.

Suggested-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/linux/kernel.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index 33f47a996513..c1fa9fc2b5cd 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -63,6 +63,12 @@
 }					\
 )
 
+/**
+ * lower_48_bits - return bits 0-47 of a number
+ * @n: the number we're accessing
+ */
+#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
+
 /**
  * upper_32_bits - return bits 32-63 of a number
  * @n: the number we're accessing
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 05/10] lib: add rocksoft model crc64
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
                   ` (3 preceding siblings ...)
  2022-02-22 16:31 ` [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-25 16:04   ` Christoph Hellwig
  2022-02-22 16:31 ` [PATCHv3 06/10] crypto: add rocksoft 64b crc framework Keith Busch
                   ` (4 subsequent siblings)
  9 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch, Hannes Reinecke

The NVM Express specification extended data integrity fields to 64 bits
using the Rocksoft^TM parameters. Add the poly to the crc64 table
generation, and provide a generic library routine implementing the
algorithm.

The Rocksoft 64-bit CRC model parameters are as follows:
    Poly: 0xAD93D23594C93659
    Initial value: 0xFFFFFFFFFFFFFFFF
    Reflected Input: True
    Reflected Output: True
    Xor Final: 0xFFFFFFFFFFFFFFFF

Since this model used reflected bits, the implementation generates the
reflected table so the result is ordered consistently.

Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 include/linux/crc64.h |  2 ++
 lib/crc64.c           | 26 ++++++++++++++++++++++
 lib/gen_crc64table.c  | 51 +++++++++++++++++++++++++++++++++----------
 3 files changed, 68 insertions(+), 11 deletions(-)

diff --git a/include/linux/crc64.h b/include/linux/crc64.h
index c756e65a1b58..9480f38cc7cf 100644
--- a/include/linux/crc64.h
+++ b/include/linux/crc64.h
@@ -8,4 +8,6 @@
 #include <linux/types.h>
 
 u64 __pure crc64_be(u64 crc, const void *p, size_t len);
+u64 __pure crc64_rocksoft_generic(u64 crc, const void *p, size_t len);
+
 #endif /* _LINUX_CRC64_H */
diff --git a/lib/crc64.c b/lib/crc64.c
index 9f852a89ee2a..80072fa007a9 100644
--- a/lib/crc64.c
+++ b/lib/crc64.c
@@ -22,6 +22,13 @@
  * x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 +
  * x^7 + x^4 + x + 1
  *
+ * crc64rocksoft[256] table is from the Rocksoft specification polynomial
+ * defined as,
+ *
+ * x^64 + x^63 + x^61 + x^59 + x^58 + x^56 + x^55 + x^52 + x^49 + x^48 + x^47 +
+ * x^46 + x^44 + x^41 + x^37 + x^36 + x^34 + x^32 + x^31 + x^28 + x^26 + x^23 +
+ * x^22 + x^19 + x^16 + x^13 + x^12 + x^10 + x^9 + x^6 + x^4 + x^3 + 1
+ *
  * Copyright 2018 SUSE Linux.
  *   Author: Coly Li <colyli@suse.de>
  */
@@ -55,3 +62,22 @@ u64 __pure crc64_be(u64 crc, const void *p, size_t len)
 	return crc;
 }
 EXPORT_SYMBOL_GPL(crc64_be);
+
+/**
+ * crc64_rocksoft_generic - Calculate bitwise Rocksoft CRC64
+ * @crc: seed value for computation. (u64)~0 for a new CRC calculation, or the
+ * 	 previous crc64 value if computing incrementally.
+ * @p: pointer to buffer over which CRC64 is run
+ * @len: length of buffer @p
+ */
+u64 __pure crc64_rocksoft_generic(u64 crc, const void *p, size_t len)
+{
+	const unsigned char *_p = p;
+	size_t i;
+
+	for (i = 0; i < len; i++)
+		crc = (crc >> 8) ^ crc64rocksofttable[(crc & 0xff) ^ *_p++];
+
+	return crc ^ (u64)~0;
+}
+EXPORT_SYMBOL_GPL(crc64_rocksoft_generic);
diff --git a/lib/gen_crc64table.c b/lib/gen_crc64table.c
index 094b43aef8db..55e222acd0b8 100644
--- a/lib/gen_crc64table.c
+++ b/lib/gen_crc64table.c
@@ -17,10 +17,30 @@
 #include <stdio.h>
 
 #define CRC64_ECMA182_POLY 0x42F0E1EBA9EA3693ULL
+#define CRC64_ROCKSOFT_POLY 0x9A6C9329AC4BC9B5ULL
 
 static uint64_t crc64_table[256] = {0};
+static uint64_t crc64_rocksoft_table[256] = {0};
 
-static void generate_crc64_table(void)
+static void generate_reflected_crc64_table(uint64_t table[256], uint64_t poly)
+{
+	uint64_t i, j, c, crc;
+
+	for (i = 0; i < 256; i++) {
+		crc = 0ULL;
+		c = i;
+
+		for (j = 0; j < 8; j++) {
+			if ((crc ^ (c >> j)) & 1)
+				crc = (crc >> 1) ^ poly;
+			else
+				crc >>= 1;
+		}
+		table[i] = crc;
+	}
+}
+
+static void generate_crc64_table(uint64_t table[256], uint64_t poly)
 {
 	uint64_t i, j, c, crc;
 
@@ -30,26 +50,22 @@ static void generate_crc64_table(void)
 
 		for (j = 0; j < 8; j++) {
 			if ((crc ^ c) & 0x8000000000000000ULL)
-				crc = (crc << 1) ^ CRC64_ECMA182_POLY;
+				crc = (crc << 1) ^ poly;
 			else
 				crc <<= 1;
 			c <<= 1;
 		}
 
-		crc64_table[i] = crc;
+		table[i] = crc;
 	}
 }
 
-static void print_crc64_table(void)
+static void output_table(uint64_t table[256])
 {
 	int i;
 
-	printf("/* this file is generated - do not edit */\n\n");
-	printf("#include <linux/types.h>\n");
-	printf("#include <linux/cache.h>\n\n");
-	printf("static const u64 ____cacheline_aligned crc64table[256] = {\n");
 	for (i = 0; i < 256; i++) {
-		printf("\t0x%016" PRIx64 "ULL", crc64_table[i]);
+		printf("\t0x%016" PRIx64 "ULL", table[i]);
 		if (i & 0x1)
 			printf(",\n");
 		else
@@ -58,9 +74,22 @@ static void print_crc64_table(void)
 	printf("};\n");
 }
 
+static void print_crc64_tables(void)
+{
+	printf("/* this file is generated - do not edit */\n\n");
+	printf("#include <linux/types.h>\n");
+	printf("#include <linux/cache.h>\n\n");
+	printf("static const u64 ____cacheline_aligned crc64table[256] = {\n");
+	output_table(crc64_table);
+
+	printf("\nstatic const u64 ____cacheline_aligned crc64rocksofttable[256] = {\n");
+	output_table(crc64_rocksoft_table);
+}
+
 int main(int argc, char *argv[])
 {
-	generate_crc64_table();
-	print_crc64_table();
+	generate_crc64_table(crc64_table, CRC64_ECMA182_POLY);
+	generate_reflected_crc64_table(crc64_rocksoft_table, CRC64_ROCKSOFT_POLY);
+	print_crc64_tables();
 	return 0;
 }
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 06/10] crypto: add rocksoft 64b crc framework
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
                   ` (4 preceding siblings ...)
  2022-02-22 16:31 ` [PATCHv3 05/10] lib: add rocksoft model crc64 Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-22 19:50   ` Eric Biggers
  2022-02-22 19:56   ` Eric Biggers
  2022-02-22 16:31 ` [PATCHv3 07/10] lib: add crc64 tests Keith Busch
                   ` (3 subsequent siblings)
  9 siblings, 2 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

Hardware specific features may be able to calculate a crc64, so provide
a framework for drivers to register their implementation. If nothing is
registered, fallback to the generic table lookup implementation. The
implementation is modeled after the crct10dif equivalent.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 crypto/Kconfig                  |   9 +++
 crypto/Makefile                 |   1 +
 crypto/crc64_rocksoft_generic.c | 104 +++++++++++++++++++++++++
 include/linux/crc64.h           |   5 ++
 lib/Kconfig                     |   9 +++
 lib/Makefile                    |   1 +
 lib/crc64-rocksoft.c            | 129 ++++++++++++++++++++++++++++++++
 7 files changed, 258 insertions(+)
 create mode 100644 crypto/crc64_rocksoft_generic.c
 create mode 100644 lib/crc64-rocksoft.c

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 442765219c37..e343147b9f8f 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -735,6 +735,15 @@ config CRYPTO_CRCT10DIF_VPMSUM
 	  multiply-sum (vpmsum) instructions, introduced in POWER8. Enable on
 	  POWER8 and newer processors for improved performance.
 
+config CRYPTO_CRC64_ROCKSOFT
+	tristate "Rocksoft Model CRC64 algorithm"
+	depends on CRC64
+	select CRYPTO_HASH
+	help
+	  Rocksoft Model CRC64 computation is being cast as a crypto
+	  transform. This allows for faster crc64 transforms to be used
+	  if they are available.
+
 config CRYPTO_VPMSUM_TESTER
 	tristate "Powerpc64 vpmsum hardware acceleration tester"
 	depends on CRYPTO_CRCT10DIF_VPMSUM && CRYPTO_CRC32C_VPMSUM
diff --git a/crypto/Makefile b/crypto/Makefile
index d76bff8d0ffd..f754c4d17d6b 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -152,6 +152,7 @@ obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
 obj-$(CONFIG_CRYPTO_CRC32C) += crc32c_generic.o
 obj-$(CONFIG_CRYPTO_CRC32) += crc32_generic.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF) += crct10dif_common.o crct10dif_generic.o
+obj-$(CONFIG_CRYPTO_CRC64_ROCKSOFT) += crc64_rocksoft_generic.o
 obj-$(CONFIG_CRYPTO_AUTHENC) += authenc.o authencesn.o
 obj-$(CONFIG_CRYPTO_LZO) += lzo.o lzo-rle.o
 obj-$(CONFIG_CRYPTO_LZ4) += lz4.o
diff --git a/crypto/crc64_rocksoft_generic.c b/crypto/crc64_rocksoft_generic.c
new file mode 100644
index 000000000000..55bad1939614
--- /dev/null
+++ b/crypto/crc64_rocksoft_generic.c
@@ -0,0 +1,104 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Cryptographic API.
+ *
+ * Rocksoft^TM model CRC64 Crypto Transform
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/crc64.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <linux/init.h>
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <asm/cpufeatures.h>
+#include <asm/cpu_device_id.h>
+#include <asm/simd.h>
+
+struct chksum_desc_ctx {
+	u64 crc;
+};
+
+static int chksum_init(struct shash_desc *desc)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	ctx->crc = 0;
+
+	return 0;
+}
+
+static int chksum_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	ctx->crc = crc64_rocksoft_generic(ctx->crc, data, length);
+	return 0;
+}
+
+static int chksum_final(struct shash_desc *desc, u8 *out)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	*(u64 *)out = ctx->crc;
+	return 0;
+}
+
+static int __chksum_finup(u64 crc, const u8 *data, unsigned int len, u8 *out)
+{
+	*(u64 *)out = crc64_rocksoft_generic(crc, data, len);
+	return 0;
+}
+
+static int chksum_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	return __chksum_finup(ctx->crc, data, len, out);
+}
+
+static int chksum_digest(struct shash_desc *desc, const u8 *data,
+			 unsigned int length, u8 *out)
+{
+	return __chksum_finup(0, data, length, out);
+}
+
+static struct shash_alg alg = {
+	.digestsize	= 	8,
+	.init		=	chksum_init,
+	.update		=	chksum_update,
+	.final		=	chksum_final,
+	.finup		=	chksum_finup,
+	.digest		=	chksum_digest,
+	.descsize	=	sizeof(struct chksum_desc_ctx),
+	.base		=	{
+		.cra_name		=	CRC64_ROCKSOFT_STRING,
+		.cra_driver_name	=	"crc64-rocksoft-generic",
+		.cra_priority		=	200,
+		.cra_blocksize		=	1,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static int __init crc64_rocksoft_x86_mod_init(void)
+{
+	return crypto_register_shash(&alg);
+}
+
+static void __exit crc64_rocksoft_x86_mod_fini(void)
+{
+	crypto_unregister_shash(&alg);
+}
+
+module_init(crc64_rocksoft_x86_mod_init);
+module_exit(crc64_rocksoft_x86_mod_fini);
+
+MODULE_AUTHOR("Keith Busch <kbusch@kernel.org>");
+MODULE_DESCRIPTION("Rocksoft model CRC64 calculation.");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("crc64-rocksoft");
+MODULE_ALIAS_CRYPTO("crc64-rocksoft-generic");
diff --git a/include/linux/crc64.h b/include/linux/crc64.h
index 9480f38cc7cf..e044c60d1e61 100644
--- a/include/linux/crc64.h
+++ b/include/linux/crc64.h
@@ -7,7 +7,12 @@
 
 #include <linux/types.h>
 
+#define CRC64_ROCKSOFT_STRING "crc64-rocksoft"
+
 u64 __pure crc64_be(u64 crc, const void *p, size_t len);
 u64 __pure crc64_rocksoft_generic(u64 crc, const void *p, size_t len);
 
+u64 crc64_rocksoft(const unsigned char *buffer, size_t len);
+u64 crc64_rocksoft_update(u64 crc, const unsigned char *buffer, size_t len);
+
 #endif /* _LINUX_CRC64_H */
diff --git a/lib/Kconfig b/lib/Kconfig
index c80fde816a7e..d4a46d635cb1 100644
--- a/lib/Kconfig
+++ b/lib/Kconfig
@@ -146,6 +146,15 @@ config CRC_T10DIF
 	  kernel tree needs to calculate CRC checks for use with the
 	  SCSI data integrity subsystem.
 
+config CRC64_ROCKSOFT
+	tristate "CRC calculation for the Rocksoft^TM model CRC64"
+	select CRYPTO
+	select CRYPTO_CRC64_ROCKSOFT
+	help
+	  This option is only needed if a module that's not in the
+	  kernel tree needs to calculate CRC checks for use with the
+	  rocksoft model parameters.
+
 config CRC_ITU_T
 	tristate "CRC ITU-T V.41 functions"
 	help
diff --git a/lib/Makefile b/lib/Makefile
index 300f569c626b..130bed83cdf2 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -177,6 +177,7 @@ obj-$(CONFIG_LIBCRC32C)	+= libcrc32c.o
 obj-$(CONFIG_CRC8)	+= crc8.o
 obj-$(CONFIG_XXHASH)	+= xxhash.o
 obj-$(CONFIG_GENERIC_ALLOCATOR) += genalloc.o
+obj-$(CONFIG_CRC64_ROCKSOFT) += crc64-rocksoft.o
 
 obj-$(CONFIG_842_COMPRESS) += 842/
 obj-$(CONFIG_842_DECOMPRESS) += 842/
diff --git a/lib/crc64-rocksoft.c b/lib/crc64-rocksoft.c
new file mode 100644
index 000000000000..49d7418ea827
--- /dev/null
+++ b/lib/crc64-rocksoft.c
@@ -0,0 +1,129 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Rocksoft^TM model CRC64 calculation
+ */
+
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/crc64.h>
+#include <linux/err.h>
+#include <linux/init.h>
+#include <crypto/hash.h>
+#include <crypto/algapi.h>
+#include <linux/static_key.h>
+#include <linux/notifier.h>
+
+static struct crypto_shash __rcu *crc64_rocksoft_tfm;
+static DEFINE_STATIC_KEY_TRUE(crc64_rocksoft_fallback);
+static DEFINE_MUTEX(crc64_rocksoft_mutex);
+static struct work_struct crc64_rocksoft_rehash_work;
+
+static int crc64_rocksoft_notify(struct notifier_block *self, unsigned long val, void *data)
+{
+	struct crypto_alg *alg = data;
+
+	if (val != CRYPTO_MSG_ALG_LOADED ||
+	    strcmp(alg->cra_name, CRC64_ROCKSOFT_STRING))
+		return NOTIFY_DONE;
+
+	schedule_work(&crc64_rocksoft_rehash_work);
+	return NOTIFY_OK;
+}
+
+static void crc64_rocksoft_rehash(struct work_struct *work)
+{
+	struct crypto_shash *new, *old;
+
+	mutex_lock(&crc64_rocksoft_mutex);
+	old = rcu_dereference_protected(crc64_rocksoft_tfm,
+					lockdep_is_held(&crc64_rocksoft_mutex));
+	new = crypto_alloc_shash(CRC64_ROCKSOFT_STRING, 0, 0);
+	if (IS_ERR(new)) {
+		mutex_unlock(&crc64_rocksoft_mutex);
+		return;
+	}
+	rcu_assign_pointer(crc64_rocksoft_tfm, new);
+	mutex_unlock(&crc64_rocksoft_mutex);
+
+	if (old) {
+		synchronize_rcu();
+		crypto_free_shash(old);
+	} else {
+		static_branch_disable(&crc64_rocksoft_fallback);
+	}
+}
+
+static struct notifier_block crc64_rocksoft_nb = {
+	.notifier_call = crc64_rocksoft_notify,
+};
+
+u64 crc64_rocksoft_update(u64 crc, const unsigned char *buffer, size_t len)
+{
+	struct {
+		struct shash_desc shash;
+		u64 crc;
+	} desc;
+	int err;
+
+	if (static_branch_unlikely(&crc64_rocksoft_fallback))
+		return crc64_rocksoft_generic(crc, buffer, len);
+
+	rcu_read_lock();
+	desc.shash.tfm = rcu_dereference(crc64_rocksoft_tfm);
+	desc.crc = crc;
+	err = crypto_shash_update(&desc.shash, buffer, len);
+	rcu_read_unlock();
+
+	BUG_ON(err);
+
+	return desc.crc;
+}
+EXPORT_SYMBOL(crc64_rocksoft_update);
+
+u64 crc64_rocksoft(const unsigned char *buffer, size_t len)
+{
+	return crc64_rocksoft_update(~0ULL, buffer, len);
+}
+EXPORT_SYMBOL(crc64_rocksoft);
+
+static int __init crc64_rocksoft_mod_init(void)
+{
+	INIT_WORK(&crc64_rocksoft_rehash_work, crc64_rocksoft_rehash);
+	crypto_register_notifier(&crc64_rocksoft_nb);
+	crc64_rocksoft_rehash(&crc64_rocksoft_rehash_work);
+	return 0;
+}
+
+static void __exit crc64_rocksoft_mod_fini(void)
+{
+	crypto_unregister_notifier(&crc64_rocksoft_nb);
+	cancel_work_sync(&crc64_rocksoft_rehash_work);
+	crypto_free_shash(rcu_dereference_protected(crc64_rocksoft_tfm, 1));
+}
+
+module_init(crc64_rocksoft_mod_init);
+module_exit(crc64_rocksoft_mod_fini);
+
+static int crc64_rocksoft_transform_show(char *buffer, const struct kernel_param *kp)
+{
+	struct crypto_shash *tfm;
+	int len;
+
+	if (static_branch_unlikely(&crc64_rocksoft_fallback))
+		return sprintf(buffer, "fallback\n");
+
+	rcu_read_lock();
+	tfm = rcu_dereference(crc64_rocksoft_tfm);
+	len = snprintf(buffer, PAGE_SIZE, "%s\n",
+		       crypto_shash_driver_name(tfm));
+	rcu_read_unlock();
+
+	return len;
+}
+
+module_param_call(transform, NULL, crc64_rocksoft_transform_show, NULL, 0444);
+
+MODULE_AUTHOR("Keith Busch <kbusch@kernel.org>");
+MODULE_DESCRIPTION("Rocksoft model CRC64 calculation (library API)");
+MODULE_LICENSE("GPL");
+MODULE_SOFTDEP("pre: crc64");
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 07/10] lib: add crc64 tests
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
                   ` (5 preceding siblings ...)
  2022-02-22 16:31 ` [PATCHv3 06/10] crypto: add rocksoft 64b crc framework Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-22 16:50   ` Chaitanya Kulkarni
  2022-02-25 16:05   ` Christoph Hellwig
  2022-02-22 16:31 ` [PATCHv3 08/10] block: add pi for nvme enhanced integrity Keith Busch
                   ` (2 subsequent siblings)
  9 siblings, 2 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

Provide a module to test the rocksoft crc64 calculations with well known
inputs and exepected values. Check the generic table implementation and
whatever module is registered from the crypto library, and compare their
speeds.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 lib/Kconfig.debug |   4 ++
 lib/Makefile      |   1 +
 lib/test_crc64.c  | 133 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 138 insertions(+)
 create mode 100644 lib/test_crc64.c

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 14b89aa37c5c..149de11ae903 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -2214,6 +2214,10 @@ config TEST_UUID
 config TEST_XARRAY
 	tristate "Test the XArray code at runtime"
 
+config TEST_CRC64
+	depends on CRC64
+	tristate "Test the crc64 code at runtime"
+
 config TEST_OVERFLOW
 	tristate "Test check_*_overflow() functions at runtime"
 
diff --git a/lib/Makefile b/lib/Makefile
index 130bed83cdf2..0895a3fc3f5a 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -103,6 +103,7 @@ obj-$(CONFIG_TEST_HMM) += test_hmm.o
 obj-$(CONFIG_TEST_FREE_PAGES) += test_free_pages.o
 obj-$(CONFIG_KPROBES_SANITY_TEST) += test_kprobes.o
 obj-$(CONFIG_TEST_REF_TRACKER) += test_ref_tracker.o
+obj-$(CONFIG_TEST_CRC64) += test_crc64.o
 #
 # CFLAGS for compiling floating point code inside the kernel. x86/Makefile turns
 # off the generation of FPU/SSE* instructions for kernel proper but FPU_FLAGS
diff --git a/lib/test_crc64.c b/lib/test_crc64.c
new file mode 100644
index 000000000000..281aacd20f0a
--- /dev/null
+++ b/lib/test_crc64.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * Tests were selected from NVM Express NVM Command Set Specification 1.0a,
+ * section 5.2.1.3.5 "64b CRC Test Cases" available here:
+ *
+ *   https://nvmexpress.org/wp-content/uploads/NVMe-NVM-Command-Set-Specification-1.0a-2021.07.26-Ratified.pdf
+ *
+ * Copyright 2022 Keith Busch <kbusch@kernel.org>
+ */
+
+#include <linux/crc64.h>
+#include <linux/module.h>
+#include <linux/vmalloc.h>
+#include <linux/random.h>
+
+static unsigned int tests_passed;
+static unsigned int tests_run;
+
+#define ALL_ZEROS 0x6482D367EB22B64EULL
+#define ALL_FFS 0xC0DDBA7302ECA3ACULL
+#define INC 0x3E729F5F6750449CULL
+#define DEC 0x9A2DF64B8E9E517EULL
+
+static u8 buffer[4096];
+
+#define CRC_CHECK(c, v) do {					\
+	tests_run++;						\
+	if (c != v)						\
+		printk("BUG at %s:%d expected:%llx got:%llx\n", \
+			__func__, __LINE__, v, c);		\
+	else							\
+		tests_passed++;					\
+} while (0)
+
+static void randomize(u64 *b, int size)
+{
+	int i;
+
+	for (i = 0; i < size / 8; i++)
+		b[i] = get_random_u64();
+}
+
+static void crc_speed_tests(void)
+{
+	unsigned long size = 1 << 20;
+	unsigned test_size;
+	void *b;
+
+	b = vmalloc(size);
+	if (!b)
+		return;
+
+	test_size = 512;
+	while (test_size <= size) {
+		ktime_t start_time, end_time;
+		u64 crc1, crc2;
+		s64 t1, t2;
+		int i;
+
+		randomize(b, test_size);
+		crc1 = crc2 = ~0ULL;
+
+		start_time = ktime_get();
+		for (i = 0; i < 1024; i++)
+			crc1 = crc64_rocksoft_generic(crc1, b, test_size);
+		end_time = ktime_get();
+		t1 = ktime_us_delta(end_time, start_time);
+
+		start_time = ktime_get();
+		for (i = 0; i < 1024; i++)
+			crc2 = crc64_rocksoft_update(crc2, b, test_size);
+		end_time = ktime_get();
+		t2 = ktime_us_delta(end_time, start_time);
+
+		printk("Size:%-7u Generic:%-7lld Library:%lld\n", test_size,
+			t1, t2);
+		CRC_CHECK(crc1, crc2);
+
+		test_size <<= 1;
+	}
+
+        vfree(b);
+}
+
+static int crc_tests(void)
+{
+	__u64 crc;
+	int i;
+
+	memset(buffer, 0, sizeof(buffer));
+	crc = crc64_rocksoft_generic(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, ALL_ZEROS);
+
+	crc = crc64_rocksoft(buffer, 4096);
+	CRC_CHECK(crc, ALL_ZEROS);
+
+	memset(buffer, 0xff, sizeof(buffer));
+	crc = crc64_rocksoft_generic(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, ALL_FFS);
+
+	crc = crc64_rocksoft(buffer, 4096);
+	CRC_CHECK(crc, ALL_FFS);
+
+	for (i = 0; i < 4096; i++)
+		buffer[i] = i & 0xff;
+	crc = crc64_rocksoft_generic(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, INC);
+
+	crc = crc64_rocksoft(buffer, 4096);
+	CRC_CHECK(crc, INC);
+
+	for (i = 0; i < 4096; i++)
+		buffer[i] = 0xff - (i & 0xff);
+	crc = crc64_rocksoft_generic(~0ULL, buffer, 4096);
+	CRC_CHECK(crc, DEC);
+
+	crc = crc64_rocksoft(buffer, 4096);
+	CRC_CHECK(crc, DEC);
+
+	crc_speed_tests();
+
+	printk("CRC64: %u of %u tests passed\n", tests_passed, tests_run);
+	return (tests_run == tests_passed) ? 0 : -EINVAL;
+}
+
+static void crc_exit(void)
+{
+}
+
+module_init(crc_tests);
+module_exit(crc_exit);
+MODULE_AUTHOR("Keith Busch <kbusch@kernel.org>");
+MODULE_LICENSE("GPL");
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 08/10] block: add pi for nvme enhanced integrity
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
                   ` (6 preceding siblings ...)
  2022-02-22 16:31 ` [PATCHv3 07/10] lib: add crc64 tests Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-25 16:14   ` Christoph Hellwig
  2022-02-22 16:31 ` [PATCHv3 09/10] nvme: add support for enhanced metadata Keith Busch
  2022-02-22 16:31 ` [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64 Keith Busch
  9 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch, Hannes Reinecke

The NVMe specification defines new data integrity formats beyond the
t10 tuple. Add support for the specification defined CRC64 formats,
assuming the reference tag does not need to be split with the "storage
tag".

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 block/Kconfig          |   1 +
 block/t10-pi.c         | 194 +++++++++++++++++++++++++++++++++++++++++
 include/linux/t10-pi.h |  20 +++++
 3 files changed, 215 insertions(+)

diff --git a/block/Kconfig b/block/Kconfig
index 168b873eb666..ce3d8d63c199 100644
--- a/block/Kconfig
+++ b/block/Kconfig
@@ -75,6 +75,7 @@ config BLK_DEV_INTEGRITY_T10
 	tristate
 	depends on BLK_DEV_INTEGRITY
 	select CRC_T10DIF
+	select CRC64_ROCKSOFT
 
 config BLK_DEV_ZONED
 	bool "Zoned block device support"
diff --git a/block/t10-pi.c b/block/t10-pi.c
index 758a76518854..449d62782d79 100644
--- a/block/t10-pi.c
+++ b/block/t10-pi.c
@@ -7,8 +7,10 @@
 #include <linux/t10-pi.h>
 #include <linux/blk-integrity.h>
 #include <linux/crc-t10dif.h>
+#include <linux/crc64.h>
 #include <linux/module.h>
 #include <net/checksum.h>
+#include <asm/unaligned.h>
 
 typedef __be16 (csum_fn) (void *, unsigned int);
 
@@ -278,4 +280,196 @@ const struct blk_integrity_profile t10_pi_type3_ip = {
 };
 EXPORT_SYMBOL(t10_pi_type3_ip);
 
+static __be64 nvme_pi_crc64(void *data, unsigned int len)
+{
+	return cpu_to_be64(crc64_rocksoft(data, len));
+}
+
+static blk_status_t nvme_crc64_generate(struct blk_integrity_iter *iter,
+					enum t10_dif_type type)
+{
+	unsigned int i;
+
+	for (i = 0 ; i < iter->data_size ; i += iter->interval) {
+		struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
+
+		pi->guard_tag = nvme_pi_crc64(iter->data_buf, iter->interval);
+		pi->app_tag = 0;
+
+		if (type == T10_PI_TYPE1_PROTECTION)
+			put_unaligned_be48(iter->seed, pi->ref_tag);
+		else
+			put_unaligned_be48(0ULL, pi->ref_tag);
+
+		iter->data_buf += iter->interval;
+		iter->prot_buf += iter->tuple_size;
+		iter->seed++;
+	}
+
+	return BLK_STS_OK;
+}
+
+static bool nvme_crc64_ref_escape(u8 *ref_tag)
+{
+	static u8 ref_escape[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
+
+	return memcmp(ref_tag, ref_escape, sizeof(ref_escape)) == 0;
+}
+
+static blk_status_t nvme_crc64_verify(struct blk_integrity_iter *iter,
+				      enum t10_dif_type type)
+{
+	unsigned int i;
+
+	for (i = 0; i < iter->data_size; i += iter->interval) {
+		struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
+		u64 ref, seed;
+		__be64 csum;
+
+		if (type == T10_PI_TYPE1_PROTECTION) {
+			if (pi->app_tag == T10_PI_APP_ESCAPE)
+				goto next;
+
+			ref = get_unaligned_be48(pi->ref_tag);
+			seed = lower_48_bits(iter->seed);
+			if (ref != seed) {
+				pr_err("%s: ref tag error at location %llu (rcvd %llu)\n",
+					iter->disk_name, seed, ref);
+				return BLK_STS_PROTECTION;
+			}
+		} else if (type == T10_PI_TYPE3_PROTECTION) {
+			if (pi->app_tag == T10_PI_APP_ESCAPE &&
+			    nvme_crc64_ref_escape(pi->ref_tag))
+				goto next;
+		}
+
+		csum = nvme_pi_crc64(iter->data_buf, iter->interval);
+		if (pi->guard_tag != csum) {
+			pr_err("%s: guard tag error at sector %llu " \
+			       "(rcvd %016llx, want %016llx)\n",
+				iter->disk_name, (unsigned long long)iter->seed,
+				be64_to_cpu(pi->guard_tag), be64_to_cpu(csum));
+			return BLK_STS_PROTECTION;
+		}
+
+next:
+		iter->data_buf += iter->interval;
+		iter->prot_buf += iter->tuple_size;
+		iter->seed++;
+	}
+
+	return BLK_STS_OK;
+}
+
+static blk_status_t nvme_pi_type1_verify_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_verify(iter, T10_PI_TYPE1_PROTECTION);
+}
+
+static blk_status_t nvme_pi_type1_generate_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_generate(iter, T10_PI_TYPE1_PROTECTION);
+}
+
+static void nvme_pi_type1_prepare(struct request *rq)
+{
+	const int tuple_sz = rq->q->integrity.tuple_size;
+	u64 ref_tag = nvme_pi_extended_ref_tag(rq);
+	struct bio *bio;
+
+	__rq_for_each_bio(bio, rq) {
+		struct bio_integrity_payload *bip = bio_integrity(bio);
+		u64 virt = lower_48_bits(bip_get_seed(bip));
+		struct bio_vec iv;
+		struct bvec_iter iter;
+
+		/* Already remapped? */
+		if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
+			break;
+
+		bip_for_each_vec(iv, bip, iter) {
+			unsigned int j;
+			void *p;
+
+			p = bvec_kmap_local(&iv);
+			for (j = 0; j < iv.bv_len; j += tuple_sz) {
+				struct nvme_crc64_pi_tuple *pi = p;
+				u64 ref = get_unaligned_be48(pi->ref_tag);
+
+				if (ref == virt)
+					put_unaligned_be48(ref_tag, pi->ref_tag);
+				virt++;
+				ref_tag++;
+				p += tuple_sz;
+			}
+			kunmap_local(p);
+		}
+
+		bip->bip_flags |= BIP_MAPPED_INTEGRITY;
+	}
+}
+
+static void nvme_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
+{
+	unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
+	const int tuple_sz = rq->q->integrity.tuple_size;
+	u64 ref_tag = nvme_pi_extended_ref_tag(rq);
+	struct bio *bio;
+
+	__rq_for_each_bio(bio, rq) {
+		struct bio_integrity_payload *bip = bio_integrity(bio);
+		u64 virt = lower_48_bits(bip_get_seed(bip));
+		struct bio_vec iv;
+		struct bvec_iter iter;
+
+		bip_for_each_vec(iv, bip, iter) {
+			unsigned int j;
+			void *p;
+
+			p = bvec_kmap_local(&iv);
+			for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
+				struct nvme_crc64_pi_tuple *pi = p;
+				u64 ref = get_unaligned_be48(pi->ref_tag);
+
+				if (ref == ref_tag)
+					put_unaligned_be48(virt, pi->ref_tag);
+				virt++;
+				ref_tag++;
+				intervals--;
+				p += tuple_sz;
+			}
+			kunmap_local(p);
+		}
+	}
+}
+
+static blk_status_t nvme_pi_type3_verify_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_verify(iter, T10_PI_TYPE3_PROTECTION);
+}
+
+static blk_status_t nvme_pi_type3_generate_crc(struct blk_integrity_iter *iter)
+{
+	return nvme_crc64_generate(iter, T10_PI_TYPE3_PROTECTION);
+}
+
+const struct blk_integrity_profile nvme_pi_type1_crc64 = {
+	.name			= "NVME-DIF-TYPE1-CRC64",
+	.generate_fn		= nvme_pi_type1_generate_crc,
+	.verify_fn		= nvme_pi_type1_verify_crc,
+	.prepare_fn		= nvme_pi_type1_prepare,
+	.complete_fn		= nvme_pi_type1_complete,
+};
+EXPORT_SYMBOL(nvme_pi_type1_crc64);
+
+const struct blk_integrity_profile nvme_pi_type3_crc64 = {
+	.name			= "NVME-DIF-TYPE3-CRC64",
+	.generate_fn		= nvme_pi_type3_generate_crc,
+	.verify_fn		= nvme_pi_type3_verify_crc,
+	.prepare_fn		= t10_pi_type3_prepare,
+	.complete_fn		= t10_pi_type3_complete,
+};
+EXPORT_SYMBOL(nvme_pi_type3_crc64);
+
+MODULE_LICENSE("GPL");
 MODULE_LICENSE("GPL");
diff --git a/include/linux/t10-pi.h b/include/linux/t10-pi.h
index c635c2e014e3..a246ab3840bc 100644
--- a/include/linux/t10-pi.h
+++ b/include/linux/t10-pi.h
@@ -53,4 +53,24 @@ extern const struct blk_integrity_profile t10_pi_type1_ip;
 extern const struct blk_integrity_profile t10_pi_type3_crc;
 extern const struct blk_integrity_profile t10_pi_type3_ip;
 
+struct nvme_crc64_pi_tuple {
+	__be64 guard_tag;
+	__be16 app_tag;
+	__u8   ref_tag[6];
+};
+
+static inline u64 nvme_pi_extended_ref_tag(struct request *rq)
+{
+	unsigned int shift = ilog2(queue_logical_block_size(rq->q));
+
+#ifdef CONFIG_BLK_DEV_INTEGRITY
+	if (rq->q->integrity.interval_exp)
+		shift = rq->q->integrity.interval_exp;
+#endif
+	return lower_48_bits(blk_rq_pos(rq) >> (shift - SECTOR_SHIFT));
+}
+
+extern const struct blk_integrity_profile nvme_pi_type1_crc64;
+extern const struct blk_integrity_profile nvme_pi_type3_crc64;
+
 #endif
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 09/10] nvme: add support for enhanced metadata
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
                   ` (7 preceding siblings ...)
  2022-02-22 16:31 ` [PATCHv3 08/10] block: add pi for nvme enhanced integrity Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-25 16:17   ` Christoph Hellwig
  2022-03-02  3:18   ` Martin K. Petersen
  2022-02-22 16:31 ` [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64 Keith Busch
  9 siblings, 2 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch,
	Hannes Reinecke, Klaus Jensen

NVM Express ratified TP 4068 defines new protection information formats.
Implement support for the CRC64 guard tags.

Since the block layer doesn't support variable length reference tags,
driver support for the Storage Tag space is not supported at this time.

Cc: Hannes Reinecke <hare@suse.de>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Klaus Jensen <its@irrelevant.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 172 ++++++++++++++++++++++++++++++++-------
 drivers/nvme/host/nvme.h |   4 +-
 include/linux/nvme.h     |  53 ++++++++++--
 3 files changed, 191 insertions(+), 38 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 8132b1282082..5599c84ec5ec 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -918,6 +918,30 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
 	return BLK_STS_OK;
 }
 
+static inline void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd,
+				    struct request *req)
+{
+	u32 upper, lower;
+	u64 ref48;
+
+	/* both rw and write zeroes share the same reftag format */
+	switch (ns->guard_type) {
+	case NVME_NVM_NS_16B_GUARD:
+		cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
+		break;
+	case NVME_NVM_NS_64B_GUARD:
+		ref48 = nvme_pi_extended_ref_tag(req);
+		lower = lower_32_bits(ref48);
+		upper = upper_32_bits(ref48);
+
+		cmnd->rw.reftag = cpu_to_le32(lower);
+		cmnd->rw.cdw3 = cpu_to_le32(upper);
+		break;
+	default:
+		break;
+	}
+}
+
 static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
 		struct request *req, struct nvme_command *cmnd)
 {
@@ -939,8 +963,7 @@ static inline blk_status_t nvme_setup_write_zeroes(struct nvme_ns *ns,
 		switch (ns->pi_type) {
 		case NVME_NS_DPS_PI_TYPE1:
 		case NVME_NS_DPS_PI_TYPE2:
-			cmnd->write_zeroes.reftag =
-				cpu_to_le32(t10_pi_ref_tag(req));
+			nvme_set_ref_tag(ns, cmnd, req);
 			break;
 		}
 	}
@@ -967,7 +990,8 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 	cmnd->rw.opcode = op;
 	cmnd->rw.flags = 0;
 	cmnd->rw.nsid = cpu_to_le32(ns->head->ns_id);
-	cmnd->rw.rsvd2 = 0;
+	cmnd->rw.cdw2 = 0;
+	cmnd->rw.cdw3 = 0;
 	cmnd->rw.metadata = 0;
 	cmnd->rw.slba = cpu_to_le64(nvme_sect_to_lba(ns, blk_rq_pos(req)));
 	cmnd->rw.length = cpu_to_le16((blk_rq_bytes(req) >> ns->lba_shift) - 1);
@@ -1001,7 +1025,7 @@ static inline blk_status_t nvme_setup_rw(struct nvme_ns *ns,
 					NVME_RW_PRINFO_PRCHK_REF;
 			if (op == nvme_cmd_zone_append)
 				control |= NVME_RW_APPEND_PIREMAP;
-			cmnd->rw.reftag = cpu_to_le32(t10_pi_ref_tag(req));
+			nvme_set_ref_tag(ns, cmnd, req);
 			break;
 		}
 	}
@@ -1653,33 +1677,58 @@ int nvme_getgeo(struct block_device *bdev, struct hd_geometry *geo)
 }
 
 #ifdef CONFIG_BLK_DEV_INTEGRITY
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
 				u32 max_integrity_segments)
 {
 	struct blk_integrity integrity = { };
 
-	switch (pi_type) {
+	switch (ns->pi_type) {
 	case NVME_NS_DPS_PI_TYPE3:
-		integrity.profile = &t10_pi_type3_crc;
-		integrity.tag_size = sizeof(u16) + sizeof(u32);
-		integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+		switch (ns->guard_type) {
+		case NVME_NVM_NS_16B_GUARD:
+			integrity.profile = &t10_pi_type3_crc;
+			integrity.tag_size = sizeof(u16) + sizeof(u32);
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		case NVME_NVM_NS_64B_GUARD:
+			integrity.profile = &nvme_pi_type3_crc64;
+			integrity.tag_size = sizeof(u16) + 6;
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		default:
+			integrity.profile = NULL;
+			break;
+		}
 		break;
 	case NVME_NS_DPS_PI_TYPE1:
 	case NVME_NS_DPS_PI_TYPE2:
-		integrity.profile = &t10_pi_type1_crc;
-		integrity.tag_size = sizeof(u16);
-		integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+		switch (ns->guard_type) {
+		case NVME_NVM_NS_16B_GUARD:
+			integrity.profile = &t10_pi_type1_crc;
+			integrity.tag_size = sizeof(u16);
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		case NVME_NVM_NS_64B_GUARD:
+			integrity.profile = &nvme_pi_type1_crc64;
+			integrity.tag_size = sizeof(u16);
+			integrity.flags |= BLK_INTEGRITY_DEVICE_CAPABLE;
+			break;
+		default:
+			integrity.profile = NULL;
+			break;
+		}
 		break;
 	default:
 		integrity.profile = NULL;
 		break;
 	}
-	integrity.tuple_size = ms;
+
+	integrity.tuple_size = ns->ms;
 	blk_integrity_register(disk, &integrity);
 	blk_queue_max_integrity_segments(disk->queue, max_integrity_segments);
 }
 #else
-static void nvme_init_integrity(struct gendisk *disk, u16 ms, u8 pi_type,
+static void nvme_init_integrity(struct gendisk *disk, struct nvme_ns *ns,
 				u32 max_integrity_segments)
 {
 }
@@ -1756,20 +1805,76 @@ static int nvme_setup_streams_ns(struct nvme_ctrl *ctrl, struct nvme_ns *ns,
 	return 0;
 }
 
-static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
+static int nvme_init_ms(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
+	bool first = id->dps & NVME_NS_DPS_PI_FIRST;
+	unsigned lbaf = nvme_lbaf_index(id->flbas);
 	struct nvme_ctrl *ctrl = ns->ctrl;
+	struct nvme_command c = { };
+	struct nvme_id_ns_nvm *nvm;
+	int ret = 0;
+	u32 elbaf;
+
+	ns->pi_size = 0;
+	ns->ms = le16_to_cpu(id->lbaf[lbaf].ms);
+	if (!(ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)) {
+		ns->pi_size = sizeof(struct t10_pi_tuple);
+		ns->guard_type = NVME_NVM_NS_16B_GUARD;
+		goto set_pi;
+	}
+
+	nvm = kzalloc(sizeof(*nvm), GFP_KERNEL);
+	if (!nvm)
+		return -ENOMEM;
+
+	c.identify.opcode = nvme_admin_identify;
+	c.identify.nsid = cpu_to_le32(ns->head->ns_id);
+	c.identify.cns = NVME_ID_CNS_CS_NS;
+	c.identify.csi = NVME_CSI_NVM;
 
-	ns->ms = le16_to_cpu(id->lbaf[id->flbas & NVME_NS_FLBAS_LBA_MASK].ms);
-	if (id->dps & NVME_NS_DPS_PI_FIRST ||
-	    ns->ms == sizeof(struct t10_pi_tuple))
+	ret = nvme_submit_sync_cmd(ns->ctrl->admin_q, &c, nvm, sizeof(*nvm));
+	if (ret)
+		goto free_data;
+
+	elbaf = le32_to_cpu(nvm->elbaf[lbaf]);
+
+	/* no support for storage tag formats right now */
+	if (nvme_elbaf_sts(elbaf))
+		goto free_data;
+
+	ns->guard_type = nvme_elbaf_guard_type(elbaf);
+	switch (ns->guard_type) {
+	case NVME_NVM_NS_64B_GUARD:
+		ns->pi_size = sizeof(struct nvme_crc64_pi_tuple);
+		break;
+	case NVME_NVM_NS_16B_GUARD:
+		ns->pi_size = sizeof(struct t10_pi_tuple);
+		break;
+	default:
+		break;
+	}
+
+free_data:
+	kfree(nvm);
+set_pi:
+	if (ns->pi_size && (first || ns->ms == ns->pi_size))
 		ns->pi_type = id->dps & NVME_NS_DPS_PI_MASK;
 	else
 		ns->pi_type = 0;
 
+	return ret;
+}
+
+static void nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
+{
+	struct nvme_ctrl *ctrl = ns->ctrl;
+
+	if (nvme_init_ms(ns, id))
+		return;
+
 	ns->features &= ~(NVME_NS_METADATA_SUPPORTED | NVME_NS_EXT_LBAS);
 	if (!ns->ms || !(ctrl->ops->flags & NVME_F_METADATA_SUPPORTED))
-		return 0;
+		return;
 	if (ctrl->ops->flags & NVME_F_FABRICS) {
 		/*
 		 * The NVMe over Fabrics specification only supports metadata as
@@ -1777,7 +1882,7 @@ static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
 		 * remap the separate metadata buffer from the block layer.
 		 */
 		if (WARN_ON_ONCE(!(id->flbas & NVME_NS_FLBAS_META_EXT)))
-			return -EINVAL;
+			return;
 
 		ns->features |= NVME_NS_EXT_LBAS;
 
@@ -1804,8 +1909,6 @@ static int nvme_configure_metadata(struct nvme_ns *ns, struct nvme_id_ns *id)
 		else
 			ns->features |= NVME_NS_METADATA_SUPPORTED;
 	}
-
-	return 0;
 }
 
 static void nvme_set_queue_limits(struct nvme_ctrl *ctrl,
@@ -1884,7 +1987,7 @@ static void nvme_update_disk_info(struct gendisk *disk,
 	if (ns->ms) {
 		if (IS_ENABLED(CONFIG_BLK_DEV_INTEGRITY) &&
 		    (ns->features & NVME_NS_METADATA_SUPPORTED))
-			nvme_init_integrity(disk, ns->ms, ns->pi_type,
+			nvme_init_integrity(disk, ns,
 					    ns->ctrl->max_integrity_segments);
 		else if (!nvme_ns_has_pi(ns))
 			capacity = 0;
@@ -1939,16 +2042,14 @@ static void nvme_set_chunk_sectors(struct nvme_ns *ns, struct nvme_id_ns *id)
 
 static int nvme_update_ns_info(struct nvme_ns *ns, struct nvme_id_ns *id)
 {
-	unsigned lbaf = id->flbas & NVME_NS_FLBAS_LBA_MASK;
+	unsigned lbaf = nvme_lbaf_index(id->flbas);
 	int ret;
 
 	blk_mq_freeze_queue(ns->disk->queue);
 	ns->lba_shift = id->lbaf[lbaf].ds;
 	nvme_set_queue_limits(ns->ctrl, ns->queue);
 
-	ret = nvme_configure_metadata(ns, id);
-	if (ret)
-		goto out_unfreeze;
+	nvme_configure_metadata(ns, id);
 	nvme_set_chunk_sectors(ns, id);
 	nvme_update_disk_info(ns->disk, ns, id);
 
@@ -2286,20 +2387,27 @@ static int nvme_configure_timestamp(struct nvme_ctrl *ctrl)
 	return ret;
 }
 
-static int nvme_configure_acre(struct nvme_ctrl *ctrl)
+static int nvme_configure_host_options(struct nvme_ctrl *ctrl)
 {
 	struct nvme_feat_host_behavior *host;
+	u8 acre = 0, lbafee = 0;
 	int ret;
 
 	/* Don't bother enabling the feature if retry delay is not reported */
-	if (!ctrl->crdt[0])
+	if (ctrl->crdt[0])
+		acre = NVME_ENABLE_ACRE;
+	if (ctrl->ctratt & NVME_CTRL_ATTR_ELBAS)
+		lbafee = NVME_ENABLE_LBAFEE;
+
+	if (!acre && !lbafee)
 		return 0;
 
 	host = kzalloc(sizeof(*host), GFP_KERNEL);
 	if (!host)
 		return 0;
 
-	host->acre = NVME_ENABLE_ACRE;
+	host->acre = acre;
+	host->lbafee = lbafee;
 	ret = nvme_set_features(ctrl, NVME_FEAT_HOST_BEHAVIOR, 0,
 				host, sizeof(*host), NULL);
 	kfree(host);
@@ -3141,7 +3249,7 @@ int nvme_init_ctrl_finish(struct nvme_ctrl *ctrl)
 	if (ret < 0)
 		return ret;
 
-	ret = nvme_configure_acre(ctrl);
+	ret = nvme_configure_host_options(ctrl);
 	if (ret < 0)
 		return ret;
 
@@ -4814,12 +4922,14 @@ static inline void _nvme_check_size(void)
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ns) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ns_zns) != NVME_IDENTIFY_DATA_SIZE);
+	BUILD_BUG_ON(sizeof(struct nvme_id_ns_nvm) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl_zns) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_id_ctrl_nvm) != NVME_IDENTIFY_DATA_SIZE);
 	BUILD_BUG_ON(sizeof(struct nvme_lba_range_type) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_smart_log) != 512);
 	BUILD_BUG_ON(sizeof(struct nvme_dbbuf) != 64);
 	BUILD_BUG_ON(sizeof(struct nvme_directive_cmd) != 64);
+	BUILD_BUG_ON(sizeof(struct nvme_feat_host_behavior) != 512);
 }
 
 
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index 587d92df118b..77257e02ef03 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -454,9 +454,11 @@ struct nvme_ns {
 
 	int lba_shift;
 	u16 ms;
+	u16 pi_size;
 	u16 sgs;
 	u32 sws;
 	u8 pi_type;
+	u8 guard_type;
 #ifdef CONFIG_BLK_DEV_ZONED
 	u64 zsze;
 #endif
@@ -479,7 +481,7 @@ struct nvme_ns {
 /* NVMe ns supports metadata actions by the controller (generate/strip) */
 static inline bool nvme_ns_has_pi(struct nvme_ns *ns)
 {
-	return ns->pi_type && ns->ms == sizeof(struct t10_pi_tuple);
+	return ns->pi_type && ns->ms == ns->pi_size;
 }
 
 struct nvme_ctrl_ops {
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 9dbc3ef4daf7..4f44f83817a9 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -244,6 +244,7 @@ enum {
 enum nvme_ctrl_attr {
 	NVME_CTRL_ATTR_HID_128_BIT	= (1 << 0),
 	NVME_CTRL_ATTR_TBKAS		= (1 << 6),
+	NVME_CTRL_ATTR_ELBAS		= (1 << 15),
 };
 
 struct nvme_id_ctrl {
@@ -399,8 +400,7 @@ struct nvme_id_ns {
 	__le16			endgid;
 	__u8			nguid[16];
 	__u8			eui64[8];
-	struct nvme_lbaf	lbaf[16];
-	__u8			rsvd192[192];
+	struct nvme_lbaf	lbaf[64];
 	__u8			vs[3712];
 };
 
@@ -418,8 +418,7 @@ struct nvme_id_ns_zns {
 	__le32			rrl;
 	__le32			frl;
 	__u8			rsvd20[2796];
-	struct nvme_zns_lbafe	lbafe[16];
-	__u8			rsvd3072[768];
+	struct nvme_zns_lbafe	lbafe[64];
 	__u8			vs[256];
 };
 
@@ -428,6 +427,30 @@ struct nvme_id_ctrl_zns {
 	__u8	rsvd1[4095];
 };
 
+struct nvme_id_ns_nvm {
+	__le64	lbstm;
+	__u8	pic;
+	__u8	rsvd9[3];
+	__le32	elbaf[64];
+	__u8	rsvd268[3828];
+};
+
+enum {
+	NVME_ID_NS_NVM_STS_MASK		= 0x3f,
+	NVME_ID_NS_NVM_GUARD_SHIFT	= 7,
+	NVME_ID_NS_NVM_GUARD_MASK	= 0x3,
+};
+
+static inline __u8 nvme_elbaf_sts(__u32 elbaf)
+{
+	return elbaf & NVME_ID_NS_NVM_STS_MASK;
+}
+
+static inline __u8 nvme_elbaf_guard_type(__u32 elbaf)
+{
+	return (elbaf >> NVME_ID_NS_NVM_GUARD_SHIFT) & NVME_ID_NS_NVM_GUARD_MASK;
+}
+
 struct nvme_id_ctrl_nvm {
 	__u8	vsl;
 	__u8	wzsl;
@@ -478,6 +501,8 @@ enum {
 	NVME_NS_FEAT_IO_OPT	= 1 << 4,
 	NVME_NS_ATTR_RO		= 1 << 0,
 	NVME_NS_FLBAS_LBA_MASK	= 0xf,
+	NVME_NS_FLBAS_LBA_UMASK	= 0x60,
+	NVME_NS_FLBAS_LBA_SHIFT	= 1,
 	NVME_NS_FLBAS_META_EXT	= 0x10,
 	NVME_NS_NMIC_SHARED	= 1 << 0,
 	NVME_LBAF_RP_BEST	= 0,
@@ -496,6 +521,18 @@ enum {
 	NVME_NS_DPS_PI_TYPE3	= 3,
 };
 
+enum {
+	NVME_NVM_NS_16B_GUARD	= 0,
+	NVME_NVM_NS_32B_GUARD	= 1,
+	NVME_NVM_NS_64B_GUARD	= 2,
+};
+
+static inline __u8 nvme_lbaf_index(__u8 flbas)
+{
+	return (flbas & NVME_NS_FLBAS_LBA_MASK) |
+		((flbas & NVME_NS_FLBAS_LBA_UMASK) >> NVME_NS_FLBAS_LBA_SHIFT);
+}
+
 /* Identify Namespace Metadata Capabilities (MC): */
 enum {
 	NVME_MC_EXTENDED_LBA	= (1 << 0),
@@ -842,7 +879,8 @@ struct nvme_rw_command {
 	__u8			flags;
 	__u16			command_id;
 	__le32			nsid;
-	__u64			rsvd2;
+	__le32			cdw2;
+	__le32			cdw3;
 	__le64			metadata;
 	union nvme_data_ptr	dptr;
 	__le64			slba;
@@ -996,11 +1034,14 @@ enum {
 
 struct nvme_feat_host_behavior {
 	__u8 acre;
-	__u8 resv1[511];
+	__u8 etdas;
+	__u8 lbafee;
+	__u8 resv1[509];
 };
 
 enum {
 	NVME_ENABLE_ACRE	= 1,
+	NVME_ENABLE_LBAFEE	= 1,
 };
 
 /* Admin commands */
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64
  2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
                   ` (8 preceding siblings ...)
  2022-02-22 16:31 ` [PATCHv3 09/10] nvme: add support for enhanced metadata Keith Busch
@ 2022-02-22 16:31 ` Keith Busch
  2022-02-22 17:02   ` David Laight
  9 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:31 UTC (permalink / raw)
  To: linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Keith Busch

The crc64 table lookup method is inefficient, using a significant number
of CPU cycles in the block stack per IO. If available on x86, use a
PCLMULQDQ implementation to accelerate the calculation.

The assembly from this patch was mostly generated by gcc from a C
program using library functions provided by x86 intrinsics, and measures
~20x faster than the table lookup.

Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 arch/x86/crypto/Makefile                  |   3 +
 arch/x86/crypto/crc64-rocksoft-pcl-asm.S  | 215 ++++++++++++++++++++++
 arch/x86/crypto/crc64-rocksoft-pcl_glue.c | 117 ++++++++++++
 crypto/Kconfig                            |  11 ++
 4 files changed, 346 insertions(+)
 create mode 100644 arch/x86/crypto/crc64-rocksoft-pcl-asm.S
 create mode 100644 arch/x86/crypto/crc64-rocksoft-pcl_glue.c

diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index c3af959648e6..036520c59f0e 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -79,6 +79,9 @@ crc32-pclmul-y := crc32-pclmul_asm.o crc32-pclmul_glue.o
 obj-$(CONFIG_CRYPTO_CRCT10DIF_PCLMUL) += crct10dif-pclmul.o
 crct10dif-pclmul-y := crct10dif-pcl-asm_64.o crct10dif-pclmul_glue.o
 
+obj-$(CONFIG_CRYPTO_CRC64_ROCKSOFT_PCLMUL) += crc64-rocksoft-pclmul.o
+crc64-rocksoft-pclmul-y := crc64-rocksoft-pcl-asm.o crc64-rocksoft-pcl_glue.o
+
 obj-$(CONFIG_CRYPTO_POLY1305_X86_64) += poly1305-x86_64.o
 poly1305-x86_64-y := poly1305-x86_64-cryptogams.o poly1305_glue.o
 targets += poly1305-x86_64-cryptogams.S
diff --git a/arch/x86/crypto/crc64-rocksoft-pcl-asm.S b/arch/x86/crypto/crc64-rocksoft-pcl-asm.S
new file mode 100644
index 000000000000..e3b633a776a9
--- /dev/null
+++ b/arch/x86/crypto/crc64-rocksoft-pcl-asm.S
@@ -0,0 +1,215 @@
+########################################################################
+# Implement fast Rocksoft CRC-64 computation with SSE and PCLMULQDQ instructions
+#
+
+#include <linux/linkage.h>
+
+SYM_FUNC_START(crc_rocksoft_pcl)
+	leaq	(%rsi,%rdx), %rcx
+	movq	%rsi, %r10
+	andl	$15, %esi
+	movq	%rdi, %xmm3
+	leaq	15(%rcx), %rax
+	andq	$-16, %r10
+	pxor	%xmm1, %xmm1
+	andq	$-16, %rax
+	movdqa	%xmm1, %xmm5
+	movq	%rax, %r8
+	subq	%r10, %rax
+	subq	%rcx, %r8
+	movl	$16, %ecx
+	movq	%rax, %r11
+	movq	%rcx, %r9
+	sarq	$4, %r11
+	subq	%rsi, %r9
+	movdqu	shuffleMasks(%r9), %xmm4
+	movdqa	%xmm4, %xmm0
+	pblendvb	%xmm0, (%r10), %xmm5
+	cmpq	$16, %rax
+	je	.L12
+	movdqa	16(%r10), %xmm2
+	cmpq	$2, %r11
+	je	.L13
+	pcmpeqd	%xmm1, %xmm1
+	leaq	-16(%rsi,%rdx), %rdi
+	leaq	16(%r10), %r9
+	pxor	%xmm1, %xmm4
+	movdqa	%xmm3, %xmm1
+	pshufb	%xmm0, %xmm3
+	pshufb	%xmm4, %xmm1
+	movdqa	%xmm3, %xmm0
+	movdqa	.LC0(%rip), %xmm3
+	pxor	%xmm5, %xmm1
+	movdqa	%xmm1, %xmm4
+	pclmulqdq	$0, %xmm3, %xmm1
+	pclmulqdq	$17, %xmm3, %xmm4
+	pxor	%xmm4, %xmm1
+	pxor	%xmm1, %xmm0
+	cmpq	$31, %rdi
+	jbe	.L6
+	leaq	-32(%rdi), %rax
+	movq	%rax, %rsi
+	andq	$-16, %rax
+	leaq	32(%r10,%rax), %rcx
+	shrq	$4, %rsi
+	movq	%r9, %rax
+	.p2align 4,,10
+	.p2align 3
+.L7:
+	pxor	%xmm2, %xmm0
+	movq	%rax, %rdx
+	addq	$16, %rax
+	movdqa	%xmm0, %xmm1
+	pclmulqdq	$0, %xmm3, %xmm0
+	movdqa	16(%rdx), %xmm2
+	pclmulqdq	$17, %xmm3, %xmm1
+	pxor	%xmm1, %xmm0
+	cmpq	%rcx, %rax
+	jne	.L7
+	movq	%rsi, %rax
+	addq	$1, %rsi
+	negq	%rax
+	salq	$4, %rsi
+	salq	$4, %rax
+	addq	%rsi, %r9
+	leaq	-16(%rdi,%rax), %rdi
+.L6:
+	pxor	%xmm2, %xmm0
+	cmpq	$16, %rdi
+	je	.L9
+	movl	$16, %eax
+	pcmpeqd	%xmm2, %xmm2
+	movdqa	%xmm0, %xmm7
+	subq	%r8, %rax
+	movdqu	shuffleMasks(%rax), %xmm4
+	pxor	%xmm4, %xmm2
+	pshufb	%xmm4, %xmm0
+	movdqa	16(%r9), %xmm4
+	pshufb	%xmm2, %xmm7
+	pshufb	%xmm2, %xmm4
+	movdqa	%xmm7, %xmm1
+	movdqa	%xmm4, %xmm2
+	movdqa	%xmm7, %xmm4
+	pclmulqdq	$0, %xmm3, %xmm1
+	pclmulqdq	$17, %xmm3, %xmm4
+	por	%xmm2, %xmm0
+	pxor	%xmm4, %xmm1
+	pxor	%xmm1, %xmm0
+.L9:
+	movdqa	%xmm0, %xmm2
+	pclmulqdq	$16, %xmm3, %xmm0
+	psrldq	$8, %xmm2
+	pxor	%xmm2, %xmm0
+.L3:
+	movdqa	.LC1(%rip), %xmm2
+	movdqa	%xmm0, %xmm1
+	pclmulqdq	$0, %xmm2, %xmm1
+	movdqa	%xmm1, %xmm3
+	pclmulqdq	$16, %xmm2, %xmm1
+	pslldq	$8, %xmm3
+	pxor	%xmm3, %xmm1
+	pxor	%xmm1, %xmm0
+	pextrd	$3, %xmm0, %eax
+	salq	$32, %rax
+	movq	%rax, %rdx
+	pextrd	$2, %xmm0, %eax
+	orq	%rdx, %rax
+	notq	%rax
+	ret
+	.p2align 4,,10
+	.p2align 3
+.L13:
+	subq	%r8, %rcx
+	pcmpeqd	%xmm1, %xmm1
+	movdqu	shuffleMasks(%rcx), %xmm7
+	movdqa	%xmm7, %xmm6
+	pxor	%xmm1, %xmm6
+	cmpq	$7, %rdx
+	ja	.L5
+	movdqa	%xmm1, %xmm4
+	pshufb	%xmm7, %xmm5
+	movdqa	%xmm3, %xmm1
+	movdqu	shuffleMasks(%rdx), %xmm8
+	pshufb	%xmm6, %xmm2
+	pxor	%xmm8, %xmm4
+	pxor	%xmm5, %xmm2
+	pshufb	%xmm8, %xmm3
+	pshufb	%xmm4, %xmm1
+	movdqa	%xmm3, %xmm0
+	pxor	%xmm1, %xmm2
+	pslldq	$8, %xmm0
+	movdqa	%xmm2, %xmm3
+	pclmulqdq	$16, .LC0(%rip), %xmm2
+	psrldq	$8, %xmm3
+	pxor	%xmm3, %xmm0
+	pxor	%xmm2, %xmm0
+	jmp	.L3
+	.p2align 4,,10
+	.p2align 3
+.L12:
+	movdqu	shuffleMasks(%rdx), %xmm2
+	subq	%r8, %rcx
+	movdqa	%xmm3, %xmm6
+	pcmpeqd	%xmm4, %xmm4
+	movdqa	%xmm2, %xmm0
+	pshufb	%xmm2, %xmm3
+	movdqu	shuffleMasks(%rcx), %xmm2
+	pxor	%xmm4, %xmm0
+	pslldq	$8, %xmm3
+	pxor	%xmm4, %xmm2
+	pshufb	%xmm0, %xmm6
+	pshufb	%xmm2, %xmm5
+	movdqa	%xmm5, %xmm1
+	pxor	%xmm6, %xmm1
+	movdqa	%xmm1, %xmm0
+	pclmulqdq	$16, .LC0(%rip), %xmm1
+	psrldq	$8, %xmm0
+	pxor	%xmm3, %xmm0
+	pxor	%xmm1, %xmm0
+	jmp	.L3
+	.p2align 4,,10
+	.p2align 3
+.L5:
+	pxor	%xmm1, %xmm4
+	movdqa	%xmm3, %xmm1
+	pshufb	%xmm0, %xmm3
+	pshufb	%xmm4, %xmm1
+	pxor	%xmm3, %xmm2
+	movdqa	.LC0(%rip), %xmm3
+	pxor	%xmm5, %xmm1
+	pshufb	%xmm6, %xmm2
+	movdqa	%xmm1, %xmm5
+	pshufb	%xmm7, %xmm1
+	pshufb	%xmm6, %xmm5
+	pxor	%xmm2, %xmm1
+	movdqa	%xmm5, %xmm4
+	movdqa	%xmm5, %xmm0
+	pclmulqdq	$17, %xmm3, %xmm0
+	pclmulqdq	$0, %xmm3, %xmm4
+	pxor	%xmm0, %xmm4
+	pxor	%xmm4, %xmm1
+	movdqa	%xmm1, %xmm0
+	pclmulqdq	$16, %xmm3, %xmm1
+	psrldq	$8, %xmm0
+	pxor	%xmm1, %xmm0
+	jmp	.L3
+SYM_FUNC_END(crc_rocksoft_pcl)
+
+.section	.rodata
+.align 32
+.type	shuffleMasks, @object
+.size	shuffleMasks, 32
+shuffleMasks:
+	.string	""
+	.ascii	"\001\002\003\004\005\006\007\b\t\n\013\f\r\016\017\217\216\215"
+	.ascii	"\214\213\212\211\210\207\206\205\204\203\202\201\200"
+
+.section	.rodata.cst16,"aM",@progbits,16
+.align 16
+.LC0:
+	.quad	-1523270018343381984
+	.quad	2443614144669557164
+	.align 16
+.LC1:
+	.quad	2876949357237608311
+	.quad	3808117099328934763
diff --git a/arch/x86/crypto/crc64-rocksoft-pcl_glue.c b/arch/x86/crypto/crc64-rocksoft-pcl_glue.c
new file mode 100644
index 000000000000..996780aa3d93
--- /dev/null
+++ b/arch/x86/crypto/crc64-rocksoft-pcl_glue.c
@@ -0,0 +1,117 @@
+#include <linux/types.h>
+#include <linux/module.h>
+#include <linux/crc64.h>
+#include <crypto/internal/hash.h>
+#include <crypto/internal/simd.h>
+#include <linux/init.h>
+#include <linux/string.h>
+#include <linux/kernel.h>
+#include <asm/cpufeatures.h>
+#include <asm/cpu_device_id.h>
+#include <asm/simd.h>
+
+asmlinkage u64 crc_rocksoft_pcl(u64 init_crc, const u8 *buf, size_t len);
+
+struct chksum_desc_ctx {
+	u64 crc;
+};
+
+static int chksum_init(struct shash_desc *desc)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	ctx->crc = 0;
+
+	return 0;
+}
+
+static int chksum_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int length)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	if (length >= 16 && crypto_simd_usable()) {
+		kernel_fpu_begin();
+		ctx->crc = crc_rocksoft_pcl(ctx->crc, data, length);
+		kernel_fpu_end();
+	} else
+		ctx->crc = crc64_rocksoft_generic(ctx->crc, data, length);
+	return 0;
+}
+
+static int chksum_final(struct shash_desc *desc, u8 *out)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	*(u64 *)out = ctx->crc;
+	return 0;
+}
+
+static int __chksum_finup(u64 crc, const u8 *data, unsigned int len, u8 *out)
+{
+	if (len >= 16 && crypto_simd_usable()) {
+		kernel_fpu_begin();
+		*(u64 *)out = crc_rocksoft_pcl(crc, data, len);
+		kernel_fpu_end();
+	} else
+		*(u64 *)out = crc64_rocksoft_generic(crc, data, len);
+	return 0;
+}
+
+static int chksum_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
+
+	return __chksum_finup(ctx->crc, data, len, out);
+}
+
+static int chksum_digest(struct shash_desc *desc, const u8 *data,
+			 unsigned int length, u8 *out)
+{
+	return __chksum_finup(0, data, length, out);
+}
+
+static struct shash_alg alg = {
+	.digestsize	= 	8,
+	.init		=	chksum_init,
+	.update		=	chksum_update,
+	.final		=	chksum_final,
+	.finup		=	chksum_finup,
+	.digest		=	chksum_digest,
+	.descsize	=	sizeof(struct chksum_desc_ctx),
+	.base		=	{
+		.cra_name		=	CRC64_ROCKSOFT_STRING,
+		.cra_driver_name	=	"crc64-rocksoft-pclmul",
+		.cra_priority		=	200,
+		.cra_blocksize		=	1,
+		.cra_module		=	THIS_MODULE,
+	}
+};
+
+static const struct x86_cpu_id crc64_rocksoft_cpu_id[] = {
+	X86_MATCH_FEATURE(X86_FEATURE_PCLMULQDQ, NULL),
+	{}
+};
+MODULE_DEVICE_TABLE(x86cpu, crc64_rocksoft_cpu_id);
+
+static int __init crc64_rocksoft_x86_mod_init(void)
+{
+	if (!x86_match_cpu(crc64_rocksoft_cpu_id))
+		return -ENODEV;
+
+	return crypto_register_shash(&alg);
+}
+
+static void __exit crc64_rocksoft_x86_mod_fini(void)
+{
+	crypto_unregister_shash(&alg);
+}
+
+module_init(crc64_rocksoft_x86_mod_init);
+module_exit(crc64_rocksoft_x86_mod_fini);
+
+MODULE_AUTHOR("Keith Busch <kbusch@kernel.org>");
+MODULE_DESCRIPTION("Rocksoft CRC64 calculation accelerated with PCLMULQDQ.");
+MODULE_LICENSE("GPL");
+MODULE_ALIAS_CRYPTO("crc64-rocksoft-pclmul");
diff --git a/crypto/Kconfig b/crypto/Kconfig
index e343147b9f8f..d8861138f117 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -744,6 +744,17 @@ config CRYPTO_CRC64_ROCKSOFT
 	  transform. This allows for faster crc64 transforms to be used
 	  if they are available.
 
+config CRYPTO_CRC64_ROCKSOFT_PCLMUL
+	tristate "Rocksoft model CRC64 PCLMULQDQ hardware acceleration"
+	depends on X86 && 64BIT && CRC64
+	select CRYPTO_HASH
+	help
+	  For x86_64 processors with SSE4.2 and PCLMULQDQ supported,
+	  CRC64 PCLMULQDQ computation can be hardware accelerated PCLMULQDQ
+	  instruction. This option will create 'crc64-rocksoft-pclmul'
+	  module, which is faster when computing crc64 checksum compared
+	  with the generic table implementation.
+
 config CRYPTO_VPMSUM_TESTER
 	tristate "Powerpc64 vpmsum hardware acceleration tester"
 	depends on CRYPTO_CRCT10DIF_VPMSUM && CRYPTO_CRC32C_VPMSUM
-- 
2.25.4


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:31 ` [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro Keith Busch
@ 2022-02-22 16:45   ` Joe Perches
  2022-02-22 16:50     ` Christoph Hellwig
  2022-02-22 16:48   ` Chaitanya Kulkarni
  1 sibling, 1 reply; 47+ messages in thread
From: Joe Perches @ 2022-02-22 16:45 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Bart Van Assche

On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> Recent data integrity field enhancements allow 48-bit reference tags.
> Introduce a helper macro since this will be a repeated operation.
[]
> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
[]
> @@ -63,6 +63,12 @@
>  }					\
>  )
>  
> +/**
> + * lower_48_bits - return bits 0-47 of a number
> + * @n: the number we're accessing
> + */
> +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))

why not make this a static inline function?
And visually, it's difficult to quickly count a repeated character to 12.

Perhaps:

static inline u64 lower_48_bits(u64 val)
{
	return val & GENMASK_ULL(47, 0);
}



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:31 ` [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro Keith Busch
  2022-02-22 16:45   ` Joe Perches
@ 2022-02-22 16:48   ` Chaitanya Kulkarni
  1 sibling, 0 replies; 47+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-22 16:48 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Bart Van Assche

On 2/22/22 08:31, Keith Busch wrote:
> Recent data integrity field enhancements allow 48-bit reference tags.
> Introduce a helper macro since this will be a repeated operation.
> 
> Suggested-by: Bart Van Assche <bvanassche@acm.org>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:45   ` Joe Perches
@ 2022-02-22 16:50     ` Christoph Hellwig
  2022-02-22 16:56       ` Keith Busch
                         ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-22 16:50 UTC (permalink / raw)
  To: Joe Perches
  Cc: Keith Busch, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, hch, martin.petersen, colyli,
	Bart Van Assche

On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > Recent data integrity field enhancements allow 48-bit reference tags.
> > Introduce a helper macro since this will be a repeated operation.
> []
> > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> []
> > @@ -63,6 +63,12 @@
> >  }					\
> >  )
> >  
> > +/**
> > + * lower_48_bits - return bits 0-47 of a number
> > + * @n: the number we're accessing
> > + */
> > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> 
> why not make this a static inline function?

Agreed.

> And visually, it's difficult to quickly count a repeated character to 12.
> 
> Perhaps:
> 
> static inline u64 lower_48_bits(u64 val)
> {
> 	return val & GENMASK_ULL(47, 0);
> }

For anyone who has a minimum knowledge of C and hardware your version
is an obsfucated clusterfuck, while the version Keith wrote is trivial
to read.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 07/10] lib: add crc64 tests
  2022-02-22 16:31 ` [PATCHv3 07/10] lib: add crc64 tests Keith Busch
@ 2022-02-22 16:50   ` Chaitanya Kulkarni
  2022-02-25 16:05   ` Christoph Hellwig
  1 sibling, 0 replies; 47+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-22 16:50 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli

On 2/22/22 08:31, Keith Busch wrote:
> Provide a module to test the rocksoft crc64 calculations with well known
> inputs and exepected values. Check the generic table implementation and
> whatever module is registered from the crypto library, and compare their
> speeds.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors
  2022-02-22 16:31 ` [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors Keith Busch
@ 2022-02-22 16:52   ` Chaitanya Kulkarni
  2022-02-25 16:03   ` Christoph Hellwig
  1 sibling, 0 replies; 47+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-22 16:52 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli, Hannes Reinecke, Arnd Bergmann

On 2/22/22 08:31, Keith Busch wrote:
> The NVMe protocol extended the data integrity fields with unaligned
> 48-bit reference tags. Provide some helper accessors in
> preparation for these.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---

Looks good.

Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>

-ck





^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:50     ` Christoph Hellwig
@ 2022-02-22 16:56       ` Keith Busch
  2022-02-22 18:43         ` Joe Perches
  2022-02-22 16:58       ` Joe Perches
                         ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 16:56 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Joe Perches, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, martin.petersen, colyli, Bart Van Assche

On Tue, Feb 22, 2022 at 05:50:45PM +0100, Christoph Hellwig wrote:
> On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > +/**
> > > + * lower_48_bits - return bits 0-47 of a number
> > > + * @n: the number we're accessing
> > > + */
> > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > 
> > why not make this a static inline function?
> 
> Agreed.

Sure, that sounds good to me. I only did it this way to match the
existing local convention, but I personally prefer the inline function
too. 

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:50     ` Christoph Hellwig
  2022-02-22 16:56       ` Keith Busch
@ 2022-02-22 16:58       ` Joe Perches
  2022-02-22 17:09       ` David Laight
  2022-02-22 17:14       ` Chaitanya Kulkarni
  3 siblings, 0 replies; 47+ messages in thread
From: Joe Perches @ 2022-02-22 16:58 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, martin.petersen, colyli, Bart Van Assche

On Tue, 2022-02-22 at 17:50 +0100, Christoph Hellwig wrote:
> On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > Recent data integrity field enhancements allow 48-bit reference tags.
> > > Introduce a helper macro since this will be a repeated operation.
> > []
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > []
> > > @@ -63,6 +63,12 @@
> > >  }					\
> > >  )
> > >  
> > > +/**
> > > + * lower_48_bits - return bits 0-47 of a number
> > > + * @n: the number we're accessing
> > > + */
> > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > 
> > why not make this a static inline function?
> 
> Agreed.
> 
> > And visually, it's difficult to quickly count a repeated character to 12.
> > 
> > Perhaps:
> > 
> > static inline u64 lower_48_bits(u64 val)
> > {
> > 	return val & GENMASK_ULL(47, 0);
> > }
> 
> For anyone who has a minimum knowledge of C and hardware your version
> is an obsfucated clusterfuck, while the version Keith wrote is
> trivial to read.

Don't think so.  I've dealt with hardware and have more than once
seen defects introduced by firmware developers that can't count.

be quick, which one is it:

	0xfffffffffffULL
or
	0xffffffffffffULL
or
	0xfffffffffffffULL
or
	0xffffffffffffffULL




^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64
  2022-02-22 16:31 ` [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64 Keith Busch
@ 2022-02-22 17:02   ` David Laight
  2022-02-22 17:14     ` Keith Busch
  0 siblings, 1 reply; 47+ messages in thread
From: David Laight @ 2022-02-22 17:02 UTC (permalink / raw)
  To: 'Keith Busch',
	linux-nvme, linux-block, linux-crypto, x86, linux-kernel
  Cc: axboe, hch, martin.petersen, colyli

From: Keith Busch
> Sent: 22 February 2022 16:32
> 
> The crc64 table lookup method is inefficient, using a significant number
> of CPU cycles in the block stack per IO. If available on x86, use a
> PCLMULQDQ implementation to accelerate the calculation.
> 
> The assembly from this patch was mostly generated by gcc from a C
> program using library functions provided by x86 intrinsics, and measures
> ~20x faster than the table lookup.

I think I'd like to see the C code and compiler options used to
generate the assembler as comments in the committed source file.
Either that or reasonable comments in the assembler.

It is also quite a lot of code.
What is the break-even length for 'cold cache' including the FPU saves.

...
> +.section	.rodata
> +.align 32
> +.type	shuffleMasks, @object
> +.size	shuffleMasks, 32
> +shuffleMasks:
> +	.string	""
> +	.ascii	"\001\002\003\004\005\006\007\b\t\n\013\f\r\016\017\217\216\215"
> +	.ascii	"\214\213\212\211\210\207\206\205\204\203\202\201\200"

That has to be the worst way to define 32 bytes.

> +.section	.rodata.cst16,"aM",@progbits,16
> +.align 16
> +.LC0:
> +	.quad	-1523270018343381984
> +	.quad	2443614144669557164
> +	.align 16
> +.LC1:
> +	.quad	2876949357237608311
> +	.quad	3808117099328934763

Not sure what those are, but I bet there are better ways to
define/describe them.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:50     ` Christoph Hellwig
  2022-02-22 16:56       ` Keith Busch
  2022-02-22 16:58       ` Joe Perches
@ 2022-02-22 17:09       ` David Laight
  2022-02-22 17:14       ` Chaitanya Kulkarni
  3 siblings, 0 replies; 47+ messages in thread
From: David Laight @ 2022-02-22 17:09 UTC (permalink / raw)
  To: 'Christoph Hellwig', Joe Perches
  Cc: Keith Busch, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, martin.petersen, colyli, Bart Van Assche

From: Christoph Hellwig
> Sent: 22 February 2022 16:51
> 
> On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > Recent data integrity field enhancements allow 48-bit reference tags.
> > > Introduce a helper macro since this will be a repeated operation.
> > []
> > > diff --git a/include/linux/kernel.h b/include/linux/kernel.h
> > []
> > > @@ -63,6 +63,12 @@
> > >  }					\
> > >  )
> > >
> > > +/**
> > > + * lower_48_bits - return bits 0-47 of a number
> > > + * @n: the number we're accessing
> > > + */
> > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> >
> > why not make this a static inline function?
> 
> Agreed.
> 
> > And visually, it's difficult to quickly count a repeated character to 12.
> >
> > Perhaps:
> >
> > static inline u64 lower_48_bits(u64 val)
> > {
> > 	return val & GENMASK_ULL(47, 0);
> > }
> 
> For anyone who has a minimum knowledge of C and hardware your version
> is an obsfucated clusterfuck, while the version Keith wrote is trivial
> to read.

I'd use the explicit: val & ((1ull << 48) - 1)
I think it is even fewer characters.

	David.

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64
  2022-02-22 17:02   ` David Laight
@ 2022-02-22 17:14     ` Keith Busch
  2022-02-22 20:06       ` Eric Biggers
  0 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 17:14 UTC (permalink / raw)
  To: David Laight
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 05:02:16PM +0000, David Laight wrote:
> From: Keith Busch
> > Sent: 22 February 2022 16:32
> > 
> > The crc64 table lookup method is inefficient, using a significant number
> > of CPU cycles in the block stack per IO. If available on x86, use a
> > PCLMULQDQ implementation to accelerate the calculation.
> > 
> > The assembly from this patch was mostly generated by gcc from a C
> > program using library functions provided by x86 intrinsics, and measures
> > ~20x faster than the table lookup.
> 
> I think I'd like to see the C code and compiler options used to
> generate the assembler as comments in the committed source file.
> Either that or reasonable comments in the assembler.

The C code, compiled as "gcc -O3 -msse4 -mpclmul -S", was adapted from
this found on the internet:

  https://github.com/rawrunprotected/crc/blob/master/crc64.c

I just ported it to linux, changed the poly parameters and removed the
unnecessary stuff. 
 
I'm okay with dropping this patch from the series for now since I don't
think I'm qualified to write it. :) I just needed something to test the
crytpo module registration.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:50     ` Christoph Hellwig
                         ` (2 preceding siblings ...)
  2022-02-22 17:09       ` David Laight
@ 2022-02-22 17:14       ` Chaitanya Kulkarni
  3 siblings, 0 replies; 47+ messages in thread
From: Chaitanya Kulkarni @ 2022-02-22 17:14 UTC (permalink / raw)
  To: Christoph Hellwig, Joe Perches
  Cc: Keith Busch, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, martin.petersen, colyli, Bart Van Assche

On 2/22/22 08:50, Christoph Hellwig wrote:
> On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
>> On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
>>> Recent data integrity field enhancements allow 48-bit reference tags.
>>> Introduce a helper macro since this will be a repeated operation.
>> []
>>> diff --git a/include/linux/kernel.h b/include/linux/kernel.h
>> []
>>> @@ -63,6 +63,12 @@
>>>   }					\
>>>   )
>>>   
>>> +/**
>>> + * lower_48_bits - return bits 0-47 of a number
>>> + * @n: the number we're accessing
>>> + */
>>> +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
>>
>> why not make this a static inline function?
> 
> Agreed.
> 


All the bit maskd macros in the same file needs to be converted into
static inline to have the right data type, however that needs to be
done once this series is in, since clearly objective of this series
is different than cleanup of include/linux/kernel.h bit macros.

-ck



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 16:56       ` Keith Busch
@ 2022-02-22 18:43         ` Joe Perches
  2022-02-22 20:09           ` David Laight
  0 siblings, 1 reply; 47+ messages in thread
From: Joe Perches @ 2022-02-22 18:43 UTC (permalink / raw)
  To: Keith Busch, Christoph Hellwig
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	martin.petersen, colyli, Bart Van Assche

On Tue, 2022-02-22 at 08:56 -0800, Keith Busch wrote:
> On Tue, Feb 22, 2022 at 05:50:45PM +0100, Christoph Hellwig wrote:
> > On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > > +/ *
> > > > + * lower_48_bits - return bits 0-47 of a number
> > > > + * @n: the number we're accessing
> > > > + */
> > > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > > 
> > > why not make this a static inline function?
> > 
> > Agreed.
> 
> Sure, that sounds good to me. I only did it this way to match the
> existing local convention, but I personally prefer the inline function
> too. 

The existing convention is used there to allow the compiler to
avoid warnings and unnecessary conversions of a u32 to a u64 when
shifting by 32 or more bits.

If it's possible to be used with an architecture dependent typedef
like dma_addr_t, then perhaps it's reasonable to do something like:

#define lower_48_bits(val)					\
({								\
	typeof(val) high = lower_16_bits(upper_32_bits(val));	\
	typeof(val) low = lower_32_bits(val);			\
								\
	(high << 16 << 16) | low;				\
})

and have the compiler have the return value be an appropriate type.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 06/10] crypto: add rocksoft 64b crc framework
  2022-02-22 16:31 ` [PATCHv3 06/10] crypto: add rocksoft 64b crc framework Keith Busch
@ 2022-02-22 19:50   ` Eric Biggers
  2022-02-22 19:54     ` Eric Biggers
  2022-02-22 20:09     ` Keith Busch
  2022-02-22 19:56   ` Eric Biggers
  1 sibling, 2 replies; 47+ messages in thread
From: Eric Biggers @ 2022-02-22 19:50 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> +config CRYPTO_CRC64_ROCKSOFT
> +	tristate "Rocksoft Model CRC64 algorithm"
> +	depends on CRC64
> +	select CRYPTO_HASH
> +	help
> +	  Rocksoft Model CRC64 computation is being cast as a crypto
> +	  transform. This allows for faster crc64 transforms to be used
> +	  if they are available.

The first sentence of this help text doesn't make sense.

> diff --git a/crypto/crc64_rocksoft_generic.c b/crypto/crc64_rocksoft_generic.c
> new file mode 100644
> index 000000000000..55bad1939614
> --- /dev/null
> +++ b/crypto/crc64_rocksoft_generic.c
> @@ -0,0 +1,104 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * Cryptographic API.

The "Cryptographic API" line doesn't provide any helpful information.

> +static int chksum_final(struct shash_desc *desc, u8 *out)
> +{
> +	struct chksum_desc_ctx *ctx = shash_desc_ctx(desc);
> +
> +	*(u64 *)out = ctx->crc;
> +	return 0;
> +}
> +
> +static int __chksum_finup(u64 crc, const u8 *data, unsigned int len, u8 *out)
> +{
> +	*(u64 *)out = crc64_rocksoft_generic(crc, data, len);
> +	return 0;
> +}

These 64-bit writes violate alignment rules and will give the wrong result on
big endian CPUs.  They need to use put_unaligned_le64().

> +static int __init crc64_rocksoft_x86_mod_init(void)
> +{
> +	return crypto_register_shash(&alg);
> +}
> +
> +static void __exit crc64_rocksoft_x86_mod_fini(void)
> +{
> +	crypto_unregister_shash(&alg);
> +}

This has nothing to do with x86.

> +config CRC64_ROCKSOFT
> +	tristate "CRC calculation for the Rocksoft^TM model CRC64"

I'm sure what the rules for trademarks are, but kernel source code usually
doesn't have the trademark symbol/abbreviation scattered everywhere.

> +	select CRYPTO
> +	select CRYPTO_CRC64_ROCKSOFT
> +	help
> +	  This option is only needed if a module that's not in the
> +	  kernel tree needs to calculate CRC checks for use with the
> +	  rocksoft model parameters.

Out-of-tree modules can't be the reason to have a kconfig option.  What is the
real reason?

> +u64 crc64_rocksoft(const unsigned char *buffer, size_t len)
> +{
> +	return crc64_rocksoft_update(~0ULL, buffer, len);
> +}
> +EXPORT_SYMBOL(crc64_rocksoft);

Isn't this missing the bitwise inversion at the end?

> +MODULE_AUTHOR("Keith Busch <kbusch@kernel.org>");
> +MODULE_DESCRIPTION("Rocksoft model CRC64 calculation (library API)");
> +MODULE_LICENSE("GPL");
> +MODULE_SOFTDEP("pre: crc64");

Shouldn't the MODULE_SOFTDEP be on crc64-rocksoft?

- Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 06/10] crypto: add rocksoft 64b crc framework
  2022-02-22 19:50   ` Eric Biggers
@ 2022-02-22 19:54     ` Eric Biggers
  2022-02-22 20:09     ` Keith Busch
  1 sibling, 0 replies; 47+ messages in thread
From: Eric Biggers @ 2022-02-22 19:54 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 11:50:44AM -0800, Eric Biggers wrote:
> > +config CRC64_ROCKSOFT
> > +	tristate "CRC calculation for the Rocksoft^TM model CRC64"
> 
> I'm sure what the rules for trademarks are, but kernel source code usually
> doesn't have the trademark symbol/abbreviation scattered everywhere.
> 
> > +	select CRYPTO
> > +	select CRYPTO_CRC64_ROCKSOFT
> > +	help
> > +	  This option is only needed if a module that's not in the
> > +	  kernel tree needs to calculate CRC checks for use with the
> > +	  rocksoft model parameters.
> 
> Out-of-tree modules can't be the reason to have a kconfig option.  What is the
> real reason?

Also this option can be enabled without the CONFIG_CRC64 it depends on, which is
broken.

- Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 06/10] crypto: add rocksoft 64b crc framework
  2022-02-22 16:31 ` [PATCHv3 06/10] crypto: add rocksoft 64b crc framework Keith Busch
  2022-02-22 19:50   ` Eric Biggers
@ 2022-02-22 19:56   ` Eric Biggers
  1 sibling, 0 replies; 47+ messages in thread
From: Eric Biggers @ 2022-02-22 19:56 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> Hardware specific features may be able to calculate a crc64, so provide
> a framework for drivers to register their implementation. If nothing is
> registered, fallback to the generic table lookup implementation. The
> implementation is modeled after the crct10dif equivalent.
> 
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  crypto/Kconfig                  |   9 +++
>  crypto/Makefile                 |   1 +
>  crypto/crc64_rocksoft_generic.c | 104 +++++++++++++++++++++++++
>  include/linux/crc64.h           |   5 ++
>  lib/Kconfig                     |   9 +++
>  lib/Makefile                    |   1 +
>  lib/crc64-rocksoft.c            | 129 ++++++++++++++++++++++++++++++++
>  7 files changed, 258 insertions(+)
>  create mode 100644 crypto/crc64_rocksoft_generic.c
>  create mode 100644 lib/crc64-rocksoft.c

I tried testing this, but I can't because it is missing a self-test:

[    0.736340] alg: No test for crc64-rocksoft (crc64-rocksoft-generic)
[    5.440398] alg: No test for crc64-rocksoft (crc64-rocksoft-pclmul)

All algorithms registered with the crypto API need to have a self-test
(in crypto/testmgr.c).

- Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64
  2022-02-22 17:14     ` Keith Busch
@ 2022-02-22 20:06       ` Eric Biggers
  2022-02-22 20:51         ` Keith Busch
  0 siblings, 1 reply; 47+ messages in thread
From: Eric Biggers @ 2022-02-22 20:06 UTC (permalink / raw)
  To: Keith Busch
  Cc: David Laight, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 09:14:05AM -0800, Keith Busch wrote:
> On Tue, Feb 22, 2022 at 05:02:16PM +0000, David Laight wrote:
> > From: Keith Busch
> > > Sent: 22 February 2022 16:32
> > > 
> > > The crc64 table lookup method is inefficient, using a significant number
> > > of CPU cycles in the block stack per IO. If available on x86, use a
> > > PCLMULQDQ implementation to accelerate the calculation.
> > > 
> > > The assembly from this patch was mostly generated by gcc from a C
> > > program using library functions provided by x86 intrinsics, and measures
> > > ~20x faster than the table lookup.
> > 
> > I think I'd like to see the C code and compiler options used to
> > generate the assembler as comments in the committed source file.
> > Either that or reasonable comments in the assembler.
> 
> The C code, compiled as "gcc -O3 -msse4 -mpclmul -S", was adapted from
> this found on the internet:
> 
>   https://github.com/rawrunprotected/crc/blob/master/crc64.c
> 
> I just ported it to linux, changed the poly parameters and removed the
> unnecessary stuff. 
>  
> I'm okay with dropping this patch from the series for now since I don't
> think I'm qualified to write it. :) I just needed something to test the
> crytpo module registration.

Is the license of that code compatible with the kernel's license?

In any case, adding uncommented generated assembly isn't acceptable.  The most
common convention for Linux kernel crypto is to use hand-written assembly that
is properly commented.

There is some precedent for using compiler intrinsics instead, e.g.
crypto/aegis128-neon-inner.c.  (I'm not sure why they aren't used more often.)

There are also some files where a Perl script generates the assembly code.
(This is a bit ugly IMO, but it's what the author of much of OpenSSL's crypto
assembly code does, and it was desired to reuse that code.)

Anyway, those are the available options.  Checking in some uncommented generated
assembly isn't one of them.

- Eric

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 06/10] crypto: add rocksoft 64b crc framework
  2022-02-22 19:50   ` Eric Biggers
  2022-02-22 19:54     ` Eric Biggers
@ 2022-02-22 20:09     ` Keith Busch
  2022-02-25 16:11       ` Christoph Hellwig
  1 sibling, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 20:09 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 11:50:42AM -0800, Eric Biggers wrote:
> On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> > +config CRYPTO_CRC64_ROCKSOFT
> > +	tristate "Rocksoft Model CRC64 algorithm"
> > +	depends on CRC64
> > +	select CRYPTO_HASH
> > +	help
> > +	  Rocksoft Model CRC64 computation is being cast as a crypto
> > +	  transform. This allows for faster crc64 transforms to be used
> > +	  if they are available.
> 
> The first sentence of this help text doesn't make sense.

Much of the this patch is a copy from the equivalent T10DIF option,
which says the same as above. I meant to revist these help texts because
they don't make sense to me either. I'll be sure to do that for the next
version.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* RE: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 18:43         ` Joe Perches
@ 2022-02-22 20:09           ` David Laight
  2022-02-22 20:31             ` Joe Perches
  0 siblings, 1 reply; 47+ messages in thread
From: David Laight @ 2022-02-22 20:09 UTC (permalink / raw)
  To: 'Joe Perches', Keith Busch, Christoph Hellwig
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	martin.petersen, colyli, Bart Van Assche

From: Joe Perches
> Sent: 22 February 2022 18:43
> 
> On Tue, 2022-02-22 at 08:56 -0800, Keith Busch wrote:
> > On Tue, Feb 22, 2022 at 05:50:45PM +0100, Christoph Hellwig wrote:
> > > On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > > > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > > > +/ *
> > > > > + * lower_48_bits - return bits 0-47 of a number
> > > > > + * @n: the number we're accessing
> > > > > + */
> > > > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > > >
> > > > why not make this a static inline function?
> > >
> > > Agreed.
> >
> > Sure, that sounds good to me. I only did it this way to match the
> > existing local convention, but I personally prefer the inline function
> > too.
> 
> The existing convention is used there to allow the compiler to
> avoid warnings and unnecessary conversions of a u32 to a u64 when
> shifting by 32 or more bits.
> 
> If it's possible to be used with an architecture dependent typedef
> like dma_addr_t, then perhaps it's reasonable to do something like:
> 
> #define lower_48_bits(val)					\
> ({								\
> 	typeof(val) high = lower_16_bits(upper_32_bits(val));	\
> 	typeof(val) low = lower_32_bits(val);			\
> 								\
> 	(high << 16 << 16) | low;				\
> })
> 
> and have the compiler have the return value be an appropriate type.

The compiler could make a real pigs breakfast of that.

For lower_46_bits() an integer promotion to u64 does no harm.
But for some other cases you get in a right mess when values
that should be unsigned get sign extended.

Although I think:
	(val) & (((typeof(val))1 << 48) - 1)
avoids any promotion if anyone tries lower_48_bits(int_var).
(It is even likely to be a compile error.)

Oh, did you look for GENMASK([^,]*,[ 0]*) ?
I'd only use something GENMASK() for bit ranges.
Even then it is often easier to just write the value in hex.

I think the only time I've written anything like that recently
(last 30 years) was for some hardware registers when the documentation
user 'bit 1' for the most significant bit.

It's rather like I just know that (x & (x - 1)) checks for 1 bit being set.
I have to lookup is_power_of_2() to see what it does.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 20:09           ` David Laight
@ 2022-02-22 20:31             ` Joe Perches
  2022-02-22 21:12               ` Keith Busch
  0 siblings, 1 reply; 47+ messages in thread
From: Joe Perches @ 2022-02-22 20:31 UTC (permalink / raw)
  To: David Laight, Keith Busch, Christoph Hellwig
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	martin.petersen, colyli, Bart Van Assche

On Tue, 2022-02-22 at 20:09 +0000, David Laight wrote:
> From: Joe Perches
> > Sent: 22 February 2022 18:43
> > 
> > On Tue, 2022-02-22 at 08:56 -0800, Keith Busch wrote:
> > > On Tue, Feb 22, 2022 at 05:50:45PM +0100, Christoph Hellwig wrote:
> > > > On Tue, Feb 22, 2022 at 08:45:53AM -0800, Joe Perches wrote:
> > > > > On Tue, 2022-02-22 at 08:31 -0800, Keith Busch wrote:
> > > > > > +/ *
> > > > > > + * lower_48_bits - return bits 0-47 of a number
> > > > > > + * @n: the number we're accessing
> > > > > > + */
> > > > > > +#define lower_48_bits(n) ((u64)((n) & 0xffffffffffffull))
> > > > > 
> > > > > why not make this a static inline function?
> > > > 
> > > > Agreed.
> > > 
> > > Sure, that sounds good to me. I only did it this way to match the
> > > existing local convention, but I personally prefer the inline function
> > > too.
> > 
> > The existing convention is used there to allow the compiler to
> > avoid warnings and unnecessary conversions of a u32 to a u64 when
> > shifting by 32 or more bits.
> > 
> > If it's possible to be used with an architecture dependent typedef
> > like dma_addr_t, then perhaps it's reasonable to do something like:
> > 
> > #define lower_48_bits(val)					\
> > ({								\
> > 	typeof(val) high = lower_16_bits(upper_32_bits(val));	\
> > 	typeof(val) low = lower_32_bits(val);			\
> > 								\
> > 	(high << 16 << 16) | low;				\
> > })
> > 
> > and have the compiler have the return value be an appropriate type.
> 
> The compiler could make a real pigs breakfast of that.

Both gcc and clang optimize it just fine.

btw: to return the same type the last line should be:

	(typeof(val))((high << 16 << 16) | low);

otherwise the return is sizeof(int) if typeof(val) is not u64

> Oh, did you look for GENMASK([^,]*,[ 0]*) ?

No, why?  I did look for 47, 0 though.

But it's pretty common really.

$ git grep -P 'GENMASK(?:_ULL)?\s*\(\s*\d+\s*,\s*0\s*\)' | wc -l
6233

> I'd only use something GENMASK() for bit ranges.
> Even then it is often easier to just write the value in hex.

Mostly it's the count of the repeated f that's difficult to
quickly verify visually.

> I think the only time I've written anything like that recently
> (last 30 years) was for some hardware registers when the documentation
> user 'bit 1' for the most significant bit.

Luckily the hardware I've had to deal with never did that.
Not even the least significant bit too.



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64
  2022-02-22 20:06       ` Eric Biggers
@ 2022-02-22 20:51         ` Keith Busch
  0 siblings, 0 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-22 20:51 UTC (permalink / raw)
  To: Eric Biggers
  Cc: David Laight, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 12:06:39PM -0800, Eric Biggers wrote:
> Is the license of that code compatible with the kernel's license?

It's released into "public domain", so I assume we can leverage it into
GPL licenced code. I don't have similar past experiences with this
scenario, so please correct me if I'm mistaken.
 
> In any case, adding uncommented generated assembly isn't acceptable.  The most
> common convention for Linux kernel crypto is to use hand-written assembly that
> is properly commented.
>
> There is some precedent for using compiler intrinsics instead, e.g.
> crypto/aegis128-neon-inner.c.  (I'm not sure why they aren't used more often.)
>
> There are also some files where a Perl script generates the assembly code.
> (This is a bit ugly IMO, but it's what the author of much of OpenSSL's crypto
> assembly code does, and it was desired to reuse that code.)
> 
> Anyway, those are the available options.  Checking in some uncommented generated
> assembly isn't one of them.

Fair enough. I'll find help from someone to author an appropriate form
to replace this patch.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 20:31             ` Joe Perches
@ 2022-02-22 21:12               ` Keith Busch
  2022-02-22 21:17                 ` Joe Perches
  0 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-22 21:12 UTC (permalink / raw)
  To: Joe Perches
  Cc: David Laight, Christoph Hellwig, linux-nvme, linux-block,
	linux-crypto, x86, linux-kernel, axboe, martin.petersen, colyli,
	Bart Van Assche

On Tue, Feb 22, 2022 at 12:31:30PM -0800, Joe Perches wrote:
> > I'd only use something GENMASK() for bit ranges.
> > Even then it is often easier to just write the value in hex.
> 
> Mostly it's the count of the repeated f that's difficult to
> quickly verify visually.

I admit I made this counting mistake in earlier versions of this series.

I think the earlier suggested "(1ull << 48) - 1" style in an inline
function seems good, and it doesn't appear to cause compiler concerns.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro
  2022-02-22 21:12               ` Keith Busch
@ 2022-02-22 21:17                 ` Joe Perches
  0 siblings, 0 replies; 47+ messages in thread
From: Joe Perches @ 2022-02-22 21:17 UTC (permalink / raw)
  To: Keith Busch
  Cc: David Laight, Christoph Hellwig, linux-nvme, linux-block,
	linux-crypto, x86, linux-kernel, axboe, martin.petersen, colyli,
	Bart Van Assche

On Tue, 2022-02-22 at 13:12 -0800, Keith Busch wrote:
> On Tue, Feb 22, 2022 at 12:31:30PM -0800, Joe Perches wrote:
> > > I'd only use something GENMASK() for bit ranges.
> > > Even then it is often easier to just write the value in hex.
> > 
> > Mostly it's the count of the repeated f that's difficult to
> > quickly verify visually.
> 
> I admit I made this counting mistake in earlier versions of this series.

It's simply hard for humans to do...

https://en.wikipedia.org/wiki/Subitizing



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 01/10] block: support pi with extended metadata
  2022-02-22 16:31 ` [PATCHv3 01/10] block: support pi with extended metadata Keith Busch
@ 2022-02-25 16:01   ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:01 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli, Hannes Reinecke

Looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 02/10] nvme: allow integrity on extended metadata formats
  2022-02-22 16:31 ` [PATCHv3 02/10] nvme: allow integrity on extended metadata formats Keith Busch
@ 2022-02-25 16:02   ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:02 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli, Hannes Reinecke

On Tue, Feb 22, 2022 at 08:31:36AM -0800, Keith Busch wrote:
> The block integrity subsystem knows how to construct protection
> information buffers with metadata beyond the protection information
> fields. Remove the driver restriction.
> 
> Note, this can only work if the PI field appears first in the metadata,
> as the integrity subsystem doesn't calculate guard tags on preceding
> metadata.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors
  2022-02-22 16:31 ` [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors Keith Busch
  2022-02-22 16:52   ` Chaitanya Kulkarni
@ 2022-02-25 16:03   ` Christoph Hellwig
  2022-02-25 17:53     ` Joe Perches
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:03 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli, Hannes Reinecke, Arnd Bergmann

On Tue, Feb 22, 2022 at 08:31:37AM -0800, Keith Busch wrote:
> The NVMe protocol extended the data integrity fields with unaligned
> 48-bit reference tags. Provide some helper accessors in
> preparation for these.
> 
> Reviewed-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> Acked-by: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Keith Busch <kbusch@kernel.org>

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 05/10] lib: add rocksoft model crc64
  2022-02-22 16:31 ` [PATCHv3 05/10] lib: add rocksoft model crc64 Keith Busch
@ 2022-02-25 16:04   ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:04 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli, Hannes Reinecke

Looks good,

Reviewed-by: Christoph Hellwig <hch@lst.de>

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 07/10] lib: add crc64 tests
  2022-02-22 16:31 ` [PATCHv3 07/10] lib: add crc64 tests Keith Busch
  2022-02-22 16:50   ` Chaitanya Kulkarni
@ 2022-02-25 16:05   ` Christoph Hellwig
  2022-02-25 16:12     ` Keith Busch
  1 sibling, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:05 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 08:31:41AM -0800, Keith Busch wrote:
> Provide a module to test the rocksoft crc64 calculations with well known
> inputs and exepected values. Check the generic table implementation and
> whatever module is registered from the crypto library, and compare their
> speeds.

The code looks good, but isn't the trent to use the kunit framework
for these kinds of tests these days?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 06/10] crypto: add rocksoft 64b crc framework
  2022-02-22 20:09     ` Keith Busch
@ 2022-02-25 16:11       ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:11 UTC (permalink / raw)
  To: Keith Busch
  Cc: Eric Biggers, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, hch, martin.petersen, colyli

On Tue, Feb 22, 2022 at 12:09:07PM -0800, Keith Busch wrote:
> On Tue, Feb 22, 2022 at 11:50:42AM -0800, Eric Biggers wrote:
> > On Tue, Feb 22, 2022 at 08:31:40AM -0800, Keith Busch wrote:
> > > +config CRYPTO_CRC64_ROCKSOFT
> > > +	tristate "Rocksoft Model CRC64 algorithm"
> > > +	depends on CRC64
> > > +	select CRYPTO_HASH
> > > +	help
> > > +	  Rocksoft Model CRC64 computation is being cast as a crypto
> > > +	  transform. This allows for faster crc64 transforms to be used
> > > +	  if they are available.
> > 
> > The first sentence of this help text doesn't make sense.
> 
> Much of the this patch is a copy from the equivalent T10DIF option,
> which says the same as above. I meant to revist these help texts because
> they don't make sense to me either. I'll be sure to do that for the next
> version.

It might make sense to make CRC_T10DIF an invisible option as well
and drop the help text entirely.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 07/10] lib: add crc64 tests
  2022-02-25 16:05   ` Christoph Hellwig
@ 2022-02-25 16:12     ` Keith Busch
  2022-02-25 16:19       ` Christoph Hellwig
  0 siblings, 1 reply; 47+ messages in thread
From: Keith Busch @ 2022-02-25 16:12 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	martin.petersen, colyli

On Fri, Feb 25, 2022 at 05:05:09PM +0100, Christoph Hellwig wrote:
> On Tue, Feb 22, 2022 at 08:31:41AM -0800, Keith Busch wrote:
> > Provide a module to test the rocksoft crc64 calculations with well known
> > inputs and exepected values. Check the generic table implementation and
> > whatever module is registered from the crypto library, and compare their
> > speeds.
> 
> The code looks good, but isn't the trent to use the kunit framework
> for these kinds of tests these days?

I don't have experience with kunit, but I'll look into that.

I am already changing the way this gets tested. Eric recommended adding
to the crypto "testmgr", and I've done that on my private tree. That
test framework exercises a lot more than this this patch, and it did
reveal a problem with how I've implemented the initial XOR when the
buffer is split, so I have some minor updates coming soon.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 08/10] block: add pi for nvme enhanced integrity
  2022-02-22 16:31 ` [PATCHv3 08/10] block: add pi for nvme enhanced integrity Keith Busch
@ 2022-02-25 16:14   ` Christoph Hellwig
  2022-03-02  3:15     ` Martin K. Petersen
  0 siblings, 1 reply; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli, Hannes Reinecke

On Tue, Feb 22, 2022 at 08:31:42AM -0800, Keith Busch wrote:
> +static __be64 nvme_pi_crc64(void *data, unsigned int len)
> +{
> +	return cpu_to_be64(crc64_rocksoft(data, len));
> +}
> +
> +static blk_status_t nvme_crc64_generate(struct blk_integrity_iter *iter,
> +					enum t10_dif_type type)

Shouldn't the naming be something more like ext_pi_*?  For one thing
I kinda hate having the nvme prefix here in block layer code, but also
nvme supports the normal 8 byte PI tuples, so this is a bit confusing.

> +{
> +	unsigned int i;
> +
> +	for (i = 0 ; i < iter->data_size ; i += iter->interval) {
> +		struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
> +
> +		pi->guard_tag = nvme_pi_crc64(iter->data_buf, iter->interval);
> +		pi->app_tag = 0;
> +
> +		if (type == T10_PI_TYPE1_PROTECTION)
> +			put_unaligned_be48(iter->seed, pi->ref_tag);
> +		else
> +			put_unaligned_be48(0ULL, pi->ref_tag);
> +
> +		iter->data_buf += iter->interval;
> +		iter->prot_buf += iter->tuple_size;
> +		iter->seed++;
> +	}
> +
> +	return BLK_STS_OK;
> +}
> +
> +static bool nvme_crc64_ref_escape(u8 *ref_tag)
> +{
> +	static u8 ref_escape[6] = { 0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
> +
> +	return memcmp(ref_tag, ref_escape, sizeof(ref_escape)) == 0;
> +}
> +
> +static blk_status_t nvme_crc64_verify(struct blk_integrity_iter *iter,
> +				      enum t10_dif_type type)
> +{
> +	unsigned int i;
> +
> +	for (i = 0; i < iter->data_size; i += iter->interval) {
> +		struct nvme_crc64_pi_tuple *pi = iter->prot_buf;
> +		u64 ref, seed;
> +		__be64 csum;
> +
> +		if (type == T10_PI_TYPE1_PROTECTION) {
> +			if (pi->app_tag == T10_PI_APP_ESCAPE)
> +				goto next;
> +
> +			ref = get_unaligned_be48(pi->ref_tag);
> +			seed = lower_48_bits(iter->seed);
> +			if (ref != seed) {
> +				pr_err("%s: ref tag error at location %llu (rcvd %llu)\n",
> +					iter->disk_name, seed, ref);
> +				return BLK_STS_PROTECTION;
> +			}
> +		} else if (type == T10_PI_TYPE3_PROTECTION) {
> +			if (pi->app_tag == T10_PI_APP_ESCAPE &&
> +			    nvme_crc64_ref_escape(pi->ref_tag))
> +				goto next;
> +		}
> +
> +		csum = nvme_pi_crc64(iter->data_buf, iter->interval);
> +		if (pi->guard_tag != csum) {
> +			pr_err("%s: guard tag error at sector %llu " \
> +			       "(rcvd %016llx, want %016llx)\n",
> +				iter->disk_name, (unsigned long long)iter->seed,
> +				be64_to_cpu(pi->guard_tag), be64_to_cpu(csum));
> +			return BLK_STS_PROTECTION;
> +		}
> +
> +next:
> +		iter->data_buf += iter->interval;
> +		iter->prot_buf += iter->tuple_size;
> +		iter->seed++;
> +	}
> +
> +	return BLK_STS_OK;
> +}
> +
> +static blk_status_t nvme_pi_type1_verify_crc(struct blk_integrity_iter *iter)
> +{
> +	return nvme_crc64_verify(iter, T10_PI_TYPE1_PROTECTION);
> +}
> +
> +static blk_status_t nvme_pi_type1_generate_crc(struct blk_integrity_iter *iter)
> +{
> +	return nvme_crc64_generate(iter, T10_PI_TYPE1_PROTECTION);
> +}
> +
> +static void nvme_pi_type1_prepare(struct request *rq)
> +{
> +	const int tuple_sz = rq->q->integrity.tuple_size;
> +	u64 ref_tag = nvme_pi_extended_ref_tag(rq);
> +	struct bio *bio;
> +
> +	__rq_for_each_bio(bio, rq) {
> +		struct bio_integrity_payload *bip = bio_integrity(bio);
> +		u64 virt = lower_48_bits(bip_get_seed(bip));
> +		struct bio_vec iv;
> +		struct bvec_iter iter;
> +
> +		/* Already remapped? */
> +		if (bip->bip_flags & BIP_MAPPED_INTEGRITY)
> +			break;
> +
> +		bip_for_each_vec(iv, bip, iter) {
> +			unsigned int j;
> +			void *p;
> +
> +			p = bvec_kmap_local(&iv);
> +			for (j = 0; j < iv.bv_len; j += tuple_sz) {
> +				struct nvme_crc64_pi_tuple *pi = p;
> +				u64 ref = get_unaligned_be48(pi->ref_tag);
> +
> +				if (ref == virt)
> +					put_unaligned_be48(ref_tag, pi->ref_tag);
> +				virt++;
> +				ref_tag++;
> +				p += tuple_sz;
> +			}
> +			kunmap_local(p);
> +		}
> +
> +		bip->bip_flags |= BIP_MAPPED_INTEGRITY;
> +	}
> +}
> +
> +static void nvme_pi_type1_complete(struct request *rq, unsigned int nr_bytes)
> +{
> +	unsigned intervals = nr_bytes >> rq->q->integrity.interval_exp;
> +	const int tuple_sz = rq->q->integrity.tuple_size;
> +	u64 ref_tag = nvme_pi_extended_ref_tag(rq);
> +	struct bio *bio;
> +
> +	__rq_for_each_bio(bio, rq) {
> +		struct bio_integrity_payload *bip = bio_integrity(bio);
> +		u64 virt = lower_48_bits(bip_get_seed(bip));
> +		struct bio_vec iv;
> +		struct bvec_iter iter;
> +
> +		bip_for_each_vec(iv, bip, iter) {
> +			unsigned int j;
> +			void *p;
> +
> +			p = bvec_kmap_local(&iv);
> +			for (j = 0; j < iv.bv_len && intervals; j += tuple_sz) {
> +				struct nvme_crc64_pi_tuple *pi = p;
> +				u64 ref = get_unaligned_be48(pi->ref_tag);
> +
> +				if (ref == ref_tag)
> +					put_unaligned_be48(virt, pi->ref_tag);
> +				virt++;
> +				ref_tag++;
> +				intervals--;
> +				p += tuple_sz;
> +			}
> +			kunmap_local(p);
> +		}
> +	}
> +}
> +
> +static blk_status_t nvme_pi_type3_verify_crc(struct blk_integrity_iter *iter)
> +{
> +	return nvme_crc64_verify(iter, T10_PI_TYPE3_PROTECTION);
> +}
> +
> +static blk_status_t nvme_pi_type3_generate_crc(struct blk_integrity_iter *iter)
> +{
> +	return nvme_crc64_generate(iter, T10_PI_TYPE3_PROTECTION);
> +}
> +
> +const struct blk_integrity_profile nvme_pi_type1_crc64 = {
> +	.name			= "NVME-DIF-TYPE1-CRC64",
> +	.generate_fn		= nvme_pi_type1_generate_crc,
> +	.verify_fn		= nvme_pi_type1_verify_crc,
> +	.prepare_fn		= nvme_pi_type1_prepare,
> +	.complete_fn		= nvme_pi_type1_complete,
> +};
> +EXPORT_SYMBOL(nvme_pi_type1_crc64);

All this should probably be EXPORT_SYMBOL_GPL (and I have a series
pending that would remove the exports of the profiles entirely,
but that will need some major rework after your series goes in).

The actual code looks fine to me.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 09/10] nvme: add support for enhanced metadata
  2022-02-22 16:31 ` [PATCHv3 09/10] nvme: add support for enhanced metadata Keith Busch
@ 2022-02-25 16:17   ` Christoph Hellwig
  2022-03-02  3:18   ` Martin K. Petersen
  1 sibling, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:17 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli, Hannes Reinecke, Klaus Jensen

On Tue, Feb 22, 2022 at 08:31:43AM -0800, Keith Busch wrote:
> @@ -918,6 +918,30 @@ static blk_status_t nvme_setup_discard(struct nvme_ns *ns, struct request *req,
>  	return BLK_STS_OK;
>  }
>  
> +static inline void nvme_set_ref_tag(struct nvme_ns *ns, struct nvme_command *cmnd,

Overly long line. But I don't think this should be marked inline anyway.

Otherwise this looks good:

Reviewed-by: Christoph Hellwig <hch@lst.de>

I suspect eventually we'll need to lift the new Identify call into
the common revalidation once more fields get added there.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 07/10] lib: add crc64 tests
  2022-02-25 16:12     ` Keith Busch
@ 2022-02-25 16:19       ` Christoph Hellwig
  0 siblings, 0 replies; 47+ messages in thread
From: Christoph Hellwig @ 2022-02-25 16:19 UTC (permalink / raw)
  To: Keith Busch
  Cc: Christoph Hellwig, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, martin.petersen, colyli

On Fri, Feb 25, 2022 at 08:12:59AM -0800, Keith Busch wrote:
> I don't have experience with kunit, but I'll look into that.
> 
> I am already changing the way this gets tested. Eric recommended adding
> to the crypto "testmgr", and I've done that on my private tree. That
> test framework exercises a lot more than this this patch, and it did
> reveal a problem with how I've implemented the initial XOR when the
> buffer is split, so I have some minor updates coming soon.

I guess if we exercise the algorithm through that we don't really need
another low-level test anyway, right?

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors
  2022-02-25 16:03   ` Christoph Hellwig
@ 2022-02-25 17:53     ` Joe Perches
  2022-02-25 17:59       ` Keith Busch
  0 siblings, 1 reply; 47+ messages in thread
From: Joe Perches @ 2022-02-25 17:53 UTC (permalink / raw)
  To: Christoph Hellwig, Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	martin.petersen, colyli, Hannes Reinecke, Arnd Bergmann

On Fri, 2022-02-25 at 17:03 +0100, Christoph Hellwig wrote:
> On Tue, Feb 22, 2022 at 08:31:37AM -0800, Keith Busch wrote:
> > The NVMe protocol extended the data integrity fields with unaligned
> > 48-bit reference tags. Provide some helper accessors in
> > preparation for these.
> > 
> > Reviewed-by: Hannes Reinecke <hare@suse.de>
> > Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > Signed-off-by: Keith Busch <kbusch@kernel.org>
> 
> Looks good,
> 
> Reviewed-by: Christoph Hellwig <hch@lst.de>

Perhaps for completeness this should also add the le48 variants
like the 24 bit accessors above this.


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors
  2022-02-25 17:53     ` Joe Perches
@ 2022-02-25 17:59       ` Keith Busch
  0 siblings, 0 replies; 47+ messages in thread
From: Keith Busch @ 2022-02-25 17:59 UTC (permalink / raw)
  To: Joe Perches
  Cc: Christoph Hellwig, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, martin.petersen, colyli, Hannes Reinecke,
	Arnd Bergmann

On Fri, Feb 25, 2022 at 09:53:11AM -0800, Joe Perches wrote:
> On Fri, 2022-02-25 at 17:03 +0100, Christoph Hellwig wrote:
> > On Tue, Feb 22, 2022 at 08:31:37AM -0800, Keith Busch wrote:
> > > The NVMe protocol extended the data integrity fields with unaligned
> > > 48-bit reference tags. Provide some helper accessors in
> > > preparation for these.
> > > 
> > > Reviewed-by: Hannes Reinecke <hare@suse.de>
> > > Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
> > > Acked-by: Arnd Bergmann <arnd@arndb.de>
> > > Signed-off-by: Keith Busch <kbusch@kernel.org>
> > 
> > Looks good,
> > 
> > Reviewed-by: Christoph Hellwig <hch@lst.de>
> 
> Perhaps for completeness this should also add the le48 variants
> like the 24 bit accessors above this.

I don't know of a user for le48 at this time, and kernel API's without
users often get culled. If you think it's useful, though, I can
certainly include it.

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 08/10] block: add pi for nvme enhanced integrity
  2022-02-25 16:14   ` Christoph Hellwig
@ 2022-03-02  3:15     ` Martin K. Petersen
  0 siblings, 0 replies; 47+ messages in thread
From: Martin K. Petersen @ 2022-03-02  3:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, linux-block, linux-crypto, x86,
	linux-kernel, axboe, martin.petersen, colyli, Hannes Reinecke


Christoph,

>> +static blk_status_t nvme_crc64_generate(struct blk_integrity_iter *iter,
>> +					enum t10_dif_type type)
>
> Shouldn't the naming be something more like ext_pi_*?  For one thing
> I kinda hate having the nvme prefix here in block layer code, but also
> nvme supports the normal 8 byte PI tuples, so this is a bit confusing.

The rationale behind the original t10 prefix was that the format was
defined by the T10 organization. At the time a T13 format was also on
the table. So from that perspective, using nvme_ here is correct. I do
like ext_ better, though.

I don't particularly appreciate the way the new formats were defined in
NVMe. I would have preferred new types instead of this "just like type N
except for all these differences" approach. But that comes from NVMe
completely missing how DIX removed all the format type knowledge from
the controller/device and instead put the burden on the driver to tell
the device what and how to check.

In any case: Naming is hard, the code looks fine to me.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCHv3 09/10] nvme: add support for enhanced metadata
  2022-02-22 16:31 ` [PATCHv3 09/10] nvme: add support for enhanced metadata Keith Busch
  2022-02-25 16:17   ` Christoph Hellwig
@ 2022-03-02  3:18   ` Martin K. Petersen
  1 sibling, 0 replies; 47+ messages in thread
From: Martin K. Petersen @ 2022-03-02  3:18 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, linux-block, linux-crypto, x86, linux-kernel, axboe,
	hch, martin.petersen, colyli, Hannes Reinecke, Klaus Jensen


Keith,

> NVM Express ratified TP 4068 defines new protection information
> formats.  Implement support for the CRC64 guard tags.
>
> Since the block layer doesn't support variable length reference tags,
> driver support for the Storage Tag space is not supported at this
> time.

Looks fine.

Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>

-- 
Martin K. Petersen	Oracle Linux Engineering

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2022-03-02  3:19 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-22 16:31 [PATCHv3 00/10] 64-bit data integrity field support Keith Busch
2022-02-22 16:31 ` [PATCHv3 01/10] block: support pi with extended metadata Keith Busch
2022-02-25 16:01   ` Christoph Hellwig
2022-02-22 16:31 ` [PATCHv3 02/10] nvme: allow integrity on extended metadata formats Keith Busch
2022-02-25 16:02   ` Christoph Hellwig
2022-02-22 16:31 ` [PATCHv3 03/10] asm-generic: introduce be48 unaligned accessors Keith Busch
2022-02-22 16:52   ` Chaitanya Kulkarni
2022-02-25 16:03   ` Christoph Hellwig
2022-02-25 17:53     ` Joe Perches
2022-02-25 17:59       ` Keith Busch
2022-02-22 16:31 ` [PATCHv3 04/10] linux/kernel: introduce lower_48_bits macro Keith Busch
2022-02-22 16:45   ` Joe Perches
2022-02-22 16:50     ` Christoph Hellwig
2022-02-22 16:56       ` Keith Busch
2022-02-22 18:43         ` Joe Perches
2022-02-22 20:09           ` David Laight
2022-02-22 20:31             ` Joe Perches
2022-02-22 21:12               ` Keith Busch
2022-02-22 21:17                 ` Joe Perches
2022-02-22 16:58       ` Joe Perches
2022-02-22 17:09       ` David Laight
2022-02-22 17:14       ` Chaitanya Kulkarni
2022-02-22 16:48   ` Chaitanya Kulkarni
2022-02-22 16:31 ` [PATCHv3 05/10] lib: add rocksoft model crc64 Keith Busch
2022-02-25 16:04   ` Christoph Hellwig
2022-02-22 16:31 ` [PATCHv3 06/10] crypto: add rocksoft 64b crc framework Keith Busch
2022-02-22 19:50   ` Eric Biggers
2022-02-22 19:54     ` Eric Biggers
2022-02-22 20:09     ` Keith Busch
2022-02-25 16:11       ` Christoph Hellwig
2022-02-22 19:56   ` Eric Biggers
2022-02-22 16:31 ` [PATCHv3 07/10] lib: add crc64 tests Keith Busch
2022-02-22 16:50   ` Chaitanya Kulkarni
2022-02-25 16:05   ` Christoph Hellwig
2022-02-25 16:12     ` Keith Busch
2022-02-25 16:19       ` Christoph Hellwig
2022-02-22 16:31 ` [PATCHv3 08/10] block: add pi for nvme enhanced integrity Keith Busch
2022-02-25 16:14   ` Christoph Hellwig
2022-03-02  3:15     ` Martin K. Petersen
2022-02-22 16:31 ` [PATCHv3 09/10] nvme: add support for enhanced metadata Keith Busch
2022-02-25 16:17   ` Christoph Hellwig
2022-03-02  3:18   ` Martin K. Petersen
2022-02-22 16:31 ` [PATCHv3 10/10] x86/crypto: add pclmul acceleration for crc64 Keith Busch
2022-02-22 17:02   ` David Laight
2022-02-22 17:14     ` Keith Busch
2022-02-22 20:06       ` Eric Biggers
2022-02-22 20:51         ` Keith Busch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).