linux-btrfs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/4] Support xxhash64 checksums
@ 2019-08-22 11:40 Johannes Thumshirn
  2019-08-22 11:40 ` [PATCH v2 1/4] btrfs: turn checksum type define into a enum Johannes Thumshirn
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 11:40 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

Now that Nikolay's XXHASH64 support for the Crypto API has landed and BTRFS is
prepared for an easy addition of new checksums, this patchset implements
XXHASH64 as a second, fast but not cryptographically secure checksum hash.

For changes since v1, please see the individual patches.

David Sterba (1):
  btrfs: sysfs: export supported checksums

Johannes Thumshirn (3):
  btrfs: turn checksum type define into a enum
  btrfs: create structure to encode checksum type and length
  btrfs: use xxhash64 for checksumming

 fs/btrfs/Kconfig                |  1 +
 fs/btrfs/ctree.h                | 17 ++++++++++++-----
 fs/btrfs/disk-io.c              |  1 +
 fs/btrfs/super.c                |  1 +
 fs/btrfs/sysfs.c                | 29 +++++++++++++++++++++++++++++
 include/uapi/linux/btrfs_tree.h |  5 ++++-
 6 files changed, 48 insertions(+), 6 deletions(-)

-- 
2.16.4


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2 1/4] btrfs: turn checksum type define into a enum
  2019-08-22 11:40 [PATCH v2 0/4] Support xxhash64 checksums Johannes Thumshirn
@ 2019-08-22 11:40 ` Johannes Thumshirn
  2019-08-22 11:40 ` [PATCH v2 2/4] btrfs: create structure to encode checksum type and length Johannes Thumshirn
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 11:40 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

Turn the checksum type definition into a enum. This eases later addition
of new checksums.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 include/uapi/linux/btrfs_tree.h | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index 71246c1941aa..b65c7ee75bc7 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -300,7 +300,9 @@
 #define BTRFS_CSUM_SIZE 32
 
 /* csum types */
-#define BTRFS_CSUM_TYPE_CRC32	0
+enum btrfs_csum_type {
+	BTRFS_CSUM_TYPE_CRC32	= 0,
+};
 
 /*
  * flags definitions for directory entry item type
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 2/4] btrfs: create structure to encode checksum type and length
  2019-08-22 11:40 [PATCH v2 0/4] Support xxhash64 checksums Johannes Thumshirn
  2019-08-22 11:40 ` [PATCH v2 1/4] btrfs: turn checksum type define into a enum Johannes Thumshirn
@ 2019-08-22 11:40 ` Johannes Thumshirn
  2019-08-22 12:11   ` Johannes Thumshirn
  2019-08-22 13:22   ` [PATCH v2.1] " Johannes Thumshirn
  2019-08-22 11:40 ` [PATCH v2 3/4] btrfs: use xxhash64 for checksumming Johannes Thumshirn
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 11:40 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

Create a structure to encode the type and length for the known on-disk
checksums. Also add a table and a convenience macro for adding the
checksum types to the table.

This makes it easier to add new checksums later.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

---
Changes to v1:
- Remove initializer macro (David)
---
 fs/btrfs/ctree.h | 16 +++++++++++-----
 1 file changed, 11 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b161224b5a0b..327ca7e95549 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -82,9 +82,15 @@ struct btrfs_ref;
  */
 #define BTRFS_LINK_MAX 65535U
 
-/* four bytes for CRC32 */
-static const int btrfs_csum_sizes[] = { 4 };
-static const char *btrfs_csum_names[] = { "crc32c" };
+#define BTRFS_CHECKSUM_TYPE(_type, _size, _name) \
+	[_type] = { .size = _size, .name = _name }
+
+static const struct btrfs_csums {
+	u16		size;
+	const char	*name;
+} btrfs_csums[] = {
+	[BTRFS_CSUM_TYPE_CRC32] = { .size = 4, .name = "crc32c" },
+};
 
 #define BTRFS_EMPTY_DIR_SIZE 0
 
@@ -2207,13 +2213,13 @@ static inline int btrfs_super_csum_size(const struct btrfs_super_block *s)
 	/*
 	 * csum type is validated at mount time
 	 */
-	return btrfs_csum_sizes[t];
+	return btrfs_csums[t].size;
 }
 
 static inline const char *btrfs_super_csum_name(u16 csum_type)
 {
 	/* csum type is validated at mount time */
-	return btrfs_csum_names[csum_type];
+	return btrfs_csums[csum_type].name;
 }
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 3/4] btrfs: use xxhash64 for checksumming
  2019-08-22 11:40 [PATCH v2 0/4] Support xxhash64 checksums Johannes Thumshirn
  2019-08-22 11:40 ` [PATCH v2 1/4] btrfs: turn checksum type define into a enum Johannes Thumshirn
  2019-08-22 11:40 ` [PATCH v2 2/4] btrfs: create structure to encode checksum type and length Johannes Thumshirn
@ 2019-08-22 11:40 ` Johannes Thumshirn
  2019-08-22 11:40 ` [PATCH v2 4/4] btrfs: sysfs: export supported checksums Johannes Thumshirn
  2019-08-22 12:28 ` [PATCH v2 0/4] Support xxhash64 checksums Holger Hoffstätte
  4 siblings, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 11:40 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
---
 fs/btrfs/Kconfig                | 1 +
 fs/btrfs/ctree.h                | 1 +
 fs/btrfs/disk-io.c              | 1 +
 fs/btrfs/super.c                | 1 +
 include/uapi/linux/btrfs_tree.h | 1 +
 5 files changed, 5 insertions(+)

diff --git a/fs/btrfs/Kconfig b/fs/btrfs/Kconfig
index 38651fae7f21..6d5a01c57da3 100644
--- a/fs/btrfs/Kconfig
+++ b/fs/btrfs/Kconfig
@@ -5,6 +5,7 @@ config BTRFS_FS
 	select CRYPTO
 	select CRYPTO_CRC32C
 	select LIBCRC32C
+	select CRYPTO_XXHASH
 	select ZLIB_INFLATE
 	select ZLIB_DEFLATE
 	select LZO_COMPRESS
diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index 327ca7e95549..10fa3a6fe8bf 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -90,6 +90,7 @@ static const struct btrfs_csums {
 	const char	*name;
 } btrfs_csums[] = {
 	[BTRFS_CSUM_TYPE_CRC32] = { .size = 4, .name = "crc32c" },
+	[BTRFS_CSUM_TYPE_XXHASH] = { .size = 8, .name = "xxhash64" },
 };
 
 #define BTRFS_EMPTY_DIR_SIZE 0
diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 99dfd889b9f7..ac039a4d23ff 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -352,6 +352,7 @@ static bool btrfs_supported_super_csum(u16 csum_type)
 {
 	switch (csum_type) {
 	case BTRFS_CSUM_TYPE_CRC32:
+	case BTRFS_CSUM_TYPE_XXHASH:
 		return true;
 	default:
 		return false;
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 1b151af25772..60116d0410e5 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2456,3 +2456,4 @@ module_exit(exit_btrfs_fs)
 
 MODULE_LICENSE("GPL");
 MODULE_SOFTDEP("pre: crc32c");
+MODULE_SOFTDEP("pre: xxhash64");
diff --git a/include/uapi/linux/btrfs_tree.h b/include/uapi/linux/btrfs_tree.h
index b65c7ee75bc7..ba2f125a3a1c 100644
--- a/include/uapi/linux/btrfs_tree.h
+++ b/include/uapi/linux/btrfs_tree.h
@@ -302,6 +302,7 @@
 /* csum types */
 enum btrfs_csum_type {
 	BTRFS_CSUM_TYPE_CRC32	= 0,
+	BTRFS_CSUM_TYPE_XXHASH	= 1,
 };
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH v2 4/4] btrfs: sysfs: export supported checksums
  2019-08-22 11:40 [PATCH v2 0/4] Support xxhash64 checksums Johannes Thumshirn
                   ` (2 preceding siblings ...)
  2019-08-22 11:40 ` [PATCH v2 3/4] btrfs: use xxhash64 for checksumming Johannes Thumshirn
@ 2019-08-22 11:40 ` Johannes Thumshirn
  2019-08-22 12:28 ` [PATCH v2 0/4] Support xxhash64 checksums Holger Hoffstätte
  4 siblings, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 11:40 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

From: David Sterba <dsterba@suse.com>

Export supported checksum algorithms via sysfs.

Signed-off-by: David Sterba <dsterba@suse.com>
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

---
Changes to v1:
- Removed btrfs_checksums_store() function (Nik)
- Renamed sysfs file to supported_checksums
---
 fs/btrfs/sysfs.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/fs/btrfs/sysfs.c b/fs/btrfs/sysfs.c
index f6d3c80f2e28..1cd351d2be03 100644
--- a/fs/btrfs/sysfs.c
+++ b/fs/btrfs/sysfs.c
@@ -246,6 +246,24 @@ static umode_t btrfs_feature_visible(struct kobject *kobj,
 	return mode;
 }
 
+static ssize_t btrfs_supported_checksums_show(struct kobject *kobj,
+					      struct kobj_attribute *a,
+					      char *buf)
+{
+	ssize_t ret = 0;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(btrfs_csums); i++) {
+		ret += snprintf(buf + ret, PAGE_SIZE, "%s%s",
+				(i == 0 ? "" : ", "),
+				btrfs_csums[i].name);
+
+	}
+
+	ret += snprintf(buf + ret, PAGE_SIZE, "\n");
+	return ret;
+}
+
 BTRFS_FEAT_ATTR_INCOMPAT(mixed_backref, MIXED_BACKREF);
 BTRFS_FEAT_ATTR_INCOMPAT(default_subvol, DEFAULT_SUBVOL);
 BTRFS_FEAT_ATTR_INCOMPAT(mixed_groups, MIXED_GROUPS);
@@ -259,6 +277,14 @@ BTRFS_FEAT_ATTR_INCOMPAT(no_holes, NO_HOLES);
 BTRFS_FEAT_ATTR_INCOMPAT(metadata_uuid, METADATA_UUID);
 BTRFS_FEAT_ATTR_COMPAT_RO(free_space_tree, FREE_SPACE_TREE);
 
+static struct btrfs_feature_attr btrfs_attr_features_checksums_name = {
+	.kobj_attr = __INIT_KOBJ_ATTR(supported_checksums, S_IRUGO,
+				      btrfs_supported_checksums_show,
+				      NULL),
+	.feature_set	= FEAT_INCOMPAT,
+	.feature_bit	= 0,
+};
+
 static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(mixed_backref),
 	BTRFS_FEAT_ATTR_PTR(default_subvol),
@@ -272,6 +298,9 @@ static struct attribute *btrfs_supported_feature_attrs[] = {
 	BTRFS_FEAT_ATTR_PTR(no_holes),
 	BTRFS_FEAT_ATTR_PTR(metadata_uuid),
 	BTRFS_FEAT_ATTR_PTR(free_space_tree),
+
+	&btrfs_attr_features_checksums_name.kobj_attr.attr,
+
 	NULL
 };
 
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/4] btrfs: create structure to encode checksum type and length
  2019-08-22 11:40 ` [PATCH v2 2/4] btrfs: create structure to encode checksum type and length Johannes Thumshirn
@ 2019-08-22 12:11   ` Johannes Thumshirn
  2019-08-22 13:22   ` [PATCH v2.1] " Johannes Thumshirn
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 12:11 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist

On Thu, Aug 22, 2019 at 01:40:27PM +0200, Johannes Thumshirn wrote:
> Create a structure to encode the type and length for the known on-disk
> checksums. Also add a table and a convenience macro for adding the
> checksum types to the table.
> 
> This makes it easier to add new checksums later.
> 
> Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
> 
> ---
> Changes to v1:
> - Remove initializer macro (David)

> +#define BTRFS_CHECKSUM_TYPE(_type, _size, _name) \
> +	[_type] = { .size = _size, .name = _name }
> +

Adn obviously this should be gone as well *facepalm*

-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-22 11:40 [PATCH v2 0/4] Support xxhash64 checksums Johannes Thumshirn
                   ` (3 preceding siblings ...)
  2019-08-22 11:40 ` [PATCH v2 4/4] btrfs: sysfs: export supported checksums Johannes Thumshirn
@ 2019-08-22 12:28 ` Holger Hoffstätte
  2019-08-22 12:54   ` Johannes Thumshirn
  2019-08-22 15:40   ` Peter Becker
  4 siblings, 2 replies; 15+ messages in thread
From: Holger Hoffstätte @ 2019-08-22 12:28 UTC (permalink / raw)
  To: Linux BTRFS Mailinglist

On 8/22/19 1:40 PM, Johannes Thumshirn wrote:
> Now that Nikolay's XXHASH64 support for the Crypto API has landed and BTRFS is
> prepared for an easy addition of new checksums, this patchset implements
> XXHASH64 as a second, fast but not cryptographically secure checksum hash.

Question from the cheap seats.. :)

I know that crc32c-intel uses native SSE 4.2 instructions, but so far I have
been unable to find benchmarks or explanations why adding xxhash64 benefits
btrfs. All benchmarks seem to be against crc32c in *software*, not the
SSE4.2-enabled version (or I can't read). I mean, it's great that xxhash64 is
really fast for a software implementation, but how does btrfs benefit from this
compared to using crc32-intel?

Verifying that plugging in other hash impls works (e.g. as preparation for
stronger impls) has value, but it's probably not something most
users care about.

Maybe there are obscure downsides to crc32c-intel like instruction latency
(def. a problem for AVX512), cache pollution..?

Just curious.

thanks,
Holger

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-22 12:28 ` [PATCH v2 0/4] Support xxhash64 checksums Holger Hoffstätte
@ 2019-08-22 12:54   ` Johannes Thumshirn
  2019-08-22 15:40   ` Peter Becker
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 12:54 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Linux BTRFS Mailinglist

On Thu, Aug 22, 2019 at 02:28:53PM +0200, Holger Hoffstätte wrote:
> On 8/22/19 1:40 PM, Johannes Thumshirn wrote:
> > Now that Nikolay's XXHASH64 support for the Crypto API has landed and BTRFS is
> > prepared for an easy addition of new checksums, this patchset implements
> > XXHASH64 as a second, fast but not cryptographically secure checksum hash.
> 
> Question from the cheap seats.. :)
> 
> I know that crc32c-intel uses native SSE 4.2 instructions, but so far I have
> been unable to find benchmarks or explanations why adding xxhash64 benefits
> btrfs. All benchmarks seem to be against crc32c in *software*, not the
> SSE4.2-enabled version (or I can't read). I mean, it's great that xxhash64 is
> really fast for a software implementation, but how does btrfs benefit from this
> compared to using crc32-intel?
> 
> Verifying that plugging in other hash impls works (e.g. as preparation for
> stronger impls) has value, but it's probably not something most
> users care about.
> 
> Maybe there are obscure downsides to crc32c-intel like instruction latency
> (def. a problem for AVX512), cache pollution..?
> 
> Just curious.

It's not so much about the performance aspect of xxhash64 vs crc32c. xxhash64
has a lower collission proability compared to crc32c, which for instance makes
it a good candidate to use for de-duplication.

HTH,
	Johannes
-- 
Johannes Thumshirn                            SUSE Labs Filesystems
jthumshirn@suse.de                                +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH v2.1] btrfs: create structure to encode checksum type and length
  2019-08-22 11:40 ` [PATCH v2 2/4] btrfs: create structure to encode checksum type and length Johannes Thumshirn
  2019-08-22 12:11   ` Johannes Thumshirn
@ 2019-08-22 13:22   ` Johannes Thumshirn
  1 sibling, 0 replies; 15+ messages in thread
From: Johannes Thumshirn @ 2019-08-22 13:22 UTC (permalink / raw)
  To: David Sterba; +Cc: Linux BTRFS Mailinglist, Johannes Thumshirn

Create a structure to encode the type and length for the known on-disk
checksums. Also add a table and a convenience macro for adding the
checksum types to the table.

This makes it easier to add new checksums later.

Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>

---
Changes to v2:
- Really remove initializer macro *doh*

Changes to v1:
- Remove initializer macro (David)
---
 fs/btrfs/ctree.h | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/ctree.h b/fs/btrfs/ctree.h
index b161224b5a0b..139354d02dfa 100644
--- a/fs/btrfs/ctree.h
+++ b/fs/btrfs/ctree.h
@@ -82,9 +82,12 @@ struct btrfs_ref;
  */
 #define BTRFS_LINK_MAX 65535U
 
-/* four bytes for CRC32 */
-static const int btrfs_csum_sizes[] = { 4 };
-static const char *btrfs_csum_names[] = { "crc32c" };
+static const struct btrfs_csums {
+	u16		size;
+	const char	*name;
+} btrfs_csums[] = {
+	[BTRFS_CSUM_TYPE_CRC32] = { .size = 4, .name = "crc32c" },
+};
 
 #define BTRFS_EMPTY_DIR_SIZE 0
 
@@ -2207,13 +2210,13 @@ static inline int btrfs_super_csum_size(const struct btrfs_super_block *s)
 	/*
 	 * csum type is validated at mount time
 	 */
-	return btrfs_csum_sizes[t];
+	return btrfs_csums[t].size;
 }
 
 static inline const char *btrfs_super_csum_name(u16 csum_type)
 {
 	/* csum type is validated at mount time */
-	return btrfs_csum_names[csum_type];
+	return btrfs_csums[csum_type].name;
 }
 
 /*
-- 
2.16.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-22 12:28 ` [PATCH v2 0/4] Support xxhash64 checksums Holger Hoffstätte
  2019-08-22 12:54   ` Johannes Thumshirn
@ 2019-08-22 15:40   ` Peter Becker
  2019-08-23  9:38     ` Paul Jones
  1 sibling, 1 reply; 15+ messages in thread
From: Peter Becker @ 2019-08-22 15:40 UTC (permalink / raw)
  To: Holger Hoffstätte; +Cc: Linux BTRFS Mailinglist

Am Do., 22. Aug. 2019 um 16:41 Uhr schrieb Holger Hoffstätte
<holger@applied-asynchrony.com>:
> but how does btrfs benefit from this compared to using crc32-intel?

As i know, crc32c  is as far as ~3x faster than xxhash. But xxHash was
created with a differend design goal.
If you using a cpu without hardware crc32 support, xxHash provides you
a maximum portability and speed. Look at arm, mips, power, etc. or old
intel cpus like Core 2 Duo.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-22 15:40   ` Peter Becker
@ 2019-08-23  9:38     ` Paul Jones
  2019-08-23  9:43       ` Paul Jones
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Jones @ 2019-08-23  9:38 UTC (permalink / raw)
  To: Peter Becker, Holger Hoffstätte; +Cc: Linux BTRFS Mailinglist

> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
> owner@vger.kernel.org> On Behalf Of Peter Becker
> Sent: Friday, 23 August 2019 1:40 AM
> To: Holger Hoffstätte <holger@applied-asynchrony.com>
> Cc: Linux BTRFS Mailinglist <linux-btrfs@vger.kernel.org>
> Subject: Re: [PATCH v2 0/4] Support xxhash64 checksums
> 
> Am Do., 22. Aug. 2019 um 16:41 Uhr schrieb Holger Hoffstätte
> <holger@applied-asynchrony.com>:
> > but how does btrfs benefit from this compared to using crc32-intel?
> 
> As i know, crc32c  is as far as ~3x faster than xxhash. But xxHash was created
> with a differend design goal.
> If you using a cpu without hardware crc32 support, xxHash provides you a
> maximum portability and speed. Look at arm, mips, power, etc. or old intel
> cpus like Core 2 Duo.

I've got a modified version of smhasher (https://github.com/PeeJay/smhasher) that tests speed and cryptographics of various hashing functions.

Crc32 Software -  379.91 MiB/sec
Crc32 Hardware - 7338.60 MiB/sec
XXhash64 Software - 12094.40 MiB/sec

Testing done on a 1st Gen Ryzen. Impressive numbers from XXhash64.


Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-23  9:38     ` Paul Jones
@ 2019-08-23  9:43       ` Paul Jones
  2019-08-23 17:08         ` Adam Borowski
  0 siblings, 1 reply; 15+ messages in thread
From: Paul Jones @ 2019-08-23  9:43 UTC (permalink / raw)
  To: Paul Jones, Peter Becker, Holger Hoffstätte; +Cc: Linux BTRFS Mailinglist



> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
> owner@vger.kernel.org> On Behalf Of Paul Jones
> Sent: Friday, 23 August 2019 7:39 PM
> To: Peter Becker <floyd.net@gmail.com>; Holger Hoffstätte
> <holger@applied-asynchrony.com>
> Cc: Linux BTRFS Mailinglist <linux-btrfs@vger.kernel.org>
> Subject: RE: [PATCH v2 0/4] Support xxhash64 checksums
> 
> > -----Original Message-----
> > From: linux-btrfs-owner@vger.kernel.org <linux-btrfs-
> > owner@vger.kernel.org> On Behalf Of Peter Becker
> > Sent: Friday, 23 August 2019 1:40 AM
> > To: Holger Hoffstätte <holger@applied-asynchrony.com>
> > Cc: Linux BTRFS Mailinglist <linux-btrfs@vger.kernel.org>
> > Subject: Re: [PATCH v2 0/4] Support xxhash64 checksums
> >
> > Am Do., 22. Aug. 2019 um 16:41 Uhr schrieb Holger Hoffstätte
> > <holger@applied-asynchrony.com>:
> > > but how does btrfs benefit from this compared to using crc32-intel?
> >
> > As i know, crc32c  is as far as ~3x faster than xxhash. But xxHash was
> > created with a differend design goal.
> > If you using a cpu without hardware crc32 support, xxHash provides you
> > a maximum portability and speed. Look at arm, mips, power, etc. or old
> > intel cpus like Core 2 Duo.
> 
> I've got a modified version of smhasher
> (https://github.com/PeeJay/smhasher) that tests speed and cryptographics
> of various hashing functions.

I forgot to add xxhash32
 
Crc32 Software -  379.91 MiB/sec
Crc32 Hardware - 7338.60 MiB/sec
XXhash64 Software - 12094.40 MiB/sec
XXhash32 Software - 6060.11 MiB/sec

Testing done on a 1st Gen Ryzen. Impressive numbers from XXhash64.
 
 
Paul.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-23  9:43       ` Paul Jones
@ 2019-08-23 17:08         ` Adam Borowski
  2019-08-26 12:27           ` Austin S. Hemmelgarn
  0 siblings, 1 reply; 15+ messages in thread
From: Adam Borowski @ 2019-08-23 17:08 UTC (permalink / raw)
  To: Paul Jones; +Cc: Peter Becker, Holger Hoffstätte, Linux BTRFS Mailinglist

On Fri, Aug 23, 2019 at 09:43:22AM +0000, Paul Jones wrote:
> > > Am Do., 22. Aug. 2019 um 16:41 Uhr schrieb Holger Hoffstätte
> > > <holger@applied-asynchrony.com>:
> > > > but how does btrfs benefit from this compared to using crc32-intel?
> > >
> > > As i know, crc32c  is as far as ~3x faster than xxhash. But xxHash was
> > > created with a differend design goal.
> > > If you using a cpu without hardware crc32 support, xxHash provides you
> > > a maximum portability and speed. Look at arm, mips, power, etc. or old
> > > intel cpus like Core 2 Duo.
> > 
> > I've got a modified version of smhasher
> > (https://github.com/PeeJay/smhasher) that tests speed and cryptographics
> > of various hashing functions.
> 
> I forgot to add xxhash32
>  
> Crc32 Software -  379.91 MiB/sec
> Crc32 Hardware - 7338.60 MiB/sec
> XXhash64 Software - 12094.40 MiB/sec
> XXhash32 Software - 6060.11 MiB/sec
> 
> Testing done on a 1st Gen Ryzen. Impressive numbers from XXhash64.

Newest biggest Threadripper (2990WX, no 3* version released yet):
crc32      -   492.75 MiB/sec
crc32hw    -  9447.37 MiB/sec
crc64      -  1959.51 MiB/sec
xxhash32   -  7479.29 MiB/sec
xxhash64   - 14911.58 MiB/sec

An old Skylake (i7-6700):
crc32      -   359.32 MiB/sec
crc32hw    - 21119.68 MiB/sec
crc64      -  1656.34 MiB/sec
xxhash32   -  5989.87 MiB/sec
xxhash64   - 11949.41 MiB/sec

Cascade Lake (0000%@):
crc32hw 1.92× as fast as xxhash64.

So you want crc32hw on Intel, xxhash64 on AMD.

crc32 also allows going back to old kernels; the improved collision
resistance of xxhash64 is not a reason as if you intend to dedupe you want
a crypto hash so you don't need to verify.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋  The root of a real enemy is an imaginary friend.
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-23 17:08         ` Adam Borowski
@ 2019-08-26 12:27           ` Austin S. Hemmelgarn
  2019-08-27  0:33             ` Adam Borowski
  0 siblings, 1 reply; 15+ messages in thread
From: Austin S. Hemmelgarn @ 2019-08-26 12:27 UTC (permalink / raw)
  To: Adam Borowski, Paul Jones
  Cc: Peter Becker, Holger Hoffstätte, Linux BTRFS Mailinglist

On 2019-08-23 13:08, Adam Borowski wrote:
> the improved collision
> resistance of xxhash64 is not a reason as if you intend to dedupe you want
> a crypto hash so you don't need to verify.

The improved collision resistance is a roughly 10 orders of magnitude 
reduction in the chance of a collision.  That may not matter for most, 
but it's a significant improvement for anybody operating at large enough 
scale that media errors are commonplace.

Also, you would still need to verify even if you're using whatever the 
fanciest new collision resistant cryptographic hash is, because the 
number of possible input values is still more than _nine thousand_ 
orders of magnitude larger than the total number of output values even 
if we use a 512-bit cryptographic hash.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 0/4] Support xxhash64 checksums
  2019-08-26 12:27           ` Austin S. Hemmelgarn
@ 2019-08-27  0:33             ` Adam Borowski
  0 siblings, 0 replies; 15+ messages in thread
From: Adam Borowski @ 2019-08-27  0:33 UTC (permalink / raw)
  To: Austin S. Hemmelgarn
  Cc: Paul Jones, Peter Becker, Holger Hoffstätte,
	Linux BTRFS Mailinglist

On Mon, Aug 26, 2019 at 08:27:15AM -0400, Austin S. Hemmelgarn wrote:
> On 2019-08-23 13:08, Adam Borowski wrote:
> > the improved collision
> > resistance of xxhash64 is not a reason as if you intend to dedupe you want
> > a crypto hash so you don't need to verify.
> 
> The improved collision resistance is a roughly 10 orders of magnitude
> reduction in the chance of a collision.  That may not matter for most, but
> it's a significant improvement for anybody operating at large enough scale
> that media errors are commonplace.

Hash size doesn't matter vs media errors.  You don't have billions of
mismatches: the first one is a cause of alarm, so 1-in-4294967296 chance of
failing to notice it hardly ever matters (even though it _can_ happen in
real life as opposed to collisions below).

I can think of a bigger hash useful in three cases:
* recovering from a split-brain RAID
* recovering from one disk of a RAID having had a large piece scribbled upon
* finding candidates for deduplication (but see below why not 64-bit)

> Also, you would still need to verify even if you're using whatever the
> fanciest new collision resistant cryptographic hash is, because the number
> of possible input values is still more than _nine thousand_ orders of
> magnitude larger than the total number of output values even if we use a
> 512-bit cryptographic hash.

You're underestimating how rare crypto-strength hash collisions are.

There are two scenarios: unintentional, and malicious.

Let's go with unintentional first: the age of the Universe is 2^58.5
seconds.  The fastest disk (non-pmem) is NVMe-connected Optane, at 240000
IOPS.  That's 2^17.8.  With a 256-bit hash, the mass of machines needed for
a single expected collision within the age of Universe exceeds the mass of
observable Universe itself.

So, malicious.  We demand a non-broken hash, which in crypto speak means
there's no known attack better than brute force.  An iterative approach is
right out; the best space-time tradeoff is birthday attack, which requires
storage size akin to the root of # of combinations (ie, half the hash
length).  It's drastically better: at current best storage densities, you'd
need only the mass of the Earth.

Please let me know when you'll build that Earth-sized computer, so I can
migrate from weak SHA256 to eg. BLAKE2b.

On the other hand, computers and memories get hit by cosmic rays, thermal
noise, and so on at a non-negligible rate.  Any theoretical chance of a hash
collision is dwarfed by flaws of technology we have.  Or, eg, by the chance
that you'll get hit by multiple lightings the next time you leave your
house.

Thus: no, you don't need to recheck after SHA256.


Meow!
-- 
⢀⣴⠾⠻⢶⣦⠀
⣾⠁⢠⠒⠀⣿⡁
⢿⡄⠘⠷⠚⠋  The root of a real enemy is an imaginary friend.
⠈⠳⣄⠀⠀⠀⠀

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2019-08-27  0:33 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-22 11:40 [PATCH v2 0/4] Support xxhash64 checksums Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 1/4] btrfs: turn checksum type define into a enum Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 2/4] btrfs: create structure to encode checksum type and length Johannes Thumshirn
2019-08-22 12:11   ` Johannes Thumshirn
2019-08-22 13:22   ` [PATCH v2.1] " Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 3/4] btrfs: use xxhash64 for checksumming Johannes Thumshirn
2019-08-22 11:40 ` [PATCH v2 4/4] btrfs: sysfs: export supported checksums Johannes Thumshirn
2019-08-22 12:28 ` [PATCH v2 0/4] Support xxhash64 checksums Holger Hoffstätte
2019-08-22 12:54   ` Johannes Thumshirn
2019-08-22 15:40   ` Peter Becker
2019-08-23  9:38     ` Paul Jones
2019-08-23  9:43       ` Paul Jones
2019-08-23 17:08         ` Adam Borowski
2019-08-26 12:27           ` Austin S. Hemmelgarn
2019-08-27  0:33             ` Adam Borowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).