All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter
@ 2023-08-29 12:41 Jingbo Xu
  2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
  To: hsiangkao, linux-erofs

changes since v5:
- patch 1: update distribution license
- patch 3: rebase to the latest -dev branch

changes since v4:
- patch 3: the feature is refactored and implemented in a more cohesive
  way, that is, the hashbit is calculated in erofs_export_xattr_ibody().
  For xattr_item in the long xattr name prefix style, reconstruct the
  full xattr name first, then derive the corresponding predefined short
  name prefix and finally calculate the hashbit.

changes since v3:
- patch 3: "-Exattr-name-filter" option rather than "--xattr-filter"
  option is newly introduced to enable this feature (Gao Xiang)

changes since v2:
- patch 2: introduce xattr_filter_reserved in on-disk superblock;
  remove EROFS_XATTR_FILTER_MASK
- patch 3: xattr_filter_reserved is always initialized to 0 by default


changes since RFC:
- the number of hash functions is 1, and now it's implemented as:
    xxh32(name, strlen(name), EROFS_XATTR_FILTER_SEED + index),
  where the constant magic number EROFS_XATTR_FILTER_SEED [*] is used to
  give a better spread for the mapping. (Alexander Larsson)
- fix the value of EROFS_FEATURE_COMPAT_XATTR_FILTER; rename
  EROFS_XATTR_BLOOM_* to EROFS_XATTR_FILTER_* (Gao Xiang)


RFC: https://lore.kernel.org/all/20230621083939.128961-1-jefflexu@linux.alibaba.com/
v2: https://lore.kernel.org/all/20230705071017.104130-1-jefflexu@linux.alibaba.com/
v3: https://lore.kernel.org/all/20230712121331.99671-1-jefflexu@linux.alibaba.com/
v4: https://lore.kernel.org/all/20230714025330.42950-1-jefflexu@linux.alibaba.com/
v5: https://lore.kernel.org/all/20230722042414.126630-1-jefflexu@linux.alibaba.com/

The xattr bloom filter feature is used to boost the negative xattr
lookup.

Refer to the kernel patch set [*] for more details.

[*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd73a4395d47


Jingbo Xu (3):
  erofs-utils: add xxh32 library
  erofs-utils: update on-disk format for xattr name filter
  erofs-utils: mkfs: enable xattr name filter

 include/erofs/config.h   |   1 +
 include/erofs/internal.h |   1 +
 include/erofs/xxhash.h   |  72 ++++++++++++++++++++++
 include/erofs_fs.h       |  12 +++-
 lib/Makefile.am          |   3 +-
 lib/xattr.c              |  63 ++++++++++++++++++++
 lib/xxhash.c             | 125 +++++++++++++++++++++++++++++++++++++++
 mkfs/main.c              |   7 +++
 8 files changed, 281 insertions(+), 3 deletions(-)
 create mode 100644 include/erofs/xxhash.h
 create mode 100644 lib/xxhash.c

-- 
2.19.1.6.gb485710b


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH v6 1/3] erofs-utils: add xxh32 library
  2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
@ 2023-08-29 12:41 ` Jingbo Xu
  2023-08-29 12:44   ` Gao Xiang
  2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
  2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
  2 siblings, 1 reply; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
  To: hsiangkao, linux-erofs

Add xxh32 library which could be used by following xattr bloom filter
feature.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 include/erofs/xxhash.h |  72 ++++++++++++++++++++++++
 lib/Makefile.am        |   3 +-
 lib/xxhash.c           | 125 +++++++++++++++++++++++++++++++++++++++++
 3 files changed, 199 insertions(+), 1 deletion(-)
 create mode 100644 include/erofs/xxhash.h
 create mode 100644 lib/xxhash.c

diff --git a/include/erofs/xxhash.h b/include/erofs/xxhash.h
new file mode 100644
index 0000000..5459bd8
--- /dev/null
+++ b/include/erofs/xxhash.h
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
+/*
+ * The xxhash is copied from the linux kernel at:
+ *	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
+ *
+ * The original copyright is:
+ *
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above
+ *     copyright notice, this list of conditions and the following disclaimer
+ *     in the documentation and/or other materials provided with the
+ *     distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: https://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+#ifndef __EROFS_XXHASH_H
+#define __EROFS_XXHASH_H
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#include "defs.h"
+
+/**
+ * xxh32() - calculate the 32-bit hash of the input with a given seed.
+ *
+ * @input:  The data to hash.
+ * @length: The length of the data to hash.
+ * @seed:   The seed can be used to alter the result predictably.
+ *
+ * Return:  The 32-bit hash of the data.
+ */
+uint32_t xxh32(const void *input, size_t length, uint32_t seed);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/Makefile.am b/lib/Makefile.am
index 7a5dc03..3e09516 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -24,6 +24,7 @@ noinst_HEADERS = $(top_srcdir)/include/erofs_fs.h \
       $(top_srcdir)/include/erofs/xattr.h \
       $(top_srcdir)/include/erofs/compress_hints.h \
       $(top_srcdir)/include/erofs/fragments.h \
+      $(top_srcdir)/include/erofs/xxhash.h \
       $(top_srcdir)/lib/liberofs_private.h
 
 noinst_HEADERS += compressor.h
@@ -31,7 +32,7 @@ liberofs_la_SOURCES = config.c io.c cache.c super.c inode.c xattr.c exclude.c \
 		      namei.c data.c compress.c compressor.c zmap.c decompress.c \
 		      compress_hints.c hashmap.c sha256.c blobchunk.c dir.c \
 		      fragments.c rb_tree.c dedupe.c uuid_unparse.c uuid.c tar.c \
-		      block_list.c
+		      block_list.c xxhash.c
 
 liberofs_la_CFLAGS = -Wall ${libuuid_CFLAGS} -I$(top_srcdir)/include
 if ENABLE_LZ4
diff --git a/lib/xxhash.c b/lib/xxhash.c
new file mode 100644
index 0000000..3d77fe3
--- /dev/null
+++ b/lib/xxhash.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
+/*
+ * The xxhash is copied from the linux kernel at:
+ *	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
+ *
+ * The original copyright is:
+ *
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ *   * Redistributions of source code must retain the above copyright
+ *     notice, this list of conditions and the following disclaimer.
+ *   * Redistributions in binary form must reproduce the above
+ *     copyright notice, this list of conditions and the following disclaimer
+ *     in the documentation and/or other materials provided with the
+ *     distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: https://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+#include "erofs/xxhash.h"
+
+/*-*************************************
+ * Macros
+ **************************************/
+#define xxh_rotl32(x, r) ((x << r) | (x >> (32 - r)))
+
+/*-*************************************
+ * Constants
+ **************************************/
+static const uint32_t PRIME32_1 = 2654435761U;
+static const uint32_t PRIME32_2 = 2246822519U;
+static const uint32_t PRIME32_3 = 3266489917U;
+static const uint32_t PRIME32_4 =  668265263U;
+static const uint32_t PRIME32_5 =  374761393U;
+
+/*-***************************
+ * Simple Hash Functions
+ ****************************/
+static uint32_t xxh32_round(uint32_t seed, const uint32_t input)
+{
+	seed += input * PRIME32_2;
+	seed = xxh_rotl32(seed, 13);
+	seed *= PRIME32_1;
+	return seed;
+}
+
+uint32_t xxh32(const void *input, const size_t len, const uint32_t seed)
+{
+	const uint8_t *p = (const uint8_t *)input;
+	const uint8_t *b_end = p + len;
+	uint32_t h32;
+
+	if (len >= 16) {
+		const uint8_t *const limit = b_end - 16;
+		uint32_t v1 = seed + PRIME32_1 + PRIME32_2;
+		uint32_t v2 = seed + PRIME32_2;
+		uint32_t v3 = seed + 0;
+		uint32_t v4 = seed - PRIME32_1;
+
+		do {
+			v1 = xxh32_round(v1, get_unaligned_le32(p));
+			p += 4;
+			v2 = xxh32_round(v2, get_unaligned_le32(p));
+			p += 4;
+			v3 = xxh32_round(v3, get_unaligned_le32(p));
+			p += 4;
+			v4 = xxh32_round(v4, get_unaligned_le32(p));
+			p += 4;
+		} while (p <= limit);
+
+		h32 = xxh_rotl32(v1, 1) + xxh_rotl32(v2, 7) +
+			xxh_rotl32(v3, 12) + xxh_rotl32(v4, 18);
+	} else {
+		h32 = seed + PRIME32_5;
+	}
+
+	h32 += (uint32_t)len;
+
+	while (p + 4 <= b_end) {
+		h32 += get_unaligned_le32(p) * PRIME32_3;
+		h32 = xxh_rotl32(h32, 17) * PRIME32_4;
+		p += 4;
+	}
+
+	while (p < b_end) {
+		h32 += (*p) * PRIME32_5;
+		h32 = xxh_rotl32(h32, 11) * PRIME32_1;
+		p++;
+	}
+
+	h32 ^= h32 >> 15;
+	h32 *= PRIME32_2;
+	h32 ^= h32 >> 13;
+	h32 *= PRIME32_3;
+	h32 ^= h32 >> 16;
+
+	return h32;
+}
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter
  2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
  2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
@ 2023-08-29 12:41 ` Jingbo Xu
  2023-08-29 12:49   ` Gao Xiang
  2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
  2 siblings, 1 reply; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
  To: hsiangkao, linux-erofs

The xattr name bloom filter feature is going to be introduced to speed
up the negative xattr lookup, e.g. system.posix_acl_[access|default]
lookup when running "ls -lR" workload.

There are some commonly used extended attributes (n) and the total
number of these is approximately 30.

	trusted.overlay.opaque
	trusted.overlay.redirect
	trusted.overlay.origin
	trusted.overlay.impure
	trusted.overlay.nlink
	trusted.overlay.upper
	trusted.overlay.metacopy
	trusted.overlay.protattr
	user.overlay.opaque
	user.overlay.redirect
	user.overlay.origin
	user.overlay.impure
	user.overlay.nlink
	user.overlay.upper
	user.overlay.metacopy
	user.overlay.protattr
	security.evm
	security.ima
	security.selinux
	security.SMACK64
	security.SMACK64IPIN
	security.SMACK64IPOUT
	security.SMACK64EXEC
	security.SMACK64TRANSMUTE
	security.SMACK64MMAP
	security.apparmor
	security.capability
	system.posix_acl_access
	system.posix_acl_default
	user.mime_type

Given the number of bits of the bloom filter (m) is 32, the optimal
value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).

The single hash function is implemented as:

	xxh32(name, strlen(name), EROFS_XATTR_FILTER_SEED + index)

where `index` represents the index of corresponding predefined short name
prefix, while `name` represents the name string after stripping the above
predefined name prefix.

The constant magic number EROFS_XATTR_FILTER_SEED, i.e. 0x25BBE08F, is
used to give a better spread when mapping these 30 extended attributes
into 32-bit bloom filter as:

	bit  0: security.ima
	bit  1:
	bit  2: trusted.overlay.nlink
	bit  3:
	bit  4: user.overlay.nlink
	bit  5: trusted.overlay.upper
	bit  6: user.overlay.origin
	bit  7: trusted.overlay.protattr
	bit  8: security.apparmor
	bit  9: user.overlay.protattr
	bit 10: user.overlay.opaque
	bit 11: security.selinux
	bit 12: security.SMACK64TRANSMUTE
	bit 13: security.SMACK64
	bit 14: security.SMACK64MMAP
	bit 15: user.overlay.impure
	bit 16: security.SMACK64IPIN
	bit 17: trusted.overlay.redirect
	bit 18: trusted.overlay.origin
	bit 19: security.SMACK64IPOUT
	bit 20: trusted.overlay.opaque
	bit 21: system.posix_acl_default
	bit 22:
	bit 23: user.mime_type
	bit 24: trusted.overlay.impure
	bit 25: security.SMACK64EXEC
	bit 26: user.overlay.redirect
	bit 27: user.overlay.upper
	bit 28: security.evm
	bit 29: security.capability
	bit 30: system.posix_acl_access
	bit 31: trusted.overlay.metacopy, user.overlay.metacopy

h_name_filter is introduced to the on-disk per-inode xattr header to
place the corresponding xattr name filter, where bit value 1 indicates
non-existence for compatibility.

This feature is indicated by EROFS_FEATURE_COMPAT_XATTR_FILTER
compatible feature bit.

Reserve one byte in on-disk superblock as the on-disk format for xattr
name filter may change in the future.  With this flag we don't need
bothering these compatible bits again at that time.

Suggested-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 include/erofs_fs.h | 12 ++++++++++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/include/erofs_fs.h b/include/erofs_fs.h
index 850438a..2a111a4 100644
--- a/include/erofs_fs.h
+++ b/include/erofs_fs.h
@@ -14,6 +14,7 @@
 
 #define EROFS_FEATURE_COMPAT_SB_CHKSUM          0x00000001
 #define EROFS_FEATURE_COMPAT_MTIME              0x00000002
+#define EROFS_FEATURE_COMPAT_XATTR_FILTER	0x00000004
 
 /*
  * Any bits that aren't in EROFS_ALL_FEATURE_INCOMPAT should
@@ -82,7 +83,8 @@ struct erofs_super_block {
 	__u8 xattr_prefix_count;	/* # of long xattr name prefixes */
 	__le32 xattr_prefix_start;	/* start of long xattr prefixes */
 	__le64 packed_nid;	/* nid of the special packed inode */
-	__u8 reserved2[24];
+	__u8 xattr_filter_reserved; /* reserved for xattr name filter */
+	__u8 reserved2[23];
 };
 
 /*
@@ -201,7 +203,7 @@ struct erofs_inode_extended {
  * for read-only fs, no need to introduce h_refcount
  */
 struct erofs_xattr_ibody_header {
-	__le32 h_reserved;
+	__le32 h_name_filter;		/* bit value 1 indicates not-present */
 	__u8   h_shared_count;
 	__u8   h_reserved2[7];
 	__le32 h_shared_xattrs[0];      /* shared xattr id array */
@@ -222,6 +224,12 @@ struct erofs_xattr_ibody_header {
 #define EROFS_XATTR_LONG_PREFIX		0x80
 #define EROFS_XATTR_LONG_PREFIX_MASK	0x7f
 
+#define EROFS_XATTR_NAME_LEN_MAX	UCHAR_MAX
+
+#define EROFS_XATTR_FILTER_BITS		32
+#define EROFS_XATTR_FILTER_DEFAULT	UINT32_MAX
+#define EROFS_XATTR_FILTER_SEED		0x25BBE08F
+
 /* xattr entry (for both inline & shared xattrs) */
 struct erofs_xattr_entry {
 	__u8   e_name_len;      /* length of name */
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH v6 3/3] erofs-utils: mkfs: enable xattr name filter
  2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
  2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
  2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
@ 2023-08-29 12:41 ` Jingbo Xu
  2023-08-29 12:53   ` Gao Xiang
  2 siblings, 1 reply; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
  To: hsiangkao, linux-erofs

Introduce "-Exattr-name-filter" option to enable the xattr name bloom
filter feature.

Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
 include/erofs/config.h   |  1 +
 include/erofs/internal.h |  1 +
 lib/xattr.c              | 63 ++++++++++++++++++++++++++++++++++++++++
 mkfs/main.c              |  7 +++++
 4 files changed, 72 insertions(+)

diff --git a/include/erofs/config.h b/include/erofs/config.h
index 8f52d2c..c51f0cd 100644
--- a/include/erofs/config.h
+++ b/include/erofs/config.h
@@ -53,6 +53,7 @@ struct erofs_configure {
 	bool c_ignore_mtime;
 	bool c_showprogress;
 	bool c_extra_ea_name_prefixes;
+	bool c_xattr_name_filter;
 
 #ifdef HAVE_LIBSELINUX
 	struct selabel_handle *sehnd;
diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index 3e73eef..382024a 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -139,6 +139,7 @@ EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
 EROFS_FEATURE_FUNCS(dedupe, incompat, INCOMPAT_DEDUPE)
 EROFS_FEATURE_FUNCS(xattr_prefixes, incompat, INCOMPAT_XATTR_PREFIXES)
 EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
+EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
 
 #define EROFS_I_EA_INITED	(1 << 0)
 #define EROFS_I_Z_INITED	(1 << 1)
diff --git a/lib/xattr.c b/lib/xattr.c
index 46a301a..325241d 100644
--- a/lib/xattr.c
+++ b/lib/xattr.c
@@ -18,6 +18,7 @@
 #include "erofs/cache.h"
 #include "erofs/io.h"
 #include "erofs/fragments.h"
+#include "erofs/xxhash.h"
 #include "liberofs_private.h"
 
 #define EA_HASHTABLE_BITS 16
@@ -783,6 +784,63 @@ out:
 	return ret;
 }
 
+
+static int erofs_xattr_filter_hashbit(struct xattr_item *item)
+{
+	u8 prefix = item->prefix;
+	const char *key = item->kvbuf;
+	unsigned int len = item->len[0];
+	char *name = NULL;
+	uint32_t hashbit;
+
+	if (prefix & EROFS_XATTR_LONG_PREFIX) {
+		struct ea_type_node *tnode;
+		u16 prefix_len;
+		int ret;
+
+		list_for_each_entry(tnode, &ea_name_prefixes, list) {
+			if (tnode->index == item->prefix) {
+				ret = asprintf(&name, "%s%.*s",
+					       tnode->type.prefix, len, key);
+				if (ret < 0)
+					return -ENOMEM;
+				break;
+			}
+		}
+		if (!name)
+			return -ENOENT;
+
+		if (!match_base_prefix(name, &prefix, &prefix_len)) {
+			free(name);
+			return -ENOENT;
+		}
+		key = name + prefix_len;
+		len = strlen(key);
+	}
+
+	hashbit = xxh32(key, len, EROFS_XATTR_FILTER_SEED + prefix) &
+		  (EROFS_XATTR_FILTER_BITS - 1);
+	if (name)
+		free(name);
+	return hashbit;
+}
+
+static u32 erofs_xattr_filter_map(struct list_head *ixattrs)
+{
+	struct inode_xattr_node *node, *n;
+	u32 name_filter;
+	int hashbit;
+
+	name_filter = 0;
+	list_for_each_entry_safe(node, n, ixattrs, list) {
+		hashbit = erofs_xattr_filter_hashbit(node->item);
+		if (hashbit < 0)
+			return 0;
+		name_filter |= (1UL << hashbit);
+	}
+	return EROFS_XATTR_FILTER_DEFAULT & ~name_filter;
+}
+
 char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
 {
 	struct inode_xattr_node *node, *n;
@@ -797,6 +855,11 @@ char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
 	header = (struct erofs_xattr_ibody_header *)buf;
 	header->h_shared_count = 0;
 
+	if (cfg.c_xattr_name_filter) {
+		u32 name_filter = erofs_xattr_filter_map(ixattrs);
+		header->h_name_filter = cpu_to_le32(name_filter);
+	}
+
 	p = sizeof(struct erofs_xattr_ibody_header);
 	list_for_each_entry_safe(node, n, ixattrs, list) {
 		struct xattr_item *const item = node->item;
diff --git a/mkfs/main.c b/mkfs/main.c
index c03a7a8..fad80b1 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -245,6 +245,13 @@ handle_fragment:
 				return -EINVAL;
 			cfg.c_dedupe = true;
 		}
+
+		if (MATCH_EXTENTED_OPT("xattr-name-filter", token, keylen)) {
+			if (vallen)
+				return -EINVAL;
+			cfg.c_xattr_name_filter = true;
+			erofs_sb_set_xattr_filter(&sbi);
+		}
 	}
 	return 0;
 }
-- 
2.19.1.6.gb485710b


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/3] erofs-utils: add xxh32 library
  2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
@ 2023-08-29 12:44   ` Gao Xiang
  2023-08-29 14:14     ` Jingbo Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2023-08-29 12:44 UTC (permalink / raw)
  To: Jingbo Xu, linux-erofs



On 2023/8/29 20:41, Jingbo Xu wrote:
> Add xxh32 library which could be used by following xattr bloom filter
> feature.
> 
> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
> ---
>   include/erofs/xxhash.h |  72 ++++++++++++++++++++++++
>   lib/Makefile.am        |   3 +-
>   lib/xxhash.c           | 125 +++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 199 insertions(+), 1 deletion(-)
>   create mode 100644 include/erofs/xxhash.h
>   create mode 100644 lib/xxhash.c
> 
> diff --git a/include/erofs/xxhash.h b/include/erofs/xxhash.h
> new file mode 100644
> index 0000000..5459bd8
> --- /dev/null
> +++ b/include/erofs/xxhash.h
> @@ -0,0 +1,72 @@
> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only

You should use:
/* SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only */

for each header, but I guess we could drop the copyright
below since it's just a declaration?

> +/*
> + * The xxhash is copied from the linux kernel at:
> + *	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
> + *
> + * The original copyright is:
> + *
> + * xxHash - Extremely Fast Hash algorithm
> + * Copyright (C) 2012-2016, Yann Collet.
> + *
> + * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are
> + * met:
> + *
> + *   * Redistributions of source code must retain the above copyright
> + *     notice, this list of conditions and the following disclaimer.
> + *   * Redistributions in binary form must reproduce the above
> + *     copyright notice, this list of conditions and the following disclaimer
> + *     in the documentation and/or other materials provided with the
> + *     distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * This program is free software; you can redistribute it and/or modify it under
> + * the terms of the GNU General Public License version 2 as published by the
> + * Free Software Foundation. This program is dual-licensed; you may select
> + * either version 2 of the GNU General Public License ("GPL") or BSD license
> + * ("BSD").
> + *
> + * You can contact the author at:
> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
> + */
> +
> +#ifndef __EROFS_XXHASH_H
> +#define __EROFS_XXHASH_H
> +
> +#ifdef __cplusplus
> +extern "C"
> +{
> +#endif
> +
> +#include "defs.h"

#include <stdint.h> ?

> +
> +/**
> + * xxh32() - calculate the 32-bit hash of the input with a given seed.
> + *
> + * @input:  The data to hash.
> + * @length: The length of the data to hash.
> + * @seed:   The seed can be used to alter the result predictably.
> + *
> + * Return:  The 32-bit hash of the data.
> + */
> +uint32_t xxh32(const void *input, size_t length, uint32_t seed);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/Makefile.am b/lib/Makefile.am
> index 7a5dc03..3e09516 100644
> --- a/lib/Makefile.am
> +++ b/lib/Makefile.am
> @@ -24,6 +24,7 @@ noinst_HEADERS = $(top_srcdir)/include/erofs_fs.h \
>         $(top_srcdir)/include/erofs/xattr.h \
>         $(top_srcdir)/include/erofs/compress_hints.h \
>         $(top_srcdir)/include/erofs/fragments.h \
> +      $(top_srcdir)/include/erofs/xxhash.h \
>         $(top_srcdir)/lib/liberofs_private.h
>   
>   noinst_HEADERS += compressor.h
> @@ -31,7 +32,7 @@ liberofs_la_SOURCES = config.c io.c cache.c super.c inode.c xattr.c exclude.c \
>   		      namei.c data.c compress.c compressor.c zmap.c decompress.c \
>   		      compress_hints.c hashmap.c sha256.c blobchunk.c dir.c \
>   		      fragments.c rb_tree.c dedupe.c uuid_unparse.c uuid.c tar.c \
> -		      block_list.c
> +		      block_list.c xxhash.c
>   
>   liberofs_la_CFLAGS = -Wall ${libuuid_CFLAGS} -I$(top_srcdir)/include
>   if ENABLE_LZ4
> diff --git a/lib/xxhash.c b/lib/xxhash.c
> new file mode 100644
> index 0000000..3d77fe3
> --- /dev/null
> +++ b/lib/xxhash.c
> @@ -0,0 +1,125 @@
> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
> +/*
> + * The xxhash is copied from the linux kernel at:
> + *	https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
> + *
> + * The original copyright is:
> + *
> + * xxHash - Extremely Fast Hash algorithm
> + * Copyright (C) 2012-2016, Yann Collet.
> + *
> + * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are
> + * met:
> + *
> + *   * Redistributions of source code must retain the above copyright
> + *     notice, this list of conditions and the following disclaimer.
> + *   * Redistributions in binary form must reproduce the above
> + *     copyright notice, this list of conditions and the following disclaimer
> + *     in the documentation and/or other materials provided with the
> + *     distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * This program is free software; you can redistribute it and/or modify it under
> + * the terms of the GNU General Public License version 2 as published by the
> + * Free Software Foundation. This program is dual-licensed; you may select
> + * either version 2 of the GNU General Public License ("GPL") or BSD license
> + * ("BSD").
> + *
> + * You can contact the author at:
> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
> + */
> 

#include "erofs/defs.h"

> +#include "erofs/xxhash.h"
> +

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter
  2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
@ 2023-08-29 12:49   ` Gao Xiang
  2023-08-29 14:19     ` Jingbo Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2023-08-29 12:49 UTC (permalink / raw)
  To: Jingbo Xu, linux-erofs



On 2023/8/29 20:41, Jingbo Xu wrote:
> The xattr name bloom filter feature is going to be introduced to speed
> up the negative xattr lookup, e.g. system.posix_acl_[access|default]
> lookup when running "ls -lR" workload.


I think you could just say
"Let's keep in sync with kernel commit <12-char commit id>."

here.

...

> ---


...

> +#define EROFS_XATTR_NAME_LEN_MAX	UCHAR_MAX

where is EROFS_XATTR_NAME_LEN_MAX used?

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] erofs-utils: mkfs: enable xattr name filter
  2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
@ 2023-08-29 12:53   ` Gao Xiang
  2023-08-29 14:20     ` Jingbo Xu
  0 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2023-08-29 12:53 UTC (permalink / raw)
  To: Jingbo Xu, linux-erofs



On 2023/8/29 20:41, Jingbo Xu wrote:
> Introduce "-Exattr-name-filter" option to enable the xattr name bloom
> filter feature.
> 
> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
> ---
>   include/erofs/config.h   |  1 +
>   include/erofs/internal.h |  1 +
>   lib/xattr.c              | 63 ++++++++++++++++++++++++++++++++++++++++
>   mkfs/main.c              |  7 +++++
>   4 files changed, 72 insertions(+)
> 
> diff --git a/include/erofs/config.h b/include/erofs/config.h
> index 8f52d2c..c51f0cd 100644
> --- a/include/erofs/config.h
> +++ b/include/erofs/config.h
> @@ -53,6 +53,7 @@ struct erofs_configure {
>   	bool c_ignore_mtime;
>   	bool c_showprogress;
>   	bool c_extra_ea_name_prefixes;
> +	bool c_xattr_name_filter;
>   
>   #ifdef HAVE_LIBSELINUX
>   	struct selabel_handle *sehnd;
> diff --git a/include/erofs/internal.h b/include/erofs/internal.h
> index 3e73eef..382024a 100644
> --- a/include/erofs/internal.h
> +++ b/include/erofs/internal.h
> @@ -139,6 +139,7 @@ EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
>   EROFS_FEATURE_FUNCS(dedupe, incompat, INCOMPAT_DEDUPE)
>   EROFS_FEATURE_FUNCS(xattr_prefixes, incompat, INCOMPAT_XATTR_PREFIXES)
>   EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
> +EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>   
>   #define EROFS_I_EA_INITED	(1 << 0)
>   #define EROFS_I_Z_INITED	(1 << 1)
> diff --git a/lib/xattr.c b/lib/xattr.c
> index 46a301a..325241d 100644
> --- a/lib/xattr.c
> +++ b/lib/xattr.c
> @@ -18,6 +18,7 @@
>   #include "erofs/cache.h"
>   #include "erofs/io.h"
>   #include "erofs/fragments.h"
> +#include "erofs/xxhash.h"
>   #include "liberofs_private.h"
>   
>   #define EA_HASHTABLE_BITS 16
> @@ -783,6 +784,63 @@ out:
>   	return ret;
>   }
>   
> +
> +static int erofs_xattr_filter_hashbit(struct xattr_item *item)
> +{
> +	u8 prefix = item->prefix;
> +	const char *key = item->kvbuf;
> +	unsigned int len = item->len[0];
> +	char *name = NULL;
> +	uint32_t hashbit;
> +
> +	if (prefix & EROFS_XATTR_LONG_PREFIX) {
> +		struct ea_type_node *tnode;
> +		u16 prefix_len;
> +		int ret;
> +
> +		list_for_each_entry(tnode, &ea_name_prefixes, list) {
> +			if (tnode->index == item->prefix) {
> +				ret = asprintf(&name, "%s%.*s",
> +					       tnode->type.prefix, len, key);
> +				if (ret < 0)
> +					return -ENOMEM;
> +				break;
> +			}
> +		}
> +		if (!name)
> +			return -ENOENT;
> +
> +		if (!match_base_prefix(name, &prefix, &prefix_len)) {
> +			free(name);
> +			return -ENOENT;
> +		}
> +		key = name + prefix_len;
> +		len = strlen(key);
> +	}
> +
> +	hashbit = xxh32(key, len, EROFS_XATTR_FILTER_SEED + prefix) &
> +		  (EROFS_XATTR_FILTER_BITS - 1);
> +	if (name)
> +		free(name);
> +	return hashbit;
> +}
> +
> +static u32 erofs_xattr_filter_map(struct list_head *ixattrs)
> +{
> +	struct inode_xattr_node *node, *n;
> +	u32 name_filter;
> +	int hashbit;
> +
> +	name_filter = 0;
> +	list_for_each_entry_safe(node, n, ixattrs, list) {
> +		hashbit = erofs_xattr_filter_hashbit(node->item);
> +		if (hashbit < 0)

I'd suggest to clear feature bit instead:

erofs_sb_clear_xattr_filter(&sbi);

> +			return 0;
> +		name_filter |= (1UL << hashbit);
> +	}
> +	return EROFS_XATTR_FILTER_DEFAULT & ~name_filter;
> +}
> +
>   char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
>   {
>   	struct inode_xattr_node *node, *n;
> @@ -797,6 +855,11 @@ char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
>   	header = (struct erofs_xattr_ibody_header *)buf;
>   	header->h_shared_count = 0;
>   
> +	if (cfg.c_xattr_name_filter) {
> +		u32 name_filter = erofs_xattr_filter_map(ixattrs);

Leave a blank line here.

> +		header->h_name_filter = cpu_to_le32(name_filter);

Or
		header->h_name_filter =
			cpu_to_le32(erofs_xattr_filter_map(ixattrs));

Thanks,
Gao Xiang

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 1/3] erofs-utils: add xxh32 library
  2023-08-29 12:44   ` Gao Xiang
@ 2023-08-29 14:14     ` Jingbo Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 14:14 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs



On 8/29/23 8:44 PM, Gao Xiang wrote:
> 
> 
> On 2023/8/29 20:41, Jingbo Xu wrote:
>> Add xxh32 library which could be used by following xattr bloom filter
>> feature.
>>
>> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
>> ---
>>   include/erofs/xxhash.h |  72 ++++++++++++++++++++++++
>>   lib/Makefile.am        |   3 +-
>>   lib/xxhash.c           | 125 +++++++++++++++++++++++++++++++++++++++++
>>   3 files changed, 199 insertions(+), 1 deletion(-)
>>   create mode 100644 include/erofs/xxhash.h
>>   create mode 100644 lib/xxhash.c
>>
>> diff --git a/include/erofs/xxhash.h b/include/erofs/xxhash.h
>> new file mode 100644
>> index 0000000..5459bd8
>> --- /dev/null
>> +++ b/include/erofs/xxhash.h
>> @@ -0,0 +1,72 @@
>> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
> 
> You should use:
> /* SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only */
> 
> for each header, but I guess we could drop the copyright
> below since it's just a declaration?

Okay.

> 
>> +/*
>> + * The xxhash is copied from the linux kernel at:
>> + *   
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
>> + *
>> + * The original copyright is:
>> + *
>> + * xxHash - Extremely Fast Hash algorithm
>> + * Copyright (C) 2012-2016, Yann Collet.
>> + *
>> + * BSD 2-Clause License
>> (http://www.opensource.org/licenses/bsd-license.php)
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> are
>> + * met:
>> + *
>> + *   * Redistributions of source code must retain the above copyright
>> + *     notice, this list of conditions and the following disclaimer.
>> + *   * Redistributions in binary form must reproduce the above
>> + *     copyright notice, this list of conditions and the following
>> disclaimer
>> + *     in the documentation and/or other materials provided with the
>> + *     distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it under
>> + * the terms of the GNU General Public License version 2 as published
>> by the
>> + * Free Software Foundation. This program is dual-licensed; you may
>> select
>> + * either version 2 of the GNU General Public License ("GPL") or BSD
>> license
>> + * ("BSD").
>> + *
>> + * You can contact the author at:
>> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
>> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
>> + */
>> +
>> +#ifndef __EROFS_XXHASH_H
>> +#define __EROFS_XXHASH_H
>> +
>> +#ifdef __cplusplus
>> +extern "C"
>> +{
>> +#endif
>> +
>> +#include "defs.h"
> 
> #include <stdint.h> ?

Alright.

> 
>> +
>> +/**
>> + * xxh32() - calculate the 32-bit hash of the input with a given seed.
>> + *
>> + * @input:  The data to hash.
>> + * @length: The length of the data to hash.
>> + * @seed:   The seed can be used to alter the result predictably.
>> + *
>> + * Return:  The 32-bit hash of the data.
>> + */
>> +uint32_t xxh32(const void *input, size_t length, uint32_t seed);
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/lib/Makefile.am b/lib/Makefile.am
>> index 7a5dc03..3e09516 100644
>> --- a/lib/Makefile.am
>> +++ b/lib/Makefile.am
>> @@ -24,6 +24,7 @@ noinst_HEADERS = $(top_srcdir)/include/erofs_fs.h \
>>         $(top_srcdir)/include/erofs/xattr.h \
>>         $(top_srcdir)/include/erofs/compress_hints.h \
>>         $(top_srcdir)/include/erofs/fragments.h \
>> +      $(top_srcdir)/include/erofs/xxhash.h \
>>         $(top_srcdir)/lib/liberofs_private.h
>>     noinst_HEADERS += compressor.h
>> @@ -31,7 +32,7 @@ liberofs_la_SOURCES = config.c io.c cache.c super.c
>> inode.c xattr.c exclude.c \
>>                 namei.c data.c compress.c compressor.c zmap.c
>> decompress.c \
>>                 compress_hints.c hashmap.c sha256.c blobchunk.c dir.c \
>>                 fragments.c rb_tree.c dedupe.c uuid_unparse.c uuid.c
>> tar.c \
>> -              block_list.c
>> +              block_list.c xxhash.c
>>     liberofs_la_CFLAGS = -Wall ${libuuid_CFLAGS} -I$(top_srcdir)/include
>>   if ENABLE_LZ4
>> diff --git a/lib/xxhash.c b/lib/xxhash.c
>> new file mode 100644
>> index 0000000..3d77fe3
>> --- /dev/null
>> +++ b/lib/xxhash.c
>> @@ -0,0 +1,125 @@
>> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
>> +/*
>> + * The xxhash is copied from the linux kernel at:
>> + *   
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
>> + *
>> + * The original copyright is:
>> + *
>> + * xxHash - Extremely Fast Hash algorithm
>> + * Copyright (C) 2012-2016, Yann Collet.
>> + *
>> + * BSD 2-Clause License
>> (http://www.opensource.org/licenses/bsd-license.php)
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> are
>> + * met:
>> + *
>> + *   * Redistributions of source code must retain the above copyright
>> + *     notice, this list of conditions and the following disclaimer.
>> + *   * Redistributions in binary form must reproduce the above
>> + *     copyright notice, this list of conditions and the following
>> disclaimer
>> + *     in the documentation and/or other materials provided with the
>> + *     distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it under
>> + * the terms of the GNU General Public License version 2 as published
>> by the
>> + * Free Software Foundation. This program is dual-licensed; you may
>> select
>> + * either version 2 of the GNU General Public License ("GPL") or BSD
>> license
>> + * ("BSD").
>> + *
>> + * You can contact the author at:
>> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
>> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
>> + */
>>
> 
> #include "erofs/defs.h"

Alright.

> 
>> +#include "erofs/xxhash.h"
>> +
> 
> Thanks,
> Gao Xiang

-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter
  2023-08-29 12:49   ` Gao Xiang
@ 2023-08-29 14:19     ` Jingbo Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 14:19 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs



On 8/29/23 8:49 PM, Gao Xiang wrote:
> 
> 
> On 2023/8/29 20:41, Jingbo Xu wrote:
>> The xattr name bloom filter feature is going to be introduced to speed
>> up the negative xattr lookup, e.g. system.posix_acl_[access|default]
>> lookup when running "ls -lR" workload.
> 
> 
> I think you could just say
> "Let's keep in sync with kernel commit <12-char commit id>."
> 
> here.
> 
> ...
> 
>> ---
> 
> 
> ...
> 
>> +#define EROFS_XATTR_NAME_LEN_MAX    UCHAR_MAX
> 
> where is EROFS_XATTR_NAME_LEN_MAX used?

Indeed it's not used and I can't remember where it's used in the initial
implementation...

I will drop it later.

-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH v6 3/3] erofs-utils: mkfs: enable xattr name filter
  2023-08-29 12:53   ` Gao Xiang
@ 2023-08-29 14:20     ` Jingbo Xu
  0 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 14:20 UTC (permalink / raw)
  To: Gao Xiang, linux-erofs



On 8/29/23 8:53 PM, Gao Xiang wrote:
> 
> 
> On 2023/8/29 20:41, Jingbo Xu wrote:
>> Introduce "-Exattr-name-filter" option to enable the xattr name bloom
>> filter feature.
>>
>> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
>> ---
>>   include/erofs/config.h   |  1 +
>>   include/erofs/internal.h |  1 +
>>   lib/xattr.c              | 63 ++++++++++++++++++++++++++++++++++++++++
>>   mkfs/main.c              |  7 +++++
>>   4 files changed, 72 insertions(+)
>>
>> diff --git a/include/erofs/config.h b/include/erofs/config.h
>> index 8f52d2c..c51f0cd 100644
>> --- a/include/erofs/config.h
>> +++ b/include/erofs/config.h
>> @@ -53,6 +53,7 @@ struct erofs_configure {
>>       bool c_ignore_mtime;
>>       bool c_showprogress;
>>       bool c_extra_ea_name_prefixes;
>> +    bool c_xattr_name_filter;
>>     #ifdef HAVE_LIBSELINUX
>>       struct selabel_handle *sehnd;
>> diff --git a/include/erofs/internal.h b/include/erofs/internal.h
>> index 3e73eef..382024a 100644
>> --- a/include/erofs/internal.h
>> +++ b/include/erofs/internal.h
>> @@ -139,6 +139,7 @@ EROFS_FEATURE_FUNCS(fragments, incompat,
>> INCOMPAT_FRAGMENTS)
>>   EROFS_FEATURE_FUNCS(dedupe, incompat, INCOMPAT_DEDUPE)
>>   EROFS_FEATURE_FUNCS(xattr_prefixes, incompat, INCOMPAT_XATTR_PREFIXES)
>>   EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
>> +EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>>     #define EROFS_I_EA_INITED    (1 << 0)
>>   #define EROFS_I_Z_INITED    (1 << 1)
>> diff --git a/lib/xattr.c b/lib/xattr.c
>> index 46a301a..325241d 100644
>> --- a/lib/xattr.c
>> +++ b/lib/xattr.c
>> @@ -18,6 +18,7 @@
>>   #include "erofs/cache.h"
>>   #include "erofs/io.h"
>>   #include "erofs/fragments.h"
>> +#include "erofs/xxhash.h"
>>   #include "liberofs_private.h"
>>     #define EA_HASHTABLE_BITS 16
>> @@ -783,6 +784,63 @@ out:
>>       return ret;
>>   }
>>   +
>> +static int erofs_xattr_filter_hashbit(struct xattr_item *item)
>> +{
>> +    u8 prefix = item->prefix;
>> +    const char *key = item->kvbuf;
>> +    unsigned int len = item->len[0];
>> +    char *name = NULL;
>> +    uint32_t hashbit;
>> +
>> +    if (prefix & EROFS_XATTR_LONG_PREFIX) {
>> +        struct ea_type_node *tnode;
>> +        u16 prefix_len;
>> +        int ret;
>> +
>> +        list_for_each_entry(tnode, &ea_name_prefixes, list) {
>> +            if (tnode->index == item->prefix) {
>> +                ret = asprintf(&name, "%s%.*s",
>> +                           tnode->type.prefix, len, key);
>> +                if (ret < 0)
>> +                    return -ENOMEM;
>> +                break;
>> +            }
>> +        }
>> +        if (!name)
>> +            return -ENOENT;
>> +
>> +        if (!match_base_prefix(name, &prefix, &prefix_len)) {
>> +            free(name);
>> +            return -ENOENT;
>> +        }
>> +        key = name + prefix_len;
>> +        len = strlen(key);
>> +    }
>> +
>> +    hashbit = xxh32(key, len, EROFS_XATTR_FILTER_SEED + prefix) &
>> +          (EROFS_XATTR_FILTER_BITS - 1);
>> +    if (name)
>> +        free(name);
>> +    return hashbit;
>> +}
>> +
>> +static u32 erofs_xattr_filter_map(struct list_head *ixattrs)
>> +{
>> +    struct inode_xattr_node *node, *n;
>> +    u32 name_filter;
>> +    int hashbit;
>> +
>> +    name_filter = 0;
>> +    list_for_each_entry_safe(node, n, ixattrs, list) {
>> +        hashbit = erofs_xattr_filter_hashbit(node->item);
>> +        if (hashbit < 0)
> 
> I'd suggest to clear feature bit instead:
> 
> erofs_sb_clear_xattr_filter(&sbi);

Good idea.


> 
>> +            return 0;
>> +        name_filter |= (1UL << hashbit);
>> +    }
>> +    return EROFS_XATTR_FILTER_DEFAULT & ~name_filter;
>> +}
>> +
>>   char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned
>> int size)
>>   {
>>       struct inode_xattr_node *node, *n;
>> @@ -797,6 +855,11 @@ char *erofs_export_xattr_ibody(struct list_head
>> *ixattrs, unsigned int size)
>>       header = (struct erofs_xattr_ibody_header *)buf;
>>       header->h_shared_count = 0;
>>   +    if (cfg.c_xattr_name_filter) {
>> +        u32 name_filter = erofs_xattr_filter_map(ixattrs);
> 
> Leave a blank line here.
> 
>> +        header->h_name_filter = cpu_to_le32(name_filter);
> 
> Or
>         header->h_name_filter =
>             cpu_to_le32(erofs_xattr_filter_map(ixattrs));
> 

Okay.  I would prefer the latter.

-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-08-29 14:20 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
2023-08-29 12:44   ` Gao Xiang
2023-08-29 14:14     ` Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
2023-08-29 12:49   ` Gao Xiang
2023-08-29 14:19     ` Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
2023-08-29 12:53   ` Gao Xiang
2023-08-29 14:20     ` Jingbo Xu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.