* [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter
@ 2023-08-29 12:41 Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
To: hsiangkao, linux-erofs
changes since v5:
- patch 1: update distribution license
- patch 3: rebase to the latest -dev branch
changes since v4:
- patch 3: the feature is refactored and implemented in a more cohesive
way, that is, the hashbit is calculated in erofs_export_xattr_ibody().
For xattr_item in the long xattr name prefix style, reconstruct the
full xattr name first, then derive the corresponding predefined short
name prefix and finally calculate the hashbit.
changes since v3:
- patch 3: "-Exattr-name-filter" option rather than "--xattr-filter"
option is newly introduced to enable this feature (Gao Xiang)
changes since v2:
- patch 2: introduce xattr_filter_reserved in on-disk superblock;
remove EROFS_XATTR_FILTER_MASK
- patch 3: xattr_filter_reserved is always initialized to 0 by default
changes since RFC:
- the number of hash functions is 1, and now it's implemented as:
xxh32(name, strlen(name), EROFS_XATTR_FILTER_SEED + index),
where the constant magic number EROFS_XATTR_FILTER_SEED [*] is used to
give a better spread for the mapping. (Alexander Larsson)
- fix the value of EROFS_FEATURE_COMPAT_XATTR_FILTER; rename
EROFS_XATTR_BLOOM_* to EROFS_XATTR_FILTER_* (Gao Xiang)
RFC: https://lore.kernel.org/all/20230621083939.128961-1-jefflexu@linux.alibaba.com/
v2: https://lore.kernel.org/all/20230705071017.104130-1-jefflexu@linux.alibaba.com/
v3: https://lore.kernel.org/all/20230712121331.99671-1-jefflexu@linux.alibaba.com/
v4: https://lore.kernel.org/all/20230714025330.42950-1-jefflexu@linux.alibaba.com/
v5: https://lore.kernel.org/all/20230722042414.126630-1-jefflexu@linux.alibaba.com/
The xattr bloom filter feature is used to boost the negative xattr
lookup.
Refer to the kernel patch set [*] for more details.
[*] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fd73a4395d47
Jingbo Xu (3):
erofs-utils: add xxh32 library
erofs-utils: update on-disk format for xattr name filter
erofs-utils: mkfs: enable xattr name filter
include/erofs/config.h | 1 +
include/erofs/internal.h | 1 +
include/erofs/xxhash.h | 72 ++++++++++++++++++++++
include/erofs_fs.h | 12 +++-
lib/Makefile.am | 3 +-
lib/xattr.c | 63 ++++++++++++++++++++
lib/xxhash.c | 125 +++++++++++++++++++++++++++++++++++++++
mkfs/main.c | 7 +++
8 files changed, 281 insertions(+), 3 deletions(-)
create mode 100644 include/erofs/xxhash.h
create mode 100644 lib/xxhash.c
--
2.19.1.6.gb485710b
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH v6 1/3] erofs-utils: add xxh32 library
2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
@ 2023-08-29 12:41 ` Jingbo Xu
2023-08-29 12:44 ` Gao Xiang
2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
2 siblings, 1 reply; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
To: hsiangkao, linux-erofs
Add xxh32 library which could be used by following xattr bloom filter
feature.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
include/erofs/xxhash.h | 72 ++++++++++++++++++++++++
lib/Makefile.am | 3 +-
lib/xxhash.c | 125 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 199 insertions(+), 1 deletion(-)
create mode 100644 include/erofs/xxhash.h
create mode 100644 lib/xxhash.c
diff --git a/include/erofs/xxhash.h b/include/erofs/xxhash.h
new file mode 100644
index 0000000..5459bd8
--- /dev/null
+++ b/include/erofs/xxhash.h
@@ -0,0 +1,72 @@
+// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
+/*
+ * The xxhash is copied from the linux kernel at:
+ * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
+ *
+ * The original copyright is:
+ *
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: https://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+#ifndef __EROFS_XXHASH_H
+#define __EROFS_XXHASH_H
+
+#ifdef __cplusplus
+extern "C"
+{
+#endif
+
+#include "defs.h"
+
+/**
+ * xxh32() - calculate the 32-bit hash of the input with a given seed.
+ *
+ * @input: The data to hash.
+ * @length: The length of the data to hash.
+ * @seed: The seed can be used to alter the result predictably.
+ *
+ * Return: The 32-bit hash of the data.
+ */
+uint32_t xxh32(const void *input, size_t length, uint32_t seed);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
diff --git a/lib/Makefile.am b/lib/Makefile.am
index 7a5dc03..3e09516 100644
--- a/lib/Makefile.am
+++ b/lib/Makefile.am
@@ -24,6 +24,7 @@ noinst_HEADERS = $(top_srcdir)/include/erofs_fs.h \
$(top_srcdir)/include/erofs/xattr.h \
$(top_srcdir)/include/erofs/compress_hints.h \
$(top_srcdir)/include/erofs/fragments.h \
+ $(top_srcdir)/include/erofs/xxhash.h \
$(top_srcdir)/lib/liberofs_private.h
noinst_HEADERS += compressor.h
@@ -31,7 +32,7 @@ liberofs_la_SOURCES = config.c io.c cache.c super.c inode.c xattr.c exclude.c \
namei.c data.c compress.c compressor.c zmap.c decompress.c \
compress_hints.c hashmap.c sha256.c blobchunk.c dir.c \
fragments.c rb_tree.c dedupe.c uuid_unparse.c uuid.c tar.c \
- block_list.c
+ block_list.c xxhash.c
liberofs_la_CFLAGS = -Wall ${libuuid_CFLAGS} -I$(top_srcdir)/include
if ENABLE_LZ4
diff --git a/lib/xxhash.c b/lib/xxhash.c
new file mode 100644
index 0000000..3d77fe3
--- /dev/null
+++ b/lib/xxhash.c
@@ -0,0 +1,125 @@
+// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
+/*
+ * The xxhash is copied from the linux kernel at:
+ * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
+ *
+ * The original copyright is:
+ *
+ * xxHash - Extremely Fast Hash algorithm
+ * Copyright (C) 2012-2016, Yann Collet.
+ *
+ * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions are
+ * met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following disclaimer
+ * in the documentation and/or other materials provided with the
+ * distribution.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * This program is free software; you can redistribute it and/or modify it under
+ * the terms of the GNU General Public License version 2 as published by the
+ * Free Software Foundation. This program is dual-licensed; you may select
+ * either version 2 of the GNU General Public License ("GPL") or BSD license
+ * ("BSD").
+ *
+ * You can contact the author at:
+ * - xxHash homepage: https://cyan4973.github.io/xxHash/
+ * - xxHash source repository: https://github.com/Cyan4973/xxHash
+ */
+
+#include "erofs/xxhash.h"
+
+/*-*************************************
+ * Macros
+ **************************************/
+#define xxh_rotl32(x, r) ((x << r) | (x >> (32 - r)))
+
+/*-*************************************
+ * Constants
+ **************************************/
+static const uint32_t PRIME32_1 = 2654435761U;
+static const uint32_t PRIME32_2 = 2246822519U;
+static const uint32_t PRIME32_3 = 3266489917U;
+static const uint32_t PRIME32_4 = 668265263U;
+static const uint32_t PRIME32_5 = 374761393U;
+
+/*-***************************
+ * Simple Hash Functions
+ ****************************/
+static uint32_t xxh32_round(uint32_t seed, const uint32_t input)
+{
+ seed += input * PRIME32_2;
+ seed = xxh_rotl32(seed, 13);
+ seed *= PRIME32_1;
+ return seed;
+}
+
+uint32_t xxh32(const void *input, const size_t len, const uint32_t seed)
+{
+ const uint8_t *p = (const uint8_t *)input;
+ const uint8_t *b_end = p + len;
+ uint32_t h32;
+
+ if (len >= 16) {
+ const uint8_t *const limit = b_end - 16;
+ uint32_t v1 = seed + PRIME32_1 + PRIME32_2;
+ uint32_t v2 = seed + PRIME32_2;
+ uint32_t v3 = seed + 0;
+ uint32_t v4 = seed - PRIME32_1;
+
+ do {
+ v1 = xxh32_round(v1, get_unaligned_le32(p));
+ p += 4;
+ v2 = xxh32_round(v2, get_unaligned_le32(p));
+ p += 4;
+ v3 = xxh32_round(v3, get_unaligned_le32(p));
+ p += 4;
+ v4 = xxh32_round(v4, get_unaligned_le32(p));
+ p += 4;
+ } while (p <= limit);
+
+ h32 = xxh_rotl32(v1, 1) + xxh_rotl32(v2, 7) +
+ xxh_rotl32(v3, 12) + xxh_rotl32(v4, 18);
+ } else {
+ h32 = seed + PRIME32_5;
+ }
+
+ h32 += (uint32_t)len;
+
+ while (p + 4 <= b_end) {
+ h32 += get_unaligned_le32(p) * PRIME32_3;
+ h32 = xxh_rotl32(h32, 17) * PRIME32_4;
+ p += 4;
+ }
+
+ while (p < b_end) {
+ h32 += (*p) * PRIME32_5;
+ h32 = xxh_rotl32(h32, 11) * PRIME32_1;
+ p++;
+ }
+
+ h32 ^= h32 >> 15;
+ h32 *= PRIME32_2;
+ h32 ^= h32 >> 13;
+ h32 *= PRIME32_3;
+ h32 ^= h32 >> 16;
+
+ return h32;
+}
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter
2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
@ 2023-08-29 12:41 ` Jingbo Xu
2023-08-29 12:49 ` Gao Xiang
2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
2 siblings, 1 reply; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
To: hsiangkao, linux-erofs
The xattr name bloom filter feature is going to be introduced to speed
up the negative xattr lookup, e.g. system.posix_acl_[access|default]
lookup when running "ls -lR" workload.
There are some commonly used extended attributes (n) and the total
number of these is approximately 30.
trusted.overlay.opaque
trusted.overlay.redirect
trusted.overlay.origin
trusted.overlay.impure
trusted.overlay.nlink
trusted.overlay.upper
trusted.overlay.metacopy
trusted.overlay.protattr
user.overlay.opaque
user.overlay.redirect
user.overlay.origin
user.overlay.impure
user.overlay.nlink
user.overlay.upper
user.overlay.metacopy
user.overlay.protattr
security.evm
security.ima
security.selinux
security.SMACK64
security.SMACK64IPIN
security.SMACK64IPOUT
security.SMACK64EXEC
security.SMACK64TRANSMUTE
security.SMACK64MMAP
security.apparmor
security.capability
system.posix_acl_access
system.posix_acl_default
user.mime_type
Given the number of bits of the bloom filter (m) is 32, the optimal
value for the number of the hash functions (k) is 1 (ln2 * m/n = 0.74).
The single hash function is implemented as:
xxh32(name, strlen(name), EROFS_XATTR_FILTER_SEED + index)
where `index` represents the index of corresponding predefined short name
prefix, while `name` represents the name string after stripping the above
predefined name prefix.
The constant magic number EROFS_XATTR_FILTER_SEED, i.e. 0x25BBE08F, is
used to give a better spread when mapping these 30 extended attributes
into 32-bit bloom filter as:
bit 0: security.ima
bit 1:
bit 2: trusted.overlay.nlink
bit 3:
bit 4: user.overlay.nlink
bit 5: trusted.overlay.upper
bit 6: user.overlay.origin
bit 7: trusted.overlay.protattr
bit 8: security.apparmor
bit 9: user.overlay.protattr
bit 10: user.overlay.opaque
bit 11: security.selinux
bit 12: security.SMACK64TRANSMUTE
bit 13: security.SMACK64
bit 14: security.SMACK64MMAP
bit 15: user.overlay.impure
bit 16: security.SMACK64IPIN
bit 17: trusted.overlay.redirect
bit 18: trusted.overlay.origin
bit 19: security.SMACK64IPOUT
bit 20: trusted.overlay.opaque
bit 21: system.posix_acl_default
bit 22:
bit 23: user.mime_type
bit 24: trusted.overlay.impure
bit 25: security.SMACK64EXEC
bit 26: user.overlay.redirect
bit 27: user.overlay.upper
bit 28: security.evm
bit 29: security.capability
bit 30: system.posix_acl_access
bit 31: trusted.overlay.metacopy, user.overlay.metacopy
h_name_filter is introduced to the on-disk per-inode xattr header to
place the corresponding xattr name filter, where bit value 1 indicates
non-existence for compatibility.
This feature is indicated by EROFS_FEATURE_COMPAT_XATTR_FILTER
compatible feature bit.
Reserve one byte in on-disk superblock as the on-disk format for xattr
name filter may change in the future. With this flag we don't need
bothering these compatible bits again at that time.
Suggested-by: Alexander Larsson <alexl@redhat.com>
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
include/erofs_fs.h | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/include/erofs_fs.h b/include/erofs_fs.h
index 850438a..2a111a4 100644
--- a/include/erofs_fs.h
+++ b/include/erofs_fs.h
@@ -14,6 +14,7 @@
#define EROFS_FEATURE_COMPAT_SB_CHKSUM 0x00000001
#define EROFS_FEATURE_COMPAT_MTIME 0x00000002
+#define EROFS_FEATURE_COMPAT_XATTR_FILTER 0x00000004
/*
* Any bits that aren't in EROFS_ALL_FEATURE_INCOMPAT should
@@ -82,7 +83,8 @@ struct erofs_super_block {
__u8 xattr_prefix_count; /* # of long xattr name prefixes */
__le32 xattr_prefix_start; /* start of long xattr prefixes */
__le64 packed_nid; /* nid of the special packed inode */
- __u8 reserved2[24];
+ __u8 xattr_filter_reserved; /* reserved for xattr name filter */
+ __u8 reserved2[23];
};
/*
@@ -201,7 +203,7 @@ struct erofs_inode_extended {
* for read-only fs, no need to introduce h_refcount
*/
struct erofs_xattr_ibody_header {
- __le32 h_reserved;
+ __le32 h_name_filter; /* bit value 1 indicates not-present */
__u8 h_shared_count;
__u8 h_reserved2[7];
__le32 h_shared_xattrs[0]; /* shared xattr id array */
@@ -222,6 +224,12 @@ struct erofs_xattr_ibody_header {
#define EROFS_XATTR_LONG_PREFIX 0x80
#define EROFS_XATTR_LONG_PREFIX_MASK 0x7f
+#define EROFS_XATTR_NAME_LEN_MAX UCHAR_MAX
+
+#define EROFS_XATTR_FILTER_BITS 32
+#define EROFS_XATTR_FILTER_DEFAULT UINT32_MAX
+#define EROFS_XATTR_FILTER_SEED 0x25BBE08F
+
/* xattr entry (for both inline & shared xattrs) */
struct erofs_xattr_entry {
__u8 e_name_len; /* length of name */
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH v6 3/3] erofs-utils: mkfs: enable xattr name filter
2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
@ 2023-08-29 12:41 ` Jingbo Xu
2023-08-29 12:53 ` Gao Xiang
2 siblings, 1 reply; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 12:41 UTC (permalink / raw)
To: hsiangkao, linux-erofs
Introduce "-Exattr-name-filter" option to enable the xattr name bloom
filter feature.
Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
---
include/erofs/config.h | 1 +
include/erofs/internal.h | 1 +
lib/xattr.c | 63 ++++++++++++++++++++++++++++++++++++++++
mkfs/main.c | 7 +++++
4 files changed, 72 insertions(+)
diff --git a/include/erofs/config.h b/include/erofs/config.h
index 8f52d2c..c51f0cd 100644
--- a/include/erofs/config.h
+++ b/include/erofs/config.h
@@ -53,6 +53,7 @@ struct erofs_configure {
bool c_ignore_mtime;
bool c_showprogress;
bool c_extra_ea_name_prefixes;
+ bool c_xattr_name_filter;
#ifdef HAVE_LIBSELINUX
struct selabel_handle *sehnd;
diff --git a/include/erofs/internal.h b/include/erofs/internal.h
index 3e73eef..382024a 100644
--- a/include/erofs/internal.h
+++ b/include/erofs/internal.h
@@ -139,6 +139,7 @@ EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
EROFS_FEATURE_FUNCS(dedupe, incompat, INCOMPAT_DEDUPE)
EROFS_FEATURE_FUNCS(xattr_prefixes, incompat, INCOMPAT_XATTR_PREFIXES)
EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
+EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
#define EROFS_I_EA_INITED (1 << 0)
#define EROFS_I_Z_INITED (1 << 1)
diff --git a/lib/xattr.c b/lib/xattr.c
index 46a301a..325241d 100644
--- a/lib/xattr.c
+++ b/lib/xattr.c
@@ -18,6 +18,7 @@
#include "erofs/cache.h"
#include "erofs/io.h"
#include "erofs/fragments.h"
+#include "erofs/xxhash.h"
#include "liberofs_private.h"
#define EA_HASHTABLE_BITS 16
@@ -783,6 +784,63 @@ out:
return ret;
}
+
+static int erofs_xattr_filter_hashbit(struct xattr_item *item)
+{
+ u8 prefix = item->prefix;
+ const char *key = item->kvbuf;
+ unsigned int len = item->len[0];
+ char *name = NULL;
+ uint32_t hashbit;
+
+ if (prefix & EROFS_XATTR_LONG_PREFIX) {
+ struct ea_type_node *tnode;
+ u16 prefix_len;
+ int ret;
+
+ list_for_each_entry(tnode, &ea_name_prefixes, list) {
+ if (tnode->index == item->prefix) {
+ ret = asprintf(&name, "%s%.*s",
+ tnode->type.prefix, len, key);
+ if (ret < 0)
+ return -ENOMEM;
+ break;
+ }
+ }
+ if (!name)
+ return -ENOENT;
+
+ if (!match_base_prefix(name, &prefix, &prefix_len)) {
+ free(name);
+ return -ENOENT;
+ }
+ key = name + prefix_len;
+ len = strlen(key);
+ }
+
+ hashbit = xxh32(key, len, EROFS_XATTR_FILTER_SEED + prefix) &
+ (EROFS_XATTR_FILTER_BITS - 1);
+ if (name)
+ free(name);
+ return hashbit;
+}
+
+static u32 erofs_xattr_filter_map(struct list_head *ixattrs)
+{
+ struct inode_xattr_node *node, *n;
+ u32 name_filter;
+ int hashbit;
+
+ name_filter = 0;
+ list_for_each_entry_safe(node, n, ixattrs, list) {
+ hashbit = erofs_xattr_filter_hashbit(node->item);
+ if (hashbit < 0)
+ return 0;
+ name_filter |= (1UL << hashbit);
+ }
+ return EROFS_XATTR_FILTER_DEFAULT & ~name_filter;
+}
+
char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
{
struct inode_xattr_node *node, *n;
@@ -797,6 +855,11 @@ char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
header = (struct erofs_xattr_ibody_header *)buf;
header->h_shared_count = 0;
+ if (cfg.c_xattr_name_filter) {
+ u32 name_filter = erofs_xattr_filter_map(ixattrs);
+ header->h_name_filter = cpu_to_le32(name_filter);
+ }
+
p = sizeof(struct erofs_xattr_ibody_header);
list_for_each_entry_safe(node, n, ixattrs, list) {
struct xattr_item *const item = node->item;
diff --git a/mkfs/main.c b/mkfs/main.c
index c03a7a8..fad80b1 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -245,6 +245,13 @@ handle_fragment:
return -EINVAL;
cfg.c_dedupe = true;
}
+
+ if (MATCH_EXTENTED_OPT("xattr-name-filter", token, keylen)) {
+ if (vallen)
+ return -EINVAL;
+ cfg.c_xattr_name_filter = true;
+ erofs_sb_set_xattr_filter(&sbi);
+ }
}
return 0;
}
--
2.19.1.6.gb485710b
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH v6 1/3] erofs-utils: add xxh32 library
2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
@ 2023-08-29 12:44 ` Gao Xiang
2023-08-29 14:14 ` Jingbo Xu
0 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2023-08-29 12:44 UTC (permalink / raw)
To: Jingbo Xu, linux-erofs
On 2023/8/29 20:41, Jingbo Xu wrote:
> Add xxh32 library which could be used by following xattr bloom filter
> feature.
>
> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
> ---
> include/erofs/xxhash.h | 72 ++++++++++++++++++++++++
> lib/Makefile.am | 3 +-
> lib/xxhash.c | 125 +++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 199 insertions(+), 1 deletion(-)
> create mode 100644 include/erofs/xxhash.h
> create mode 100644 lib/xxhash.c
>
> diff --git a/include/erofs/xxhash.h b/include/erofs/xxhash.h
> new file mode 100644
> index 0000000..5459bd8
> --- /dev/null
> +++ b/include/erofs/xxhash.h
> @@ -0,0 +1,72 @@
> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
You should use:
/* SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only */
for each header, but I guess we could drop the copyright
below since it's just a declaration?
> +/*
> + * The xxhash is copied from the linux kernel at:
> + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
> + *
> + * The original copyright is:
> + *
> + * xxHash - Extremely Fast Hash algorithm
> + * Copyright (C) 2012-2016, Yann Collet.
> + *
> + * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are
> + * met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following disclaimer
> + * in the documentation and/or other materials provided with the
> + * distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * This program is free software; you can redistribute it and/or modify it under
> + * the terms of the GNU General Public License version 2 as published by the
> + * Free Software Foundation. This program is dual-licensed; you may select
> + * either version 2 of the GNU General Public License ("GPL") or BSD license
> + * ("BSD").
> + *
> + * You can contact the author at:
> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
> + */
> +
> +#ifndef __EROFS_XXHASH_H
> +#define __EROFS_XXHASH_H
> +
> +#ifdef __cplusplus
> +extern "C"
> +{
> +#endif
> +
> +#include "defs.h"
#include <stdint.h> ?
> +
> +/**
> + * xxh32() - calculate the 32-bit hash of the input with a given seed.
> + *
> + * @input: The data to hash.
> + * @length: The length of the data to hash.
> + * @seed: The seed can be used to alter the result predictably.
> + *
> + * Return: The 32-bit hash of the data.
> + */
> +uint32_t xxh32(const void *input, size_t length, uint32_t seed);
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif
> diff --git a/lib/Makefile.am b/lib/Makefile.am
> index 7a5dc03..3e09516 100644
> --- a/lib/Makefile.am
> +++ b/lib/Makefile.am
> @@ -24,6 +24,7 @@ noinst_HEADERS = $(top_srcdir)/include/erofs_fs.h \
> $(top_srcdir)/include/erofs/xattr.h \
> $(top_srcdir)/include/erofs/compress_hints.h \
> $(top_srcdir)/include/erofs/fragments.h \
> + $(top_srcdir)/include/erofs/xxhash.h \
> $(top_srcdir)/lib/liberofs_private.h
>
> noinst_HEADERS += compressor.h
> @@ -31,7 +32,7 @@ liberofs_la_SOURCES = config.c io.c cache.c super.c inode.c xattr.c exclude.c \
> namei.c data.c compress.c compressor.c zmap.c decompress.c \
> compress_hints.c hashmap.c sha256.c blobchunk.c dir.c \
> fragments.c rb_tree.c dedupe.c uuid_unparse.c uuid.c tar.c \
> - block_list.c
> + block_list.c xxhash.c
>
> liberofs_la_CFLAGS = -Wall ${libuuid_CFLAGS} -I$(top_srcdir)/include
> if ENABLE_LZ4
> diff --git a/lib/xxhash.c b/lib/xxhash.c
> new file mode 100644
> index 0000000..3d77fe3
> --- /dev/null
> +++ b/lib/xxhash.c
> @@ -0,0 +1,125 @@
> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
> +/*
> + * The xxhash is copied from the linux kernel at:
> + * https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
> + *
> + * The original copyright is:
> + *
> + * xxHash - Extremely Fast Hash algorithm
> + * Copyright (C) 2012-2016, Yann Collet.
> + *
> + * BSD 2-Clause License (http://www.opensource.org/licenses/bsd-license.php)
> + *
> + * Redistribution and use in source and binary forms, with or without
> + * modification, are permitted provided that the following conditions are
> + * met:
> + *
> + * * Redistributions of source code must retain the above copyright
> + * notice, this list of conditions and the following disclaimer.
> + * * Redistributions in binary form must reproduce the above
> + * copyright notice, this list of conditions and the following disclaimer
> + * in the documentation and/or other materials provided with the
> + * distribution.
> + *
> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> + *
> + * This program is free software; you can redistribute it and/or modify it under
> + * the terms of the GNU General Public License version 2 as published by the
> + * Free Software Foundation. This program is dual-licensed; you may select
> + * either version 2 of the GNU General Public License ("GPL") or BSD license
> + * ("BSD").
> + *
> + * You can contact the author at:
> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
> + */
>
#include "erofs/defs.h"
> +#include "erofs/xxhash.h"
> +
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter
2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
@ 2023-08-29 12:49 ` Gao Xiang
2023-08-29 14:19 ` Jingbo Xu
0 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2023-08-29 12:49 UTC (permalink / raw)
To: Jingbo Xu, linux-erofs
On 2023/8/29 20:41, Jingbo Xu wrote:
> The xattr name bloom filter feature is going to be introduced to speed
> up the negative xattr lookup, e.g. system.posix_acl_[access|default]
> lookup when running "ls -lR" workload.
I think you could just say
"Let's keep in sync with kernel commit <12-char commit id>."
here.
...
> ---
...
> +#define EROFS_XATTR_NAME_LEN_MAX UCHAR_MAX
where is EROFS_XATTR_NAME_LEN_MAX used?
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] erofs-utils: mkfs: enable xattr name filter
2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
@ 2023-08-29 12:53 ` Gao Xiang
2023-08-29 14:20 ` Jingbo Xu
0 siblings, 1 reply; 10+ messages in thread
From: Gao Xiang @ 2023-08-29 12:53 UTC (permalink / raw)
To: Jingbo Xu, linux-erofs
On 2023/8/29 20:41, Jingbo Xu wrote:
> Introduce "-Exattr-name-filter" option to enable the xattr name bloom
> filter feature.
>
> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
> ---
> include/erofs/config.h | 1 +
> include/erofs/internal.h | 1 +
> lib/xattr.c | 63 ++++++++++++++++++++++++++++++++++++++++
> mkfs/main.c | 7 +++++
> 4 files changed, 72 insertions(+)
>
> diff --git a/include/erofs/config.h b/include/erofs/config.h
> index 8f52d2c..c51f0cd 100644
> --- a/include/erofs/config.h
> +++ b/include/erofs/config.h
> @@ -53,6 +53,7 @@ struct erofs_configure {
> bool c_ignore_mtime;
> bool c_showprogress;
> bool c_extra_ea_name_prefixes;
> + bool c_xattr_name_filter;
>
> #ifdef HAVE_LIBSELINUX
> struct selabel_handle *sehnd;
> diff --git a/include/erofs/internal.h b/include/erofs/internal.h
> index 3e73eef..382024a 100644
> --- a/include/erofs/internal.h
> +++ b/include/erofs/internal.h
> @@ -139,6 +139,7 @@ EROFS_FEATURE_FUNCS(fragments, incompat, INCOMPAT_FRAGMENTS)
> EROFS_FEATURE_FUNCS(dedupe, incompat, INCOMPAT_DEDUPE)
> EROFS_FEATURE_FUNCS(xattr_prefixes, incompat, INCOMPAT_XATTR_PREFIXES)
> EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
> +EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>
> #define EROFS_I_EA_INITED (1 << 0)
> #define EROFS_I_Z_INITED (1 << 1)
> diff --git a/lib/xattr.c b/lib/xattr.c
> index 46a301a..325241d 100644
> --- a/lib/xattr.c
> +++ b/lib/xattr.c
> @@ -18,6 +18,7 @@
> #include "erofs/cache.h"
> #include "erofs/io.h"
> #include "erofs/fragments.h"
> +#include "erofs/xxhash.h"
> #include "liberofs_private.h"
>
> #define EA_HASHTABLE_BITS 16
> @@ -783,6 +784,63 @@ out:
> return ret;
> }
>
> +
> +static int erofs_xattr_filter_hashbit(struct xattr_item *item)
> +{
> + u8 prefix = item->prefix;
> + const char *key = item->kvbuf;
> + unsigned int len = item->len[0];
> + char *name = NULL;
> + uint32_t hashbit;
> +
> + if (prefix & EROFS_XATTR_LONG_PREFIX) {
> + struct ea_type_node *tnode;
> + u16 prefix_len;
> + int ret;
> +
> + list_for_each_entry(tnode, &ea_name_prefixes, list) {
> + if (tnode->index == item->prefix) {
> + ret = asprintf(&name, "%s%.*s",
> + tnode->type.prefix, len, key);
> + if (ret < 0)
> + return -ENOMEM;
> + break;
> + }
> + }
> + if (!name)
> + return -ENOENT;
> +
> + if (!match_base_prefix(name, &prefix, &prefix_len)) {
> + free(name);
> + return -ENOENT;
> + }
> + key = name + prefix_len;
> + len = strlen(key);
> + }
> +
> + hashbit = xxh32(key, len, EROFS_XATTR_FILTER_SEED + prefix) &
> + (EROFS_XATTR_FILTER_BITS - 1);
> + if (name)
> + free(name);
> + return hashbit;
> +}
> +
> +static u32 erofs_xattr_filter_map(struct list_head *ixattrs)
> +{
> + struct inode_xattr_node *node, *n;
> + u32 name_filter;
> + int hashbit;
> +
> + name_filter = 0;
> + list_for_each_entry_safe(node, n, ixattrs, list) {
> + hashbit = erofs_xattr_filter_hashbit(node->item);
> + if (hashbit < 0)
I'd suggest to clear feature bit instead:
erofs_sb_clear_xattr_filter(&sbi);
> + return 0;
> + name_filter |= (1UL << hashbit);
> + }
> + return EROFS_XATTR_FILTER_DEFAULT & ~name_filter;
> +}
> +
> char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
> {
> struct inode_xattr_node *node, *n;
> @@ -797,6 +855,11 @@ char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned int size)
> header = (struct erofs_xattr_ibody_header *)buf;
> header->h_shared_count = 0;
>
> + if (cfg.c_xattr_name_filter) {
> + u32 name_filter = erofs_xattr_filter_map(ixattrs);
Leave a blank line here.
> + header->h_name_filter = cpu_to_le32(name_filter);
Or
header->h_name_filter =
cpu_to_le32(erofs_xattr_filter_map(ixattrs));
Thanks,
Gao Xiang
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 1/3] erofs-utils: add xxh32 library
2023-08-29 12:44 ` Gao Xiang
@ 2023-08-29 14:14 ` Jingbo Xu
0 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 14:14 UTC (permalink / raw)
To: Gao Xiang, linux-erofs
On 8/29/23 8:44 PM, Gao Xiang wrote:
>
>
> On 2023/8/29 20:41, Jingbo Xu wrote:
>> Add xxh32 library which could be used by following xattr bloom filter
>> feature.
>>
>> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
>> ---
>> include/erofs/xxhash.h | 72 ++++++++++++++++++++++++
>> lib/Makefile.am | 3 +-
>> lib/xxhash.c | 125 +++++++++++++++++++++++++++++++++++++++++
>> 3 files changed, 199 insertions(+), 1 deletion(-)
>> create mode 100644 include/erofs/xxhash.h
>> create mode 100644 lib/xxhash.c
>>
>> diff --git a/include/erofs/xxhash.h b/include/erofs/xxhash.h
>> new file mode 100644
>> index 0000000..5459bd8
>> --- /dev/null
>> +++ b/include/erofs/xxhash.h
>> @@ -0,0 +1,72 @@
>> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
>
> You should use:
> /* SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only */
>
> for each header, but I guess we could drop the copyright
> below since it's just a declaration?
Okay.
>
>> +/*
>> + * The xxhash is copied from the linux kernel at:
>> + *
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
>> + *
>> + * The original copyright is:
>> + *
>> + * xxHash - Extremely Fast Hash algorithm
>> + * Copyright (C) 2012-2016, Yann Collet.
>> + *
>> + * BSD 2-Clause License
>> (http://www.opensource.org/licenses/bsd-license.php)
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> are
>> + * met:
>> + *
>> + * * Redistributions of source code must retain the above copyright
>> + * notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above
>> + * copyright notice, this list of conditions and the following
>> disclaimer
>> + * in the documentation and/or other materials provided with the
>> + * distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it under
>> + * the terms of the GNU General Public License version 2 as published
>> by the
>> + * Free Software Foundation. This program is dual-licensed; you may
>> select
>> + * either version 2 of the GNU General Public License ("GPL") or BSD
>> license
>> + * ("BSD").
>> + *
>> + * You can contact the author at:
>> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
>> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
>> + */
>> +
>> +#ifndef __EROFS_XXHASH_H
>> +#define __EROFS_XXHASH_H
>> +
>> +#ifdef __cplusplus
>> +extern "C"
>> +{
>> +#endif
>> +
>> +#include "defs.h"
>
> #include <stdint.h> ?
Alright.
>
>> +
>> +/**
>> + * xxh32() - calculate the 32-bit hash of the input with a given seed.
>> + *
>> + * @input: The data to hash.
>> + * @length: The length of the data to hash.
>> + * @seed: The seed can be used to alter the result predictably.
>> + *
>> + * Return: The 32-bit hash of the data.
>> + */
>> +uint32_t xxh32(const void *input, size_t length, uint32_t seed);
>> +
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
>> +#endif
>> diff --git a/lib/Makefile.am b/lib/Makefile.am
>> index 7a5dc03..3e09516 100644
>> --- a/lib/Makefile.am
>> +++ b/lib/Makefile.am
>> @@ -24,6 +24,7 @@ noinst_HEADERS = $(top_srcdir)/include/erofs_fs.h \
>> $(top_srcdir)/include/erofs/xattr.h \
>> $(top_srcdir)/include/erofs/compress_hints.h \
>> $(top_srcdir)/include/erofs/fragments.h \
>> + $(top_srcdir)/include/erofs/xxhash.h \
>> $(top_srcdir)/lib/liberofs_private.h
>> noinst_HEADERS += compressor.h
>> @@ -31,7 +32,7 @@ liberofs_la_SOURCES = config.c io.c cache.c super.c
>> inode.c xattr.c exclude.c \
>> namei.c data.c compress.c compressor.c zmap.c
>> decompress.c \
>> compress_hints.c hashmap.c sha256.c blobchunk.c dir.c \
>> fragments.c rb_tree.c dedupe.c uuid_unparse.c uuid.c
>> tar.c \
>> - block_list.c
>> + block_list.c xxhash.c
>> liberofs_la_CFLAGS = -Wall ${libuuid_CFLAGS} -I$(top_srcdir)/include
>> if ENABLE_LZ4
>> diff --git a/lib/xxhash.c b/lib/xxhash.c
>> new file mode 100644
>> index 0000000..3d77fe3
>> --- /dev/null
>> +++ b/lib/xxhash.c
>> @@ -0,0 +1,125 @@
>> +// SPDX-License-Identifier: BSD-2-Clause OR GPL-2.0-only
>> +/*
>> + * The xxhash is copied from the linux kernel at:
>> + *
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/lib/xxhash.c
>> + *
>> + * The original copyright is:
>> + *
>> + * xxHash - Extremely Fast Hash algorithm
>> + * Copyright (C) 2012-2016, Yann Collet.
>> + *
>> + * BSD 2-Clause License
>> (http://www.opensource.org/licenses/bsd-license.php)
>> + *
>> + * Redistribution and use in source and binary forms, with or without
>> + * modification, are permitted provided that the following conditions
>> are
>> + * met:
>> + *
>> + * * Redistributions of source code must retain the above copyright
>> + * notice, this list of conditions and the following disclaimer.
>> + * * Redistributions in binary form must reproduce the above
>> + * copyright notice, this list of conditions and the following
>> disclaimer
>> + * in the documentation and/or other materials provided with the
>> + * distribution.
>> + *
>> + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
>> + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
>> + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
>> + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
>> + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
>> + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
>> + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
>> + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
>> + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
>> + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
>> + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> modify it under
>> + * the terms of the GNU General Public License version 2 as published
>> by the
>> + * Free Software Foundation. This program is dual-licensed; you may
>> select
>> + * either version 2 of the GNU General Public License ("GPL") or BSD
>> license
>> + * ("BSD").
>> + *
>> + * You can contact the author at:
>> + * - xxHash homepage: https://cyan4973.github.io/xxHash/
>> + * - xxHash source repository: https://github.com/Cyan4973/xxHash
>> + */
>>
>
> #include "erofs/defs.h"
Alright.
>
>> +#include "erofs/xxhash.h"
>> +
>
> Thanks,
> Gao Xiang
--
Thanks,
Jingbo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter
2023-08-29 12:49 ` Gao Xiang
@ 2023-08-29 14:19 ` Jingbo Xu
0 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 14:19 UTC (permalink / raw)
To: Gao Xiang, linux-erofs
On 8/29/23 8:49 PM, Gao Xiang wrote:
>
>
> On 2023/8/29 20:41, Jingbo Xu wrote:
>> The xattr name bloom filter feature is going to be introduced to speed
>> up the negative xattr lookup, e.g. system.posix_acl_[access|default]
>> lookup when running "ls -lR" workload.
>
>
> I think you could just say
> "Let's keep in sync with kernel commit <12-char commit id>."
>
> here.
>
> ...
>
>> ---
>
>
> ...
>
>> +#define EROFS_XATTR_NAME_LEN_MAX UCHAR_MAX
>
> where is EROFS_XATTR_NAME_LEN_MAX used?
Indeed it's not used and I can't remember where it's used in the initial
implementation...
I will drop it later.
--
Thanks,
Jingbo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH v6 3/3] erofs-utils: mkfs: enable xattr name filter
2023-08-29 12:53 ` Gao Xiang
@ 2023-08-29 14:20 ` Jingbo Xu
0 siblings, 0 replies; 10+ messages in thread
From: Jingbo Xu @ 2023-08-29 14:20 UTC (permalink / raw)
To: Gao Xiang, linux-erofs
On 8/29/23 8:53 PM, Gao Xiang wrote:
>
>
> On 2023/8/29 20:41, Jingbo Xu wrote:
>> Introduce "-Exattr-name-filter" option to enable the xattr name bloom
>> filter feature.
>>
>> Signed-off-by: Jingbo Xu <jefflexu@linux.alibaba.com>
>> ---
>> include/erofs/config.h | 1 +
>> include/erofs/internal.h | 1 +
>> lib/xattr.c | 63 ++++++++++++++++++++++++++++++++++++++++
>> mkfs/main.c | 7 +++++
>> 4 files changed, 72 insertions(+)
>>
>> diff --git a/include/erofs/config.h b/include/erofs/config.h
>> index 8f52d2c..c51f0cd 100644
>> --- a/include/erofs/config.h
>> +++ b/include/erofs/config.h
>> @@ -53,6 +53,7 @@ struct erofs_configure {
>> bool c_ignore_mtime;
>> bool c_showprogress;
>> bool c_extra_ea_name_prefixes;
>> + bool c_xattr_name_filter;
>> #ifdef HAVE_LIBSELINUX
>> struct selabel_handle *sehnd;
>> diff --git a/include/erofs/internal.h b/include/erofs/internal.h
>> index 3e73eef..382024a 100644
>> --- a/include/erofs/internal.h
>> +++ b/include/erofs/internal.h
>> @@ -139,6 +139,7 @@ EROFS_FEATURE_FUNCS(fragments, incompat,
>> INCOMPAT_FRAGMENTS)
>> EROFS_FEATURE_FUNCS(dedupe, incompat, INCOMPAT_DEDUPE)
>> EROFS_FEATURE_FUNCS(xattr_prefixes, incompat, INCOMPAT_XATTR_PREFIXES)
>> EROFS_FEATURE_FUNCS(sb_chksum, compat, COMPAT_SB_CHKSUM)
>> +EROFS_FEATURE_FUNCS(xattr_filter, compat, COMPAT_XATTR_FILTER)
>> #define EROFS_I_EA_INITED (1 << 0)
>> #define EROFS_I_Z_INITED (1 << 1)
>> diff --git a/lib/xattr.c b/lib/xattr.c
>> index 46a301a..325241d 100644
>> --- a/lib/xattr.c
>> +++ b/lib/xattr.c
>> @@ -18,6 +18,7 @@
>> #include "erofs/cache.h"
>> #include "erofs/io.h"
>> #include "erofs/fragments.h"
>> +#include "erofs/xxhash.h"
>> #include "liberofs_private.h"
>> #define EA_HASHTABLE_BITS 16
>> @@ -783,6 +784,63 @@ out:
>> return ret;
>> }
>> +
>> +static int erofs_xattr_filter_hashbit(struct xattr_item *item)
>> +{
>> + u8 prefix = item->prefix;
>> + const char *key = item->kvbuf;
>> + unsigned int len = item->len[0];
>> + char *name = NULL;
>> + uint32_t hashbit;
>> +
>> + if (prefix & EROFS_XATTR_LONG_PREFIX) {
>> + struct ea_type_node *tnode;
>> + u16 prefix_len;
>> + int ret;
>> +
>> + list_for_each_entry(tnode, &ea_name_prefixes, list) {
>> + if (tnode->index == item->prefix) {
>> + ret = asprintf(&name, "%s%.*s",
>> + tnode->type.prefix, len, key);
>> + if (ret < 0)
>> + return -ENOMEM;
>> + break;
>> + }
>> + }
>> + if (!name)
>> + return -ENOENT;
>> +
>> + if (!match_base_prefix(name, &prefix, &prefix_len)) {
>> + free(name);
>> + return -ENOENT;
>> + }
>> + key = name + prefix_len;
>> + len = strlen(key);
>> + }
>> +
>> + hashbit = xxh32(key, len, EROFS_XATTR_FILTER_SEED + prefix) &
>> + (EROFS_XATTR_FILTER_BITS - 1);
>> + if (name)
>> + free(name);
>> + return hashbit;
>> +}
>> +
>> +static u32 erofs_xattr_filter_map(struct list_head *ixattrs)
>> +{
>> + struct inode_xattr_node *node, *n;
>> + u32 name_filter;
>> + int hashbit;
>> +
>> + name_filter = 0;
>> + list_for_each_entry_safe(node, n, ixattrs, list) {
>> + hashbit = erofs_xattr_filter_hashbit(node->item);
>> + if (hashbit < 0)
>
> I'd suggest to clear feature bit instead:
>
> erofs_sb_clear_xattr_filter(&sbi);
Good idea.
>
>> + return 0;
>> + name_filter |= (1UL << hashbit);
>> + }
>> + return EROFS_XATTR_FILTER_DEFAULT & ~name_filter;
>> +}
>> +
>> char *erofs_export_xattr_ibody(struct list_head *ixattrs, unsigned
>> int size)
>> {
>> struct inode_xattr_node *node, *n;
>> @@ -797,6 +855,11 @@ char *erofs_export_xattr_ibody(struct list_head
>> *ixattrs, unsigned int size)
>> header = (struct erofs_xattr_ibody_header *)buf;
>> header->h_shared_count = 0;
>> + if (cfg.c_xattr_name_filter) {
>> + u32 name_filter = erofs_xattr_filter_map(ixattrs);
>
> Leave a blank line here.
>
>> + header->h_name_filter = cpu_to_le32(name_filter);
>
> Or
> header->h_name_filter =
> cpu_to_le32(erofs_xattr_filter_map(ixattrs));
>
Okay. I would prefer the latter.
--
Thanks,
Jingbo
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-08-29 14:20 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29 12:41 [PATCH v6 0/3] erofs-utils: introduce xattr name bloom filter Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 1/3] erofs-utils: add xxh32 library Jingbo Xu
2023-08-29 12:44 ` Gao Xiang
2023-08-29 14:14 ` Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 2/3] erofs-utils: update on-disk format for xattr name filter Jingbo Xu
2023-08-29 12:49 ` Gao Xiang
2023-08-29 14:19 ` Jingbo Xu
2023-08-29 12:41 ` [PATCH v6 3/3] erofs-utils: mkfs: enable " Jingbo Xu
2023-08-29 12:53 ` Gao Xiang
2023-08-29 14:20 ` Jingbo Xu
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.