From: Peng Zhang <zhangpeng.00@bytedance.com>
To: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Cc: corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org,
brauner@kernel.org, surenb@google.com,
michael.christie@oracle.com, mathieu.desnoyers@efficios.com,
peterz@infradead.org, npiggin@gmail.com, avagin@gmail.com,
linux-mm@kvack.org, linux-doc@vger.kernel.org,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}()
Date: Mon, 31 Jul 2023 17:52:59 +0800 [thread overview]
Message-ID: <36f2d5d1-00ff-d8a7-ed40-15eb5d929224@bytedance.com> (raw)
In-Reply-To: <20230726145825.2fufoujgc5faiszq@revolver>
在 2023/7/26 22:58, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@bytedance.com> [230726 04:10]:
>> Introduce ma_nonleaf_data_end{_nocheck}() to get the data end of
>> non-leaf nodes without knowing the maximum value of nodes, so that any
>> ascending can be avoided even if the maximum value of nodes is not known.
>>
>> The principle is that we introduce MAPLE_ENODE to mark an ENODE, which
>> cannot be used by metadata, so we can distinguish whether it is ENODE or
>> metadata.
>>
>> The nocheck version is to avoid lockdep complaining in some scenarios
>> where no locks are held.
>>
>> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
>> ---
>> lib/maple_tree.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++--
>> 1 file changed, 68 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index a3d602cfd030..98e4fdf6f4b9 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -310,12 +310,19 @@ static inline void mte_set_node_dead(struct maple_enode *mn)
>> #define MAPLE_ENODE_TYPE_SHIFT 0x03
>> /* Bit 2 means a NULL somewhere below */
>> #define MAPLE_ENODE_NULL 0x04
>> +/* Bit 7 means this is an ENODE, instead of metadata */
>> +#define MAPLE_ENODE 0x80
>
> We were saving this bit for more node types. I don't want to use this
> bit for this reason since you could have done BFS to duplicate the tree
> using the existing way to find the node end.
We have reserved 4 bits for the node type. I don't think there will be
more than 16 node types going forward.
Even the DFS that has been implemented can use the existing way to get
the data end. I didn't use it because when walking up the tree, we don't
know the maximum value of the node, and the continuous upward walk will
introduce more overhead, which is what mas_ascend() does. Doing BFS
cannot avoid this problem also.
The reason I don't do BFS is that it has more overhead than DFS. If you
think of a tree as a graph, doing DFS will only walk each edge twice.
What if it is BFS? Since we can't use queues, we can only emulate BFS,
which additionally does something like mas_next_node() does, which
introduces more overhead than DFS. Considering only the layer of leaf
nodes, it needs to walk each edge twice. So the overhead of doing BFS is
more than DFS.
>
> Bits are highly valuable and this is the only remaining bit. I had
> thought about using this in Feb 2021 to see if there was metadata or
> not, but figured a way around it (using the max trick) and thus saved
> this bit for potential expansion of node types.
I thought of another way to get the maximum value of a node without
doing any extra upward walk. When doing DFS, we can use a stack to save
the maximum value of ancestor nodes. The stack size can be set to
MAPLE_HEIGHT_MAX. In this way, this bit can be reserved, and there is no
need to do a loop like mas_ascend() in order to get the maximum value.
>
>> +
>> +static inline bool slot_is_mte(unsigned long slot)
>> +{
>> + return slot & MAPLE_ENODE;
>> +}
>>
>> static inline struct maple_enode *mt_mk_node(const struct maple_node *node,
>> enum maple_type type)
>> {
>> - return (void *)((unsigned long)node |
>> - (type << MAPLE_ENODE_TYPE_SHIFT) | MAPLE_ENODE_NULL);
>> + return (void *)((unsigned long)node | (type << MAPLE_ENODE_TYPE_SHIFT) |
>> + MAPLE_ENODE_NULL | MAPLE_ENODE);
>> }
>>
>> static inline void *mte_mk_root(const struct maple_enode *node)
>> @@ -1411,6 +1418,65 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
>> return NULL;
>> }
>>
>> +/*
>> + * ma_nonleaf_data_end() - Find the end of the data in a non-leaf node.
>> + * @mt: The maple tree
>> + * @node: The maple node
>> + * @type: The maple node type
>> + *
>> + * Uses metadata to find the end of the data when possible without knowing the
>> + * node maximum.
>> + *
>> + * Return: The zero indexed last slot with child.
>> + */
>> +static inline unsigned char ma_nonleaf_data_end(struct maple_tree *mt,
>> + struct maple_node *node,
>> + enum maple_type type)
>> +{
>> + void __rcu **slots;
>> + unsigned long slot;
>> +
>> + slots = ma_slots(node, type);
>> + slot = (unsigned long)mt_slot(mt, slots, mt_pivots[type]);
>> + if (unlikely(slot_is_mte(slot)))
>> + return mt_pivots[type];
>> +
>> + return ma_meta_end(node, type);
>> +}
>> +
>> +/*
>> + * ma_nonleaf_data_end_nocheck() - Find the end of the data in a non-leaf node.
>> + * @node: The maple node
>> + * @type: The maple node type
>> + *
>> + * Uses metadata to find the end of the data when possible without knowing the
>> + * node maximum. This is the version of ma_nonleaf_data_end() that does not
>> + * check for lock held. This particular version is designed to avoid lockdep
>> + * complaining in some scenarios.
>> + *
>> + * Return: The zero indexed last slot with child.
>> + */
>> +static inline unsigned char ma_nonleaf_data_end_nocheck(struct maple_node *node,
>> + enum maple_type type)
>> +{
>> + void __rcu **slots;
>> + unsigned long slot;
>> +
>> + slots = ma_slots(node, type);
>> + slot = (unsigned long)rcu_dereference_raw(slots[mt_pivots[type]]);
>> + if (unlikely(slot_is_mte(slot)))
>> + return mt_pivots[type];
>> +
>> + return ma_meta_end(node, type);
>> +}
>> +
>> +/* See ma_nonleaf_data_end() */
>> +static inline unsigned char mte_nonleaf_data_end(struct maple_tree *mt,
>> + struct maple_enode *enode)
>> +{
>> + return ma_nonleaf_data_end(mt, mte_to_node(enode), mte_node_type(enode));
>> +}
>> +
>> /*
>> * ma_data_end() - Find the end of the data in a node.
>> * @node: The maple node
>> --
>> 2.20.1
>>
>>
next prev parent reply other threads:[~2023-07-31 9:54 UTC|newest]
Thread overview: 42+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-07-26 8:09 [PATCH 00/11] Introduce mt_dup() to improve the performance of fork() Peng Zhang
2023-07-26 8:09 ` [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}() Peng Zhang
2023-07-26 14:58 ` Liam R. Howlett
2023-07-31 9:52 ` Peng Zhang [this message]
2023-07-31 16:08 ` Liam R. Howlett
2023-07-26 8:09 ` [PATCH 02/11] maple_tree: Validate MAPLE_ENODE and ma_nonleaf_data_end() Peng Zhang
2023-07-26 8:09 ` [PATCH 03/11] maple_tree: Add some helper functions Peng Zhang
2023-07-26 15:02 ` Liam R. Howlett
2023-07-26 15:08 ` Matthew Wilcox
2023-07-31 11:45 ` Peng Zhang
2023-08-11 17:28 ` Liam R. Howlett
2023-07-31 11:40 ` Peng Zhang
2023-07-26 8:09 ` [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup() Peng Zhang
2023-07-26 16:03 ` Liam R. Howlett
2023-07-31 12:24 ` Peng Zhang
2023-07-31 16:27 ` Liam R. Howlett
2023-08-16 13:41 ` Peng Zhang
2023-08-16 18:30 ` Liam R. Howlett
2023-08-18 11:53 ` Peng Zhang
2023-08-18 16:13 ` Liam R. Howlett
2023-07-26 8:09 ` [PATCH 05/11] maple_tree: Add test for mt_dup() Peng Zhang
2023-07-26 16:06 ` Liam R. Howlett
2023-07-31 12:32 ` Peng Zhang
2023-07-31 16:41 ` Liam R. Howlett
2023-07-26 8:09 ` [PATCH 06/11] maple_tree: Introduce mas_replace_entry() to directly replace an entry Peng Zhang
2023-07-26 16:08 ` Liam R. Howlett
2023-07-31 12:39 ` Peng Zhang
2023-07-31 16:48 ` Liam R. Howlett
2023-08-16 13:11 ` Peng Zhang
2023-08-16 17:40 ` Liam R. Howlett
2023-08-18 9:39 ` Peng Zhang
2023-08-18 16:15 ` Liam R. Howlett
2023-07-26 8:09 ` [PATCH 07/11] maple_tree: Update the documentation of maple tree Peng Zhang
2023-07-26 8:09 ` [PATCH 08/11] maple_tree: Skip other tests when BENCH is enabled Peng Zhang
2023-07-26 8:09 ` [PATCH 09/11] maple_tree: Update check_forking() and bench_forking() Peng Zhang
2023-07-26 8:09 ` [PATCH 10/11] MAINTAINERS: Add co-maintainer for maple tree Peng Zhang
2023-07-26 16:39 ` Liam R. Howlett
2023-07-31 12:55 ` Peng Zhang
2023-07-31 20:55 ` Liam R. Howlett
2023-07-26 8:09 ` [PATCH 11/11] fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Peng Zhang
2023-07-26 17:06 ` Liam R. Howlett
2023-07-31 12:59 ` Peng Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=36f2d5d1-00ff-d8a7-ed40-15eb5d929224@bytedance.com \
--to=zhangpeng.00@bytedance.com \
--cc=Liam.Howlett@Oracle.com \
--cc=akpm@linux-foundation.org \
--cc=avagin@gmail.com \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=linux-doc@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mathieu.desnoyers@efficios.com \
--cc=michael.christie@oracle.com \
--cc=npiggin@gmail.com \
--cc=peterz@infradead.org \
--cc=surenb@google.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).