linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peng Zhang <zhangpeng.00@bytedance.com>
To: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Cc: corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org,
	brauner@kernel.org, surenb@google.com,
	michael.christie@oracle.com, mathieu.desnoyers@efficios.com,
	peterz@infradead.org, npiggin@gmail.com, avagin@gmail.com,
	linux-mm@kvack.org, linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}()
Date: Mon, 31 Jul 2023 17:52:59 +0800	[thread overview]
Message-ID: <36f2d5d1-00ff-d8a7-ed40-15eb5d929224@bytedance.com> (raw)
In-Reply-To: <20230726145825.2fufoujgc5faiszq@revolver>



在 2023/7/26 22:58, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@bytedance.com> [230726 04:10]:
>> Introduce ma_nonleaf_data_end{_nocheck}() to get the data end of
>> non-leaf nodes without knowing the maximum value of nodes, so that any
>> ascending can be avoided even if the maximum value of nodes is not known.
>>
>> The principle is that we introduce MAPLE_ENODE to mark an ENODE, which
>> cannot be used by metadata, so we can distinguish whether it is ENODE or
>> metadata.
>>
>> The nocheck version is to avoid lockdep complaining in some scenarios
>> where no locks are held.
>>
>> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
>> ---
>>   lib/maple_tree.c | 70 ++++++++++++++++++++++++++++++++++++++++++++++--
>>   1 file changed, 68 insertions(+), 2 deletions(-)
>>
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index a3d602cfd030..98e4fdf6f4b9 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -310,12 +310,19 @@ static inline void mte_set_node_dead(struct maple_enode *mn)
>>   #define MAPLE_ENODE_TYPE_SHIFT		0x03
>>   /* Bit 2 means a NULL somewhere below */
>>   #define MAPLE_ENODE_NULL		0x04
>> +/* Bit 7 means this is an ENODE, instead of metadata */
>> +#define MAPLE_ENODE			0x80
> 
> We were saving this bit for more node types.  I don't want to use this
> bit for this reason since you could have done BFS to duplicate the tree
> using the existing way to find the node end.
We have reserved 4 bits for the node type. I don't think there will be
more than 16 node types going forward.

Even the DFS that has been implemented can use the existing way to get
the data end. I didn't use it because when walking up the tree, we don't
know the maximum value of the node, and the continuous upward walk will
introduce more overhead, which is what mas_ascend() does. Doing BFS
cannot avoid this problem also.

The reason I don't do BFS is that it has more overhead than DFS. If you
think of a tree as a graph, doing DFS will only walk each edge twice.
What if it is BFS? Since we can't use queues, we can only emulate BFS,
which additionally does something like mas_next_node() does, which
introduces more overhead than DFS. Considering only the layer of leaf
nodes, it needs to walk each edge twice. So the overhead of doing BFS is
more than DFS.
> 
> Bits are highly valuable and this is the only remaining bit.  I had
> thought about using this in Feb 2021 to see if there was metadata or
> not, but figured a way around it (using the max trick) and thus saved
> this bit for potential expansion of node types.
I thought of another way to get the maximum value of a node without
doing any extra upward walk. When doing DFS, we can use a stack to save
the maximum value of ancestor nodes. The stack size can be set to
MAPLE_HEIGHT_MAX. In this way, this bit can be reserved, and there is no
need to do a loop like mas_ascend() in order to get the maximum value.
> 
>> +
>> +static inline bool slot_is_mte(unsigned long slot)
>> +{
>> +	return slot & MAPLE_ENODE;
>> +}
>>   
>>   static inline struct maple_enode *mt_mk_node(const struct maple_node *node,
>>   					     enum maple_type type)
>>   {
>> -	return (void *)((unsigned long)node |
>> -			(type << MAPLE_ENODE_TYPE_SHIFT) | MAPLE_ENODE_NULL);
>> +	return (void *)((unsigned long)node | (type << MAPLE_ENODE_TYPE_SHIFT) |
>> +			MAPLE_ENODE_NULL | MAPLE_ENODE);
>>   }
>>   
>>   static inline void *mte_mk_root(const struct maple_enode *node)
>> @@ -1411,6 +1418,65 @@ static inline struct maple_enode *mas_start(struct ma_state *mas)
>>   	return NULL;
>>   }
>>   
>> +/*
>> + * ma_nonleaf_data_end() - Find the end of the data in a non-leaf node.
>> + * @mt: The maple tree
>> + * @node: The maple node
>> + * @type: The maple node type
>> + *
>> + * Uses metadata to find the end of the data when possible without knowing the
>> + * node maximum.
>> + *
>> + * Return: The zero indexed last slot with child.
>> + */
>> +static inline unsigned char ma_nonleaf_data_end(struct maple_tree *mt,
>> +						struct maple_node *node,
>> +						enum maple_type type)
>> +{
>> +	void __rcu **slots;
>> +	unsigned long slot;
>> +
>> +	slots = ma_slots(node, type);
>> +	slot = (unsigned long)mt_slot(mt, slots, mt_pivots[type]);
>> +	if (unlikely(slot_is_mte(slot)))
>> +		return mt_pivots[type];
>> +
>> +	return ma_meta_end(node, type);
>> +}
>> +
>> +/*
>> + * ma_nonleaf_data_end_nocheck() - Find the end of the data in a non-leaf node.
>> + * @node: The maple node
>> + * @type: The maple node type
>> + *
>> + * Uses metadata to find the end of the data when possible without knowing the
>> + * node maximum. This is the version of ma_nonleaf_data_end() that does not
>> + * check for lock held. This particular version is designed to avoid lockdep
>> + * complaining in some scenarios.
>> + *
>> + * Return: The zero indexed last slot with child.
>> + */
>> +static inline unsigned char ma_nonleaf_data_end_nocheck(struct maple_node *node,
>> +							enum maple_type type)
>> +{
>> +	void __rcu **slots;
>> +	unsigned long slot;
>> +
>> +	slots = ma_slots(node, type);
>> +	slot = (unsigned long)rcu_dereference_raw(slots[mt_pivots[type]]);
>> +	if (unlikely(slot_is_mte(slot)))
>> +		return mt_pivots[type];
>> +
>> +	return ma_meta_end(node, type);
>> +}
>> +
>> +/* See ma_nonleaf_data_end() */
>> +static inline unsigned char mte_nonleaf_data_end(struct maple_tree *mt,
>> +						 struct maple_enode *enode)
>> +{
>> +	return ma_nonleaf_data_end(mt, mte_to_node(enode), mte_node_type(enode));
>> +}
>> +
>>   /*
>>    * ma_data_end() - Find the end of the data in a node.
>>    * @node: The maple node
>> -- 
>> 2.20.1
>>
>>

  reply	other threads:[~2023-07-31  9:54 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-26  8:09 [PATCH 00/11] Introduce mt_dup() to improve the performance of fork() Peng Zhang
2023-07-26  8:09 ` [PATCH 01/11] maple_tree: Introduce ma_nonleaf_data_end{_nocheck}() Peng Zhang
2023-07-26 14:58   ` Liam R. Howlett
2023-07-31  9:52     ` Peng Zhang [this message]
2023-07-31 16:08       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 02/11] maple_tree: Validate MAPLE_ENODE and ma_nonleaf_data_end() Peng Zhang
2023-07-26  8:09 ` [PATCH 03/11] maple_tree: Add some helper functions Peng Zhang
2023-07-26 15:02   ` Liam R. Howlett
2023-07-26 15:08     ` Matthew Wilcox
2023-07-31 11:45       ` Peng Zhang
2023-08-11 17:28         ` Liam R. Howlett
2023-07-31 11:40     ` Peng Zhang
2023-07-26  8:09 ` [PATCH 04/11] maple_tree: Introduce interfaces __mt_dup() and mt_dup() Peng Zhang
2023-07-26 16:03   ` Liam R. Howlett
2023-07-31 12:24     ` Peng Zhang
2023-07-31 16:27       ` Liam R. Howlett
2023-08-16 13:41         ` Peng Zhang
2023-08-16 18:30           ` Liam R. Howlett
2023-08-18 11:53             ` Peng Zhang
2023-08-18 16:13               ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 05/11] maple_tree: Add test for mt_dup() Peng Zhang
2023-07-26 16:06   ` Liam R. Howlett
2023-07-31 12:32     ` Peng Zhang
2023-07-31 16:41       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 06/11] maple_tree: Introduce mas_replace_entry() to directly replace an entry Peng Zhang
2023-07-26 16:08   ` Liam R. Howlett
2023-07-31 12:39     ` Peng Zhang
2023-07-31 16:48       ` Liam R. Howlett
2023-08-16 13:11         ` Peng Zhang
2023-08-16 17:40           ` Liam R. Howlett
2023-08-18  9:39             ` Peng Zhang
2023-08-18 16:15               ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 07/11] maple_tree: Update the documentation of maple tree Peng Zhang
2023-07-26  8:09 ` [PATCH 08/11] maple_tree: Skip other tests when BENCH is enabled Peng Zhang
2023-07-26  8:09 ` [PATCH 09/11] maple_tree: Update check_forking() and bench_forking() Peng Zhang
2023-07-26  8:09 ` [PATCH 10/11] MAINTAINERS: Add co-maintainer for maple tree Peng Zhang
2023-07-26 16:39   ` Liam R. Howlett
2023-07-31 12:55     ` Peng Zhang
2023-07-31 20:55       ` Liam R. Howlett
2023-07-26  8:09 ` [PATCH 11/11] fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Peng Zhang
2023-07-26 17:06   ` Liam R. Howlett
2023-07-31 12:59     ` Peng Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=36f2d5d1-00ff-d8a7-ed40-15eb5d929224@bytedance.com \
    --to=zhangpeng.00@bytedance.com \
    --cc=Liam.Howlett@Oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=npiggin@gmail.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).