linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Peng Zhang <zhangpeng.00@bytedance.com>
To: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
Cc: Peng Zhang <zhangpeng.00@bytedance.com>,
	corbet@lwn.net, akpm@linux-foundation.org, willy@infradead.org,
	brauner@kernel.org, surenb@google.com,
	michael.christie@oracle.com, mjguzik@gmail.com,
	mathieu.desnoyers@efficios.com, npiggin@gmail.com,
	peterz@infradead.org, oliver.sang@intel.com, mst@redhat.com,
	maple-tree@lists.infradead.org, linux-mm@kvack.org,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v5 03/10] maple_tree: Introduce interfaces __mt_dup() and mtree_dup()
Date: Tue, 24 Oct 2023 16:40:18 +0800	[thread overview]
Message-ID: <a63242e2-291c-4b3c-8269-429c93d8badd@bytedance.com> (raw)
In-Reply-To: <20231017135717.2iipnd37pgaswzdc@revolver>



在 2023/10/17 21:57, Liam R. Howlett 写道:
> * Peng Zhang <zhangpeng.00@bytedance.com> [231015 23:23]:
>> Introduce interfaces __mt_dup() and mtree_dup(), which are used to
>> duplicate a maple tree. They duplicate a maple tree in Depth-First
>> Search (DFS) pre-order traversal. It uses memcopy() to copy nodes in the
>> source tree and allocate new child nodes in non-leaf nodes. The new node
>> is exactly the same as the source node except for all the addresses
>> stored in it. It will be faster than traversing all elements in the
>> source tree and inserting them one by one into the new tree. The time
>> complexity of these two functions is O(n).
>>
>> The difference between __mt_dup() and mtree_dup() is that mtree_dup()
>> handles locks internally.
>>
>> Analysis of the average time complexity of this algorithm:
>>
>> For simplicity, let's assume that the maximum branching factor of all
>> non-leaf nodes is 16 (in allocation mode, it is 10), and the tree is a
>> full tree.
>>
>> Under the given conditions, if there is a maple tree with n elements,
>> the number of its leaves is n/16. From bottom to top, the number of
>> nodes in each level is 1/16 of the number of nodes in the level below.
>> So the total number of nodes in the entire tree is given by the sum of
>> n/16 + n/16^2 + n/16^3 + ... + 1. This is a geometric series, and it has
>> log(n) terms with base 16. According to the formula for the sum of a
>> geometric series, the sum of this series can be calculated as (n-1)/15.
>> Each node has only one parent node pointer, which can be considered as
>> an edge. In total, there are (n-1)/15-1 edges.
>>
>> This algorithm consists of two operations:
>>
>> 1. Traversing all nodes in DFS order.
>> 2. For each node, making a copy and performing necessary modifications
>>     to create a new node.
>>
>> For the first part, DFS traversal will visit each edge twice. Let
>> T(ascend) represent the cost of taking one step downwards, and
>> T(descend) represent the cost of taking one step upwards. And both of
>> them are constants (although mas_ascend() may not be, as it contains a
>> loop, but here we ignore it and treat it as a constant). So the time
>> spent on the first part can be represented as
>> ((n-1)/15-1) * (T(ascend) + T(descend)).
>>
>> For the second part, each node will be copied, and the cost of copying a
>> node is denoted as T(copy_node). For each non-leaf node, it is necessary
>> to reallocate all child nodes, and the cost of this operation is denoted
>> as T(dup_alloc). The behavior behind memory allocation is complex and
>> not specific to the maple tree operation. Here, we assume that the time
>> required for a single allocation is constant. Since the size of a node
>> is fixed, both of these symbols are also constants. We can calculate
>> that the time spent on the second part is
>> ((n-1)/15) * T(copy_node) + ((n-1)/15 - n/16) * T(dup_alloc).
>>
>> Adding both parts together, the total time spent by the algorithm can be
>> represented as:
>>
>> ((n-1)/15) * (T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)) -
>> n/16 * T(dup_alloc) - (T(ascend) + T(descend))
>>
>> Let C1 = T(ascend) + T(descend) + T(copy_node) + T(dup_alloc)
>> Let C2 = T(dup_alloc)
>> Let C3 = T(ascend) + T(descend)
>>
>> Finally, the expression can be simplified as:
>> ((16 * C1 - 15 * C2) / (15 * 16)) * n - (C1 / 15 + C3).
>>
>> This is a linear function, so the average time complexity is O(n).
>>
>> Signed-off-by: Peng Zhang <zhangpeng.00@bytedance.com>
>> ---
>>   include/linux/maple_tree.h |   3 +
>>   lib/maple_tree.c           | 290 +++++++++++++++++++++++++++++++++++++
>>   2 files changed, 293 insertions(+)
>>
>> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
>> index f91dbc7fe091..a452dd8a1e5c 100644
>> --- a/include/linux/maple_tree.h
>> +++ b/include/linux/maple_tree.h
>> @@ -329,6 +329,9 @@ int mtree_store(struct maple_tree *mt, unsigned long index,
>>   		void *entry, gfp_t gfp);
>>   void *mtree_erase(struct maple_tree *mt, unsigned long index);
>>   
>> +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp);
>> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp);
>> +
>>   void mtree_destroy(struct maple_tree *mt);
>>   void __mt_destroy(struct maple_tree *mt);
>>   
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index ca7039633844..6e0ad83f14e3 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -4,6 +4,10 @@
>>    * Copyright (c) 2018-2022 Oracle Corporation
>>    * Authors: Liam R. Howlett <Liam.Howlett@oracle.com>
>>    *	    Matthew Wilcox <willy@infradead.org>
>> + *
>> + * Algorithm for duplicating Maple Tree
>> + * Copyright (c) 2023 ByteDance
>> + * Author: Peng Zhang <zhangpeng.00@bytedance.com>
>>    */
>>   
>>   /*
>> @@ -6475,6 +6479,292 @@ void *mtree_erase(struct maple_tree *mt, unsigned long index)
>>   }
>>   EXPORT_SYMBOL(mtree_erase);
>>   
>> +/*
>> + * mas_dup_free() - Free an incomplete duplication of a tree.
>> + * @mas: The maple state of a incomplete tree.
>> + *
>> + * The parameter @mas->node passed in indicates that the allocation failed on
>> + * this node. This function frees all nodes starting from @mas->node in the
>> + * reverse order of mas_dup_build(). There is no need to hold the source tree
>> + * lock at this time.
>> + */
>> +static void mas_dup_free(struct ma_state *mas)
>> +{
>> +	struct maple_node *node;
>> +	enum maple_type type;
>> +	void __rcu **slots;
>> +	unsigned char count, i;
>> +
>> +	/* Maybe the first node allocation failed. */
>> +	if (mas_is_none(mas))
>> +		return;
>> +
>> +	while (!mte_is_root(mas->node)) {
>> +		mas_ascend(mas);
>> +
> 
> Please watch the extra whitespace.  There are a few in this patch.
Done in v6, thank you.
> 
>> +		if (mas->offset) {
>> +			mas->offset--;
>> +			do {
>> +				mas_descend(mas);
>> +				mas->offset = mas_data_end(mas);
>> +			} while (!mte_is_leaf(mas->node));
>> +
>> +			mas_ascend(mas);
>> +		}
>> +
>> +		node = mte_to_node(mas->node);
>> +		type = mte_node_type(mas->node);
>> +		slots = ma_slots(node, type);
>> +		count = mas_data_end(mas) + 1;
>> +		for (i = 0; i < count; i++)
>> +			((unsigned long *)slots)[i] &= ~MAPLE_NODE_MASK;
>> +
>> +		mt_free_bulk(count, slots);
>> +	}
>> +
>> +	node = mte_to_node(mas->node);
>> +	mt_free_one(node);
>> +}
>> +
>> +/*
>> + * mas_copy_node() - Copy a maple node and replace the parent.
>> + * @mas: The maple state of source tree.
>> + * @new_mas: The maple state of new tree.
>> + * @parent: The parent of the new node.
>> + *
>> + * Copy @mas->node to @new_mas->node, set @parent to be the parent of
>> + * @new_mas->node. If memory allocation fails, @mas is set to -ENOMEM.
>> + */
>> +static inline void mas_copy_node(struct ma_state *mas, struct ma_state *new_mas,
>> +		struct maple_pnode *parent)
>> +{
>> +	struct maple_node *node = mte_to_node(mas->node);
>> +	struct maple_node *new_node = mte_to_node(new_mas->node);
>> +	unsigned long val;
>> +
>> +	/* Copy the node completely. */
>> +	memcpy(new_node, node, sizeof(struct maple_node));
>> +
>> +	/* Update the parent node pointer. */
>> +	val = (unsigned long)node->parent & MAPLE_NODE_MASK;
>> +	new_node->parent = ma_parent_ptr(val | (unsigned long)parent);
>> +}
>> +
>> +/*
>> + * mas_dup_alloc() - Allocate child nodes for a maple node.
>> + * @mas: The maple state of source tree.
>> + * @new_mas: The maple state of new tree.
>> + * @gfp: The GFP_FLAGS to use for allocations.
>> + *
>> + * This function allocates child nodes for @new_mas->node during the duplication
>> + * process. If memory allocation fails, @mas is set to -ENOMEM.
>> + */
>> +static inline void mas_dup_alloc(struct ma_state *mas, struct ma_state *new_mas,
>> +		gfp_t gfp)
>> +{
>> +	struct maple_node *node = mte_to_node(mas->node);
>> +	struct maple_node *new_node = mte_to_node(new_mas->node);
>> +	enum maple_type type;
>> +	unsigned char request, count, i;
>> +	void __rcu **slots;
>> +	void __rcu **new_slots;
>> +	unsigned long val;
>> +
>> +	/* Allocate memory for child nodes. */
>> +	type = mte_node_type(mas->node);
>> +	new_slots = ma_slots(new_node, type);
>> +	request = mas_data_end(mas) + 1;
>> +	count = mt_alloc_bulk(gfp, request, (void **)new_slots);
>> +	if (unlikely(count < request)) {
>> +		if (count)
>> +			mt_free_bulk(count, new_slots);
> 
> We were dropping this mt_free_bulk() call as discussed in [1].  Did I
> miss something?
It seems that I misunderstood earlier, I thought it needed to be kept.
It has been deleted in v6, thank you.
> 
>> +
>> +		memset(new_slots, 0, request * sizeof(void *));
>> +		mas_set_err(mas, -ENOMEM);
>> +		return;
>> +	}
>> +
>> +	/* Restore node type information in slots. */
>> +	slots = ma_slots(node, type);
>> +	for (i = 0; i < count; i++) {
>> +		val = (unsigned long)mt_slot_locked(mas->tree, slots, i);
>> +		val &= MAPLE_NODE_MASK;
>> +		((unsigned long *)new_slots)[i] |= val;
>> +	}
>> +}
>> +
>> +/*
>> + * mas_dup_build() - Build a new maple tree from a source tree
>> + * @mas: The maple state of source tree, need to be in MAS_START state.
>> + * @new_mas: The maple state of new tree, need to be in MAS_START state.
>> + * @gfp: The GFP_FLAGS to use for allocations.
>> + *
>> + * This function builds a new tree in DFS preorder. If the memory allocation
>> + * fails, the error code -ENOMEM will be set in @mas, and @new_mas points to the
>> + * last node. mas_dup_free() will free the incomplete duplication of a tree.
>> + *
>> + * Note that the attributes of the two trees need to be exactly the same, and the
>> + * new tree needs to be empty, otherwise -EINVAL will be set in @mas.
>> + */
>> +static inline void mas_dup_build(struct ma_state *mas, struct ma_state *new_mas,
>> +		gfp_t gfp)
>> +{
>> +	struct maple_node *node;
>> +	struct maple_pnode *parent = NULL;
>> +	struct maple_enode *root;
>> +	enum maple_type type;
>> +
>> +	if (unlikely(mt_attr(mas->tree) != mt_attr(new_mas->tree)) ||
>> +	    unlikely(!mtree_empty(new_mas->tree))) {
>> +		mas_set_err(mas, -EINVAL);
>> +		return;
>> +	}
>> +
>> +	mas_start(mas);
>> +	if (mas_is_ptr(mas) || mas_is_none(mas)) {
>> +		root = mt_root_locked(mas->tree);
> 
> mas_start(mas) would return the root entry if it's a pointer and NULL if
> the tree is empty, so this can be written:
> root = mas_start(mas);
> if (mas_is_ptry() || mas_is_none()
> 	goto set_new_tree;
Done in v6, thank you.
> 
> 
>> +		goto set_new_tree;
>> +	}
>> +
>> +	node = mt_alloc_one(gfp);
>> +	if (!node) {
>> +		new_mas->node = MAS_NONE;
>> +		mas_set_err(mas, -ENOMEM);
>> +		return;
>> +	}
>> +
>> +	type = mte_node_type(mas->node);
>> +	root = mt_mk_node(node, type);
>> +	new_mas->node = root;
>> +	new_mas->min = 0;
>> +	new_mas->max = ULONG_MAX;
>> +	root = mte_mk_root(root);
>> +
>> +	while (1) {
>> +		mas_copy_node(mas, new_mas, parent);
>> +
>> +		if (!mte_is_leaf(mas->node)) {
>> +			/* Only allocate child nodes for non-leaf nodes. */
>> +			mas_dup_alloc(mas, new_mas, gfp);
>> +			if (unlikely(mas_is_err(mas)))
>> +				return;
>> +		} else {
>> +			/*
>> +			 * This is the last leaf node and duplication is
>> +			 * completed.
>> +			 */
>> +			if (mas->max == ULONG_MAX)
>> +				goto done;
>> +
>> +			/* This is not the last leaf node and needs to go up. */
>> +			do {
>> +				mas_ascend(mas);
>> +				mas_ascend(new_mas);
>> +			} while (mas->offset == mas_data_end(mas));
>> +
>> +			/* Move to the next subtree. */
>> +			mas->offset++;
>> +			new_mas->offset++;
>> +		}
>> +
>> +		mas_descend(mas);
>> +		parent = ma_parent_ptr(mte_to_node(new_mas->node));
>> +		mas_descend(new_mas);
>> +		mas->offset = 0;
>> +		new_mas->offset = 0;
>> +	}
>> +done:
>> +	/* Specially handle the parent of the root node. */
>> +	mte_to_node(root)->parent = ma_parent_ptr(mas_tree_parent(new_mas));
>> +set_new_tree:
>> +	/* Make them the same height */
>> +	new_mas->tree->ma_flags = mas->tree->ma_flags;
>> +	rcu_assign_pointer(new_mas->tree->ma_root, root);
>> +}
>> +
>> +/**
>> + * __mt_dup(): Duplicate an entire maple tree
>> + * @mt: The source maple tree
>> + * @new: The new maple tree
>> + * @gfp: The GFP_FLAGS to use for allocations
>> + *
>> + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order
>> + * traversal. It uses memcopy() to copy nodes in the source tree and allocate
>> + * new child nodes in non-leaf nodes. The new node is exactly the same as the
>> + * source node except for all the addresses stored in it. It will be faster than
>> + * traversing all elements in the source tree and inserting them one by one into
>> + * the new tree.
>> + * The user needs to ensure that the attributes of the source tree and the new
>> + * tree are the same, and the new tree needs to be an empty tree, otherwise
>> + * -EINVAL will be returned.
>> + * Note that the user needs to manually lock the source tree and the new tree.
>> + *
>> + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If
>> + * the attributes of the two trees are different or the new tree is not an empty
>> + * tree.
>> + */
>> +int __mt_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp)
>> +{
>> +	int ret = 0;
>> +	MA_STATE(mas, mt, 0, 0);
>> +	MA_STATE(new_mas, new, 0, 0);
>> +
>> +	mas_dup_build(&mas, &new_mas, gfp);
>> +
>> +	if (unlikely(mas_is_err(&mas))) {
>> +		ret = xa_err(mas.node);
>> +		if (ret == -ENOMEM)
>> +			mas_dup_free(&new_mas);
>> +	}
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(__mt_dup);
>> +
>> +/**
>> + * mtree_dup(): Duplicate an entire maple tree
>> + * @mt: The source maple tree
>> + * @new: The new maple tree
>> + * @gfp: The GFP_FLAGS to use for allocations
>> + *
>> + * This function duplicates a maple tree in Depth-First Search (DFS) pre-order
>> + * traversal. It uses memcopy() to copy nodes in the source tree and allocate
>> + * new child nodes in non-leaf nodes. The new node is exactly the same as the
>> + * source node except for all the addresses stored in it. It will be faster than
>> + * traversing all elements in the source tree and inserting them one by one into
>> + * the new tree.
>> + * The user needs to ensure that the attributes of the source tree and the new
>> + * tree are the same, and the new tree needs to be an empty tree, otherwise
>> + * -EINVAL will be returned.
>> + *
>> + * Return: 0 on success, -ENOMEM if memory could not be allocated, -EINVAL If
>> + * the attributes of the two trees are different or the new tree is not an empty
>> + * tree.
>> + */
>> +int mtree_dup(struct maple_tree *mt, struct maple_tree *new, gfp_t gfp)
>> +{
>> +	int ret = 0;
>> +	MA_STATE(mas, mt, 0, 0);
>> +	MA_STATE(new_mas, new, 0, 0);
>> +
>> +	mas_lock(&new_mas);
>> +	mas_lock_nested(&mas, SINGLE_DEPTH_NESTING);
>> +
>> +	mas_dup_build(&mas, &new_mas, gfp);
>> +	mas_unlock(&mas);
>> +
>> +	if (unlikely(mas_is_err(&mas))) {
>> +		ret = xa_err(mas.node);
>> +		if (ret == -ENOMEM)
>> +			mas_dup_free(&new_mas);
>> +	}
>> +
>> +	mas_unlock(&new_mas);
>> +
>> +	return ret;
>> +}
>> +EXPORT_SYMBOL(mtree_dup);
>> +
>>   /**
>>    * __mt_destroy() - Walk and free all nodes of a locked maple tree.
>>    * @mt: The maple tree
>> -- 
>> 2.20.1
>>
> 
> [1]. https://lore.kernel.org/lkml/20231004142500.gz2552r74aiphl4z@revolver/
> 
> Thanks,
> Liam

  reply	other threads:[~2023-10-24  8:40 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-10-16  3:22 [PATCH v5 00/10] Introduce __mt_dup() to improve the performance of fork() Peng Zhang
2023-10-16  3:22 ` [PATCH v5 01/10] maple_tree: Add mt_free_one() and mt_attr() helpers Peng Zhang
2023-10-16  3:22 ` [PATCH v5 02/10] maple_tree: Introduce {mtree,mas}_lock_nested() Peng Zhang
2023-10-16  3:22 ` [PATCH v5 03/10] maple_tree: Introduce interfaces __mt_dup() and mtree_dup() Peng Zhang
2023-10-16 14:10   ` Matthew Wilcox
2023-10-17  2:44     ` Peng Zhang
2023-10-17 13:57   ` Liam R. Howlett
2023-10-24  8:40     ` Peng Zhang [this message]
2023-10-16  3:22 ` [PATCH v5 04/10] radix tree test suite: Align kmem_cache_alloc_bulk() with kernel behavior Peng Zhang
2023-10-16  3:22 ` [PATCH v5 05/10] maple_tree: Add test for mtree_dup() Peng Zhang
2023-10-16  3:22 ` [PATCH v5 06/10] maple_tree: Update the documentation of maple tree Peng Zhang
2023-10-16  3:22 ` [PATCH v5 07/10] maple_tree: Skip other tests when BENCH is enabled Peng Zhang
2023-10-16  3:22 ` [PATCH v5 08/10] maple_tree: Update check_forking() and bench_forking() Peng Zhang
2023-10-16  3:22 ` [PATCH v5 09/10] maple_tree: Preserve the tree attributes when destroying maple tree Peng Zhang
2023-10-16  3:22 ` [PATCH v5 10/10] fork: Use __mt_dup() to duplicate maple tree in dup_mmap() Peng Zhang
2023-10-17 13:50   ` Liam R. Howlett
2023-10-24  8:45     ` Peng Zhang
2023-10-16  3:40 ` [PATCH v5 00/10] Introduce __mt_dup() to improve the performance of fork() Peng Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a63242e2-291c-4b3c-8269-429c93d8badd@bytedance.com \
    --to=zhangpeng.00@bytedance.com \
    --cc=Liam.Howlett@Oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=michael.christie@oracle.com \
    --cc=mjguzik@gmail.com \
    --cc=mst@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=surenb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).