All of lore.kernel.org
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Liam Howlett <liam.howlett@oracle.com>,
	"maple-tree@lists.infradead.org" <maple-tree@lists.infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Cc: Song Liu <songliubraving@fb.com>,
	Davidlohr Bueso <dave@stgolabs.net>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Matthew Wilcox <willy@infradead.org>,
	Laurent Dufour <ldufour@linux.ibm.com>,
	David Rientjes <rientjes@google.com>,
	Axel Rasmussen <axelrasmussen@google.com>,
	Suren Baghdasaryan <surenb@google.com>,
	Rik van Riel <riel@surriel.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Michel Lespinasse <walken.cr@gmail.com>,
	Jerome Glisse <jglisse@redhat.com>,
	Minchan Kim <minchan@google.com>,
	Joel Fernandes <joelaf@google.com>,
	Rom Lemarchand <romlem@google.com>
Subject: Re: [PATCH v4 05/66] Maple Tree: Add new data structure
Date: Tue, 7 Dec 2021 16:34:44 +0100	[thread overview]
Message-ID: <5ead526d-8499-4810-7657-6af5f2e96ccc@suse.cz> (raw)
In-Reply-To: <20211201142918.921493-6-Liam.Howlett@oracle.com>

On 12/1/21 15:29, Liam Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> The maple tree is an RCU-safe range based B-tree designed to use modern
> processor cache efficiently.  There are a number of places in the kernel
> that a non-overlapping range-based tree would be beneficial, especially
> one with a simple interface.  The first user that is covered in this
> patch set is the vm_area_struct, where three data structures are
> replaced by the maple tree: the augmented rbtree, the vma cache, and the
> linked list of VMAs in the mm_struct.  The long term goal is to reduce
> or remove the mmap_sem contention.
> 
> The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> nodes.  With the increased branching factor, it is significantly shorter than
> the rbtree so it has fewer cache misses.  The removal of the linked list
> between subsequent entries also reduces the cache misses and the need to pull
> in the previous and next VMA during many tree alterations.
> 
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>

For now just some comments, reviewing fully is obviously rather hard, even
when ignoring the tests...

> ---
>  Documentation/core-api/index.rst              |     1 +
>  Documentation/core-api/maple-tree.rst         |   496 +
>  MAINTAINERS                                   |    12 +
>  include/linux/maple_tree.h                    |   559 +
>  include/trace/events/maple_tree.h             |   123 +
>  lib/Kconfig.debug                             |     9 +
>  lib/Makefile                                  |     3 +-
>  lib/maple_tree.c                              |  6771 +++
>  lib/test_maple_tree.c                         | 37202 ++++++++++++++++
>  tools/testing/radix-tree/.gitignore           |     2 +
>  tools/testing/radix-tree/Makefile             |    13 +-
>  tools/testing/radix-tree/generated/autoconf.h |     1 +
>  tools/testing/radix-tree/linux/maple_tree.h   |     7 +
>  tools/testing/radix-tree/maple.c              |    59 +
>  .../radix-tree/trace/events/maple_tree.h      |     3 +
>  15 files changed, 45258 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/core-api/maple-tree.rst
>  create mode 100644 include/linux/maple_tree.h
>  create mode 100644 include/trace/events/maple_tree.h
>  create mode 100644 lib/maple_tree.c
>  create mode 100644 lib/test_maple_tree.c
>  create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
>  create mode 100644 tools/testing/radix-tree/maple.c
>  create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h
> 
> diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
> index 5de2c7a4b1b3..b487373d8585 100644
> --- a/Documentation/core-api/index.rst
> +++ b/Documentation/core-api/index.rst
> @@ -43,6 +43,7 @@ Library functionality that is used throughout the kernel.
>     this_cpu_ops
>     timekeeping
>     errseq
> +   maple-tree
>  
>  Concurrency primitives
>  ======================
> diff --git a/Documentation/core-api/maple-tree.rst b/Documentation/core-api/maple-tree.rst
> new file mode 100644
> index 000000000000..70ec7c2801e0
> --- /dev/null
> +++ b/Documentation/core-api/maple-tree.rst

In general this IMHO this could use some restructuring, right now it feels
like a mix of various implementation tidbits, while the usage of maple tree
seems to be just sketched.

The API functions in lib/maple_tree.c also seem to not have comments in the
proper doc format that starts with '/**' ?

> @@ -0,0 +1,496 @@
> +.. SPDX-License-Identifier: GPL-2.0+
> +
> +
> +==========
> +Maple Tree
> +==========
> +
> +:Author: Liam R. Howlett
> +:Date: May 20, 2021
> +

...

> +Node Slots & Node Pivots
> +------------------------
> +
> +Leaf nodes do not store pointers to nodes, they store user data.
> +Users may store almost any bit pattern.  As noted above, the optimisation
> +of storing an entry at 0 in the root pointer cannot be done for data
> +which have the bottom two bits set to '10'.  We also reserve values
> +with the bottom two bits set to '10' which are below 4096 (ie 2, 6,
> +10 .. 4094) for internal use.  Some APIs return errnos as a negative
> +errno shifted right by two bits and the bottom two bits set to '10',
> +and while choosing to store these values in the array is not an error,
> +it may lead to confusion if you're testing for an error with mas_is_err().
> +
> +Non-leaf nodes store the type of the node pointed to (enum maple_type
> +in bits 3-6), bit 2 is reserved.  That leaves bits 0-1 unused for now.
> +
> +In regular B-Tree terms, pivots are called keys.  The term pivot is used
> +to indicate that the tree is specifying ranges,  Pivots may appear in
> +the subtree with an entry attached to the value whereas keys are unique
> +to a specific position of a B-tree.  Pivot values are inclusive of the
> +slot with the same index.
> +
> +
> +The following illustrates a partial layout of a range64 nodes slots and pivots.
> +
> +          _________________________________

This causes "make htmldocs" fail for me
Documentation/core-api/maple-tree.rst:248: (SEVERE/4) Unexpected section
title or transition.

> + Slots -> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
> +          ┬   ┬   ┬   ┬   ┬   ┬   ┬   ┬   ┬
> +          │   │   │   │   │   │   │   │   └─ Implied maximum
> +          │   │   │   │   │   │   │   └─ Pivot 6
> +          │   │   │   │   │   │   └─ Pivot 5
> +          │   │   │   │   │   └─ Pivot 4
> +          │   │   │   │   └─ Pivot 3
> +          │   │   │   └─ Pivot 2
> +          │   │   └─ Pivot 1
> +          │   └─ Pivot 0
> +          └─  Implied minimum
> +
> +Slot contents:
> + Internal (non-leaf) nodes contain pointers to other nodes.
> + Leaf nodes contain entries.
> +
> +
> +Node Parent
> +-----------
> +
> +The node->parent of the root node has bit 0 set and the rest of the
> +pointer is a pointer to the tree itself.  No more bits are available in
> +this pointer (on m68k, the data structure may only be 2-byte aligned).
> +
> +Internal non-root nodes can only have maple_range_* nodes as parents.
> +The parent pointer is 256B aligned like all other tree nodes.
> +When storing a 32 or 64 bit values, the offset can fit into 4 bits.
> +The 16 bit values need an extra bit to store the offset.  This extra bit
> +comes from a reuse of the last bit in the node type.  This is possible
> +by using bit 1 to indicate if bit 2 is part of the type or the slot.
> +
> +Once the type is decided, the decision of an allocation range type
> +or a range type is done by examining the immutable tree flag for the
> +MAPLE_ALLOC_RANGE flag.
> +
> + Node types:
> +  0x??1 = Root
> +  0x?00 = 16 bit nodes
> +  0x010 = 32 bit nodes
> +  0x110 = 64 bit nodes
> +
> + Slot size and location in the parent pointer:
> +  type  : slot location
> +  0x??1 : Root
> +  0x?00 : 16 bit values, type in 0-1, slot in 2-6
> +  0x010 : 32 bit values, type in 0-2, slot in 3-6
> +  0x110 : 64 bit values, type in 0-2, slot in 3-6
> +
> +Node Metadata
> +-------------
> +
> +The node->meta is currently only supported in allocation range 64
> +(arange_64) node type.  As a result of tracking gaps, there is a small
> +area that is not used for data storage in this node type.  This area is
> +reused to store metadata related to the node itself including the data
> +end and the largest gap location.  This metadata is used to optimize
> +the gap updating code and in reverse searching for gaps or any other
> +code that needs to find the end of the data.
> +
> +Auxiliary Data
> +--------------
> +
> +At tree creation time, the user can specify that they're willing to
> +trade off storing fewer entries in a tree in return for storing more
> +information in each node.
> +
> +The maple tree supports recording the largest range of NULL entries
> +available in this node, also called gaps.  This optimises the tree for
> +allocating a range.
> +
> +
> +Maple State
> +-----------
> +
> +The maple state is defined in the struct ma_state and is used to keep
> +track of information during operations, and even between operations when
> +using the advanced API.
> +
> +If state->node has bit 0 set then it references a tree location which
> +is not a node (eg the root).  If bit 1 is set, the rest of the bits
> +are a negative errno.  Bit 2 (the 'unallocated slots' bit) is clear.
> +Bits 3-6 indicate the node type.
> +
> +state->alloc either has a request number of nodes or an allocated node.
> +If stat->alloc has a requested number of nodes, the first bit will be
> +set (0x1) and the remaining bits are the value.  If state->alloc is
> +a node, then the node will be of type maple_alloc.  maple_alloc has
> +MAPLE_NODE_SLOTS - 1 for storing more allocated nodes, a total, and
> +the node_count in this node.  total is the number of nodes allocated.
> +node_count is the number of allocated nodes in this node.  The scaling
> +beyond MAPLE_NODE_SLOTS - 1 is handled by storing further nodes into
> +state->alloc->slot[0]'s node.  Nodes are taken from state->alloc by
> +removing a node from the state->alloc node until state->alloc->node_count
> +is 1, when state->alloc is returned and the state->alloc->slot[0] is
> +promoted to state->alloc.  Nodes are pushed onto state->alloc by putting
> +the current state->alloc into the pushed node's slot[0].
> +
> +The state also contains the implied min/max of the state->node, the depth
> +of this search, and the offset. The implied min/max are either from the
> +parent node or are 0-oo for the root node.  The depth is incremented or
> +decremented every time a node is walked down or up.  The offset is the
> +slot/pivot of interest in the node - either for reading or writing.
> +
> +When returning a value the maple state index and last respectively contain
> +the start and end of the range for the entry.  Ranges are inclusive in
> +the Maple Tree.
> +
> +Tree Operations
> +===============
> +
> +Inserting
> +---------
> +
> +Inserting a new range inserts either 0, 1, or 2 pivots within the tree.
> +If the insert fits exactly into an existing gap with a value of NULL,
> +then the slot only needs to be written with the new value.  If the range
> +being inserted is adjacent to another range, then only a single pivot
> +needs to be inserted (as well as writing the entry).  If the new range
> +is within a gap but does not touch any other ranges, then two pivots
> +need to be inserted: the start - 1, and the end.  As usual, the entry
> +must be written.  Most operations require a new node to be allocated
> +and replace an existing node to ensure RCU safety, when in RCU mode.
> +The exception to requiring a newly allocated node is when inserting at
> +the end of a node (appending).  When done carefully, appending can reuse
> +the node in place.
> +
> +Storing
> +-------
> +
> +Storing is the same operation as insert with the added caveat that it
> +can overwrite entries.  Although this seems simple enough, one may want
> +to examine what happens if a single store operation was to overwrite
> +multiple entries within a self-balancing B-Tree.
> +
> +Erasing
> +-------
> +
> +Erasing is the same as a walk to an entry then a store of a NULL to
> +that entry.  In fact, it is implemented as such using the advanced API.
> +
> +Splitting
> +---------
> +
> +Splitting is handled differently from any other B-tree; the Maple Tree
> +splits up.  Splitting up means that the split operation occurs when the

"splits up" is confusing - "splits upwards" perhaps?

> +walk of the tree hits the leaves and not on the way down.  The reason
> +for splitting up is that it is impossible to know how much space will be
> +needed until the leaf (or leaves) are reached.  Since overwriting data
> +is allowed and a range could overwrite more than one range or result in
> +changing one entry into 3 entries, it is impossible to know if a split
> +is required until the data is examined.
> +
> +Splitting is a balancing act between keeping allocations to a minimum
> +and avoiding a 'jitter' event where a tree is expanded to make room
> +for an entry followed by a contraction when the entry is removed.
> +To accomplish the balance, there are empty slots remaining in both left
> +and right nodes after a split.
> +
> +Another way that 'jitter' is avoided is to terminate a split up early
> +if the left or right node has space to spare.  This is referred to as
> +"pushing left" or "pushing right" and is similar to the B* tree, except
> +the nodes left or right can rarely be reused due to RCU, but the ripple
> +upwards is halted which is a significant savings.
> +
> +To support gap tracking, all NULL entries are kept together and a node
> +cannot end on a NULL entry, with the exception of the left-most leaf.
> +The limitation means that the split of a node must be checked for this
> +condition and be able to put more data in one direction or the other.
> +
> +3-way Split
> +-----------
> +
> +Although extremely rare, it is possible to enter what is known as the
> +3-way split scenario.  The 3-way split comes about by means of a store
> +of a range that overwrites the end and beginning of two full nodes.
> +The result is a set of entries that cannot be stored in 2 nodes.
> +Sometimes, these two nodes can also be located in different parent nodes
> +which are also full.  This can carry upwards all the way to the root in
> +the worst case.
> +
> +Spanning Store
> +--------------
> +
> +A store operation that spans multiple nodes is called a spanning store and
> +is handled early in the store call stack by the function mas_is_span_wr().
> +When a spanning store is identified, the maple state is duplicated.
> +The first maple state walks the left tree path to ``index``, the duplicate
> +walks the right tree path to ``last``.  The data in the two nodes are
> +combined into a single node, two nodes, or possibly three nodes (see the
> +3-way split above).  A ``NULL`` written to the last entry of a node is
> +considered a spanning store as a rebalance is required for the operation
> +to complete and an overflow of data may happen.
> +
> +The tree needs to be rebalanced and leaves need to be kept at the same
> +level.  Rebalancing is done by use of the ``struct maple_topiary``.
> +The maple_topiary struct keeps track of which nodes to free and which
> +to destroy (free the subtree).  See mas_spanning_rebalance().
> +
> +Each level of the tree is examined and balanced in
> +mas_spanning_rebalance().  Again, pushing data to the left or right,
> +or rebalancing against left or right nodes is employed to avoid rippling
> +up the tree to limit the amount of churn.  Once a new sub-section of the
> +tree is created, there may be a mix of new and old nodes.  The old nodes
> +will have the incorrect parent pointers and currently be in two trees: the
> +original tree and the partially new tree.  To remedy the parent pointers
> +in the old tree, the new data is swapped into the active tree and a walk
> +down the tree is performed and the parent pointers are updated.  At each
> +level there may be up to 3 correct parent pointers which indicates the
> +new nodes which need to be walked to find any new nodes at a lower level.
> +See mas_descend_adopt().
> +
> +Rebalance
> +---------
> +
> +Rebalancing occurs if a node is insufficient.  Data is rebalanced against
> +the node to the right if it exists, otherwise the node to the left of
> +this node is rebalanced against this node.  If rebalancing causes just
> +one node to be produced instead of two, then the parent is also examined
> +and rebalanced if it is insufficient.  Every level tries to combine the
> +data in the same way.  If one node contains the entire range of the tree,
> +then that node is used as a new root node.
> +
> +Bulk Loading
> +------------
> +
> +Sometimes it is necessary to duplicate a tree to a new tree, such as
> +forking a process and duplicating the VMAs from one tree to a new tree.
> +When such a situation arises, it is known that the new tree is not
> +going to be used until the entire tree is populated.  For performance
> +reasons, it is best to use a bulk load with RCU disabled.  This allows
> +for optimistic splitting that favours the left and reuse of nodes during
> +the operation.  Upon completion, the mas_destroy() operation on the maple
> +state will check the left-most node and rebalance against the node to
> +the right if necessary.  mas_destroy() will also free any unused nodes.
> +
> +
> +The Maple State
> +===============
> +
> +The ma_state struct keeps track of tree operations to make life easier
> +for both internal and external tree users.
> +
> +Functions and structures
> +========================
> +
> +.. kernel-doc:: include/linux/maple_tree.h
> +.. kernel-doc:: lib/maple_tree.c
> +
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5250298d2817..e61b3e56e4b5 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11305,6 +11305,18 @@ L:	linux-man@vger.kernel.org
>  S:	Maintained
>  W:	http://www.kernel.org/doc/man-pages
>  
> +MAPLE TREE
> +M:	Liam R. Howlett <Liam.Howlett@oracle.com>
> +L:	linux-mm@kvack.org
> +S:	Supported
> +F:	Documentation/core-api/maple_tree.rst
> +F:	include/linux/maple_tree.h
> +F:	include/trace/events/maple_tree.h
> +F:	lib/maple_tree.c
> +F:	lib/test_maple_tree.c
> +F:	tools/testing/radix-tree/linux/maple_tree.h
> +F:	tools/testing/radix-tree/maple.c
> +
>  MARDUK (CREATOR CI40) DEVICE TREE SUPPORT
>  M:	Rahul Bedarkar <rahulbedarkar89@gmail.com>
>  L:	linux-mips@vger.kernel.org
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> new file mode 100644
> index 000000000000..a03c7850080a
> --- /dev/null
> +++ b/include/linux/maple_tree.h
> @@ -0,0 +1,559 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +#ifndef _LINUX_MAPLE_TREE_H
> +#define _LINUX_MAPLE_TREE_H
> +/*
> + * Maple Tree - An RCU-safe adaptive tree for storing ranges
> + * Copyright (c) 2018 Oracle
> + * Authors:     Liam R. Howlett <Liam.Howlett@Oracle.com>
> + *              Matthew Wilcox <willy@infradead.org>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/rcupdate.h>
> +#include <linux/spinlock.h>
> +/* #define CONFIG_MAPLE_RCU_DISABLED */
> +/* #define CONFIG_DEBUG_MAPLE_TREE_VERBOSE */

Make this real config or remove?

> +
> +/*
> + * Allocated nodes are mutable until they have been inserted into the tree,
> + * at which time they cannot change their type until they have been removed
> + * from the tree and an RCU grace period has passed.
> + *
> + * Removed nodes have their ->parent set to point to themselves.  RCU readers
> + * check ->parent before relying on the value that they loaded from the
> + * slots array.  This lets us reuse the slots array for the RCU head.
> + *
> + * Nodes in the tree point to their parent unless bit 0 is set.
> + */
> +#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64)
> +/* 64bit sizes */
> +#define MAPLE_NODE_SLOTS	31	/* 256 bytes including ->parent */
> +#define MAPLE_RANGE64_SLOTS	16	/* 256 bytes */
> +#define MAPLE_ARANGE64_SLOTS	10	/* 240 bytes */
> +#define MAPLE_ARANGE64_META_MAX	15	/* Out of range for metadata */
> +#define MAPLE_ALLOC_SLOTS	(MAPLE_NODE_SLOTS - 1)
> +#else
> +/* 32bit sizes */
> +#define MAPLE_NODE_SLOTS	63	/* 256 bytes including ->parent */
> +#define MAPLE_RANGE64_SLOTS	32	/* 256 bytes */
> +#define MAPLE_ARANGE64_SLOTS	21	/* 240 bytes */
> +#define MAPLE_ARANGE64_META_MAX	22	/* Out of range for metadata */
> +#define MAPLE_ALLOC_SLOTS	(MAPLE_NODE_SLOTS - 2)
> +#endif /* defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64) */
> +
> +#define MAPLE_NODE_MASK		255UL
> +
> +typedef struct maple_enode *maple_enode; /* encoded node */
> +typedef struct maple_pnode *maple_pnode; /* parent node */
> +
> +
> +/**
> + * DOC: maple_tree node explained
> + *
> + * Each node type has a number of slots for entries and a number of slots for
> + * pivots.  In the case of dense nodes, the pivots are implied by the position
> + * and are simply the slot index + the minimum of the node.
> + *
> + * In regular B-Tree terms, pivots are called keys.  The term pivot is used to
> + * indicate that the tree is specifying ranges,  Pivots may appear in the
> + * subtree with an entry attached to the value where as keys are unique to a
> + * specific position of a B-tree.  Pivot values are inclusive of the slot with
> + * the same index.
> + *
> + *
> + * The following illustrates the layout of a range64 nodes slots and pivots.
> + *
> + *           _________________________________
> + *  Slots -> | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 |
> + *           ┬   ┬   ┬   ┬   ┬   ┬   ┬   ┬   ┬
> + *           │   │   │   │   │   │   │   │   └─ Implied maximum
> + *           │   │   │   │   │   │   │   └─ Pivot 6
> + *           │   │   │   │   │   │   └─ Pivot 5
> + *           │   │   │   │   │   └─ Pivot 4
> + *           │   │   │   │   └─ Pivot 3
> + *           │   │   │   └─ Pivot 2
> + *           │   │   └─ Pivot 1
> + *           │   └─ Pivot 0
> + *           └─  Implied minimum
> + *
> + * Slot contents:
> + *  Internal (non-leaf) nodes contain pointers to other nodes.
> + *  Leaf nodes contain entries.
> + *
> + *
> + */
> +struct maple_metadata {
> +	unsigned char end;
> +	unsigned char gap;
> +
> +};
> +
> +struct maple_range_64 {
> +	struct maple_pnode *parent;
> +		unsigned long pivot[MAPLE_RANGE64_SLOTS - 1];

Inconsistent ident.

> +	union {
> +		void __rcu *slot[MAPLE_RANGE64_SLOTS];
> +		struct {
> +			void __rcu *pad[MAPLE_RANGE64_SLOTS - 1];
> +			struct maple_metadata meta;
> +		};
> +	};
> +};
> +
> +struct maple_arange_64 {
> +	struct maple_pnode *parent;
> +	unsigned long pivot[MAPLE_ARANGE64_SLOTS - 1];
> +	void __rcu *slot[MAPLE_ARANGE64_SLOTS];
> +	unsigned long gap[MAPLE_ARANGE64_SLOTS];
> +	struct maple_metadata meta;
> +};
> +
> +struct maple_alloc {
> +	unsigned long total;
> +	unsigned char node_count;
> +	unsigned int request_count;
> +	struct maple_alloc *slot[MAPLE_ALLOC_SLOTS];
> +};
> +
> +struct maple_topiary {
> +	struct maple_pnode *parent;
> +	struct maple_enode *next; /* Overlaps the pivot */
> +};
> +
> +enum maple_type {
> +	maple_dense,
> +	maple_leaf_64,
> +	maple_range_64,
> +	maple_arange_64,
> +};
> +
> +
> +/**
> + * DOC: Maple tree flags
> + *
> + * * MT_FLAGS_ALLOC_RANGE	- Track gaps in this tree
> + * * MT_FLAGS_USE_RCU		- Operate in RCU mode

Inconsistent ident.

> + * * MT_FLAGS_HEIGHT_OFFSET	- The position of the tree height in the flags
> + * * MT_FLAGS_HEIGHT_MASK	- The mask for the maple tree height value
> + * * MT_FLAGS_LOCK_MASK		- How the mt_lock is used
> + * * MT_FLAGS_LOCK_IRQ		- Acquired irq-safe
> + * * MT_FLAGS_LOCK_BH		- Acquired bh-safe
> + * * MT_FLAGS_LOCK_EXTERN	- mt_lock is not used
> + *

...

> +
> +void mas_dup_tree(struct ma_state *oldmas, struct ma_state *mas);
> +void mas_dup_store(struct ma_state *mas, void *entry);
> +/* This finds an empty area from the highest address to the lowest.
> + * AKA "Topdown" version,
> + */

Bad comment form, also ends with a comma.

...

> +++ b/lib/maple_tree.c
> @@ -0,0 +1,6771 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Maple Tree implementation
> + * Copyright (c) 2018 Oracle Corporation
> + * Authors: Liam R. Howlett <Liam.Howlett@oracle.com>
> + *	    Matthew Wilcox <willy@infradead.org>
> + */
> +
> +#include <linux/maple_tree.h>
> +#include <linux/xarray.h>
> +#include <linux/types.h>
> +#include <linux/export.h>
> +#include <linux/slab.h>
> +#include <linux/limits.h>
> +#include <asm/barrier.h>
> +
> +#define CREATE_TRACE_POINTS
> +#include <trace/events/maple_tree.h>
> +
> +#define MA_ROOT_PARENT 1
> +
> +/* Maple state flags */
> +#define MA_STATE_BULK		1
> +#define MA_STATE_REBALANCE	2
> +
> +#define ma_parent_ptr(x) ((struct maple_pnode *)(x))
> +#define ma_mnode_ptr(x) ((struct maple_node *)(x))
> +#define ma_enode_ptr(x) ((struct maple_enode *)(x))
> +static struct kmem_cache *maple_node_cache;
> +
> +static const unsigned long mt_max[] = {
> +	[maple_dense]		= MAPLE_NODE_SLOTS,
> +	[maple_leaf_64]		= ULONG_MAX,
> +	[maple_range_64]	= ULONG_MAX,
> +	[maple_arange_64]	= ULONG_MAX,
> +};
> +#define mt_node_max(x) mt_max[mte_node_type(x)]
> +
> +static const unsigned char mt_slots[] = {
> +	[maple_dense]		= MAPLE_NODE_SLOTS,
> +	[maple_leaf_64]		= MAPLE_RANGE64_SLOTS,
> +	[maple_range_64]	= MAPLE_RANGE64_SLOTS,
> +	[maple_arange_64]	= MAPLE_ARANGE64_SLOTS,
> +};
> +#define mt_slot_count(x) mt_slots[mte_node_type(x)]
> +
> +static const unsigned char mt_pivots[] = {
> +	[maple_dense]		= 0,
> +	[maple_leaf_64]		= MAPLE_RANGE64_SLOTS - 1,
> +	[maple_range_64]	= MAPLE_RANGE64_SLOTS - 1,
> +	[maple_arange_64]	= MAPLE_ARANGE64_SLOTS - 1,
> +};
> +#define mt_pivot_count(x) mt_pivots[mte_node_type(x)]
> +
> +static const unsigned char mt_min_slots[] = {
> +	[maple_dense]		= MAPLE_NODE_SLOTS / 2,
> +	[maple_leaf_64]		= (MAPLE_RANGE64_SLOTS / 2) - 2,
> +	[maple_range_64]	= (MAPLE_RANGE64_SLOTS / 2) - 2,
> +	[maple_arange_64]	= (MAPLE_ARANGE64_SLOTS / 2) - 1,
> +};
> +#define mt_min_slot_count(x) mt_min_slots[mte_node_type(x)]
> +
> +#define MAPLE_BIG_NODE_SLOTS	(MAPLE_RANGE64_SLOTS * 2 + 2)
> +
> +struct maple_big_node {
> +	struct maple_pnode *parent;
> +	struct maple_enode *slot[MAPLE_BIG_NODE_SLOTS];
> +	unsigned long pivot[MAPLE_BIG_NODE_SLOTS - 1];
> +	unsigned long gap[MAPLE_BIG_NODE_SLOTS];
> +	unsigned long min;
> +	unsigned char b_end;
> +	enum maple_type type;
> +};
> +
> +struct maple_subtree_state {
> +	struct ma_state *orig_l;	/* Original left side of subtree */
> +	struct ma_state *orig_r;	/* Original right side of subtree */
> +	struct ma_state *l;		/* New left side of subtree */
> +	struct ma_state *m;		/* New middle of subtree (rare) */
> +	struct ma_state *r;		/* New right side of subtree */
> +	struct ma_topiary *free;	/* nodes to be freed */
> +	struct ma_topiary *destroy;	/* Nodes to be destroyed (walked and freed) */
> +	struct maple_big_node *bn;
> +};
> +
> +/* Functions */
> +static inline struct maple_node *mt_alloc_one(gfp_t gfp)
> +{
> +	return kmem_cache_alloc(maple_node_cache, gfp | __GFP_ZERO);
> +}
> +
> +static inline int mt_alloc_bulk(gfp_t gfp, size_t size, void **nodes)
> +{
> +	return kmem_cache_alloc_bulk(maple_node_cache, gfp | __GFP_ZERO, size,
> +				     nodes);
> +}
> +
> +static inline void mt_free_bulk(size_t size, void __rcu **nodes)
> +{
> +	kmem_cache_free_bulk(maple_node_cache, size, (void **)nodes);
> +}
> +
> +static void mt_free_rcu(struct rcu_head *head)
> +{
> +	struct maple_node *node = container_of(head, struct maple_node, rcu);
> +
> +	kmem_cache_free(maple_node_cache, node);
> +}
> +
> +/*
> + * ma_free_rcu() - Use rcu callback to free a maple node
> + * @node: The node to free
> + *
> + * The maple tree uses the parent pointer to indicate this node is no longer in
> + * use and will be freed.
> + */
> +static void ma_free_rcu(struct maple_node *node)
> +{
> +	node->parent = ma_parent_ptr(node);
> +	call_rcu(&node->rcu, mt_free_rcu);
> +}
> +
> +static unsigned int mt_height(const struct maple_tree *mt)
> +{
> +	return (mt->ma_flags & MT_FLAGS_HEIGHT_MASK) >> MT_FLAGS_HEIGHT_OFFSET;
> +}
> +
> +static void mas_set_height(struct ma_state *mas)
> +{
> +	unsigned int new_flags = mas->tree->ma_flags;
> +
> +	new_flags &= ~MT_FLAGS_HEIGHT_MASK;
> +	BUG_ON(mas->depth > MAPLE_HEIGHT_MAX);
> +	new_flags |= mas->depth << MT_FLAGS_HEIGHT_OFFSET;
> +	mas->tree->ma_flags = new_flags;
> +}
> +
> +static unsigned int mas_mt_height(struct ma_state *mas)
> +{
> +	return mt_height(mas->tree);
> +}
> +
> +static inline enum maple_type mte_node_type(const struct maple_enode *entry)
> +{
> +	return ((unsigned long)entry >> MAPLE_NODE_TYPE_SHIFT) &
> +		MAPLE_NODE_TYPE_MASK;
> +}
> +
> +static inline bool ma_is_dense(const enum maple_type type)
> +{
> +	return type < maple_leaf_64;
> +}
> +
> +static inline bool ma_is_leaf(const enum maple_type type)
> +{
> +	return type < maple_range_64;
> +}
> +
> +static inline bool mte_is_leaf(const struct maple_enode *entry)
> +{
> +	return ma_is_leaf(mte_node_type(entry));
> +}
> +
> +/*
> + * We also reserve values with the bottom two bits set to '10' which are
> + * below 4096
> + */
> +static inline bool mt_is_reserved(const void *entry)
> +{
> +	return ((unsigned long)entry < MAPLE_RESERVED_RANGE) &&
> +		xa_is_internal(entry);

It's weird to suddenly see xa_ prefix here (and below). AFAICS it's nowhere
declared that maple tree is derived from xarray so while it's not completely
surprising given the overlap of authors, wouldn't it be more robust if maple
tree had its own independent set of these helpers?


  reply	other threads:[~2021-12-07 15:34 UTC|newest]

Thread overview: 181+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-01 14:29 [PATCH v4 00/66] Introducing the Maple Tree Liam Howlett
2021-12-01 14:29 ` [PATCH v4 01/66] radix tree test suite: Add pr_err define Liam Howlett
2021-12-01 14:29 ` [PATCH v4 02/66] radix tree test suite: Add kmem_cache_set_non_kernel() Liam Howlett
2021-12-01 14:29 ` [PATCH v4 04/66] radix tree test suite: Add support for slab bulk APIs Liam Howlett
2021-12-01 14:29 ` [PATCH v4 03/66] radix tree test suite: Add allocation counts and size to kmem_cache Liam Howlett
2021-12-01 14:29 ` [PATCH v4 06/66] mm: Start tracking VMAs with maple tree Liam Howlett
2021-12-07 18:01   ` Vlastimil Babka
2021-12-08 18:11     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 05/66] Maple Tree: Add new data structure Liam Howlett
2021-12-07 15:34   ` Vlastimil Babka [this message]
2021-12-08 15:47     ` Matthew Wilcox
2021-12-08 17:20     ` Liam Howlett
2021-12-15 12:54   ` Vlastimil Babka
2021-12-15 17:52     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 07/66] mm: Add VMA iterator Liam Howlett
2021-12-09 15:26   ` Vlastimil Babka
2021-12-10  2:02     ` Liam Howlett
2021-12-10 15:08       ` Vlastimil Babka
2021-12-10 18:24         ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 09/66] mm/mmap: Use the maple tree in find_vma() instead of the rbtree Liam Howlett
2021-12-15 13:05   ` Vlastimil Babka
2021-12-15 18:09     ` Liam Howlett
2022-01-13 15:46       ` Vlastimil Babka
2021-12-01 14:29 ` [PATCH v4 08/66] mmap: Use the VMA iterator in count_vma_pages_range() Liam Howlett
2021-12-09 15:54   ` Vlastimil Babka
2021-12-10  1:35     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 11/66] mm/mmap: Use maple tree for unmapped_area{_topdown} Liam Howlett
2021-12-15 16:43   ` Vlastimil Babka
2021-12-15 18:28     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 10/66] mm/mmap: Use the maple tree for find_vma_prev() instead of the rbtree Liam Howlett
2021-12-15 14:33   ` Vlastimil Babka
2021-12-15 16:40   ` Vlastimil Babka
2021-12-15 18:19     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 12/66] kernel/fork: Use maple tree for dup_mmap() during forking Liam Howlett
2021-12-16 11:09   ` Vlastimil Babka
2022-01-03 16:45     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 14/66] proc: Remove VMA rbtree use from nommu Liam Howlett
2021-12-16 11:25   ` Vlastimil Babka
2021-12-01 14:29 ` [PATCH v4 13/66] damon: Convert __damon_va_three_regions to use the VMA iterator Liam Howlett
2021-12-01 14:29 ` [PATCH v4 15/66] mm: Convert vma_lookup() to use the Maple Tree Liam Howlett
2021-12-17 11:59   ` Vlastimil Babka
2022-01-03 17:07     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 16/66] mm: Remove rb tree Liam Howlett
2022-01-12 12:02   ` Vlastimil Babka
2022-01-17  1:12     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 19/66] mm: Optimize find_exact_vma() to use vma_lookup() Liam Howlett
2022-01-12 16:31   ` Vlastimil Babka
2021-12-01 14:29 ` [PATCH v4 17/66] mmap: Change zeroing of maple tree in __vma_adjust Liam Howlett
2022-01-12 14:55   ` Vlastimil Babka
2022-01-17 20:02     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 18/66] xen: Use vma_lookup() in privcmd_ioctl_mmap() Liam Howlett
2022-01-12 16:01   ` Vlastimil Babka
2022-01-18  0:01     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 20/66] mm/khugepaged: Optimize collapse_pte_mapped_thp() by using vma_lookup() Liam Howlett
2022-01-12 16:42   ` Vlastimil Babka
2021-12-01 14:29 ` [PATCH v4 21/66] mm/mmap: Change do_brk_flags() to expand existing VMA and add do_brk_munmap() Liam Howlett
2022-01-13 12:59   ` Vlastimil Babka
2022-01-19  3:03     ` Liam Howlett
2022-01-21 12:41       ` Vlastimil Babka
2022-01-13 15:28   ` Vlastimil Babka
2022-01-19 15:51     ` Liam Howlett
2021-12-01 14:29 ` [PATCH v4 22/66] mm: Use maple tree operations for find_vma_intersection() and find_vma() Liam Howlett
2022-01-13 15:53   ` Vlastimil Babka
2022-01-19 16:56     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 23/66] mm/mmap: Use advanced maple tree API for mmap_region() Liam Howlett
2022-01-17 16:38   ` Vlastimil Babka
2022-01-21 18:11     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 24/66] mm: Remove vmacache Liam Howlett
2022-01-17 17:01   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 25/66] mm/mmap: Move mmap_region() below do_munmap() Liam Howlett
2021-12-01 14:30 ` [PATCH v4 27/66] mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap() Liam Howlett
2022-01-18 11:57   ` Vlastimil Babka
2022-01-22  1:53     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 26/66] mm/mmap: Reorganize munmap to use maple states Liam Howlett
2022-01-18 10:39   ` Vlastimil Babka
2022-01-21 19:31     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 29/66] parisc: Remove mmap linked list from cache handling Liam Howlett
2022-01-18 12:06   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 28/66] arm64: Remove mmap linked list from vdso Liam Howlett
2022-01-18 12:03   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 31/66] s390: Remove vma linked list walks Liam Howlett
2022-01-18 12:12   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 30/66] powerpc: Remove mmap " Liam Howlett
2022-01-18 12:10   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 32/66] x86: Remove vma " Liam Howlett
2022-01-18 12:12   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 33/66] xtensa: " Liam Howlett
2022-01-18 12:23   ` Vlastimil Babka
2022-01-25 16:17     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 34/66] cxl: Remove vma linked list walk Liam Howlett
2022-01-18 12:37   ` Vlastimil Babka
2022-01-25 16:32     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 36/66] um: " Liam Howlett
2022-01-18 18:41   ` Vlastimil Babka
2022-01-25 16:38     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 37/66] binfmt_elf: " Liam Howlett
2022-01-19  9:57   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 35/66] optee: " Liam Howlett
2022-01-18 18:15   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 38/66] coredump: " Liam Howlett
2022-01-19 10:31   ` Vlastimil Babka
2022-01-25 17:00     ` Matthew Wilcox
2021-12-01 14:30 ` [PATCH v4 39/66] binfmt_elf: Take the mmap lock when walking the VMA list Liam Howlett
2022-01-19 10:53   ` Vlastimil Babka
2022-01-31 17:41     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 42/66] fs/proc/task_mmu: Stop using linked list and highest_vm_end Liam Howlett
2022-01-21 11:52   ` Vlastimil Babka
2022-01-27 20:14     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 41/66] fs/proc/base: Use maple tree iterators in place of linked list Liam Howlett
2022-01-19 11:10   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 40/66] exec: Use VMA iterator instead " Liam Howlett
2022-01-19 11:06   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 43/66] userfaultfd: Use maple tree iterator to iterate VMAs Liam Howlett
2022-01-19 16:26   ` Vlastimil Babka
2022-01-25 20:47     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 44/66] ipc/shm: Use VMA iterator instead of linked list Liam Howlett
2022-01-21 12:25   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 46/66] perf: Use VMA iterator Liam Howlett
2022-01-19 16:47   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 45/66] acct: Use VMA iterator instead of linked list Liam Howlett
2022-01-19 16:44   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 47/66] sched: Use maple tree iterator to walk VMAs Liam Howlett
2022-01-19 16:49   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 48/66] fork: Use VMA iterator Liam Howlett
2022-01-19 16:51   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 49/66] bpf: Remove VMA linked list Liam Howlett
2022-01-19 17:04   ` Vlastimil Babka
2022-01-25 21:37     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 50/66] mm/gup: Use maple tree navigation instead of " Liam Howlett
2022-01-19 17:39   ` Vlastimil Babka
2022-01-25 21:54     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 52/66] mm/ksm: Use maple tree iterators instead of vma " Liam Howlett
2022-01-19 17:58   ` Vlastimil Babka
2022-01-26  2:29     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 51/66] mm/khugepaged: " Liam Howlett
2022-01-19 17:48   ` Vlastimil Babka
2022-01-25 22:03     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 55/66] mm/mempolicy: " Liam Howlett
2022-01-20 11:58   ` Vlastimil Babka
2022-01-26  2:48     ` Liam Howlett
2022-01-26  9:22       ` Vlastimil Babka
2022-01-27 17:25         ` Liam Howlett
2022-01-27 17:33           ` Vlastimil Babka
2022-01-27 23:03             ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 54/66] mm/memcontrol: Stop using mm->highest_vm_end Liam Howlett
2022-01-20 11:21   ` Vlastimil Babka
2022-01-26  2:34     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 53/66] mm/madvise: Use vma_find() instead of vma linked list Liam Howlett
2022-01-19 18:00   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 56/66] mm/mlock: Use maple tree iterators " Liam Howlett
2022-01-20 12:16   ` Vlastimil Babka
2022-01-26 16:41     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 57/66] mm/mprotect: Use maple tree navigation " Liam Howlett
2022-01-20 12:23   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 59/66] mm/msync: Use vma_find() " Liam Howlett
2022-01-20 12:42   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 58/66] mm/mremap: " Liam Howlett
2022-01-20 12:27   ` Vlastimil Babka
2022-01-26 16:59     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 60/66] mm/oom_kill: Use maple tree iterators " Liam Howlett
2022-01-20 12:43   ` Vlastimil Babka
2022-01-26 17:02     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 61/66] mm/pagewalk: Use vma_find() " Liam Howlett
2022-01-20 12:43   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 62/66] mm/swapfile: Use maple tree iterator " Liam Howlett
2022-01-20 12:46   ` Vlastimil Babka
2022-01-26 17:08     ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 63/66] i915: Use the VMA iterator Liam Howlett
2022-01-20 14:59   ` Vlastimil Babka
2022-01-20 15:50     ` Matthew Wilcox
2022-01-20 17:39       ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 64/66] nommu: Remove uses of VMA linked list Liam Howlett
2022-01-20 15:06   ` Vlastimil Babka
2022-01-20 15:54     ` Matthew Wilcox
2022-01-20 17:06       ` Vlastimil Babka
2022-01-27 16:36         ` Liam Howlett
2021-12-01 14:30 ` [PATCH v4 66/66] mm/mmap: Drop range_has_overlap() function Liam Howlett
2022-01-21  9:51   ` Vlastimil Babka
2021-12-01 14:30 ` [PATCH v4 65/66] mm: Remove the vma linked list Liam Howlett
2022-01-20 17:41   ` Vlastimil Babka
2022-01-26 20:29     ` Liam Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5ead526d-8499-4810-7657-6af5f2e96ccc@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=dave@stgolabs.net \
    --cc=jglisse@redhat.com \
    --cc=joelaf@google.com \
    --cc=ldufour@linux.ibm.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=minchan@google.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=riel@surriel.com \
    --cc=rientjes@google.com \
    --cc=romlem@google.com \
    --cc=songliubraving@fb.com \
    --cc=surenb@google.com \
    --cc=walken.cr@gmail.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.