All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mike Rapoport <rppt@kernel.org>
To: Liam Howlett <liam.howlett@oracle.com>
Cc: "maple-tree@lists.infradead.org" <maple-tree@lists.infradead.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH v5 07/70] Maple Tree: Add new data structure
Date: Wed, 2 Feb 2022 19:11:03 +0200	[thread overview]
Message-ID: <Yfq7J9LU58FqNFVW@kernel.org> (raw)
In-Reply-To: <20220202024137.2516438-8-Liam.Howlett@oracle.com>

Hi Liam,

On Wed, Feb 02, 2022 at 02:41:56AM +0000, Liam Howlett wrote:
> From: "Liam R. Howlett" <Liam.Howlett@Oracle.com>
> 
> The maple tree is an RCU-safe range based B-tree designed to use modern
> processor cache efficiently.  There are a number of places in the kernel
> that a non-overlapping range-based tree would be beneficial, especially
> one with a simple interface.  The first user that is covered in this
> patch set is the vm_area_struct, where three data structures are
> replaced by the maple tree: the augmented rbtree, the vma cache, and the
> linked list of VMAs in the mm_struct.  The long term goal is to reduce
> or remove the mmap_sem contention.
> 
> The tree has a branching factor of 10 for non-leaf nodes and 16 for leaf
> nodes.  With the increased branching factor, it is significantly shorter than
> the rbtree so it has fewer cache misses.  The removal of the linked list
> between subsequent entries also reduces the cache misses and the need to pull
> in the previous and next VMA during many tree alterations.
> 
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> Signed-off-by: Liam R. Howlett <Liam.Howlett@oracle.com>
> ---
>  Documentation/core-api/index.rst              |    1 +
>  Documentation/core-api/maple_tree.rst         |  196 +
>  MAINTAINERS                                   |   12 +
>  include/linux/maple_tree.h                    |  673 ++
>  include/trace/events/maple_tree.h             |  123 +
>  lib/Kconfig.debug                             |    9 +
>  lib/Makefile                                  |    3 +-
>  lib/maple_tree.c                              | 6943 +++++++++++++++++
>  tools/testing/radix-tree/.gitignore           |    2 +
>  tools/testing/radix-tree/Makefile             |   13 +-
>  tools/testing/radix-tree/generated/autoconf.h |    1 +
>  tools/testing/radix-tree/linux/maple_tree.h   |    7 +
>  tools/testing/radix-tree/maple.c              |   59 +
>  .../radix-tree/trace/events/maple_tree.h      |    3 +
>  14 files changed, 8042 insertions(+), 3 deletions(-)
>  create mode 100644 Documentation/core-api/maple_tree.rst
>  create mode 100644 include/linux/maple_tree.h
>  create mode 100644 include/trace/events/maple_tree.h
>  create mode 100644 lib/maple_tree.c
>  create mode 100644 tools/testing/radix-tree/linux/maple_tree.h
>  create mode 100644 tools/testing/radix-tree/maple.c
>  create mode 100644 tools/testing/radix-tree/trace/events/maple_tree.h
> 
> diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst
> index 5de2c7a4b1b3..c669e0abcfd6 100644
> --- a/Documentation/core-api/index.rst
> +++ b/Documentation/core-api/index.rst
> @@ -34,6 +34,7 @@ Library functionality that is used throughout the kernel.
>     kref
>     assoc_array
>     xarray
> +   maple_tree
>     idr
>     circular-buffers
>     rbtree
> diff --git a/Documentation/core-api/maple_tree.rst b/Documentation/core-api/maple_tree.rst
> new file mode 100644
> index 000000000000..06c54230d985
> --- /dev/null
> +++ b/Documentation/core-api/maple_tree.rst
> @@ -0,0 +1,196 @@
> +.. SPDX-License-Identifier: GPL-2.0+
> +
> +
> +==========
> +Maple Tree
> +==========
> +
> +:Author: Liam R. Howlett
> +
> +Overview
> +========
> +
> +The Maple Tree is a B-Tree data type which is optimized for storing
> +non-overlapping ranges, including ranges of size 1.  The tree was designed to
> +be simple to use and does not require a user written search method.  It
> +supports iterating over a range of entries and going to the previous or next in

                                                                         entry ^

> +a cache-efficient manner.  The tree can also be put into an RCU-safe mode of
> +operation which allows reading and writing concurrently.  Writers must
> +synchronize on a lock, which can be the default spinlock, or the user can set
> +the lock to an external lock of a different type.
> +
> +The Maple Tree maintains a small memory footprint and was designed to use
> +modern processor cache efficiently.  The most important user of the Maple Tree
> +is the virtual memory area.

For me it sounds like VMA *is* the maple tree user. Maybe

  The most important usage of the Maple Tree is tracking of the virtual
  memory areas.

> +
> +The Maple Tree can store between 0 and ``ULONG_MAX``.  The Maple Tree reserves

                          ^ values

> +values with the bottom two bits set to '10' which are below 4096 (ie 2, 6, 10
> +.. 4094) for internal use.  If the entries may use reserved entries then the
> +users can convert the entries using xa_mk_value() and convert them back by

Maybe

  If an entry needs to use a reserved value then the user can convert the
  value using ...

> +calling xa_to_value().  The reserved values can be used by the advanced API,
> +but are blocked by the normal API.

I'd add a sentence about existence of "normal" and "advanced" API to the
first two paragraphs.

> +
> +The Maple Tree can also be configured to support searching for a gap of a given
> +size (or larger).  The tree must be initialized as an allocation tree to
> +support this feature.

I afraid it's not clear at this point what "an allocation tree" means. I
think the first sentence alone would be enough for the overview section.

> +
> +Pre-allocating of nodes is also supported using the advanced API.  This is
> +useful for users who must guarantee a successful store operation within a given
> +code segment when allocating cannot be done.  Allocations of nodes are
> +relatively small at 256 bytes.

I doubt the size here will get timely updates when the node size will
change in the code, maybe use "around" or "roughly" to be future proof? :)

> +
> +Normal API
> +==========
> +
> +Start by initialising a maple tree, either with DEFINE_MTREE() for statically
> +allocated maple trees or mt_init() for dynamically allocated ones.  A
> +freshly-initialised maple tree contains a ``NULL`` pointer for the range 0 -
> +``ULONG_MAX``.  There are currently two types of maple trees supported: the
> +allocation tree and the regular tree.  The regular tree has a higher branching
> +factor for internal nodes.  The allocation tree has a lower branching factor
> +but allows the user to search for a gap of a given size or larger from either 0
> +upwards or ``ULONG_MAX`` down.  An allocation tree can be used by passing in
> +the ``MAPLE_ALLOC_RANGE`` flag when initialising the tree.
> +
> +You can then set entries using mtree_store() or mtree_store_range().
> +mtree_store will overwrite any entry with the new entry and return the previous

             ^ Did you intentionally drop brackets here?

> +entry stored at that index.  mtree_store_range works in the same way but only
> +returns the first entry that is overwritten.  mtree_load() is used to retrieve

kernel-doc of these functions says that they return 0 or errno. What did I
miss?

> +the entry stored at a given index.  You can use mtree_erase() to erase an
> +entire range by only knowing one value within that range, or mtree_store() call
> +with an entry of NULL may be used to partially erase a range.
> +
> +If you want to only store a new entry to a range (or index) if that range is
> +currently ``NULL``, you can use mtree_insert_range() or mtree_insert() which
> +return -EEXIST if the range is not empty.
> +
> +You can search for an entry from an index upwards by using mt_find().
> +
> +You can walk each entry within a range by calling mt_for_each().  You must
> +provide a temporary variable to store a cursor.  If you want to walk each
> +element of the tree then 0 and ``ULONG_MAX`` may be used as the range.  If the

Why 0 and ULONG_MAX have different markup?

> +caller is going to hold the lock for the duration of the walk then it is worth

Do you refer to the maple tree internal lock or any lock here?

> +looking at the mas_for_each() API in the Advanced API section.

Maybe make "Advanced API" a hyperlink.

> +
> +Sometimes it is necessary to ensure the next call to store to a maple tree does
> +not allocate memory, please see the advanced API for this use case.
> +
> +Finally, you can remove all entries from a maple tree by calling
> +mtree_destroy().  If the maple tree entries are pointers, you may wish to free
> +the entries first.
> +
> +Allocating Nodes
> +----------------
> +
> +When using the normal API, the allocations are handled by the internal
> +tree code.

Since this is a part of "Normal API" description the "When using normal
API" seems redundant.

> +
> +Locking
> +-------
> +
> +When using the Normal API, you do not have to worry about locking.

Ditto

> +The Maple Tree uses RCU and an internal spinlock to synchronise access:
> +
> +Takes RCU read lock:
> + * mtree_load()
> + * mt_find()
> + * mt_for_each()
> + * mt_next()
> + * mt_prev()
> +
> +Takes ma_lock internally:
> + * mtree_store()
> + * mtree_store_range()
> + * mtree_insert()
> + * mtree_insert_range()
> + * mtree_erase()
> + * mtree_destroy()
> + * mt_set_in_rcu()
> + * mt_clear_in_rcu()
> +
> +If you want to take advantage of the lock to protect the data structures

I believe this is about the internal lock, it would be nice to spell this
out.

> +that you are storing in the Maple Tree, you can call mtree_lock() before
> +calling mtree_load(), then take a reference count on the object you have
> +found before calling mtree_unlock().  This will prevent stores from
> +removing the object from the tree between looking up the object and
> +incrementing the refcount.  You can also use RCU to avoid dereferencing
> +freed memory, but an explanation of that is beyond the scope of this
> +document.
> +
> +Advanced API
> +============
> +
> +The advanced API offers more flexibility and better performance at the
> +cost of an interface which can be harder to use and has fewer safeguards.
> +You must take care of your own locking while using the advanced API.
> +You can use the ma_lock, RCU or an external lock for protection.
> +You can mix advanced and normal operations on the same array, as long
> +as the locking is compatible.  The normal API is implemented in terms
> +of the advanced API.
> +
> +The advanced API is based around the ma_state, this is where the 'mas'
> +prefix originates.  The ma_state struct keeps track of tree operations to make
> +life easier for both internal and external tree users.
> +
> +Initialising the maple tree is the same as in the normal API.  Please see
> +above.
> +
> +The maple state keeps track of the range start and end in mas->index and
> +mas->last, respectively.
> +
> +mas_walk() will walk the tree to the location of index and set the index
> +and last according to the range for the entry.

           ^ fields? or maybe even mas->index and mas->last?
> +
> +You can set entries using mas_store().  mas_store() will overwrite any entry
> +with the new entry and return the first existing entry that is overwritten.
> +The range is passed in as members of the maple state: index and last.
> +
> +You can use mas_erase() to erase an entire range by setting index and
> +last of the maple state to the desired range to erase.  This will erase
> +the first range that is found in that range, set the maple state index
> +and last as the range that was erased and return the entry that existed
> +at that location.
> +
> +You can walk each entry within a range by using mas_for_each().  If you want to
> +walk each element of the tree then 0 and ``ULONG_MAX`` may be used as the
> +range.  If the lock needs to be periodically dropped, see the locking section
> +mas_pause().
> +
> +Using a maple state allows mas_next() and mas_prev() to function as if the
> +tree was a linked list.  With such a high branching factor the amortized
> +performance penalty is outweighed by cache optimization.  mas_next() will
> +return the next entry which occurs after the entry at index.  mas_prev()
> +will return the previous entry which occurs before the entry at index.
> +
> +mas_find() will find the first entry which exists at or above index on
> +the first call, and the next entry from every subsequent calls.
> +
> +mas_find_rev() will find the fist entry which exists at or below the last on
> +the first call, and the previous entry from every subsequent calls.
> +
> +If the user needs to yield the lock during an operation, then the maple state
> +must be paused using mas_pause().
> +
> +There are a few extra interfaces provided when using an allocation tree.
> +If you wish to search for a gap within a range, then mas_empty_area()
> +or mas_empty_area_rev() can be used.  mas_empty_area searches for a

                                                      ^ brackets?

> +gap starting at the lowest index given up to the maximum of the range.
> +mas_empty_area_rev searches for a gap starting at the highest index

                    ^ and here?

> +given and continues downward to the lower bound of the range.
> +
> +Allocating Nodes
> +----------------
> +
> +Allocations are usually handled internally to the tree, however if allocations
> +need to occur before a write occurs then calling mas_entry_count() will
> +allocate the worst-case number of needed nodes to insert the provided number of
> +ranges.  This also causes the tree to enter mass insertion mode.  Once
> +insertions are complete calling mas_destroy() on the maple state will free the
> +unused allocations.
> +
> +Functions and structures
> +========================
> +
> +.. kernel-doc:: include/linux/maple_tree.h
> +.. kernel-doc:: lib/maple_tree.c
> +
> diff --git a/MAINTAINERS b/MAINTAINERS
> index f41088418aae..a35478a3dee3 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -11415,6 +11415,18 @@ L:	linux-man@vger.kernel.org
>  S:	Maintained
>  W:	http://www.kernel.org/doc/man-pages
>  
> +MAPLE TREE
> +M:	Liam R. Howlett <Liam.Howlett@oracle.com>
> +L:	linux-mm@kvack.org
> +S:	Supported
> +F:	Documentation/core-api/maple_tree.rst
> +F:	include/linux/maple_tree.h
> +F:	include/trace/events/maple_tree.h
> +F:	lib/maple_tree.c
> +F:	lib/test_maple_tree.c
> +F:	tools/testing/_adix-tree/linux/maple_tree.h
> +F:	tools/testing/radix-tree/maple.c
> +
>  MARDUK (CREATOR CI40) DEVICE TREE SUPPORT
>  M:	Rahul Bedarkar <rahulbedarkar89@gmail.com>
>  L:	linux-mips@vger.kernel.org
> diff --git a/include/linux/maple_tree.h b/include/linux/maple_tree.h
> new file mode 100644
> index 000000000000..14ddeaa8f3e7
> --- /dev/null
> +++ b/include/linux/maple_tree.h
> @@ -0,0 +1,673 @@
> +/* SPDX-License-Identifier: GPL-2.0+ */
> +#ifndef _LINUX_MAPLE_TREE_H
> +#define _LINUX_MAPLE_TREE_H
> +/*
> + * Maple Tree - An RCU-safe adaptive tree for storing ranges
> + * Copyright (c) 2018 Oracle

2018 - 2022?

> + * Authors:     Liam R. Howlett <Liam.Howlett@Oracle.com>
> + *              Matthew Wilcox <willy@infradead.org>
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/rcupdate.h>
> +#include <linux/spinlock.h>
> +/* #define CONFIG_MAPLE_RCU_DISABLED */
> +/* #define CONFIG_DEBUG_MAPLE_TREE_VERBOSE */
> +
> +/*
> + * Allocated nodes are mutable until they have been inserted into the tree,
> + * at which time they cannot change their type until they have been removed
> + * from the tree and an RCU grace period has passed.
> + *
> + * Removed nodes have their ->parent set to point to themselves.  RCU readers
> + * check ->parent before relying on the value that they loaded from the
> + * slots array.  This lets us reuse the slots array for the RCU head.
> + *
> + * Nodes in the tree point to their parent unless bit 0 is set.

There are a lots of comments describing the maple tree internals here and
below. Did yaou consider adding a section "Implementation details" or
something like that to the maple_tree.rst and linking these comments there
with DOC: and some glue text?

> + */
> +#if defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64)
> +/* 64bit sizes */
> +#define MAPLE_NODE_SLOTS	31	/* 256 bytes including ->parent */
> +#define MAPLE_RANGE64_SLOTS	16	/* 256 bytes */
> +#define MAPLE_ARANGE64_SLOTS	10	/* 240 bytes */
> +#define MAPLE_ARANGE64_META_MAX	15	/* Out of range for metadata */
> +#define MAPLE_ALLOC_SLOTS	(MAPLE_NODE_SLOTS - 1)
> +#else
> +/* 32bit sizes */
> +#define MAPLE_NODE_SLOTS	63	/* 256 bytes including ->parent */
> +#define MAPLE_RANGE64_SLOTS	32	/* 256 bytes */
> +#define MAPLE_ARANGE64_SLOTS	21	/* 240 bytes */
> +#define MAPLE_ARANGE64_META_MAX	22	/* Out of range for metadata */
> +#define MAPLE_ALLOC_SLOTS	(MAPLE_NODE_SLOTS - 2)
> +#endif /* defined(CONFIG_64BIT) || defined(BUILD_VDSO32_64) */
> +
> +#define MAPLE_NODE_MASK		255UL

...

> +/**
> + * DOC: Maple tree flags

I don't see this referenced in the Documentation/ changes

> + *
> + * * MT_FLAGS_ALLOC_RANGE	- Track gaps in this tree
> + * * MT_FLAGS_USE_RCU		- Operate in RCU mode
> + * * MT_FLAGS_HEIGHT_OFFSET	- The position of the tree height in the flags
> + * * MT_FLAGS_HEIGHT_MASK	- The mask for the maple tree height value
> + * * MT_FLAGS_LOCK_MASK		- How the mt_lock is used
> + * * MT_FLAGS_LOCK_IRQ		- Acquired irq-safe
> + * * MT_FLAGS_LOCK_BH		- Acquired bh-safe
> + * * MT_FLAGS_LOCK_EXTERN	- mt_lock is not used
> + *
> + * MAPLE_HEIGHT_MAX	The largest height that can be stored
> + */
> +#define MT_FLAGS_ALLOC_RANGE	0x01
> +#define MT_FLAGS_USE_RCU	0x02
> +#define	MT_FLAGS_HEIGHT_OFFSET	0x02
> +#define	MT_FLAGS_HEIGHT_MASK	0x7C

Is extra alignment here intentional?

> +#define MT_FLAGS_LOCK_MASK	0x300
> +#define MT_FLAGS_LOCK_IRQ	0x100
> +#define MT_FLAGS_LOCK_BH	0x200
> +#define MT_FLAGS_LOCK_EXTERN	0x300

...

> +/*
> + * More complicated stores can cause two nodes to become one or tree and

                                                                  ^ three?

> + * potentially alter the height of the tree.  Either half of the tree may need
> + * to be rebalanced against the other.  The ma_topiary struct is used to track
> + * which nodes have been 'cut' from the tree so that the change can be done
> + * safely at a later date.  This is done to support RCU.
> + */
> +struct ma_topiary {
> +	struct maple_enode *head;
> +	struct maple_enode *tail;
> +	struct maple_tree *mtree;
> +};

...

> +/* Advanced API */
> +
> +/*
> + * The maple state is defined in the struct ma_state and is used to keep track
> + * of information during operations, and even between operations when using the
> + * advanced API.
> + *
> + * If state->node has bit 0 set then it references a tree location which is not
> + * a node (eg the root).  If bit 1 is set, the rest of the bits are a negative
> + * errno.  Bit 2 (the 'unallocated slots' bit) is clear.  Bits 3-6 indicate the
> + * node type.
> + *
> + * state->alloc either has a request number of nodes or an allocated node.  If
> + * stat->alloc has a requested number of nodes, the first bit will be set (0x1)
> + * and the remaining bits are the value.  If state->alloc is a node, then the
> + * node will be of type maple_alloc.  maple_alloc has MAPLE_NODE_SLOTS - 1 for
> + * storing more allocated nodes, a total, and the node_count in this node.
> + * total is the number of nodes allocated.  node_count is the number of

Maybe merge "total is the number of nodes allocated" with "a total":

... allocated nodes, total number of nodes allocated, and the node_count...

> + * allocated nodes in this node.  The scaling beyond MAPLE_NODE_SLOTS - 1 is
> + * handled by storing further nodes into state->alloc->slot[0]'s node.  Nodes
> + * are taken from state->alloc by removing a node from the state->alloc node
> + * until state->alloc->node_count is 1, when state->alloc is returned and the
> + * state->alloc->slot[0] is promoted to state->alloc.  Nodes are pushed onto
> + * state->alloc by putting the current state->alloc into the pushed node's
> + * slot[0].
> + *
> + * The state also contains the implied min/max of the state->node, the depth of
> + * this search, and the offset. The implied min/max are either from the parent
> + * node or are 0-oo for the root node.  The depth is incremented or decremented
> + * every time a node is walked down or up.  The offset is the slot/pivot of
> + * interest in the node - either for reading or writing.
> + *
> + * When returning a value the maple state index and last respectively contain
> + * the start and end of the range for the entry.  Ranges are inclusive in the
> + * Maple Tree.
> + */
> +struct ma_state {
> +	struct maple_tree *tree;	/* The tree we're operating in */
> +	unsigned long index;		/* The index we're operating on - range start */
> +	unsigned long last;		/* The last index we're operating on - range end */
> +	struct maple_enode *node;	/* The node containing this entry */
> +	unsigned long min;		/* The minimum index of this node - implied pivot min */
> +	unsigned long max;		/* The maximum index of this node - implied pivot max */
> +	struct maple_alloc *alloc;	/* Allocated nodes for this operation */
> +	unsigned char depth;		/* depth of tree descent during write */
> +	unsigned char offset;
> +	unsigned char mas_flags;
> +};

...

> +/**
> + * mt_for_each - Searches for an entry starting at index until max.

Isn't it an iterator?

> + * @tree: The Maple Tree
> + * @entry: The current entry
> + * @index: The index to update to track the location in the tree
> + * @max: The maximum limit for @index
> + *
> + * Note: Will not return the zero entry.
> + */
> +#define mt_for_each(tree, entry, index, max) \
> +	for (entry = mt_find(tree, &index, max); \
> +		entry; entry = mt_find_after(tree, &index, max))
> +
> +

...

> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> new file mode 100644
> index 000000000000..6a57745319ba
> --- /dev/null
> +++ b/lib/maple_tree.c
> @@ -0,0 +1,6943 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * Maple Tree implementation
> + * Copyright (c) 2018 Oracle Corporation
> + * Authors: Liam R. Howlett <Liam.Howlett@oracle.com>
> + *	    Matthew Wilcox <willy@infradead.org>
> + */
> +
> +/*
> + * Interesting implementation details of the Maple Tree

DOC:? 

> + *
> + * Each node type has a number of slots for entries and a number of slots for
> + * pivots.  In the case of dense nodes, the pivots are implied by the position
> + * and are simply the slot index + the minimum of the node.
> + *
> + * In regular B-Tree terms, pivots are called keys.  The term pivot is used to
> + * indicate that the tree is specifying ranges,  Pivots may appear in the
> + * subtree with an entry attached to the value where as keys are unique to a
> + * specific position of a B-tree.  Pivot values are inclusive of the slot with
> + * the same index.
> + *
> + *
> + * The following illustrates the layout of a range64 nodes slots and pivots.
> + *
> + *
> + *  Slots -> | 0 | 1 | 2 | ... | 12 | 13 | 14 | 15 |
> + *           ┬   ┬   ┬   ┬     ┬    ┬    ┬    ┬    ┬
> + *           │   │   │   │     │    │    │    │    └─ Implied maximum
> + *           │   │   │   │     │    │    │    └─ Pivot 14
> + *           │   │   │   │     │    │    └─ Pivot 13
> + *           │   │   │   │     │    └─ Pivot 12
> + *           │   │   │   │     └─ Pivot 11
> + *           │   │   │   └─ Pivot 2
> + *           │   │   └─ Pivot 1
> + *           │   └─ Pivot 0
> + *           └─  Implied minimum
> + *
> + * Slot contents:
> + *  Internal (non-leaf) nodes contain pointers to other nodes.
> + *  Leaf nodes contain entries.
> + *
> + * The location of interest is often referred to as an offset.  All offsets have
> + * a slot, but the last offset has an implied pivot from the node above (or
> + * UINT_MAX for the root node.
> + *
> + * Ranges complicate certain write activities.  When modifying any of
> + * the B-tree variants, it is known that one entry will either be added or
> + * deleted.  When modifying the Maple Tree, one store operation may overwrite
> + * the entire data set, or one half of the tree, or the middle half of the tree.
> + *
> + */

...

> +/* Interface */
> +
> +/**
> + * mas_store() - Store an @entry.
> + * @mas: The maple state.
> + * @entry: The entry to store.
> + *
> + * The @mas->index and @mas->last is used to set the range for the @entry.
> + * Note: The @mas should have pre-allocated entries to ensure there is memory to
> + * store the entry.  Please see mas_expected_entries()/mas_destroy() for more details.

Return: ?

> + */
> +void *mas_store(struct ma_state *mas, void *entry)
> +{
> +	MA_WR_STATE(wr_mas, mas, entry);
> +
> +	trace_ma_write(__func__, mas, 0, entry);
> +#ifdef CONFIG_DEBUG_MAPLE_TREE
> +	if (mas->index > mas->last)
> +		printk("Error %lu > %lu %p\n", mas->index, mas->last, entry);
> +	MT_BUG_ON(mas->tree, mas->index > mas->last);
> +	if (mas->index > mas->last) {
> +		mas_set_err(mas, -EINVAL);
> +		return NULL;
> +	}
> +
> +#endif
> +
> +/*
> + * Storing is the same operation as insert with the added caveat that it can
> + * overwrite entries.  Although this seems simple enough, one may want to
> + * examine what happens if a single store operation was to overwrite multiple
> + * entries within a self-balancing B-Tree.
> + */
> +	mas_wr_store_setup(&wr_mas);
> +	mas_wr_store_entry(&wr_mas);
> +	return wr_mas.content;
> +}

...

> +/**
> + * mt_next() - get the next value in the maple tree
> + * @mt: The maple tree
> + * @index: The start index
> + * @max: The maximum index to check
> + *
> + * Returns: The entry at @index or higher, or %NULL if nothing is found.

Please s/Returns/Return/

> + */
> +void *mt_next(struct maple_tree *mt, unsigned long index, unsigned long max)
> +{
> +	void *entry = NULL;
> +	MA_STATE(mas, mt, index, index);
> +
> +	rcu_read_lock();
> +	entry = mas_next(&mas, max);
> +	rcu_read_unlock();
> +	return entry;
> +}
> +EXPORT_SYMBOL_GPL(mt_next);

...

> +/*

kernel-doc?

> + * mas_find: If mas->node == MAS_START, find the first
> + * non-NULL entry >= mas->index.
> + * Otherwise, find the first non-NULL entry > mas->index

although this might render not nicely in html...

> + * @mas: The maple state
> + * @max: The maximum value to check.
> + *
> + * Must hold rcu_read_lock or the write lock.
> + * If an entry exists, last and index are updated accordingly.
> + * May set @mas->node to MAS_NONE.
> + *
> + * Return: The entry or %NULL.
> + */
> +void *mas_find(struct ma_state *mas, unsigned long max)
> +{
> +	if (unlikely(mas_is_paused(mas))) {
> +		if (unlikely(mas->last == ULONG_MAX)) {
> +			mas->node = MAS_NONE;
> +			return NULL;
> +		}
> +		mas->node = MAS_START;
> +		mas->index = ++mas->last;
> +	}
> +
> +	if (unlikely(mas_is_start(mas))) {
> +		/* First run or continue */
> +		void *entry;
> +
> +		if (mas->index > max)
> +			return NULL;
> +
> +		entry = mas_walk(mas);
> +		if (entry)
> +			return entry;
> +	}
> +
> +	if (unlikely(!mas_searchable(mas)))
> +		return NULL;
> +
> +	/* Retries on dead nodes handled by mas_next_entry */
> +	return mas_next_entry(mas, max);
> +}
> +
> +/**
> + * mas_find: If mas->node == MAS_START, find the first

mas_find_rev

> + * non-NULL entry <= mas->last.
> + * Otherwise, find the first non-NULL entry < mas->index

So does this look Ok in html? ;-)

> + * @mas: The maple state
> + * @min: The minimum value to check.
> + *
> + * Must hold rcu_read_lock or the write lock.
> + * If an entry exists, last and index are updated accordingly.
> + * May set @mas->node to MAS_NONE.
> + *
> + * Return: The entry or %NULL.
> + */void *mas_find_rev(struct ma_state *mas, unsigned long min)
> +{
> +	if (unlikely(mas_is_paused(mas))) {
> +		if (unlikely(mas->last == ULONG_MAX)) {
> +			mas->node = MAS_NONE;
> +			return NULL;
> +		}
> +		mas->node = MAS_START;
> +		mas->last = --mas->index;
> +	}
> +
> +	if (unlikely(mas_is_start(mas))) {
> +		/* First run or continue */
> +		void *entry;
> +
> +		if (mas->index < min)
> +			return NULL;
> +
> +		entry = mas_walk(mas);
> +		if (entry)
> +			return entry;
> +	}
> +
> +	if (unlikely(!mas_searchable(mas)))
> +		return NULL;
> +
> +	/* Retries on dead nodes handled by mas_next_entry */
> +	return mas_prev_entry(mas, min);
> +}
> +EXPORT_SYMBOL_GPL(mas_find);
> +
> +/**
> + * mas_erase() - Find the range in which index resides and erase the entire
> + * range.
> + * @mas: The maple state
> + *
> + * Must hold the write lock.
> + * Searches for @mas->index, sets @mas->index and @mas->last to the range and
> + * erases that range.
> + *
> + * Return: the entry that was erased, @mas->index and @mas->last are updated.

the entry that was erased or %NULL?

> + */
> +void *mas_erase(struct ma_state *mas)
> +{
> +	void *entry;
> +	MA_WR_STATE(wr_mas, mas, NULL);
> +
> +	if (mas_is_none(mas) || mas_is_paused(mas))
> +		mas->node = MAS_START;
> +
> +	/* Retry unnecessary when holding the write lock. */
> +	entry = mas_state_walk(mas);
> +	if (!entry)
> +		return NULL;
> +
> +write_retry:
> +	/* Must reset to ensure spanning writes of last slot are detected */
> +	mas_reset(mas);
> +	mas_wr_store_setup(&wr_mas);
> +	mas_wr_store_entry(&wr_mas);
> +	if (mas_nomem(mas, GFP_KERNEL))
> +		goto write_retry;
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL_GPL(mas_erase);
> +
> +/*

kernel-doc?

> + * mas_nomem() - * Check if there was an error allocating and do the allocation

                   ^ not needed.
> + * if necessary If there are allocations, then free them.

                 ^ period?

> + * @mas: The maple state
> + * @gfp: The GFP_FALGS to use for allocations

Return:?

> + */
> +bool mas_nomem(struct ma_state *mas, gfp_t gfp)
> +	__must_hold(mas->tree->lock)
> +{
> +	if (likely(mas->node != MA_ERROR(-ENOMEM))) {
> +		mas_destroy(mas);
> +		return false;
> +	}
> +
> +	if (gfpflags_allow_blocking(gfp) && !mt_external_lock(mas->tree)) {
> +		mtree_unlock(mas->tree);
> +		mas_alloc_nodes(mas, gfp);
> +		mtree_lock(mas->tree);
> +	} else {
> +		mas_alloc_nodes(mas, gfp);
> +	}
> +
> +	if (!mas_allocated(mas))
> +		return false;
> +
> +	mas->node = MAS_START;
> +	return true;
> +}
> +
> +void __init maple_tree_init(void)
> +{
> +	maple_node_cache = kmem_cache_create("maple_node",
> +			sizeof(struct maple_node), sizeof(struct maple_node),
> +			SLAB_PANIC, NULL);
> +}
> +
> +/**
> + * mtree_load() - Load a value stored in a maple tree
> + * @mt: The maple tree
> + * @index: The index to load
> + *
> + * Return: the entry of %NULL

                       ^ or?
> + */
> +void *mtree_load(struct maple_tree *mt, unsigned long index)
> +{
> +	MA_STATE(mas, mt, index, index);
> +	void *entry;
> +
> +	trace_ma_read(__func__, &mas);
> +	rcu_read_lock();
> +retry:
> +	entry = mas_start(&mas);
> +	if (unlikely(mas_is_none(&mas)))
> +		goto unlock;
> +
> +	if (unlikely(mas_is_ptr(&mas))) {
> +		if (index)
> +			entry = NULL;
> +
> +		goto unlock;
> +	}
> +
> +	entry = mtree_lookup_walk(&mas);
> +	if (!entry && unlikely(mas_is_start(&mas)))
> +		goto retry;
> +unlock:
> +	rcu_read_unlock();
> +	if (xa_is_zero(entry))
> +		return NULL;
> +
> +	return entry;
> +}
> +EXPORT_SYMBOL(mtree_load);

-- 
Sincerely yours,
Mike.

  reply	other threads:[~2022-02-02 17:11 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-02  2:41 [PATCH v5 00/70] Introducing the Maple Tree Liam Howlett
2022-02-02  2:41 ` [PATCH v5 01/70] binfmt_elf: Take the mmap lock when walking the VMA list Liam Howlett
2022-02-02  2:41 ` [PATCH v5 02/70] radix tree test suite: Add pr_err define Liam Howlett
2022-02-02  2:41 ` [PATCH v5 04/70] radix tree test suite: Add allocation counts and size to kmem_cache Liam Howlett
2022-02-02  2:41 ` [PATCH v5 05/70] radix tree test suite: Add support for slab bulk APIs Liam Howlett
2022-02-02  2:41 ` [PATCH v5 03/70] radix tree test suite: Add kmem_cache_set_non_kernel() Liam Howlett
2022-02-02  2:41 ` [PATCH v5 06/70] radix tree test suite: Add lockdep_is_held to header Liam Howlett
2022-02-02  2:41 ` [PATCH v5 07/70] Maple Tree: Add new data structure Liam Howlett
2022-02-02 17:11   ` Mike Rapoport [this message]
2022-02-03  2:38     ` Liam Howlett
2022-02-03  8:57       ` Mike Rapoport
2022-02-03 16:02   ` Mark Hemment
2022-02-03 17:29     ` Liam Howlett
2022-02-04 11:24   ` David Howells
2022-02-02  2:42 ` [PATCH v5 08/70] lib/test_maple_tree: Add testing for maple tree Liam Howlett
2022-02-02  2:42 ` [PATCH v5 11/70] mmap: Use the VMA iterator in count_vma_pages_range() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 10/70] mm: Add VMA iterator Liam Howlett
2022-02-02  2:42 ` [PATCH v5 09/70] mm: Start tracking VMAs with maple tree Liam Howlett
2022-02-02  2:42 ` [PATCH v5 13/70] mm/mmap: Use the maple tree for find_vma_prev() instead of the rbtree Liam Howlett
2022-02-02  2:42 ` [PATCH v5 12/70] mm/mmap: Use the maple tree in find_vma() " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 14/70] mm/mmap: Use maple tree for unmapped_area{_topdown} Liam Howlett
2022-02-02  2:42 ` [PATCH v5 15/70] kernel/fork: Use maple tree for dup_mmap() during forking Liam Howlett
2022-02-03 11:59   ` Mark Hemment
2022-02-03 17:20     ` Liam Howlett
2022-02-02  2:42 ` [PATCH v5 16/70] damon: Convert __damon_va_three_regions to use the VMA iterator Liam Howlett
2022-02-02  2:42 ` [PATCH v5 18/70] mm: Remove rb tree Liam Howlett
2022-02-02  2:42 ` [PATCH v5 17/70] proc: Remove VMA rbtree use from nommu Liam Howlett
2022-02-02  2:42 ` [PATCH v5 19/70] mmap: Change zeroing of maple tree in __vma_adjust() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 21/70] mm: Optimize find_exact_vma() to use vma_lookup() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 20/70] xen: Use vma_lookup() in privcmd_ioctl_mmap() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 22/70] mm/khugepaged: Optimize collapse_pte_mapped_thp() by using vma_lookup() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 23/70] mm/mmap: Change do_brk_flags() to expand existing VMA and add do_brk_munmap() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 24/70] mm: Use maple tree operations for find_vma_intersection() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 26/70] mm: Remove vmacache Liam Howlett
2022-02-02  2:42 ` [PATCH v5 25/70] mm/mmap: Use advanced maple tree API for mmap_region() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 27/70] mm: Convert vma_lookup() to use mtree_load() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 28/70] mm/mmap: Move mmap_region() below do_munmap() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 31/70] arm64: Remove mmap linked list from vdso Liam Howlett
2022-02-02  2:42 ` [PATCH v5 29/70] mm/mmap: Reorganize munmap to use maple states Liam Howlett
2022-02-02  2:42 ` [PATCH v5 30/70] mm/mmap: Change do_brk_munmap() to use do_mas_align_munmap() Liam Howlett
2022-02-02  2:42 ` [PATCH v5 32/70] parisc: Remove mmap linked list from cache handling Liam Howlett
2022-02-02  2:42 ` [PATCH v5 33/70] powerpc: Remove mmap linked list walks Liam Howlett
2022-02-02  2:42 ` [PATCH v5 34/70] s390: Remove vma " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 37/70] cxl: Remove vma linked list walk Liam Howlett
2022-02-02  2:42 ` [PATCH v5 35/70] x86: Remove vma linked list walks Liam Howlett
2022-02-02  2:42 ` [PATCH v5 36/70] xtensa: " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 39/70] um: Remove vma linked list walk Liam Howlett
2022-02-02  2:42 ` [PATCH v5 38/70] optee: " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 41/70] coredump: " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 42/70] exec: Use VMA iterator instead of linked list Liam Howlett
2022-02-02  2:42 ` [PATCH v5 40/70] binfmt_elf: Remove vma linked list walk Liam Howlett
2022-02-02  2:42 ` [PATCH v5 44/70] fs/proc/task_mmu: Stop using linked list and highest_vm_end Liam Howlett
2022-02-02  2:42 ` [PATCH v5 43/70] fs/proc/base: Use maple tree iterators in place of linked list Liam Howlett
2022-02-02  2:42 ` [PATCH v5 47/70] acct: Use VMA iterator instead " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 46/70] ipc/shm: " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 45/70] userfaultfd: Use maple tree iterator to iterate VMAs Liam Howlett
2022-02-02  2:42 ` [PATCH v5 49/70] sched: Use maple tree iterator to walk VMAs Liam Howlett
2022-02-02  2:42 ` [PATCH v5 48/70] perf: Use VMA iterator Liam Howlett
2022-02-02  2:42 ` [PATCH v5 52/70] mm/gup: Use maple tree navigation instead of linked list Liam Howlett
2022-02-02  2:42 ` [PATCH v5 50/70] fork: Use VMA iterator Liam Howlett
2022-02-02  2:42 ` [PATCH v5 51/70] bpf: Remove VMA linked list Liam Howlett
2022-02-02  2:42 ` [PATCH v5 53/70] mm/khugepaged: Stop using vma " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 54/70] mm/ksm: Use vma iterators instead of " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 57/70] mm/mempolicy: Use vma iterator & maple state " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 55/70] mm/madvise: Use vma_find() " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 56/70] mm/memcontrol: Stop using mm->highest_vm_end Liam Howlett
2022-02-02  2:42 ` [PATCH v5 59/70] mm/mprotect: Use maple tree navigation instead of vma linked list Liam Howlett
2022-02-02  2:42 ` [PATCH v5 58/70] mm/mlock: Use vma iterator and " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 60/70] mm/mremap: Use vma_find_intersection() " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 62/70] mm/oom_kill: Use maple tree iterators " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 61/70] mm/msync: Use vma_find() " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 63/70] mm/pagewalk: " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 64/70] mm/swapfile: Use vma iterator " Liam Howlett
2022-02-02  2:42 ` [PATCH v5 65/70] i915: Use the VMA iterator Liam Howlett
2022-02-02  2:42 ` [PATCH v5 66/70] nommu: Remove uses of VMA linked list Liam Howlett
2022-02-02  2:42 ` [PATCH v5 67/70] riscv: Use vma iterator for vdso Liam Howlett
2022-02-02  2:42 ` [PATCH v5 68/70] mm: Remove the vma linked list Liam Howlett
2022-02-03 12:08   ` Mark Hemment
2022-02-03 17:25     ` Liam Howlett
2022-02-02  2:42 ` [PATCH v5 69/70] mm/mmap: Drop range_has_overlap() function Liam Howlett
2022-02-02  2:42 ` [PATCH v5 70/70] mm/mmap.c: Pass in mapping to __vma_link_file() Liam Howlett
2022-02-04 11:28 ` [PATCH v5 09/70] mm: Start tracking VMAs with maple tree David Howells
2022-02-11 18:23   ` Liam Howlett

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Yfq7J9LU58FqNFVW@kernel.org \
    --to=rppt@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.