All of lore.kernel.org
 help / color / mirror / Atom feed
* [LSF/MM TOPIC] Using XArray to manage the VMA
@ 2019-03-13 15:10 Laurent Dufour
  2019-03-13 18:01 ` Matthew Wilcox
  2019-03-13 21:06 ` Davidlohr Bueso
  0 siblings, 2 replies; 7+ messages in thread
From: Laurent Dufour @ 2019-03-13 15:10 UTC (permalink / raw)
  To: lsf-pc, Linux-MM, linux-kernel; +Cc: Matthew Wilcox

If this is not too late and if there is still place available, I would 
like to attend the MM track and propose a topic about using the XArray 
to replace the VMA's RB tree and list.

Using the XArray in place of the VMA's tree and list seems to be a first 
step to the long way of removing/replacing the mmap_sem.
However, there are still corner cases to address like the VMA splitting 
and merging which may raise some issue. Using the XArray's specifying 
locking would not be enough to handle the memory management, and 
additional fine grain locking like a per VMA one could be studied, 
leading to further discussion about the merging of the VMA.

In addition, here are some topics I'm interested in:
- Test cases to choose for demonstrating mm features or fixing mm bugs 
proposed by Balbir Singh
- mm documentation proposed by Mike Rapoport

Laurent.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC] Using XArray to manage the VMA
  2019-03-13 15:10 [LSF/MM TOPIC] Using XArray to manage the VMA Laurent Dufour
@ 2019-03-13 18:01 ` Matthew Wilcox
  2019-03-29  9:31   ` Laurent Dufour
  2019-03-13 21:06 ` Davidlohr Bueso
  1 sibling, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2019-03-13 18:01 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: lsf-pc, Linux-MM, linux-kernel, Liam R. Howlett

On Wed, Mar 13, 2019 at 04:10:14PM +0100, Laurent Dufour wrote:
> If this is not too late and if there is still place available, I would like
> to attend the MM track and propose a topic about using the XArray to replace
> the VMA's RB tree and list.

If there isn't room on the schedule, then Laurent and I are definitely
going to sneak off and talk about this ourselves at some point.  Having a
high-bandwidth conversation about this is going to be really important
for us, and I think having other people involved would be good.

If there're still spots, it'd be good to have Liam Howlett join us.
He's doing the actual writing-of-code for the Maple Tree at the moment
(I did some earlier on, but recent commits are all him).

> Using the XArray in place of the VMA's tree and list seems to be a first
> step to the long way of removing/replacing the mmap_sem.
> However, there are still corner cases to address like the VMA splitting and
> merging which may raise some issue. Using the XArray's specifying locking
> would not be enough to handle the memory management, and additional fine
> grain locking like a per VMA one could be studied, leading to further
> discussion about the merging of the VMA.
> 
> In addition, here are some topics I'm interested in:
> - Test cases to choose for demonstrating mm features or fixing mm bugs
> proposed by Balbir Singh
> - mm documentation proposed by Mike Rapoport
> 
> Laurent.
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC] Using XArray to manage the VMA
  2019-03-13 15:10 [LSF/MM TOPIC] Using XArray to manage the VMA Laurent Dufour
  2019-03-13 18:01 ` Matthew Wilcox
@ 2019-03-13 21:06 ` Davidlohr Bueso
  2019-03-14  2:39   ` Matthew Wilcox
  1 sibling, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2019-03-13 21:06 UTC (permalink / raw)
  To: Laurent Dufour; +Cc: lsf-pc, Linux-MM, linux-kernel, Matthew Wilcox

On Wed, 13 Mar 2019, Laurent Dufour wrote:

>If this is not too late and if there is still place available, I would 
>like to attend the MM track and propose a topic about using the XArray 
>to replace the VMA's RB tree and list.
>
>Using the XArray in place of the VMA's tree and list seems to be a 
>first step to the long way of removing/replacing the mmap_sem.

So threaded (not as in threads of execution) rbtrees are another
alternative to deal with the two data structure approach we currently
have. Having O(1) rb_prev/next() calls allows us to basically get rid of
the vma list at the cost of an extra check for each node we visit on
the way down when inserting.

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC] Using XArray to manage the VMA
  2019-03-13 21:06 ` Davidlohr Bueso
@ 2019-03-14  2:39   ` Matthew Wilcox
  2019-03-14 16:43     ` Davidlohr Bueso
  0 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2019-03-14  2:39 UTC (permalink / raw)
  To: Laurent Dufour, lsf-pc, Linux-MM, linux-kernel

On Wed, Mar 13, 2019 at 02:06:03PM -0700, Davidlohr Bueso wrote:
> On Wed, 13 Mar 2019, Laurent Dufour wrote:
> > If this is not too late and if there is still place available, I would
> > like to attend the MM track and propose a topic about using the XArray
> > to replace the VMA's RB tree and list.
> > 
> > Using the XArray in place of the VMA's tree and list seems to be a first
> > step to the long way of removing/replacing the mmap_sem.
> 
> So threaded (not as in threads of execution) rbtrees are another
> alternative to deal with the two data structure approach we currently
> have. Having O(1) rb_prev/next() calls allows us to basically get rid of
> the vma list at the cost of an extra check for each node we visit on
> the way down when inserting.

It's probably worth listing the advantages of the Maple Tree over the
rbtree.

 - Shallower tree.  A 1000-entry rbtree is 10 levels deep.  A 1000-entry
   Maple Tree is 5 levels deep (I did a more detailed analysis in an
   earlier email thread with Laurent and I can present it if needed).
 - O(1) prev/next
 - Lookups under the RCU lock

There're some second-order effects too; by using externally allocated
nodes, we avoid disturbing other VMAs when inserting/deleting, and we
avoid bouncing cachelines around (eg the VMA which happens to end up
at the head of the tree is accessed by every lookup in the tree because
it's on the way to every other node).

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC] Using XArray to manage the VMA
  2019-03-14  2:39   ` Matthew Wilcox
@ 2019-03-14 16:43     ` Davidlohr Bueso
  2019-03-14 19:54       ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Davidlohr Bueso @ 2019-03-14 16:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Laurent Dufour, lsf-pc, Linux-MM, linux-kernel

On Wed, 13 Mar 2019, Matthew Wilcox wrote:

>It's probably worth listing the advantages of the Maple Tree over the
>rbtree.

I'm not familiar with maple trees, are they referred to by another name?
(is this some sort of B-tree?). Google just shows me real trees.

>
> - Shallower tree.  A 1000-entry rbtree is 10 levels deep.  A 1000-entry
>   Maple Tree is 5 levels deep (I did a more detailed analysis in an
>   earlier email thread with Laurent and I can present it if needed).

I'd be interested in reading on that.

> - O(1) prev/next
> - Lookups under the RCU lock
>
>There're some second-order effects too; by using externally allocated
>nodes, we avoid disturbing other VMAs when inserting/deleting, and we
>avoid bouncing cachelines around (eg the VMA which happens to end up
>at the head of the tree is accessed by every lookup in the tree because
>it's on the way to every other node).

How would maple trees deal with the augmented vma tree (vma gaps) trick
we use to optimize get_unmapped_area?

Thanks,
Davidlohr

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC] Using XArray to manage the VMA
  2019-03-14 16:43     ` Davidlohr Bueso
@ 2019-03-14 19:54       ` Matthew Wilcox
  0 siblings, 0 replies; 7+ messages in thread
From: Matthew Wilcox @ 2019-03-14 19:54 UTC (permalink / raw)
  To: Laurent Dufour, lsf-pc, Linux-MM, linux-kernel

On Thu, Mar 14, 2019 at 09:43:43AM -0700, Davidlohr Bueso wrote:
> On Wed, 13 Mar 2019, Matthew Wilcox wrote:
> 
> > It's probably worth listing the advantages of the Maple Tree over the
> > rbtree.
> 
> I'm not familiar with maple trees, are they referred to by another name?
> (is this some sort of B-tree?). Google just shows me real trees.

It is a B-tree variant which supports ranges as a first-class citizen
(single elements are ranges of length 1).  It optimises for ranges which
are adjacent to each other, and does not support overlapping ranges.
It supports RCU lookup and embeds a spinlock which must be held for
modification.  There's a lot of detail I can go into, but let's leave
it at that for an introduction.

> > - Shallower tree.  A 1000-entry rbtree is 10 levels deep.  A 1000-entry
> >   Maple Tree is 5 levels deep (I did a more detailed analysis in an
> >   earlier email thread with Laurent and I can present it if needed).
> 
> I'd be interested in reading on that.

(see the last two paragraphs of the mail for that analysis)

> > - O(1) prev/next
> > - Lookups under the RCU lock
> > 
> > There're some second-order effects too; by using externally allocated
> > nodes, we avoid disturbing other VMAs when inserting/deleting, and we
> > avoid bouncing cachelines around (eg the VMA which happens to end up
> > at the head of the tree is accessed by every lookup in the tree because
> > it's on the way to every other node).
> 
> How would maple trees deal with the augmented vma tree (vma gaps) trick
> we use to optimize get_unmapped_area?

The fundamental unit of the Maple Tree is a 128-byte node.  A leaf node
is laid out like this:

struct maple_range_64 {
        struct maple_node *parent;
        void __rcu *slot[8];
        u64 pivot[7];
};

The pivots are stored in ascending order; if the search index is less
than pivot[i], then the value (ie the vma pointer) you are searching
for is stored in slot[i].

Non-leaf nodes (for trees which support range allocations) are laid out
like this:

struct maple_arange_64 {
        struct maple_node *parent;
	u64 gaps[5];
        void __rcu *slot[5];
        u64 pivot[4];
};

gaps[i] stores the largest run of NULL pointers in the subtree rooted at
slot[i].  When searching for an empty range of at least N, you can skip
any subtree which has gaps[i] < N.

Here's a simple case:

$ ldd `which cat`
	linux-vdso.so.1 (0x00007ffc867fc000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2c8cc6e000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f2c8ce69000)

'cat /proc/self/maps | wc' gives me 25 mappings.  They look like this:

$ cat /proc/self/maps |cut -f 1 -d ' '
55c414785000-55c414787000
55c414787000-55c41478c000
55c41478c000-55c41478e000
55c41478f000-55c414790000
55c414790000-55c414791000
55c4159dd000-55c4159fe000
7fa5f6527000-7fa5f680c000
7fa5f680c000-7fa5f682e000
7fa5f682e000-7fa5f6976000
7fa5f6976000-7fa5f69c2000
7fa5f69c2000-7fa5f69c3000
7fa5f69c3000-7fa5f69c7000
7fa5f69c7000-7fa5f69c9000
7fa5f69c9000-7fa5f69cd000
7fa5f69cd000-7fa5f69cf000
7fa5f69d9000-7fa5f69fb000
7fa5f69fb000-7fa5f69fc000
7fa5f69fc000-7fa5f6a1a000
7fa5f6a1a000-7fa5f6a22000
7fa5f6a22000-7fa5f6a23000
7fa5f6a23000-7fa5f6a24000
7fa5f6a24000-7fa5f6a25000
7ffe54a3c000-7ffe54a5d000
7ffe54a7b000-7ffe54a7e000
7ffe54a7e000-7ffe54a80000

We'd represent this in the Maple Tree as:

0-55c414785000 -> NULL
55c414785000-55c414787000 -> vma
55c414787000-55c41478c000 -> vma
...
55c414790000-55c414791000 -> vma
55c414791000-7fa5f6527000 -> NULL
7fa5f6527000-7fa5f680c000 -> vma
...
7fa5f69cd000-7fa5f69cf000 -> vma
7fa5f69cf000-7fa5f69d9000 -> NULL
7fa5f69d9000-7fa5f69fb000 -> vma
...
7fa5f6a24000-7fa5f6a25000 -> vma
7fa5f6a25000-7ffe54a3c000 -> NULL
7ffe54a3c000-7ffe54a5d000 -> vma
7ffe54a5d000-7ffe54a7b000 -> NULL
7ffe54a7b000-7ffe54a7e000 -> vma
7ffe54a7e000-7ffe54a80000 -> vma
7ffe54a80000-ffffffffffff -> NULL

so the maple tree stores 6 ranges that point to NULL in addition to the
25 that're stored by the rbtree.  Because they're allocated sequentially,
there won't be any wastage in the maple tree caused by items shifting
around.  That means we'll get 8 per leaf node, so just 4 leaf nodes
needed to store 31 ranges, all stored in a single root node.  That means
to get from root to an arbitrary VMA is just 3 pointer dereferences,
versus 3.96 pointer dereferences for an optimally balanced rbtree.


For a process with 1000 VMAs, we'll have approximately 167 leaf nodes
(assuming approximately 6 of the 8 pointers are used per node) arranged
into a tree of height 5, with about 44 non-leaf nodes needed to manage
those 166 leaf-nodes.  That'll be 6 pointers to follow per walk of the
tree (if it were optimally arranged, it'd be 125 leaf nodes plus 25 +
5 + 1 non-leaf nodes and 5 pointers to follow, but it's unrealistic to
assume it'll be optimally arranged, and this neglects the NULL ranges
which will also need to be stored).

The rbtree has a 1/1000 chance of 1 pointer dereference, a 2/1000 chance
of 2 pointers, 4/1000 chance of 3 pointers, 8/1000 4 pointers, 16/1000 5
pointers, 32/1000 6 pointers, 64/1000 7 pointers, 128/1000 8 pointers,
256/1000 9 pointers, 489/1000 10 pointers.  Amortised, that's 8.987
pointers to look up a random VMA (assuming the rbtree is fully balanced;
I haven't checked how unbalanced the rbtree can actually become).


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [LSF/MM TOPIC] Using XArray to manage the VMA
  2019-03-13 18:01 ` Matthew Wilcox
@ 2019-03-29  9:31   ` Laurent Dufour
  0 siblings, 0 replies; 7+ messages in thread
From: Laurent Dufour @ 2019-03-29  9:31 UTC (permalink / raw)
  To: Matthew Wilcox, Michel Lespinasse, David Rientjes
  Cc: lsf-pc, Linux-MM, linux-kernel, Liam R. Howlett

Adding Michel and David in the loop who are interested in this topic too.

Le 13/03/2019 à 19:01, Matthew Wilcox a écrit :
> On Wed, Mar 13, 2019 at 04:10:14PM +0100, Laurent Dufour wrote:
>> If this is not too late and if there is still place available, I would like
>> to attend the MM track and propose a topic about using the XArray to replace
>> the VMA's RB tree and list.
> 
> If there isn't room on the schedule, then Laurent and I are definitely
> going to sneak off and talk about this ourselves at some point.  Having a
> high-bandwidth conversation about this is going to be really important
> for us, and I think having other people involved would be good.
> 
> If there're still spots, it'd be good to have Liam Howlett join us.
> He's doing the actual writing-of-code for the Maple Tree at the moment
> (I did some earlier on, but recent commits are all him).
> 
>> Using the XArray in place of the VMA's tree and list seems to be a first
>> step to the long way of removing/replacing the mmap_sem.
>> However, there are still corner cases to address like the VMA splitting and
>> merging which may raise some issue. Using the XArray's specifying locking
>> would not be enough to handle the memory management, and additional fine
>> grain locking like a per VMA one could be studied, leading to further
>> discussion about the merging of the VMA.
>>
>> In addition, here are some topics I'm interested in:
>> - Test cases to choose for demonstrating mm features or fixing mm bugs
>> proposed by Balbir Singh
>> - mm documentation proposed by Mike Rapoport
>>
>> Laurent.
>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2019-03-29  9:31 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-13 15:10 [LSF/MM TOPIC] Using XArray to manage the VMA Laurent Dufour
2019-03-13 18:01 ` Matthew Wilcox
2019-03-29  9:31   ` Laurent Dufour
2019-03-13 21:06 ` Davidlohr Bueso
2019-03-14  2:39   ` Matthew Wilcox
2019-03-14 16:43     ` Davidlohr Bueso
2019-03-14 19:54       ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.