* [LSF/MM TOPIC] Using XArray to manage the VMA @ 2019-03-13 15:10 Laurent Dufour 2019-03-13 18:01 ` Matthew Wilcox 2019-03-13 21:06 ` Davidlohr Bueso 0 siblings, 2 replies; 7+ messages in thread From: Laurent Dufour @ 2019-03-13 15:10 UTC (permalink / raw) To: lsf-pc, Linux-MM, linux-kernel; +Cc: Matthew Wilcox If this is not too late and if there is still place available, I would like to attend the MM track and propose a topic about using the XArray to replace the VMA's RB tree and list. Using the XArray in place of the VMA's tree and list seems to be a first step to the long way of removing/replacing the mmap_sem. However, there are still corner cases to address like the VMA splitting and merging which may raise some issue. Using the XArray's specifying locking would not be enough to handle the memory management, and additional fine grain locking like a per VMA one could be studied, leading to further discussion about the merging of the VMA. In addition, here are some topics I'm interested in: - Test cases to choose for demonstrating mm features or fixing mm bugs proposed by Balbir Singh - mm documentation proposed by Mike Rapoport Laurent. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] Using XArray to manage the VMA 2019-03-13 15:10 [LSF/MM TOPIC] Using XArray to manage the VMA Laurent Dufour @ 2019-03-13 18:01 ` Matthew Wilcox 2019-03-29 9:31 ` Laurent Dufour 2019-03-13 21:06 ` Davidlohr Bueso 1 sibling, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2019-03-13 18:01 UTC (permalink / raw) To: Laurent Dufour; +Cc: lsf-pc, Linux-MM, linux-kernel, Liam R. Howlett On Wed, Mar 13, 2019 at 04:10:14PM +0100, Laurent Dufour wrote: > If this is not too late and if there is still place available, I would like > to attend the MM track and propose a topic about using the XArray to replace > the VMA's RB tree and list. If there isn't room on the schedule, then Laurent and I are definitely going to sneak off and talk about this ourselves at some point. Having a high-bandwidth conversation about this is going to be really important for us, and I think having other people involved would be good. If there're still spots, it'd be good to have Liam Howlett join us. He's doing the actual writing-of-code for the Maple Tree at the moment (I did some earlier on, but recent commits are all him). > Using the XArray in place of the VMA's tree and list seems to be a first > step to the long way of removing/replacing the mmap_sem. > However, there are still corner cases to address like the VMA splitting and > merging which may raise some issue. Using the XArray's specifying locking > would not be enough to handle the memory management, and additional fine > grain locking like a per VMA one could be studied, leading to further > discussion about the merging of the VMA. > > In addition, here are some topics I'm interested in: > - Test cases to choose for demonstrating mm features or fixing mm bugs > proposed by Balbir Singh > - mm documentation proposed by Mike Rapoport > > Laurent. > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] Using XArray to manage the VMA 2019-03-13 18:01 ` Matthew Wilcox @ 2019-03-29 9:31 ` Laurent Dufour 0 siblings, 0 replies; 7+ messages in thread From: Laurent Dufour @ 2019-03-29 9:31 UTC (permalink / raw) To: Matthew Wilcox, Michel Lespinasse, David Rientjes Cc: lsf-pc, Linux-MM, linux-kernel, Liam R. Howlett Adding Michel and David in the loop who are interested in this topic too. Le 13/03/2019 à 19:01, Matthew Wilcox a écrit : > On Wed, Mar 13, 2019 at 04:10:14PM +0100, Laurent Dufour wrote: >> If this is not too late and if there is still place available, I would like >> to attend the MM track and propose a topic about using the XArray to replace >> the VMA's RB tree and list. > > If there isn't room on the schedule, then Laurent and I are definitely > going to sneak off and talk about this ourselves at some point. Having a > high-bandwidth conversation about this is going to be really important > for us, and I think having other people involved would be good. > > If there're still spots, it'd be good to have Liam Howlett join us. > He's doing the actual writing-of-code for the Maple Tree at the moment > (I did some earlier on, but recent commits are all him). > >> Using the XArray in place of the VMA's tree and list seems to be a first >> step to the long way of removing/replacing the mmap_sem. >> However, there are still corner cases to address like the VMA splitting and >> merging which may raise some issue. Using the XArray's specifying locking >> would not be enough to handle the memory management, and additional fine >> grain locking like a per VMA one could be studied, leading to further >> discussion about the merging of the VMA. >> >> In addition, here are some topics I'm interested in: >> - Test cases to choose for demonstrating mm features or fixing mm bugs >> proposed by Balbir Singh >> - mm documentation proposed by Mike Rapoport >> >> Laurent. >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] Using XArray to manage the VMA 2019-03-13 15:10 [LSF/MM TOPIC] Using XArray to manage the VMA Laurent Dufour 2019-03-13 18:01 ` Matthew Wilcox @ 2019-03-13 21:06 ` Davidlohr Bueso 2019-03-14 2:39 ` Matthew Wilcox 1 sibling, 1 reply; 7+ messages in thread From: Davidlohr Bueso @ 2019-03-13 21:06 UTC (permalink / raw) To: Laurent Dufour; +Cc: lsf-pc, Linux-MM, linux-kernel, Matthew Wilcox On Wed, 13 Mar 2019, Laurent Dufour wrote: >If this is not too late and if there is still place available, I would >like to attend the MM track and propose a topic about using the XArray >to replace the VMA's RB tree and list. > >Using the XArray in place of the VMA's tree and list seems to be a >first step to the long way of removing/replacing the mmap_sem. So threaded (not as in threads of execution) rbtrees are another alternative to deal with the two data structure approach we currently have. Having O(1) rb_prev/next() calls allows us to basically get rid of the vma list at the cost of an extra check for each node we visit on the way down when inserting. Thanks, Davidlohr ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] Using XArray to manage the VMA 2019-03-13 21:06 ` Davidlohr Bueso @ 2019-03-14 2:39 ` Matthew Wilcox 2019-03-14 16:43 ` Davidlohr Bueso 0 siblings, 1 reply; 7+ messages in thread From: Matthew Wilcox @ 2019-03-14 2:39 UTC (permalink / raw) To: Laurent Dufour, lsf-pc, Linux-MM, linux-kernel On Wed, Mar 13, 2019 at 02:06:03PM -0700, Davidlohr Bueso wrote: > On Wed, 13 Mar 2019, Laurent Dufour wrote: > > If this is not too late and if there is still place available, I would > > like to attend the MM track and propose a topic about using the XArray > > to replace the VMA's RB tree and list. > > > > Using the XArray in place of the VMA's tree and list seems to be a first > > step to the long way of removing/replacing the mmap_sem. > > So threaded (not as in threads of execution) rbtrees are another > alternative to deal with the two data structure approach we currently > have. Having O(1) rb_prev/next() calls allows us to basically get rid of > the vma list at the cost of an extra check for each node we visit on > the way down when inserting. It's probably worth listing the advantages of the Maple Tree over the rbtree. - Shallower tree. A 1000-entry rbtree is 10 levels deep. A 1000-entry Maple Tree is 5 levels deep (I did a more detailed analysis in an earlier email thread with Laurent and I can present it if needed). - O(1) prev/next - Lookups under the RCU lock There're some second-order effects too; by using externally allocated nodes, we avoid disturbing other VMAs when inserting/deleting, and we avoid bouncing cachelines around (eg the VMA which happens to end up at the head of the tree is accessed by every lookup in the tree because it's on the way to every other node). ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] Using XArray to manage the VMA 2019-03-14 2:39 ` Matthew Wilcox @ 2019-03-14 16:43 ` Davidlohr Bueso 2019-03-14 19:54 ` Matthew Wilcox 0 siblings, 1 reply; 7+ messages in thread From: Davidlohr Bueso @ 2019-03-14 16:43 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Laurent Dufour, lsf-pc, Linux-MM, linux-kernel On Wed, 13 Mar 2019, Matthew Wilcox wrote: >It's probably worth listing the advantages of the Maple Tree over the >rbtree. I'm not familiar with maple trees, are they referred to by another name? (is this some sort of B-tree?). Google just shows me real trees. > > - Shallower tree. A 1000-entry rbtree is 10 levels deep. A 1000-entry > Maple Tree is 5 levels deep (I did a more detailed analysis in an > earlier email thread with Laurent and I can present it if needed). I'd be interested in reading on that. > - O(1) prev/next > - Lookups under the RCU lock > >There're some second-order effects too; by using externally allocated >nodes, we avoid disturbing other VMAs when inserting/deleting, and we >avoid bouncing cachelines around (eg the VMA which happens to end up >at the head of the tree is accessed by every lookup in the tree because >it's on the way to every other node). How would maple trees deal with the augmented vma tree (vma gaps) trick we use to optimize get_unmapped_area? Thanks, Davidlohr ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [LSF/MM TOPIC] Using XArray to manage the VMA 2019-03-14 16:43 ` Davidlohr Bueso @ 2019-03-14 19:54 ` Matthew Wilcox 0 siblings, 0 replies; 7+ messages in thread From: Matthew Wilcox @ 2019-03-14 19:54 UTC (permalink / raw) To: Laurent Dufour, lsf-pc, Linux-MM, linux-kernel On Thu, Mar 14, 2019 at 09:43:43AM -0700, Davidlohr Bueso wrote: > On Wed, 13 Mar 2019, Matthew Wilcox wrote: > > > It's probably worth listing the advantages of the Maple Tree over the > > rbtree. > > I'm not familiar with maple trees, are they referred to by another name? > (is this some sort of B-tree?). Google just shows me real trees. It is a B-tree variant which supports ranges as a first-class citizen (single elements are ranges of length 1). It optimises for ranges which are adjacent to each other, and does not support overlapping ranges. It supports RCU lookup and embeds a spinlock which must be held for modification. There's a lot of detail I can go into, but let's leave it at that for an introduction. > > - Shallower tree. A 1000-entry rbtree is 10 levels deep. A 1000-entry > > Maple Tree is 5 levels deep (I did a more detailed analysis in an > > earlier email thread with Laurent and I can present it if needed). > > I'd be interested in reading on that. (see the last two paragraphs of the mail for that analysis) > > - O(1) prev/next > > - Lookups under the RCU lock > > > > There're some second-order effects too; by using externally allocated > > nodes, we avoid disturbing other VMAs when inserting/deleting, and we > > avoid bouncing cachelines around (eg the VMA which happens to end up > > at the head of the tree is accessed by every lookup in the tree because > > it's on the way to every other node). > > How would maple trees deal with the augmented vma tree (vma gaps) trick > we use to optimize get_unmapped_area? The fundamental unit of the Maple Tree is a 128-byte node. A leaf node is laid out like this: struct maple_range_64 { struct maple_node *parent; void __rcu *slot[8]; u64 pivot[7]; }; The pivots are stored in ascending order; if the search index is less than pivot[i], then the value (ie the vma pointer) you are searching for is stored in slot[i]. Non-leaf nodes (for trees which support range allocations) are laid out like this: struct maple_arange_64 { struct maple_node *parent; u64 gaps[5]; void __rcu *slot[5]; u64 pivot[4]; }; gaps[i] stores the largest run of NULL pointers in the subtree rooted at slot[i]. When searching for an empty range of at least N, you can skip any subtree which has gaps[i] < N. Here's a simple case: $ ldd `which cat` linux-vdso.so.1 (0x00007ffc867fc000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f2c8cc6e000) /lib64/ld-linux-x86-64.so.2 (0x00007f2c8ce69000) 'cat /proc/self/maps | wc' gives me 25 mappings. They look like this: $ cat /proc/self/maps |cut -f 1 -d ' ' 55c414785000-55c414787000 55c414787000-55c41478c000 55c41478c000-55c41478e000 55c41478f000-55c414790000 55c414790000-55c414791000 55c4159dd000-55c4159fe000 7fa5f6527000-7fa5f680c000 7fa5f680c000-7fa5f682e000 7fa5f682e000-7fa5f6976000 7fa5f6976000-7fa5f69c2000 7fa5f69c2000-7fa5f69c3000 7fa5f69c3000-7fa5f69c7000 7fa5f69c7000-7fa5f69c9000 7fa5f69c9000-7fa5f69cd000 7fa5f69cd000-7fa5f69cf000 7fa5f69d9000-7fa5f69fb000 7fa5f69fb000-7fa5f69fc000 7fa5f69fc000-7fa5f6a1a000 7fa5f6a1a000-7fa5f6a22000 7fa5f6a22000-7fa5f6a23000 7fa5f6a23000-7fa5f6a24000 7fa5f6a24000-7fa5f6a25000 7ffe54a3c000-7ffe54a5d000 7ffe54a7b000-7ffe54a7e000 7ffe54a7e000-7ffe54a80000 We'd represent this in the Maple Tree as: 0-55c414785000 -> NULL 55c414785000-55c414787000 -> vma 55c414787000-55c41478c000 -> vma ... 55c414790000-55c414791000 -> vma 55c414791000-7fa5f6527000 -> NULL 7fa5f6527000-7fa5f680c000 -> vma ... 7fa5f69cd000-7fa5f69cf000 -> vma 7fa5f69cf000-7fa5f69d9000 -> NULL 7fa5f69d9000-7fa5f69fb000 -> vma ... 7fa5f6a24000-7fa5f6a25000 -> vma 7fa5f6a25000-7ffe54a3c000 -> NULL 7ffe54a3c000-7ffe54a5d000 -> vma 7ffe54a5d000-7ffe54a7b000 -> NULL 7ffe54a7b000-7ffe54a7e000 -> vma 7ffe54a7e000-7ffe54a80000 -> vma 7ffe54a80000-ffffffffffff -> NULL so the maple tree stores 6 ranges that point to NULL in addition to the 25 that're stored by the rbtree. Because they're allocated sequentially, there won't be any wastage in the maple tree caused by items shifting around. That means we'll get 8 per leaf node, so just 4 leaf nodes needed to store 31 ranges, all stored in a single root node. That means to get from root to an arbitrary VMA is just 3 pointer dereferences, versus 3.96 pointer dereferences for an optimally balanced rbtree. For a process with 1000 VMAs, we'll have approximately 167 leaf nodes (assuming approximately 6 of the 8 pointers are used per node) arranged into a tree of height 5, with about 44 non-leaf nodes needed to manage those 166 leaf-nodes. That'll be 6 pointers to follow per walk of the tree (if it were optimally arranged, it'd be 125 leaf nodes plus 25 + 5 + 1 non-leaf nodes and 5 pointers to follow, but it's unrealistic to assume it'll be optimally arranged, and this neglects the NULL ranges which will also need to be stored). The rbtree has a 1/1000 chance of 1 pointer dereference, a 2/1000 chance of 2 pointers, 4/1000 chance of 3 pointers, 8/1000 4 pointers, 16/1000 5 pointers, 32/1000 6 pointers, 64/1000 7 pointers, 128/1000 8 pointers, 256/1000 9 pointers, 489/1000 10 pointers. Amortised, that's 8.987 pointers to look up a random VMA (assuming the rbtree is fully balanced; I haven't checked how unbalanced the rbtree can actually become). ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2019-03-29 9:31 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-03-13 15:10 [LSF/MM TOPIC] Using XArray to manage the VMA Laurent Dufour 2019-03-13 18:01 ` Matthew Wilcox 2019-03-29 9:31 ` Laurent Dufour 2019-03-13 21:06 ` Davidlohr Bueso 2019-03-14 2:39 ` Matthew Wilcox 2019-03-14 16:43 ` Davidlohr Bueso 2019-03-14 19:54 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).