* Memory allocation on speculative fastpaths @ 2022-05-03 15:59 Paul E. McKenney 2022-05-03 16:04 ` Michal Hocko 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2022-05-03 15:59 UTC (permalink / raw) To: liam.howlett, willy, walken.cr, mhocko, hannes; +Cc: linux-mm, linux-kernel Hello! Just following up from off-list discussions yesterday. The requirements to allocate on an RCU-protected speculative fastpath seem to be as follows: 1. Never sleep. 2. Never reclaim. 3. Leave emergency pools alone. Any others? If those rules suffice, and if my understanding of the GFP flags is correct (ha!!!), then the following GFP flags should cover this: __GFP_NOMEMALLOC | __GFP_NOWARN Or is this just a fancy way of always returning NULL or some such? ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 15:59 Memory allocation on speculative fastpaths Paul E. McKenney @ 2022-05-03 16:04 ` Michal Hocko 2022-05-03 16:39 ` Paul E. McKenney 0 siblings, 1 reply; 12+ messages in thread From: Michal Hocko @ 2022-05-03 16:04 UTC (permalink / raw) To: Paul E. McKenney Cc: liam.howlett, willy, walken.cr, hannes, linux-mm, linux-kernel On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > Hello! > > Just following up from off-list discussions yesterday. > > The requirements to allocate on an RCU-protected speculative fastpath > seem to be as follows: > > 1. Never sleep. > 2. Never reclaim. > 3. Leave emergency pools alone. > > Any others? > > If those rules suffice, and if my understanding of the GFP flags is > correct (ha!!!), then the following GFP flags should cover this: > > __GFP_NOMEMALLOC | __GFP_NOWARN GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > Or is this just a fancy way of always returning NULL or some such? ;-) It could fail quite easily. We would also want to guarantee (by documenting I guess) that the page allocator never does anything that would depend or invoke rcu_synchronize or something like that. I believe this is the case currently. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 16:04 ` Michal Hocko @ 2022-05-03 16:39 ` Paul E. McKenney 2022-05-03 18:28 ` Matthew Wilcox 0 siblings, 1 reply; 12+ messages in thread From: Paul E. McKenney @ 2022-05-03 16:39 UTC (permalink / raw) To: Michal Hocko Cc: liam.howlett, willy, walken.cr, hannes, linux-mm, linux-kernel On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > > Hello! > > > > Just following up from off-list discussions yesterday. > > > > The requirements to allocate on an RCU-protected speculative fastpath > > seem to be as follows: > > > > 1. Never sleep. > > 2. Never reclaim. > > 3. Leave emergency pools alone. > > > > Any others? > > > > If those rules suffice, and if my understanding of the GFP flags is > > correct (ha!!!), then the following GFP flags should cover this: > > > > __GFP_NOMEMALLOC | __GFP_NOWARN > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN Ah, good point on GFP_NOWAIT, thank you! > > Or is this just a fancy way of always returning NULL or some such? ;-) > > It could fail quite easily. We would also want to guarantee (by > documenting I guess) that the page allocator never does anything that > would depend or invoke rcu_synchronize or something like that. The GPF_NOWAIT should rule out synchronize_rcu() and similar, correct? > I believe this is the case currently. Here is hoping! ;-) Thanx, Paul ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 16:39 ` Paul E. McKenney @ 2022-05-03 18:28 ` Matthew Wilcox 2022-05-03 23:15 ` Suren Baghdasaryan 2022-05-04 8:20 ` Michel Lespinasse 0 siblings, 2 replies; 12+ messages in thread From: Matthew Wilcox @ 2022-05-03 18:28 UTC (permalink / raw) To: Paul E. McKenney Cc: Michal Hocko, liam.howlett, walken.cr, hannes, linux-mm, linux-kernel On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > > > Hello! > > > > > > Just following up from off-list discussions yesterday. > > > > > > The requirements to allocate on an RCU-protected speculative fastpath > > > seem to be as follows: > > > > > > 1. Never sleep. > > > 2. Never reclaim. > > > 3. Leave emergency pools alone. > > > > > > Any others? > > > > > > If those rules suffice, and if my understanding of the GFP flags is > > > correct (ha!!!), then the following GFP flags should cover this: > > > > > > __GFP_NOMEMALLOC | __GFP_NOWARN > > > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > > Ah, good point on GFP_NOWAIT, thank you! Johannes (I think it was?) made the point to me that if we have another task very slowly freeing memory, a task in this path can take advantage of that other task's hard work and never go into reclaim. So the approach we should take is: p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); if (failure) { rcu_read_unlock(); do_reclaim(); return FAULT_FLAG_RETRY; } ... but all this is now moot since the approach we agreed to yesterday is: rcu_read_lock(); vma = vma_lookup(); if (down_read_trylock(&vma->sem)) { rcu_read_unlock(); } else { rcu_read_unlock(); mmap_read_lock(mm); vma = vma_lookup(); down_read(&vma->sem); } ... and we then execute the page table allocation under the protection of the vma->sem. At least, that's what I think we agreed to yesterday. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 18:28 ` Matthew Wilcox @ 2022-05-03 23:15 ` Suren Baghdasaryan 2022-05-03 23:45 ` Michal Hocko 2022-05-04 16:23 ` Johannes Weiner 2022-05-04 8:20 ` Michel Lespinasse 1 sibling, 2 replies; 12+ messages in thread From: Suren Baghdasaryan @ 2022-05-03 23:15 UTC (permalink / raw) To: Matthew Wilcox Cc: Paul E. McKenney, Michal Hocko, Liam R. Howlett, Michel Lespinasse, Johannes Weiner, linux-mm, LKML, David Hildenbrand, Davidlohr Bueso On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: > > On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: > > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > > > > Hello! > > > > > > > > Just following up from off-list discussions yesterday. > > > > > > > > The requirements to allocate on an RCU-protected speculative fastpath > > > > seem to be as follows: > > > > > > > > 1. Never sleep. > > > > 2. Never reclaim. > > > > 3. Leave emergency pools alone. > > > > > > > > Any others? > > > > > > > > If those rules suffice, and if my understanding of the GFP flags is > > > > correct (ha!!!), then the following GFP flags should cover this: > > > > > > > > __GFP_NOMEMALLOC | __GFP_NOWARN > > > > > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > > > > Ah, good point on GFP_NOWAIT, thank you! > > Johannes (I think it was?) made the point to me that if we have another > task very slowly freeing memory, a task in this path can take advantage > of that other task's hard work and never go into reclaim. So the > approach we should take is: > > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > if (failure) { > rcu_read_unlock(); > do_reclaim(); > return FAULT_FLAG_RETRY; > } > > ... but all this is now moot since the approach we agreed to yesterday > is: I think the discussion was about the above approach and Johannes suggested to fallback to the normal pagefault handling with mmap_lock locked if PMD does not exist. Please correct me if I misunderstood here. > > rcu_read_lock(); > vma = vma_lookup(); > if (down_read_trylock(&vma->sem)) { > rcu_read_unlock(); > } else { > rcu_read_unlock(); > mmap_read_lock(mm); > vma = vma_lookup(); > down_read(&vma->sem); > } > > ... and we then execute the page table allocation under the protection of > the vma->sem. > > At least, that's what I think we agreed to yesterday. Honestly, I don't remember discussing vma->sem at all. My understanding is that two approaches were differing by the section covered by rcu_read_lock/rcu_read_unlock. The solution that you suggested would handle pagefault completely under RCU as long as it's possible and would fallback to the mmap_lock if it's impossible, while Michel's implementation was taking rcu_read_lock/rcu_read_unlock for smaller sections and would use vma->seq_number to detect any vma changes between these sections. Your suggested approach sounds simpler and the way I understood the comments is that we should give it a try. Did I miss anything? Thanks, Suren. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 23:15 ` Suren Baghdasaryan @ 2022-05-03 23:45 ` Michal Hocko 2022-05-04 0:22 ` Matthew Wilcox 2022-05-04 16:23 ` Johannes Weiner 1 sibling, 1 reply; 12+ messages in thread From: Michal Hocko @ 2022-05-03 23:45 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Matthew Wilcox, Paul E. McKenney, Liam R. Howlett, Michel Lespinasse, Johannes Weiner, linux-mm, LKML, David Hildenbrand, Davidlohr Bueso On Tue 03-05-22 16:15:46, Suren Baghdasaryan wrote: > On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: [...] > > rcu_read_lock(); > > vma = vma_lookup(); > > if (down_read_trylock(&vma->sem)) { > > rcu_read_unlock(); > > } else { > > rcu_read_unlock(); > > mmap_read_lock(mm); > > vma = vma_lookup(); > > down_read(&vma->sem); > > } > > > > ... and we then execute the page table allocation under the protection of > > the vma->sem. > > > > At least, that's what I think we agreed to yesterday. > > Honestly, I don't remember discussing vma->sem at all. This is the rangelocking approach that is effectivelly per-VMA. So that should help with the most simplistic case where the mmap contention is not on the same VMAs which should be the most common case (e.g. faulting from several threads while there is mmap happening in the background). There are cases where this could be too coarse of course and RCU would be a long term plan. The above seems easy enough and still probably good enough for most cases so a good first step. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 23:45 ` Michal Hocko @ 2022-05-04 0:22 ` Matthew Wilcox 0 siblings, 0 replies; 12+ messages in thread From: Matthew Wilcox @ 2022-05-04 0:22 UTC (permalink / raw) To: Michal Hocko Cc: Suren Baghdasaryan, Paul E. McKenney, Liam R. Howlett, Michel Lespinasse, Johannes Weiner, linux-mm, LKML, David Hildenbrand, Davidlohr Bueso On Wed, May 04, 2022 at 01:45:11AM +0200, Michal Hocko wrote: > On Tue 03-05-22 16:15:46, Suren Baghdasaryan wrote: > > On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: > [...] > > > rcu_read_lock(); > > > vma = vma_lookup(); > > > if (down_read_trylock(&vma->sem)) { > > > rcu_read_unlock(); > > > } else { > > > rcu_read_unlock(); > > > mmap_read_lock(mm); > > > vma = vma_lookup(); > > > down_read(&vma->sem); > > > } > > > > > > ... and we then execute the page table allocation under the protection of > > > the vma->sem. > > > > > > At least, that's what I think we agreed to yesterday. > > > > Honestly, I don't remember discussing vma->sem at all. > > This is the rangelocking approach that is effectivelly per-VMA. So that > should help with the most simplistic case where the mmap contention is > not on the same VMAs which should be the most common case (e.g. faulting > from several threads while there is mmap happening in the background). > > There are cases where this could be too coarse of course and RCU would > be a long term plan. The above seems easy enough and still probably good > enough for most cases so a good first step. It also fixes the low-pri monitoring daemon problem as page faults will not be blocked by a writer (unless the read_trylock fails). I see three potential outcomes here from the vma rwsem approach: - No particular improvement on any workloads. Result: we try something else. - Minor gains (5-10%). We benchmark it and discover there's still significant contention on the vma_sem. Result: we take those wins and keep going towards a full RCU solution - Major gains (20-50%). Result: We're done, break out the champagne. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 23:15 ` Suren Baghdasaryan 2022-05-03 23:45 ` Michal Hocko @ 2022-05-04 16:23 ` Johannes Weiner 2022-05-24 20:37 ` Vlastimil Babka 1 sibling, 1 reply; 12+ messages in thread From: Johannes Weiner @ 2022-05-04 16:23 UTC (permalink / raw) To: Suren Baghdasaryan Cc: Matthew Wilcox, Paul E. McKenney, Michal Hocko, Liam R. Howlett, Michel Lespinasse, linux-mm, LKML, David Hildenbrand, Davidlohr Bueso On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote: > On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: > > > > On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: > > > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > > > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > > > > > Hello! > > > > > > > > > > Just following up from off-list discussions yesterday. > > > > > > > > > > The requirements to allocate on an RCU-protected speculative fastpath > > > > > seem to be as follows: > > > > > > > > > > 1. Never sleep. > > > > > 2. Never reclaim. > > > > > 3. Leave emergency pools alone. > > > > > > > > > > Any others? > > > > > > > > > > If those rules suffice, and if my understanding of the GFP flags is > > > > > correct (ha!!!), then the following GFP flags should cover this: > > > > > > > > > > __GFP_NOMEMALLOC | __GFP_NOWARN > > > > > > > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > > > > > > Ah, good point on GFP_NOWAIT, thank you! > > > > Johannes (I think it was?) made the point to me that if we have another > > task very slowly freeing memory, a task in this path can take advantage > > of that other task's hard work and never go into reclaim. So the > > approach we should take is: Right, GFP_NOWAIT can starve out other allocations. It can clear out the freelists without the burden of having to do reclaim like everybody else wanting memory during a shortage. Including GFP_KERNEL. In smaller doses and/or for privileged purposes (e.g. single-argument kfree_rcu ;)), those allocations are fine. But because the context is page tables specifically, it would mean that userspace could trigger a large number of those and DOS other applications and the kernel. > > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > > > if (failure) { > > rcu_read_unlock(); > > do_reclaim(); > > return FAULT_FLAG_RETRY; > > } > > > > ... but all this is now moot since the approach we agreed to yesterday > > is: > > I think the discussion was about the above approach and Johannes > suggested to fallback to the normal pagefault handling with mmap_lock > locked if PMD does not exist. Please correct me if I misunderstood > here. Yeah. Either way works, as long as the task is held accountable. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-04 16:23 ` Johannes Weiner @ 2022-05-24 20:37 ` Vlastimil Babka 2022-05-25 13:12 ` Johannes Weiner 0 siblings, 1 reply; 12+ messages in thread From: Vlastimil Babka @ 2022-05-24 20:37 UTC (permalink / raw) To: Johannes Weiner, Suren Baghdasaryan Cc: Matthew Wilcox, Paul E. McKenney, Michal Hocko, Liam R. Howlett, Michel Lespinasse, linux-mm, LKML, David Hildenbrand, Davidlohr Bueso On 5/4/22 18:23, Johannes Weiner wrote: > On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote: >> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: >>> >>> On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: >>>> On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: >>>>> On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: >>>>>> Hello! >>>>>> >>>>>> Just following up from off-list discussions yesterday. >>>>>> >>>>>> The requirements to allocate on an RCU-protected speculative fastpath >>>>>> seem to be as follows: >>>>>> >>>>>> 1. Never sleep. >>>>>> 2. Never reclaim. >>>>>> 3. Leave emergency pools alone. >>>>>> >>>>>> Any others? >>>>>> >>>>>> If those rules suffice, and if my understanding of the GFP flags is >>>>>> correct (ha!!!), then the following GFP flags should cover this: >>>>>> >>>>>> __GFP_NOMEMALLOC | __GFP_NOWARN >>>>> >>>>> GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN >>>> >>>> Ah, good point on GFP_NOWAIT, thank you! >>> >>> Johannes (I think it was?) made the point to me that if we have another >>> task very slowly freeing memory, a task in this path can take advantage >>> of that other task's hard work and never go into reclaim. So the >>> approach we should take is: > > Right, GFP_NOWAIT can starve out other allocations. It can clear out > the freelists without the burden of having to do reclaim like > everybody else wanting memory during a shortage. Including GFP_KERNEL. FTR, I wonder if this is really true, given the suggested fallback. With GFP_NOWAIT, you can either see memory (in all applicable zones) as a) above low_watermark, just go ahead and allocate, as GFP_KERNEL would b) between min and low watermark, wake up kswapd and allocate, as GFP_KERNEL would c) below min watermark, the most interesting. GFP_KERNEL fallbacks to reclaim. If the GFP_NOWAIT path's fallback also includes reclaim, as suggested in this thread, how is it really different from GFP_KERNEL? So am I missing something or is GFP_NOWAIT fastpath with an immediate fallback that includes reclaim (and not just a retry loop) fundamentally not different from GFP_KERNEL, regardless of how often we attempt it? > In smaller doses and/or for privileged purposes (e.g. single-argument > kfree_rcu ;)), those allocations are fine. But because the context is > page tables specifically, it would mean that userspace could trigger a > large number of those and DOS other applications and the kernel. > >>> p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); >>> >>> if (failure) { >>> rcu_read_unlock(); >>> do_reclaim(); >>> return FAULT_FLAG_RETRY; >>> } >>> >>> ... but all this is now moot since the approach we agreed to yesterday >>> is: >> >> I think the discussion was about the above approach and Johannes >> suggested to fallback to the normal pagefault handling with mmap_lock >> locked if PMD does not exist. Please correct me if I misunderstood >> here. > > Yeah. Either way works, as long as the task is held accountable. > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-24 20:37 ` Vlastimil Babka @ 2022-05-25 13:12 ` Johannes Weiner 0 siblings, 0 replies; 12+ messages in thread From: Johannes Weiner @ 2022-05-25 13:12 UTC (permalink / raw) To: Vlastimil Babka Cc: Suren Baghdasaryan, Matthew Wilcox, Paul E. McKenney, Michal Hocko, Liam R. Howlett, Michel Lespinasse, linux-mm, LKML, David Hildenbrand, Davidlohr Bueso On Tue, May 24, 2022 at 10:37:15PM +0200, Vlastimil Babka wrote: > On 5/4/22 18:23, Johannes Weiner wrote: > > On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote: > >> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: > >>> > >>> On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote: > >>>> On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote: > >>>>> On Tue 03-05-22 08:59:13, Paul E. McKenney wrote: > >>>>>> Hello! > >>>>>> > >>>>>> Just following up from off-list discussions yesterday. > >>>>>> > >>>>>> The requirements to allocate on an RCU-protected speculative fastpath > >>>>>> seem to be as follows: > >>>>>> > >>>>>> 1. Never sleep. > >>>>>> 2. Never reclaim. > >>>>>> 3. Leave emergency pools alone. > >>>>>> > >>>>>> Any others? > >>>>>> > >>>>>> If those rules suffice, and if my understanding of the GFP flags is > >>>>>> correct (ha!!!), then the following GFP flags should cover this: > >>>>>> > >>>>>> __GFP_NOMEMALLOC | __GFP_NOWARN > >>>>> > >>>>> GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN > >>>> > >>>> Ah, good point on GFP_NOWAIT, thank you! > >>> > >>> Johannes (I think it was?) made the point to me that if we have another > >>> task very slowly freeing memory, a task in this path can take advantage > >>> of that other task's hard work and never go into reclaim. So the > >>> approach we should take is: > > > > Right, GFP_NOWAIT can starve out other allocations. It can clear out > > the freelists without the burden of having to do reclaim like > > everybody else wanting memory during a shortage. Including GFP_KERNEL. > > FTR, I wonder if this is really true, given the suggested fallback. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ IIRC adding this fallback was the conclusion of the in-person discussion. Above I just tried to summarize for the record the original concern that led to it. I could have been more clear. Your analysis is dead on, of course. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-03 18:28 ` Matthew Wilcox 2022-05-03 23:15 ` Suren Baghdasaryan @ 2022-05-04 8:20 ` Michel Lespinasse 2022-05-04 16:52 ` Matthew Wilcox 1 sibling, 1 reply; 12+ messages in thread From: Michel Lespinasse @ 2022-05-04 8:20 UTC (permalink / raw) To: Matthew Wilcox Cc: Paul E. McKenney, Michal Hocko, Liam Howlett, hannes, linux-mm, linux-kernel, Davidlohr Bueso, David (for context, this came up during a discussion of speculative page faults implementation details) On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: > Johannes (I think it was?) made the point to me that if we have another > task very slowly freeing memory, a task in this path can take advantage > of that other task's hard work and never go into reclaim. So the > approach we should take is: > > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > if (failure) { > rcu_read_unlock(); > do_reclaim(); > return FAULT_FLAG_RETRY; > } I don't think this works. The problem with allocating page tables is not just that it may break an rcu-locked code section; you also need the code inserting the new page tables into the mm's page table tree to synchronize with any munmap() that may be concurrently running. RCU isn't sufficient here, and we would need a proper lock when wiring new page tables (current code relies on mmap lock for this). > ... but all this is now moot since the approach we agreed to yesterday > is: > > rcu_read_lock(); > vma = vma_lookup(); > if (down_read_trylock(&vma->sem)) { > rcu_read_unlock(); > } else { > rcu_read_unlock(); > mmap_read_lock(mm); > vma = vma_lookup(); > down_read(&vma->sem); > } > > ... and we then execute the page table allocation under the protection of > the vma->sem. > > At least, that's what I think we agreed to yesterday. I don't remember discussing any of this yesterday. As I remember it, the discussion was about having one large RCU section vs several small ones linked by sequence count checks to verify the validity of the vma at the start of each RCU section. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: Memory allocation on speculative fastpaths 2022-05-04 8:20 ` Michel Lespinasse @ 2022-05-04 16:52 ` Matthew Wilcox 0 siblings, 0 replies; 12+ messages in thread From: Matthew Wilcox @ 2022-05-04 16:52 UTC (permalink / raw) To: Michel Lespinasse Cc: Paul E. McKenney, Michal Hocko, Liam Howlett, hannes, linux-mm, linux-kernel, Davidlohr Bueso, David On Wed, May 04, 2022 at 01:20:39AM -0700, Michel Lespinasse wrote: > (for context, this came up during a discussion of speculative page > faults implementation details) > > On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote: > > Johannes (I think it was?) made the point to me that if we have another > > task very slowly freeing memory, a task in this path can take advantage > > of that other task's hard work and never go into reclaim. So the > > approach we should take is: > > > > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN); > > > > if (failure) { > > rcu_read_unlock(); > > do_reclaim(); > > return FAULT_FLAG_RETRY; > > } > > I don't think this works. The problem with allocating page tables is > not just that it may break an rcu-locked code section; you also need > the code inserting the new page tables into the mm's page table tree > to synchronize with any munmap() that may be concurrently running. RCU > isn't sufficient here, and we would need a proper lock when wiring new > page tables (current code relies on mmap lock for this). Hmm, so what you're saying is that we could see: CPU 0 CPU 1 rcu_read_lock() p4d_alloc() pud_alloc() mmap_write_lock pmd_free pud_free p4d_free mmap_write_unlock pmd_alloc() ... and now CPU 0 has stored a newly allocated PMD into a PUD that will be freed by RCU. Good catch, but fortunately we don't have to solve it right now because we decided on Monday to not pursue this approach. > > ... but all this is now moot since the approach we agreed to yesterday > > is: > > > > rcu_read_lock(); > > vma = vma_lookup(); > > if (down_read_trylock(&vma->sem)) { > > rcu_read_unlock(); > > } else { > > rcu_read_unlock(); > > mmap_read_lock(mm); > > vma = vma_lookup(); > > down_read(&vma->sem); > > } > > > > ... and we then execute the page table allocation under the protection of > > the vma->sem. > > > > At least, that's what I think we agreed to yesterday. > > I don't remember discussing any of this yesterday. As I remember it, > the discussion was about having one large RCU section vs several small > ones linked by sequence count checks to verify the validity of the vma > at the start of each RCU section. That was indeed what we started out discussing, but Michal and others wanted to explore locking the VMA as a first step. This approach doesn't suffer from the same problem as the write side is: mmap_write_lock() vma = vma_lookup(); down_write(&vma->sem); ... tear down page tables ... up_write(&vma->sem); rcu_free(vma); mmap_write_unlock(); So when tearing down the page tables, if the fault is on this VMA, it will block until the fault has released the VMA read lock. If it's on another VMA, the page table teardown will not tear down any page tables which are partially covered by any VMA. Or have I missed something else? ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2022-05-25 13:12 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-05-03 15:59 Memory allocation on speculative fastpaths Paul E. McKenney 2022-05-03 16:04 ` Michal Hocko 2022-05-03 16:39 ` Paul E. McKenney 2022-05-03 18:28 ` Matthew Wilcox 2022-05-03 23:15 ` Suren Baghdasaryan 2022-05-03 23:45 ` Michal Hocko 2022-05-04 0:22 ` Matthew Wilcox 2022-05-04 16:23 ` Johannes Weiner 2022-05-24 20:37 ` Vlastimil Babka 2022-05-25 13:12 ` Johannes Weiner 2022-05-04 8:20 ` Michel Lespinasse 2022-05-04 16:52 ` Matthew Wilcox
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.