All of lore.kernel.org
 help / color / mirror / Atom feed
* Memory allocation on speculative fastpaths
@ 2022-05-03 15:59 Paul E. McKenney
  2022-05-03 16:04 ` Michal Hocko
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-05-03 15:59 UTC (permalink / raw)
  To: liam.howlett, willy, walken.cr, mhocko, hannes; +Cc: linux-mm, linux-kernel

Hello!

Just following up from off-list discussions yesterday.

The requirements to allocate on an RCU-protected speculative fastpath
seem to be as follows:

1.	Never sleep.
2.	Never reclaim.
3.	Leave emergency pools alone.

Any others?

If those rules suffice, and if my understanding of the GFP flags is
correct (ha!!!), then the following GFP flags should cover this:

	__GFP_NOMEMALLOC | __GFP_NOWARN

Or is this just a fancy way of always returning NULL or some such?  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 15:59 Memory allocation on speculative fastpaths Paul E. McKenney
@ 2022-05-03 16:04 ` Michal Hocko
  2022-05-03 16:39   ` Paul E. McKenney
  0 siblings, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2022-05-03 16:04 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: liam.howlett, willy, walken.cr, hannes, linux-mm, linux-kernel

On Tue 03-05-22 08:59:13, Paul E. McKenney wrote:
> Hello!
> 
> Just following up from off-list discussions yesterday.
> 
> The requirements to allocate on an RCU-protected speculative fastpath
> seem to be as follows:
> 
> 1.	Never sleep.
> 2.	Never reclaim.
> 3.	Leave emergency pools alone.
> 
> Any others?
> 
> If those rules suffice, and if my understanding of the GFP flags is
> correct (ha!!!), then the following GFP flags should cover this:
> 
> 	__GFP_NOMEMALLOC | __GFP_NOWARN

GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN
 
> Or is this just a fancy way of always returning NULL or some such?  ;-)

It could fail quite easily. We would also want to guarantee (by
documenting I guess) that the page allocator never does anything that
would depend or invoke rcu_synchronize or something like that.

I believe this is the case currently.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 16:04 ` Michal Hocko
@ 2022-05-03 16:39   ` Paul E. McKenney
  2022-05-03 18:28     ` Matthew Wilcox
  0 siblings, 1 reply; 12+ messages in thread
From: Paul E. McKenney @ 2022-05-03 16:39 UTC (permalink / raw)
  To: Michal Hocko
  Cc: liam.howlett, willy, walken.cr, hannes, linux-mm, linux-kernel

On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote:
> On Tue 03-05-22 08:59:13, Paul E. McKenney wrote:
> > Hello!
> > 
> > Just following up from off-list discussions yesterday.
> > 
> > The requirements to allocate on an RCU-protected speculative fastpath
> > seem to be as follows:
> > 
> > 1.	Never sleep.
> > 2.	Never reclaim.
> > 3.	Leave emergency pools alone.
> > 
> > Any others?
> > 
> > If those rules suffice, and if my understanding of the GFP flags is
> > correct (ha!!!), then the following GFP flags should cover this:
> > 
> > 	__GFP_NOMEMALLOC | __GFP_NOWARN
> 
> GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN

Ah, good point on GFP_NOWAIT, thank you!

> > Or is this just a fancy way of always returning NULL or some such?  ;-)
> 
> It could fail quite easily. We would also want to guarantee (by
> documenting I guess) that the page allocator never does anything that
> would depend or invoke rcu_synchronize or something like that.

The GPF_NOWAIT should rule out synchronize_rcu() and similar, correct?

> I believe this is the case currently.

Here is hoping!  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 16:39   ` Paul E. McKenney
@ 2022-05-03 18:28     ` Matthew Wilcox
  2022-05-03 23:15       ` Suren Baghdasaryan
  2022-05-04  8:20       ` Michel Lespinasse
  0 siblings, 2 replies; 12+ messages in thread
From: Matthew Wilcox @ 2022-05-03 18:28 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: Michal Hocko, liam.howlett, walken.cr, hannes, linux-mm, linux-kernel

On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote:
> On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote:
> > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote:
> > > Hello!
> > > 
> > > Just following up from off-list discussions yesterday.
> > > 
> > > The requirements to allocate on an RCU-protected speculative fastpath
> > > seem to be as follows:
> > > 
> > > 1.	Never sleep.
> > > 2.	Never reclaim.
> > > 3.	Leave emergency pools alone.
> > > 
> > > Any others?
> > > 
> > > If those rules suffice, and if my understanding of the GFP flags is
> > > correct (ha!!!), then the following GFP flags should cover this:
> > > 
> > > 	__GFP_NOMEMALLOC | __GFP_NOWARN
> > 
> > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN
> 
> Ah, good point on GFP_NOWAIT, thank you!

Johannes (I think it was?) made the point to me that if we have another
task very slowly freeing memory, a task in this path can take advantage
of that other task's hard work and never go into reclaim.  So the
approach we should take is:

p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);

if (failure) {
  rcu_read_unlock();
  do_reclaim();
  return FAULT_FLAG_RETRY;
}

... but all this is now moot since the approach we agreed to yesterday
is:

rcu_read_lock();
vma = vma_lookup();
if (down_read_trylock(&vma->sem)) {
	rcu_read_unlock();
} else {
	rcu_read_unlock();
	mmap_read_lock(mm);
	vma = vma_lookup();
	down_read(&vma->sem);
}

... and we then execute the page table allocation under the protection of
the vma->sem.

At least, that's what I think we agreed to yesterday.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 18:28     ` Matthew Wilcox
@ 2022-05-03 23:15       ` Suren Baghdasaryan
  2022-05-03 23:45         ` Michal Hocko
  2022-05-04 16:23         ` Johannes Weiner
  2022-05-04  8:20       ` Michel Lespinasse
  1 sibling, 2 replies; 12+ messages in thread
From: Suren Baghdasaryan @ 2022-05-03 23:15 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Paul E. McKenney, Michal Hocko, Liam R. Howlett,
	Michel Lespinasse, Johannes Weiner, linux-mm, LKML,
	David Hildenbrand, Davidlohr Bueso

On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote:
> > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote:
> > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote:
> > > > Hello!
> > > >
> > > > Just following up from off-list discussions yesterday.
> > > >
> > > > The requirements to allocate on an RCU-protected speculative fastpath
> > > > seem to be as follows:
> > > >
> > > > 1.        Never sleep.
> > > > 2.        Never reclaim.
> > > > 3.        Leave emergency pools alone.
> > > >
> > > > Any others?
> > > >
> > > > If those rules suffice, and if my understanding of the GFP flags is
> > > > correct (ha!!!), then the following GFP flags should cover this:
> > > >
> > > >   __GFP_NOMEMALLOC | __GFP_NOWARN
> > >
> > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN
> >
> > Ah, good point on GFP_NOWAIT, thank you!
>
> Johannes (I think it was?) made the point to me that if we have another
> task very slowly freeing memory, a task in this path can take advantage
> of that other task's hard work and never go into reclaim.  So the
> approach we should take is:
>
> p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
>
> if (failure) {
>   rcu_read_unlock();
>   do_reclaim();
>   return FAULT_FLAG_RETRY;
> }
>
> ... but all this is now moot since the approach we agreed to yesterday
> is:

I think the discussion was about the above approach and Johannes
suggested to fallback to the normal pagefault handling with mmap_lock
locked if PMD does not exist. Please correct me if I misunderstood
here.

>
> rcu_read_lock();
> vma = vma_lookup();
> if (down_read_trylock(&vma->sem)) {
>         rcu_read_unlock();
> } else {
>         rcu_read_unlock();
>         mmap_read_lock(mm);
>         vma = vma_lookup();
>         down_read(&vma->sem);
> }
>
> ... and we then execute the page table allocation under the protection of
> the vma->sem.
>
> At least, that's what I think we agreed to yesterday.

Honestly, I don't remember discussing vma->sem at all.
My understanding is that two approaches were differing by the section
covered by rcu_read_lock/rcu_read_unlock. The solution that you
suggested would handle pagefault completely under RCU as long as it's
possible and would fallback to the mmap_lock if it's impossible, while
Michel's implementation was taking rcu_read_lock/rcu_read_unlock for
smaller sections and would use vma->seq_number to detect any vma
changes between these sections. Your suggested approach sounds simpler
and the way I understood the comments is that we should give it a try.
Did I miss anything?
Thanks,
Suren.



>

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 23:15       ` Suren Baghdasaryan
@ 2022-05-03 23:45         ` Michal Hocko
  2022-05-04  0:22           ` Matthew Wilcox
  2022-05-04 16:23         ` Johannes Weiner
  1 sibling, 1 reply; 12+ messages in thread
From: Michal Hocko @ 2022-05-03 23:45 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox, Paul E. McKenney, Liam R. Howlett,
	Michel Lespinasse, Johannes Weiner, linux-mm, LKML,
	David Hildenbrand, Davidlohr Bueso

On Tue 03-05-22 16:15:46, Suren Baghdasaryan wrote:
> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
[...]
> > rcu_read_lock();
> > vma = vma_lookup();
> > if (down_read_trylock(&vma->sem)) {
> >         rcu_read_unlock();
> > } else {
> >         rcu_read_unlock();
> >         mmap_read_lock(mm);
> >         vma = vma_lookup();
> >         down_read(&vma->sem);
> > }
> >
> > ... and we then execute the page table allocation under the protection of
> > the vma->sem.
> >
> > At least, that's what I think we agreed to yesterday.
> 
> Honestly, I don't remember discussing vma->sem at all.

This is the rangelocking approach that is effectivelly per-VMA. So that
should help with the most simplistic case where the mmap contention is
not on the same VMAs which should be the most common case (e.g. faulting
from several threads while there is mmap happening in the background).

There are cases where this could be too coarse of course and RCU would
be a long term plan. The above seems easy enough and still probably good
enough for most cases so a good first step.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 23:45         ` Michal Hocko
@ 2022-05-04  0:22           ` Matthew Wilcox
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2022-05-04  0:22 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Suren Baghdasaryan, Paul E. McKenney, Liam R. Howlett,
	Michel Lespinasse, Johannes Weiner, linux-mm, LKML,
	David Hildenbrand, Davidlohr Bueso

On Wed, May 04, 2022 at 01:45:11AM +0200, Michal Hocko wrote:
> On Tue 03-05-22 16:15:46, Suren Baghdasaryan wrote:
> > On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
> [...]
> > > rcu_read_lock();
> > > vma = vma_lookup();
> > > if (down_read_trylock(&vma->sem)) {
> > >         rcu_read_unlock();
> > > } else {
> > >         rcu_read_unlock();
> > >         mmap_read_lock(mm);
> > >         vma = vma_lookup();
> > >         down_read(&vma->sem);
> > > }
> > >
> > > ... and we then execute the page table allocation under the protection of
> > > the vma->sem.
> > >
> > > At least, that's what I think we agreed to yesterday.
> > 
> > Honestly, I don't remember discussing vma->sem at all.
> 
> This is the rangelocking approach that is effectivelly per-VMA. So that
> should help with the most simplistic case where the mmap contention is
> not on the same VMAs which should be the most common case (e.g. faulting
> from several threads while there is mmap happening in the background).
> 
> There are cases where this could be too coarse of course and RCU would
> be a long term plan. The above seems easy enough and still probably good
> enough for most cases so a good first step.

It also fixes the low-pri monitoring daemon problem as page faults will
not be blocked by a writer (unless the read_trylock fails).

I see three potential outcomes here from the vma rwsem approach:

 - No particular improvement on any workloads.
   Result: we try something else.
 - Minor gains (5-10%).  We benchmark it and discover there's still
   significant contention on the vma_sem.
   Result: we take those wins and keep going towards a full RCU solution
 - Major gains (20-50%).
   Result: We're done, break out the champagne.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 18:28     ` Matthew Wilcox
  2022-05-03 23:15       ` Suren Baghdasaryan
@ 2022-05-04  8:20       ` Michel Lespinasse
  2022-05-04 16:52         ` Matthew Wilcox
  1 sibling, 1 reply; 12+ messages in thread
From: Michel Lespinasse @ 2022-05-04  8:20 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Paul E. McKenney, Michal Hocko, Liam Howlett, hannes, linux-mm,
	linux-kernel, Davidlohr Bueso, David

(for context, this came up during a discussion of speculative page
faults implementation details)

On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
> Johannes (I think it was?) made the point to me that if we have another
> task very slowly freeing memory, a task in this path can take advantage
> of that other task's hard work and never go into reclaim.  So the
> approach we should take is:
>
> p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
>
> if (failure) {
>   rcu_read_unlock();
>   do_reclaim();
>   return FAULT_FLAG_RETRY;
> }

I don't think this works. The problem with allocating page tables is
not just that it may break an rcu-locked code section; you also need
the code inserting the new page tables into the mm's page table tree
to synchronize with any munmap() that may be concurrently running. RCU
isn't sufficient here, and we would need a proper lock when wiring new
page tables (current code relies on mmap lock for this).

> ... but all this is now moot since the approach we agreed to yesterday
> is:
>
> rcu_read_lock();
> vma = vma_lookup();
> if (down_read_trylock(&vma->sem)) {
>         rcu_read_unlock();
> } else {
>         rcu_read_unlock();
>         mmap_read_lock(mm);
>         vma = vma_lookup();
>         down_read(&vma->sem);
> }
>
> ... and we then execute the page table allocation under the protection of
> the vma->sem.
>
> At least, that's what I think we agreed to yesterday.

I don't remember discussing any of this yesterday. As I remember it,
the discussion was about having one large RCU section vs several small
ones linked by sequence count checks to verify the validity of the vma
at the start of each RCU section.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-03 23:15       ` Suren Baghdasaryan
  2022-05-03 23:45         ` Michal Hocko
@ 2022-05-04 16:23         ` Johannes Weiner
  2022-05-24 20:37           ` Vlastimil Babka
  1 sibling, 1 reply; 12+ messages in thread
From: Johannes Weiner @ 2022-05-04 16:23 UTC (permalink / raw)
  To: Suren Baghdasaryan
  Cc: Matthew Wilcox, Paul E. McKenney, Michal Hocko, Liam R. Howlett,
	Michel Lespinasse, linux-mm, LKML, David Hildenbrand,
	Davidlohr Bueso

On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote:
> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote:
> > > On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote:
> > > > On Tue 03-05-22 08:59:13, Paul E. McKenney wrote:
> > > > > Hello!
> > > > >
> > > > > Just following up from off-list discussions yesterday.
> > > > >
> > > > > The requirements to allocate on an RCU-protected speculative fastpath
> > > > > seem to be as follows:
> > > > >
> > > > > 1.        Never sleep.
> > > > > 2.        Never reclaim.
> > > > > 3.        Leave emergency pools alone.
> > > > >
> > > > > Any others?
> > > > >
> > > > > If those rules suffice, and if my understanding of the GFP flags is
> > > > > correct (ha!!!), then the following GFP flags should cover this:
> > > > >
> > > > >   __GFP_NOMEMALLOC | __GFP_NOWARN
> > > >
> > > > GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN
> > >
> > > Ah, good point on GFP_NOWAIT, thank you!
> >
> > Johannes (I think it was?) made the point to me that if we have another
> > task very slowly freeing memory, a task in this path can take advantage
> > of that other task's hard work and never go into reclaim.  So the
> > approach we should take is:

Right, GFP_NOWAIT can starve out other allocations. It can clear out
the freelists without the burden of having to do reclaim like
everybody else wanting memory during a shortage. Including GFP_KERNEL.

In smaller doses and/or for privileged purposes (e.g. single-argument
kfree_rcu ;)), those allocations are fine. But because the context is
page tables specifically, it would mean that userspace could trigger a
large number of those and DOS other applications and the kernel.

> > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> >
> > if (failure) {
> >   rcu_read_unlock();
> >   do_reclaim();
> >   return FAULT_FLAG_RETRY;
> > }
> >
> > ... but all this is now moot since the approach we agreed to yesterday
> > is:
> 
> I think the discussion was about the above approach and Johannes
> suggested to fallback to the normal pagefault handling with mmap_lock
> locked if PMD does not exist. Please correct me if I misunderstood
> here.

Yeah. Either way works, as long as the task is held accountable.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-04  8:20       ` Michel Lespinasse
@ 2022-05-04 16:52         ` Matthew Wilcox
  0 siblings, 0 replies; 12+ messages in thread
From: Matthew Wilcox @ 2022-05-04 16:52 UTC (permalink / raw)
  To: Michel Lespinasse
  Cc: Paul E. McKenney, Michal Hocko, Liam Howlett, hannes, linux-mm,
	linux-kernel, Davidlohr Bueso, David

On Wed, May 04, 2022 at 01:20:39AM -0700, Michel Lespinasse wrote:
> (for context, this came up during a discussion of speculative page
> faults implementation details)
> 
> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
> > Johannes (I think it was?) made the point to me that if we have another
> > task very slowly freeing memory, a task in this path can take advantage
> > of that other task's hard work and never go into reclaim.  So the
> > approach we should take is:
> >
> > p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> > pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> > pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
> >
> > if (failure) {
> >   rcu_read_unlock();
> >   do_reclaim();
> >   return FAULT_FLAG_RETRY;
> > }
> 
> I don't think this works. The problem with allocating page tables is
> not just that it may break an rcu-locked code section; you also need
> the code inserting the new page tables into the mm's page table tree
> to synchronize with any munmap() that may be concurrently running. RCU
> isn't sufficient here, and we would need a proper lock when wiring new
> page tables (current code relies on mmap lock for this).

Hmm, so what you're saying is that we could see:

CPU 0					CPU 1

rcu_read_lock()
p4d_alloc()
pud_alloc()
					mmap_write_lock
					pmd_free
					pud_free
					p4d_free
					mmap_write_unlock
pmd_alloc()

... and now CPU 0 has stored a newly allocated PMD into a PUD that will
be freed by RCU.

Good catch, but fortunately we don't have to solve it right now because
we decided on Monday to not pursue this approach.

> > ... but all this is now moot since the approach we agreed to yesterday
> > is:
> >
> > rcu_read_lock();
> > vma = vma_lookup();
> > if (down_read_trylock(&vma->sem)) {
> >         rcu_read_unlock();
> > } else {
> >         rcu_read_unlock();
> >         mmap_read_lock(mm);
> >         vma = vma_lookup();
> >         down_read(&vma->sem);
> > }
> >
> > ... and we then execute the page table allocation under the protection of
> > the vma->sem.
> >
> > At least, that's what I think we agreed to yesterday.
> 
> I don't remember discussing any of this yesterday. As I remember it,
> the discussion was about having one large RCU section vs several small
> ones linked by sequence count checks to verify the validity of the vma
> at the start of each RCU section.

That was indeed what we started out discussing, but Michal and others
wanted to explore locking the VMA as a first step.  This approach
doesn't suffer from the same problem as the write side is:

	mmap_write_lock()
	vma = vma_lookup();
	down_write(&vma->sem);
	... tear down page tables ...
	up_write(&vma->sem);
	rcu_free(vma);
	mmap_write_unlock();

So when tearing down the page tables, if the fault is on this VMA, it
will block until the fault has released the VMA read lock.  If it's on
another VMA, the page table teardown will not tear down any page tables
which are partially covered by any VMA.

Or have I missed something else?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-04 16:23         ` Johannes Weiner
@ 2022-05-24 20:37           ` Vlastimil Babka
  2022-05-25 13:12             ` Johannes Weiner
  0 siblings, 1 reply; 12+ messages in thread
From: Vlastimil Babka @ 2022-05-24 20:37 UTC (permalink / raw)
  To: Johannes Weiner, Suren Baghdasaryan
  Cc: Matthew Wilcox, Paul E. McKenney, Michal Hocko, Liam R. Howlett,
	Michel Lespinasse, linux-mm, LKML, David Hildenbrand,
	Davidlohr Bueso

On 5/4/22 18:23, Johannes Weiner wrote:
> On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote:
>> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
>>>
>>> On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote:
>>>> On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote:
>>>>> On Tue 03-05-22 08:59:13, Paul E. McKenney wrote:
>>>>>> Hello!
>>>>>>
>>>>>> Just following up from off-list discussions yesterday.
>>>>>>
>>>>>> The requirements to allocate on an RCU-protected speculative fastpath
>>>>>> seem to be as follows:
>>>>>>
>>>>>> 1.        Never sleep.
>>>>>> 2.        Never reclaim.
>>>>>> 3.        Leave emergency pools alone.
>>>>>>
>>>>>> Any others?
>>>>>>
>>>>>> If those rules suffice, and if my understanding of the GFP flags is
>>>>>> correct (ha!!!), then the following GFP flags should cover this:
>>>>>>
>>>>>>   __GFP_NOMEMALLOC | __GFP_NOWARN
>>>>>
>>>>> GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN
>>>>
>>>> Ah, good point on GFP_NOWAIT, thank you!
>>>
>>> Johannes (I think it was?) made the point to me that if we have another
>>> task very slowly freeing memory, a task in this path can take advantage
>>> of that other task's hard work and never go into reclaim.  So the
>>> approach we should take is:
> 
> Right, GFP_NOWAIT can starve out other allocations. It can clear out
> the freelists without the burden of having to do reclaim like
> everybody else wanting memory during a shortage. Including GFP_KERNEL.

FTR, I wonder if this is really true, given the suggested fallback. With
GFP_NOWAIT, you can either see memory (in all applicable zones) as

a) above low_watermark, just go ahead and allocate, as GFP_KERNEL would
b) between min and low watermark, wake up kswapd and allocate, as
GFP_KERNEL would
c) below min watermark, the most interesting. GFP_KERNEL fallbacks to
reclaim. If the GFP_NOWAIT path's fallback also includes reclaim, as
suggested in this thread, how is it really different from GFP_KERNEL?

So am I missing something or is GFP_NOWAIT fastpath with an immediate
fallback that includes reclaim (and not just a retry loop) fundamentally
not different from GFP_KERNEL, regardless of how often we attempt it?

> In smaller doses and/or for privileged purposes (e.g. single-argument
> kfree_rcu ;)), those allocations are fine. But because the context is
> page tables specifically, it would mean that userspace could trigger a
> large number of those and DOS other applications and the kernel.
> 
>>> p4d_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
>>> pud_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
>>> pmd_alloc(GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN);
>>>
>>> if (failure) {
>>>   rcu_read_unlock();
>>>   do_reclaim();
>>>   return FAULT_FLAG_RETRY;
>>> }
>>>
>>> ... but all this is now moot since the approach we agreed to yesterday
>>> is:
>>
>> I think the discussion was about the above approach and Johannes
>> suggested to fallback to the normal pagefault handling with mmap_lock
>> locked if PMD does not exist. Please correct me if I misunderstood
>> here.
> 
> Yeah. Either way works, as long as the task is held accountable.
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Memory allocation on speculative fastpaths
  2022-05-24 20:37           ` Vlastimil Babka
@ 2022-05-25 13:12             ` Johannes Weiner
  0 siblings, 0 replies; 12+ messages in thread
From: Johannes Weiner @ 2022-05-25 13:12 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Suren Baghdasaryan, Matthew Wilcox, Paul E. McKenney,
	Michal Hocko, Liam R. Howlett, Michel Lespinasse, linux-mm, LKML,
	David Hildenbrand, Davidlohr Bueso

On Tue, May 24, 2022 at 10:37:15PM +0200, Vlastimil Babka wrote:
> On 5/4/22 18:23, Johannes Weiner wrote:
> > On Tue, May 03, 2022 at 04:15:46PM -0700, Suren Baghdasaryan wrote:
> >> On Tue, May 3, 2022 at 11:28 AM Matthew Wilcox <willy@infradead.org> wrote:
> >>>
> >>> On Tue, May 03, 2022 at 09:39:05AM -0700, Paul E. McKenney wrote:
> >>>> On Tue, May 03, 2022 at 06:04:13PM +0200, Michal Hocko wrote:
> >>>>> On Tue 03-05-22 08:59:13, Paul E. McKenney wrote:
> >>>>>> Hello!
> >>>>>>
> >>>>>> Just following up from off-list discussions yesterday.
> >>>>>>
> >>>>>> The requirements to allocate on an RCU-protected speculative fastpath
> >>>>>> seem to be as follows:
> >>>>>>
> >>>>>> 1.        Never sleep.
> >>>>>> 2.        Never reclaim.
> >>>>>> 3.        Leave emergency pools alone.
> >>>>>>
> >>>>>> Any others?
> >>>>>>
> >>>>>> If those rules suffice, and if my understanding of the GFP flags is
> >>>>>> correct (ha!!!), then the following GFP flags should cover this:
> >>>>>>
> >>>>>>   __GFP_NOMEMALLOC | __GFP_NOWARN
> >>>>>
> >>>>> GFP_NOWAIT | __GFP_NOMEMALLOC | __GFP_NOWARN
> >>>>
> >>>> Ah, good point on GFP_NOWAIT, thank you!
> >>>
> >>> Johannes (I think it was?) made the point to me that if we have another
> >>> task very slowly freeing memory, a task in this path can take advantage
> >>> of that other task's hard work and never go into reclaim.  So the
> >>> approach we should take is:
> > 
> > Right, GFP_NOWAIT can starve out other allocations. It can clear out
> > the freelists without the burden of having to do reclaim like
> > everybody else wanting memory during a shortage. Including GFP_KERNEL.
> 
> FTR, I wonder if this is really true, given the suggested fallback.

                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

IIRC adding this fallback was the conclusion of the in-person
discussion. Above I just tried to summarize for the record the
original concern that led to it. I could have been more clear.

Your analysis is dead on, of course.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-05-25 13:12 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-03 15:59 Memory allocation on speculative fastpaths Paul E. McKenney
2022-05-03 16:04 ` Michal Hocko
2022-05-03 16:39   ` Paul E. McKenney
2022-05-03 18:28     ` Matthew Wilcox
2022-05-03 23:15       ` Suren Baghdasaryan
2022-05-03 23:45         ` Michal Hocko
2022-05-04  0:22           ` Matthew Wilcox
2022-05-04 16:23         ` Johannes Weiner
2022-05-24 20:37           ` Vlastimil Babka
2022-05-25 13:12             ` Johannes Weiner
2022-05-04  8:20       ` Michel Lespinasse
2022-05-04 16:52         ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.