On Tue, Mar 31 2020, Joel Fernandes wrote:

> On Tue, Mar 31, 2020 at 05:34:50PM +0200, Michal Hocko wrote:
>> On Tue 31-03-20 10:58:06, Joel Fernandes wrote:
>> [...]
>> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
>> > > index 4be763355c9fb..965deefffdd58 100644
>> > > --- a/kernel/rcu/tree.c
>> > > +++ b/kernel/rcu/tree.c
>> > > @@ -3149,7 +3149,7 @@ static inline struct rcu_head *attach_rcu_head_to_object(void *obj)
>> > >  
>> > >  	if (!ptr)
>> > >  		ptr = kmalloc(sizeof(unsigned long *) +
>> > > -				sizeof(struct rcu_head), GFP_ATOMIC | __GFP_NOWARN);
>> > > +				sizeof(struct rcu_head), GFP_MEMALLOC);
>> > 
>> > Just to add, the main requirements here are:
>> > 1. Allocation should be bounded in time.
>> > 2. Allocation should try hard (possibly tapping into reserves)
>> > 3. Sleeping is Ok but should not affect the time bound.
>> 
>> 
>> __GFP_ATOMIC | __GFP_HIGH is the way to get an additional access to
>> memory reserves regarless of the sleeping status.
>> 
>> Using __GFP_MEMALLOC is quite dangerous because it can deplete _all_ the
>> memory. What does prevent the above code path to do that?
>
> Can you suggest what prevents other users of GFP_MEMALLOC from doing that
> also? That's the whole point of having a reserve, in normal usage no one will
> use it, but some times you need to use it. Keep in mind this is not a common
> case in this code here, this is triggered only if earlier allocation attempts
> failed. Only *then* we try with GFP_MEMALLOC with promises to free additional
> memory soon.

I think that "soon" is the key point.  Users of __GFP_MEMALLOC certainly
must be working to free other memory, that other memory needs to be freed
"soon".  In particular - sooner than all the reserve is exhausted.  This
can require rate-limiting.  If one allocation can result in one page
being freed, that is good and it is probably OK to have 1000 allocations
resulting in 1000 pages being freed soon.  But 10 million allocation to
gain 10 million pages is not such a good thing and shouldn't be needed.
Once those first 1000 pages have been freed, you won't need
__GFP_MEMALLOC allocations any more, and you must be prepare to wait for
them.

So where does the rate-limiting happen in your proposal?  A GP can be
multiple milliseconds, which is time for lots of memory to be allocated
and for rcu-free queues to grow quite large.

You mention a possible fall-back of calling synchronize_rcu().  I think
that needs to be a fallback that happens well before __GFP_MEMALLOC is
exhausted.   You need to choose some maximum amount that you will
allocate, then use synchronize_rcu() (or probably the _expedited
version) after that.  The pool of reserves are certainly there for you
to use, but not for you to exhaust.

If you have your own rate-limiting, then I think __GFP_MEMALLOC is
probably OK, and also you *don't* want the memalloc to wait.  If memory
cannot be allocated immediately, you need to use your own fallback.

NeilBrown