On Tue, Mar 31 2020, Joel Fernandes wrote: > On Tue, Mar 31, 2020 at 05:34:50PM +0200, Michal Hocko wrote: >> On Tue 31-03-20 10:58:06, Joel Fernandes wrote: >> [...] >> > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c >> > > index 4be763355c9fb..965deefffdd58 100644 >> > > --- a/kernel/rcu/tree.c >> > > +++ b/kernel/rcu/tree.c >> > > @@ -3149,7 +3149,7 @@ static inline struct rcu_head *attach_rcu_head_to_object(void *obj) >> > > >> > > if (!ptr) >> > > ptr = kmalloc(sizeof(unsigned long *) + >> > > - sizeof(struct rcu_head), GFP_ATOMIC | __GFP_NOWARN); >> > > + sizeof(struct rcu_head), GFP_MEMALLOC); >> > >> > Just to add, the main requirements here are: >> > 1. Allocation should be bounded in time. >> > 2. Allocation should try hard (possibly tapping into reserves) >> > 3. Sleeping is Ok but should not affect the time bound. >> >> >> __GFP_ATOMIC | __GFP_HIGH is the way to get an additional access to >> memory reserves regarless of the sleeping status. >> >> Using __GFP_MEMALLOC is quite dangerous because it can deplete _all_ the >> memory. What does prevent the above code path to do that? > > Can you suggest what prevents other users of GFP_MEMALLOC from doing that > also? That's the whole point of having a reserve, in normal usage no one will > use it, but some times you need to use it. Keep in mind this is not a common > case in this code here, this is triggered only if earlier allocation attempts > failed. Only *then* we try with GFP_MEMALLOC with promises to free additional > memory soon. I think that "soon" is the key point. Users of __GFP_MEMALLOC certainly must be working to free other memory, that other memory needs to be freed "soon". In particular - sooner than all the reserve is exhausted. This can require rate-limiting. If one allocation can result in one page being freed, that is good and it is probably OK to have 1000 allocations resulting in 1000 pages being freed soon. But 10 million allocation to gain 10 million pages is not such a good thing and shouldn't be needed. Once those first 1000 pages have been freed, you won't need __GFP_MEMALLOC allocations any more, and you must be prepare to wait for them. So where does the rate-limiting happen in your proposal? A GP can be multiple milliseconds, which is time for lots of memory to be allocated and for rcu-free queues to grow quite large. You mention a possible fall-back of calling synchronize_rcu(). I think that needs to be a fallback that happens well before __GFP_MEMALLOC is exhausted. You need to choose some maximum amount that you will allocate, then use synchronize_rcu() (or probably the _expedited version) after that. The pool of reserves are certainly there for you to use, but not for you to exhaust. If you have your own rate-limiting, then I think __GFP_MEMALLOC is probably OK, and also you *don't* want the memalloc to wait. If memory cannot be allocated immediately, you need to use your own fallback. NeilBrown