Re: [PATCH 1/1] rcu/tree: Drop the lock before entering to page allocator

From: Uladzislau Rezki <urezki@gmail.com>
To: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Uladzislau Rezki <urezki@gmail.com>,
	Joel Fernandes <joel@joelfernandes.org>,
	LKML <linux-kernel@vger.kernel.org>, RCU <rcu@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	"Paul E . McKenney" <paulmck@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Theodore Y . Ts'o" <tytso@mit.edu>,
	Matthew Wilcox <willy@infradead.org>,
	Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com>
Subject: Re: [PATCH 1/1] rcu/tree: Drop the lock before entering to page allocator
Date: Thu, 16 Jul 2020 17:34:30 +0200	[thread overview]
Message-ID: <20200716153430.GA31261@pc636> (raw)
In-Reply-To: <20200716150414.iqpyby6nrww4zbyk@linutronix.de>

On Thu, Jul 16, 2020 at 05:04:14PM +0200, Sebastian Andrzej Siewior wrote:
> On 2020-07-16 16:47:28 [+0200], Uladzislau Rezki wrote:
> > On Thu, Jul 16, 2020 at 04:25:37PM +0200, Sebastian Andrzej Siewior wrote:
> > > On 2020-07-16 11:19:13 [+0200], Uladzislau Rezki wrote:
> > > > Sebastian, could you please confirm that if that patch that is in
> > > > question fixes it?
> > > > 
> > > > It would be appreciated!
> > > 
> > > So that preempt disable should in terms any warnings. However I don't
> > > think that it is strictly needed and from scheduling point of view you
> > > forbid a CPU migration which might be good otherwise.
> > >
> > Please elaborate your point regarding "i do not think it is strictly needed".
> > 
> > Actually i can rework the patch to remove even such preempt_enable/disable
> > to stay on the same CPU, but i do not see the point of doing it.
> > 
> > Do you see the point?
> 
> You disable preemption for what reason? It is not documented, it is not
> obvious - why is it required?
> 
I can document it. Will it work for you? Actually i can get rid of it
but there can be other side effects which also can be addressed but
i do not see any issues of doing just "preemtion off". Please have
a look at sources across the kernel how many times a memory is
requested in atomic context:

<snip>
preempt_disable() os any spinlock or raw_locks, etc..
__get_free_page(GFP_NOWAIT | __GFP_NOWARN);
or
kmalloc(GFP_ATOMIC);
<snip>

all those flags say to page allocator or SLAB that sleeping is not
allowed.

> > As for scheduling point of view. Well, there are many places when there
> > is a demand in memory or pages from atomic context. Also, getting a page
> > is not considered as a hot path in the kfree_rcu(). 
> 
> If you disable preemption than you assume that you wouldn't be atomic
> otherwise. You say that at this point it is not a hot path so if this is
> not *that* important why not allow preemption and allow the schedule to
>
If i disable preemption, it means that atomic section begins. Let me
explain why i disable preemption.

If i invoke a page allocator in full preemptable context, it can be
that we get a page but end up on another CPU. That another CPU does
not need it, because it has some free spots in its internal array
for pointer collecting. If we stay on the same CPU we eliminate it.

The question is what to do with that page. I see two solutions.

1) Queue it to the CPU2 page cache for further reuse or freeing.
2) Just proceed with newly allocated page thinking that previous
   "bulk arry" is partly populated, i.e. it can be that previous
   one has plenty of free spots.

Actually that is why i want to stay on the same CPU.

>
> place you somewhere else if the scheduler decides that it is a good idea.
> 
It is not a hot path, really. I do not consider it as critical, since the
page allocator will not do much work(we use restricted flags), on a high
level it is limited to just ask a page and return it. If no page, check
watermark, if low, wakeup kswapd and return NULL, that is it.

> > > Also if interrupts and everything is enabled then someone else might
> > > invoke kfree_rcu() from BH context for instance.
> > > 
> > And what? What is a problem here, please elaborate if you see any
> > issues.
> 
> That the kfree_rcu() caller from BH context will end up here as well,
> asking for a page.
> 
Please think about that CPU0 is somewhere in __get_free_page(), when it
is still there, there comes an interrupt that also calls kfree_rcu()
and end up somewhere in __get_free_page(). To prevent such internal
critical sections usually the code disables irqs and do some critical
things to prevent of breaking something. 

So, basically __get_free_page() can be interrupted and being invoked
one more time on the same CPU. It uses spin_lockirqsave() for such
scenarios.

Our internal lock is dropped.

--
Vlad Rezki