Re: [PATCH 0/6] mm: Remote LRU per-cpu pagevec cache/per-cpu page list drain support

From: nsaenzju@redhat.com
To: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	 Vlastimil Babka <vbabka@suse.cz>
Cc: akpm@linux-foundation.org, frederic@kernel.org,
	 linux-kernel@vger.kernel.org, linux-mm@kvack.org, cl@linux.com,
	 juri.lelli@redhat.com, mingo@redhat.com, mtosatti@redhat.com,
	nilal@redhat.com,  mgorman@suse.de, ppandit@redhat.com,
	williams@redhat.com, bigeasy@linutronix.de,
	 anna-maria@linutronix.de, linux-rt-users@vger.kernel.org
Subject: Re: [PATCH 0/6] mm: Remote LRU per-cpu pagevec cache/per-cpu page list drain support
Date: Mon, 27 Sep 2021 11:30:53 +0200	[thread overview]
Message-ID: <8ee79b78c7f7f8fa18ec237494f09492ef60081f.camel@redhat.com> (raw)
In-Reply-To: <87v92sgt3n.ffs@tglx>

Hi Thomas,
thanks for the in-depth review.

On Thu, 2021-09-23 at 00:09 +0200, Thomas Gleixner wrote:
> On Wed, Sep 22 2021 at 13:28, Peter Zijlstra wrote:
> > On Tue, Sep 21, 2021 at 07:59:51PM +0200, Vlastimil Babka wrote:
> > 
> > > These days the pcplist protection is done by local_lock, which solved
> > > the RT concerns. Probably a stupid/infeasible idea, but maybe what you
> > > want to achieve could be more generally solved at the local_lock level?
> > > That on NOHZ_FULL CPUs, local_locks could have this mode where they
> > > could synchronize with remote cpus?
> > 
> > local_lock and spinlock have different rules, local_lock for example can
> > never cause an irq inversion, unlike a spinlock.
> 
> TBH, I really regret an attempt I made at some point in the RT
> development to abuse local locks for this kind of cross CPU protections
> because that led to yet another semantically ill defined construct.
> 
> local locks are as the name says strictly local. IOW, they do not exist
> for remote access. Just look at the !RT mapping:
> 
>   local_lock() 		-> preempt_disable()
>   local_lock_irq()	-> local_irq_disable()
>   ...
> 
> The only thing local_lock is addressing is the opaque nature of
> preempt_disable(), local*_disable/save() protected critical sections,
> which have per CPU BKL, i.e. undefined protection scope, semantics.
> 
> If you really want cross CPU protection then using a regular spinlock in
> a per CPU data structure is the right way to go.
> 
> That makes it a bit akward vs. the code at hand which already introduced
> local locks to replace the opaque preempt/local_irq disabled critical
> sections with scoped local locks which in turn allows RT to substitute
> them with strict CPU local spinlocks.
> 
> But for clarity sake we really have to look at two different cases now:
> 
>    1) Strict per CPU local protection
> 
>       That's what the code does today via local lock and this includes
>       RT where the locality is preserved via the local lock semantics
> 
>       I.e. for the !RT case the spinlock overhead is avoided 
> 
>    2) Global scoped per CPU protection
> 
>       That's what Nicolas is trying to achieve to be able to update
>       data structures cross CPU for the sake of avoiding smp function
>       calls or queuing work on remote CPUs for the NOHZ_FULL isolation
>       use case.
> 
> That said, I completely understand Andrew's concerns versus these
> distinctions and their test coverage.
> 
> In consequence the real interesting question here is whether any of
> these use cases are sensible to the extra overhead of #2.
> 
> IOW, does it really matter whether the !RT and !NOHZ_FULL case take an
> uncontended per CPU spinlock unconditionally instead of just disabling
> preemption / interrupts?
> 
> The same question arises vs. the remote work queuing. Does it really
> matter? I think that a proper investigation is due to figure out whether
> delegated work is really superiour vs. doing the same work locked from a
> remote CPU occasionally.
> 
> If you really think about it then the task which is initiating the work
> is already in a slow path. So what's the tradeoff to make this path a
> little bit slower due to remote access vs. scheduling work and thereby
> disturbing the remote CPU which has a performance impact as well and in
> the NOHZ_FULL case even a correctness impact? That's especially
> questionable for situations where the initiator has to wait for
> completion of the remote work.
> 
> The changelogs and the cover letter have a distinct void vs. that which
> means this is just another example of 'scratch my itch' changes w/o
> proper justification.

Point taken, I'll get to it.

-- 
Nicolás Sáenz