Linux-rt-users Archive on lore.kernel.org
 help / color / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Michal Hocko <mhocko@kernel.org>
Subject: Re: [PATCH 11/11] mm/page_alloc: Embed per_cpu_pages locking within the per-cpu structure
Date: Thu, 15 Apr 2021 16:29:58 +0100
Message-ID: <20210415152958.GL3697@techsingularity.net> (raw)
In-Reply-To: <de5d1dbb-fb56-9660-fadb-6318047305d4@suse.cz>

On Thu, Apr 15, 2021 at 04:53:46PM +0200, Vlastimil Babka wrote:
> On 4/14/21 3:39 PM, Mel Gorman wrote:
> > struct per_cpu_pages is protected by the pagesets lock but it can be
> > embedded within struct per_cpu_pages at a minor cost. This is possible
> > because per-cpu lookups are based on offsets. Paraphrasing an explanation
> > from Peter Ziljstra
> > 
> >   The whole thing relies on:
> > 
> >     &per_cpu_ptr(msblk->stream, cpu)->lock == per_cpu_ptr(&msblk->stream->lock, cpu)
> > 
> >   Which is true because the lhs:
> > 
> >     (local_lock_t *)((zone->per_cpu_pages + per_cpu_offset(cpu)) + offsetof(struct per_cpu_pages, lock))
> > 
> >   and the rhs:
> > 
> >     (local_lock_t *)((zone->per_cpu_pages + offsetof(struct per_cpu_pages, lock)) + per_cpu_offset(cpu))
> > 
> >   are identical, because addition is associative.
> > 
> > More details are included in mmzone.h. This embedding is not completely
> > free for three reasons.
> > 
> > 1. As local_lock does not return a per-cpu structure, the PCP has to
> >    be looked up twice -- first to acquire the lock and again to get the
> >    PCP pointer.
> > 
> > 2. For PREEMPT_RT and CONFIG_DEBUG_LOCK_ALLOC, local_lock is potentially
> >    a spinlock or has lock-specific tracking. In both cases, it becomes
> >    necessary to release/acquire different locks when freeing a list of
> >    pages in free_unref_page_list.
> 
> Looks like this pattern could benefit from a local_lock API helper that would do
> the right thing? It probably couldn't optimize much the CONFIG_PREEMPT_RT case
> which would need to be unlock/lock in any case, but CONFIG_DEBUG_LOCK_ALLOC
> could perhaps just keep the IRQ's disabled and just note the change of what's
> acquired?
> 

A helper could potentially be used but right now, there is only one
call-site that needs this type of care so it may be overkill. A helper
was proposed that can lookup and lock a per-cpu structure which is
generally useful but does not suit the case where different locks need
to be acquired.

> > 3. For most kernel configurations, local_lock_t is empty and no storage is
> >    required. By embedding the lock, the memory consumption on PREEMPT_RT
> >    and CONFIG_DEBUG_LOCK_ALLOC is higher.
> 
> But I wonder, is there really a benefit to this increased complexity? Before the
> patch we had "pagesets" - a local_lock that protects all zones' pcplists. Now
> each zone's pcplists have own local_lock. On !PREEMPT_RT we will never take the
> locks of multiple zones from the same CPU in parallel, because we use
> local_lock_irqsave(). Can that parallelism happen on PREEMPT_RT, because that
> could perhaps justify the change?
> 

I don't think PREEMPT_RT gets additional parallelism because it's still
a per-cpu structure that is being protected. The difference is whether
we are protecting the CPU-N index for all per_cpu_pages or just one.
The patch exists because it was asked why the lock was not embedded within
the structure it's protecting. I initially thought that was unsafe and
I was wrong as explained in the changelog. But now that I find it *can*
be done but it's a bit ugly so I put it at the end of the series so it
can be dropped if necessary.

-- 
Mel Gorman
SUSE Labs

      reply index

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-04-14 13:39 [PATCH 0/11 v3] Use local_lock for pcp protection and reduce stat overhead Mel Gorman
2021-04-14 13:39 ` [PATCH 01/11] mm/page_alloc: Split per cpu page lists and zone stats Mel Gorman
2021-04-14 13:39 ` [PATCH 02/11] mm/page_alloc: Convert per-cpu list protection to local_lock Mel Gorman
2021-04-14 13:39 ` [PATCH 03/11] mm/vmstat: Convert NUMA statistics to basic NUMA counters Mel Gorman
2021-04-14 13:39 ` [PATCH 04/11] mm/vmstat: Inline NUMA event counter updates Mel Gorman
2021-04-14 16:20   ` Vlastimil Babka
2021-04-14 16:26     ` Vlastimil Babka
2021-04-15  9:34       ` Mel Gorman
2021-04-14 13:39 ` [PATCH 05/11] mm/page_alloc: Batch the accounting updates in the bulk allocator Mel Gorman
2021-04-14 16:31   ` Vlastimil Babka
2021-04-14 13:39 ` [PATCH 06/11] mm/page_alloc: Reduce duration that IRQs are disabled for VM counters Mel Gorman
2021-04-14 17:10   ` Vlastimil Babka
2021-04-14 13:39 ` [PATCH 07/11] mm/page_alloc: Remove duplicate checks if migratetype should be isolated Mel Gorman
2021-04-14 17:21   ` Vlastimil Babka
2021-04-15  9:33     ` Mel Gorman
2021-04-15 11:24       ` Vlastimil Babka
2021-04-14 13:39 ` [PATCH 08/11] mm/page_alloc: Explicitly acquire the zone lock in __free_pages_ok Mel Gorman
2021-04-15 10:24   ` Vlastimil Babka
2021-04-14 13:39 ` [PATCH 09/11] mm/page_alloc: Avoid conflating IRQs disabled with zone->lock Mel Gorman
2021-04-15 12:25   ` Vlastimil Babka
2021-04-15 14:11     ` Mel Gorman
2021-04-14 13:39 ` [PATCH 10/11] mm/page_alloc: Update PGFREE outside the zone lock in __free_pages_ok Mel Gorman
2021-04-15 13:04   ` Vlastimil Babka
2021-04-14 13:39 ` [PATCH 11/11] mm/page_alloc: Embed per_cpu_pages locking within the per-cpu structure Mel Gorman
2021-04-15 14:53   ` Vlastimil Babka
2021-04-15 15:29     ` Mel Gorman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210415152958.GL3697@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=brouer@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-rt-users Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git