Linux-rt-users Archive on
 help / color / Atom feed
From: Mel Gorman <>
To: Ingo Molnar <>,
	Peter Zijlstra <>,
	Thomas Gleixner <>
Cc: Linux-MM <>,
	Linux-RT-Users <>,
	LKML <>,
	Chuck Lever <>,
	Jesper Dangaard Brouer <>,
	Matthew Wilcox <>
Subject: Re: [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead
Date: Wed, 31 Mar 2021 09:52:13 +0100
Message-ID: <> (raw)
In-Reply-To: <>

Ingo, Thomas or Peter, is there any chance one of you could take a look
at patch "[PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to
local_lock" from this series? It's partially motivated by PREEMPT_RT. More
details below.

On Mon, Mar 29, 2021 at 01:06:42PM +0100, Mel Gorman wrote:
> This series requires patches in Andrew's tree so the series is also
> available at
> git:// mm-percpu-local_lock-v1r15
> The PCP (per-cpu page allocator in page_alloc.c) share locking requirements
> with vmstat which is inconvenient and causes some issues. Possibly because
> of that, the PCP list and vmstat share the same per-cpu space meaning that
> it's possible that vmstat updates dirty cache lines holding per-cpu lists
> across CPUs unless padding is used. The series splits that structure and
> separates the locking.

The bulk page allocation series that the local_lock work had an
additional fix so I've rebased this onto

git:// mm-percpu-local_lock-v1r16

> Second, PREEMPT_RT considers the following sequence to be unsafe
> as documented in Documentation/locking/locktypes.rst
>    local_irq_disable();
>    spin_lock(&lock);
> The pcp allocator has this sequence for rmqueue_pcplist (local_irq_save)
> -> __rmqueue_pcplist -> rmqueue_bulk (spin_lock). This series explicitly
> separates the locking requirements for the PCP list (local_lock) and stat
> updates (irqs disabled). Once that is done, the length of time IRQs are
> disabled can be reduced and in some cases, IRQ disabling can be replaced
> with preempt_disable.

It's this part I'm interested in even though it only partially addresses
the preempt-rt tree concerns. More legwork is needed for preempt-rt which
is outside the context of this series. At minimum, it involves

1. Split locking of pcp and buddy allocator instead of using spin_lock()
   when it's "known" that IRQs are disabled (not necessarily a valid
   assumption on PREEMPT_RT)
2. Split the zone lock into what protects the zone metadata and what
   protects the free lists

This looks straight-forward but it involves audit work and it may be
difficult to avoid regressing non-PREEMPT_RT kernels by disabling/enabling
IRQs when switching between the pcp allocator and the buddy allocator.

> After that, it was very obvious that zone_statistics in particular has way
> too much overhead and leaves IRQs disabled for longer than necessary. It
> has perfectly accurate counters requiring IRQs be disabled for parallel
> RMW sequences when inaccurate ones like vm_events would do. The series
> makes the NUMA statistics (NUMA_HIT and friends) inaccurate counters that
> only require preempt be disabled.
> Finally the bulk page allocator can then do all the stat updates in bulk
> with IRQs enabled which should improve the efficiency of the bulk page
> allocator. Technically, this could have been done without the local_lock
> and vmstat conversion work and the order simply reflects the timing of
> when different series were implemented.
> No performance data is included because despite the overhead of the
> stats, it's within the noise for most workloads but Jesper and Chuck may
> observe a significant different with the same tests used for the bulk
> page allocator. The series is more likely to be interesting to the RT
> folk in terms of slowing getting the PREEMPT tree into mainline.
>  drivers/base/node.c    |  18 +--
>  include/linux/mmzone.h |  29 +++--
>  include/linux/vmstat.h |  65 ++++++-----
>  mm/mempolicy.c         |   2 +-
>  mm/page_alloc.c        | 173 ++++++++++++++++------------
>  mm/vmstat.c            | 254 +++++++++++++++--------------------------
>  6 files changed, 254 insertions(+), 287 deletions(-)
> -- 
> 2.26.2

Mel Gorman

  parent reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29 12:06 Mel Gorman
2021-03-29 12:06 ` [PATCH 1/6] mm/page_alloc: Split per cpu page lists and zone stats Mel Gorman
2021-03-29 12:06 ` [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock Mel Gorman
2021-03-31  9:55   ` Thomas Gleixner
2021-03-31 11:01     ` Mel Gorman
2021-03-31 17:42       ` Thomas Gleixner
2021-03-31 17:46         ` Thomas Gleixner
2021-03-31 20:42         ` Mel Gorman
2021-03-29 12:06 ` [PATCH 3/6] mm/vmstat: Convert NUMA statistics to basic NUMA counters Mel Gorman
2021-03-29 12:06 ` [PATCH 4/6] mm/vmstat: Inline NUMA event counter updates Mel Gorman
2021-03-29 12:06 ` [PATCH 5/6] mm/page_alloc: Batch the accounting updates in the bulk allocator Mel Gorman
2021-03-29 12:06 ` [PATCH 6/6] mm/page_alloc: Reduce duration that IRQs are disabled for VM counters Mel Gorman
2021-03-30 18:51 ` [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead Jesper Dangaard Brouer
2021-03-31  7:38   ` Mel Gorman
2021-03-31  8:17     ` Jesper Dangaard Brouer
2021-03-31  8:52 ` Mel Gorman [this message]
2021-03-31  9:51   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-rt-users Archive on

Archives are clonable:
	git clone --mirror linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ \
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:

AGPL code for this site: git clone