Linux-rt-users Archive on lore.kernel.org
 help / color / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Thomas Gleixner <tglx@linutronix.de>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux-RT-Users <linux-rt-users@vger.kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Chuck Lever <chuck.lever@oracle.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>,
	Matthew Wilcox <willy@infradead.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Subject: Re: [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock
Date: Wed, 31 Mar 2021 12:01:37 +0100
Message-ID: <20210331110137.GA3697@techsingularity.net> (raw)
In-Reply-To: <877dln640j.ffs@nanos.tec.linutronix.de>

On Wed, Mar 31, 2021 at 11:55:56AM +0200, Thomas Gleixner wrote:
> On Mon, Mar 29 2021 at 13:06, Mel Gorman wrote:
> > There is a lack of clarity of what exactly local_irq_save/local_irq_restore
> > protects in page_alloc.c . It conflates the protection of per-cpu page
> > allocation structures with per-cpu vmstat deltas.
> >
> > This patch protects the PCP structure using local_lock which
> > for most configurations is identical to IRQ enabling/disabling.
> > The scope of the lock is still wider than it should be but this is
> > decreased in later patches. The per-cpu vmstat deltas are protected by
> > preempt_disable/preempt_enable where necessary instead of relying on
> > IRQ disable/enable.
> 
> Yes, this goes into the right direction and I really appreciate the
> scoped protection for clarity sake.
> 

Thanks.

> >  #ifdef CONFIG_MEMORY_HOTREMOVE
> > diff --git a/mm/vmstat.c b/mm/vmstat.c
> > index 8a8f1a26b231..01b74ff73549 100644
> > --- a/mm/vmstat.c
> > +++ b/mm/vmstat.c
> > @@ -887,6 +887,7 @@ void cpu_vm_stats_fold(int cpu)
> >  
> >  		pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
> >  
> > +		preempt_disable();
> 
> What's the reason for the preempt_disable() here? A comment would be
> appreciated.
> 

Very good question because it's protecting vm_stat_diff and
vm_numa_stat_diff in different contexts and not quite correctly at this
point of the series. By the end of the series vm_numa_stat_diff is a
simple counter and does not need special protection.

Right now, it's protecting against a read and clear of vm_stat_diff
in two contexts -- cpu_vm_stats_fold and drain_zonestats but it's only
defensive. cpu_vm_stats_fold is only called when a CPU is going dead and
drain_zonestats is called from memory hotplug context. The protection is
necessary only if a new drain_zonestats caller was added without taking
the RMW of vm_stat_diff into account which may never happen.

This whole problem with preemption could be avoided altogether if
this_cpu_xchg was used similar to what is done elsewhere in vmstat
so.... this?

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 64429ca4957f..9528304ce24d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -8969,8 +8969,9 @@ void zone_pcp_reset(struct zone *zone)
 	struct per_cpu_zonestat *pzstats;
 
 	/*
-	 * No race with drain_pages. drain_zonestat disables preemption
-	 * and drain_pages relies on the pcp local_lock.
+	 * No race with drain_pages. drain_zonestat is only concerned with
+	 * vm_*_stat_diff which is updated with this_cpu_xchg and drain_pages
+	 * only cares about the PCP lists protected by local_lock.
 	 */
 	if (zone->per_cpu_pageset != &boot_pageset) {
 		for_each_online_cpu(cpu) {
diff --git a/mm/vmstat.c b/mm/vmstat.c
index 01b74ff73549..34ff61a145d2 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -887,13 +887,11 @@ void cpu_vm_stats_fold(int cpu)
 
 		pzstats = per_cpu_ptr(zone->per_cpu_zonestats, cpu);
 
-		preempt_disable();
 		for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 			if (pzstats->vm_stat_diff[i]) {
 				int v;
 
-				v = pzstats->vm_stat_diff[i];
-				pzstats->vm_stat_diff[i] = 0;
+				v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);
 				atomic_long_add(v, &zone->vm_stat[i]);
 				global_zone_diff[i] += v;
 			}
@@ -903,13 +901,11 @@ void cpu_vm_stats_fold(int cpu)
 			if (pzstats->vm_numa_stat_diff[i]) {
 				int v;
 
-				v = pzstats->vm_numa_stat_diff[i];
-				pzstats->vm_numa_stat_diff[i] = 0;
+				v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0);
 				atomic_long_add(v, &zone->vm_numa_stat[i]);
 				global_numa_diff[i] += v;
 			}
 #endif
-		preempt_enable();
 	}
 
 	for_each_online_pgdat(pgdat) {
@@ -943,10 +939,9 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats)
 {
 	int i;
 
-	preempt_disable();
 	for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
 		if (pzstats->vm_stat_diff[i]) {
-			int v = pzstats->vm_stat_diff[i];
+			int v = this_cpu_xchg(pzstats->vm_stat_diff[i], 0);
 			pzstats->vm_stat_diff[i] = 0;
 			atomic_long_add(v, &zone->vm_stat[i]);
 			atomic_long_add(v, &vm_zone_stat[i]);
@@ -955,14 +950,12 @@ void drain_zonestat(struct zone *zone, struct per_cpu_zonestat *pzstats)
 #ifdef CONFIG_NUMA
 	for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++)
 		if (pzstats->vm_numa_stat_diff[i]) {
-			int v = pzstats->vm_numa_stat_diff[i];
+			int v = this_cpu_xchg(pzstats->vm_numa_stat_diff[i], 0);
 
-			pzstats->vm_numa_stat_diff[i] = 0;
 			atomic_long_add(v, &zone->vm_numa_stat[i]);
 			atomic_long_add(v, &vm_numa_stat[i]);
 		}
 #endif
-	preempt_enable();
 }
 #endif
 
 
-- 
Mel Gorman
SUSE Labs

  reply index

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-29 12:06 [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead Mel Gorman
2021-03-29 12:06 ` [PATCH 1/6] mm/page_alloc: Split per cpu page lists and zone stats Mel Gorman
2021-03-29 12:06 ` [PATCH 2/6] mm/page_alloc: Convert per-cpu list protection to local_lock Mel Gorman
2021-03-31  9:55   ` Thomas Gleixner
2021-03-31 11:01     ` Mel Gorman [this message]
2021-03-31 17:42       ` Thomas Gleixner
2021-03-31 17:46         ` Thomas Gleixner
2021-03-31 20:42         ` Mel Gorman
2021-03-29 12:06 ` [PATCH 3/6] mm/vmstat: Convert NUMA statistics to basic NUMA counters Mel Gorman
2021-03-29 12:06 ` [PATCH 4/6] mm/vmstat: Inline NUMA event counter updates Mel Gorman
2021-03-29 12:06 ` [PATCH 5/6] mm/page_alloc: Batch the accounting updates in the bulk allocator Mel Gorman
2021-03-29 12:06 ` [PATCH 6/6] mm/page_alloc: Reduce duration that IRQs are disabled for VM counters Mel Gorman
2021-03-30 18:51 ` [RFC PATCH 0/6] Use local_lock for pcp protection and reduce stat overhead Jesper Dangaard Brouer
2021-03-31  7:38   ` Mel Gorman
2021-03-31  8:17     ` Jesper Dangaard Brouer
2021-03-31  8:52 ` Mel Gorman
2021-03-31  9:51   ` Thomas Gleixner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210331110137.GA3697@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=bigeasy@linutronix.de \
    --cc=brouer@redhat.com \
    --cc=chuck.lever@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-rt-users Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rt-users/0 linux-rt-users/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rt-users linux-rt-users/ https://lore.kernel.org/linux-rt-users \
		linux-rt-users@vger.kernel.org
	public-inbox-index linux-rt-users

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rt-users


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git