All of lore.kernel.org
 help / color / mirror / Atom feed
From: Gilad Ben-Yossef <gilad@benyossef.com>
To: linux-kernel@vger.kernel.org
Cc: Gilad Ben-Yossef <gilad@benyossef.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Chris Metcalf <cmetcalf@tilera.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Russell King <linux@arm.linux.org.uk>,
	linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
	Matt Mackall <mpm@selenic.com>,
	Sasha Levin <levinsasha928@gmail.com>,
	Rik van Riel <riel@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, Avi Kivity <avi@redhat.com>,
	Michal Nazarewicz <mina86@mina86.com>,
	Milton Miller <miltonm@bga.com>
Subject: [PATCH v8 7/8] mm: only IPI CPUs to drain local pages if they exist
Date: Sun,  5 Feb 2012 15:48:41 +0200	[thread overview]
Message-ID: <1328449722-15959-6-git-send-email-gilad@benyossef.com> (raw)
In-Reply-To: <1328448800-15794-1-git-send-email-gilad@benyossef.com>

Calculate a cpumask of CPUs with per-cpu pages in any zone
and only send an IPI requesting CPUs to drain these pages
to the buddy allocator if they actually have pages when
asked to flush.

This patch saves 85%+ of IPIs asking to drain per-cpu
pages in case of severe memory preassure that leads
to OOM since in these cases multiple, possibly concurrent,
allocation requests end up in the direct reclaim code
path so when the per-cpu pages end up reclaimed on first
allocation failure for most of the proceeding allocation
attempts until the memory pressure is off (possibly via
the OOM killer) there are no per-cpu pages on most CPUs
(and there can easily be hundreds of them).

This also has the side effect of shortening the average
latency of direct reclaim by 1 or more order of magnitude
since waiting for all the CPUs to ACK the IPI takes a
long time.

Tested by running "hackbench 400" on a 8 CPU x86 VM and
observing the difference between the number of direct
reclaim attempts that end up in drain_all_pages() and
those were more then 1/2 of the online CPU had any per-cpu
page in them, using the vmstat counters introduced
in the next patch in the series and using proc/interrupts.

In the test sceanrio, this was seen to save around 3600 global
IPIs after trigerring an OOM on a concurrent workload:

$ cat /proc/vmstat | tail -n 2
pcp_global_drain 0
pcp_global_ipi_saved 0

$ cat /proc/interrupts | grep CAL
CAL:          1          2          1          2
          2          2          2          2   Function call interrupts

$ hackbench 400
[OOM messages snipped]

$ cat /proc/vmstat | tail -n 2
pcp_global_drain 3647
pcp_global_ipi_saved 3642

$ cat /proc/interrupts | grep CAL
CAL:          6         13          6          3
          3          3         1 2          7   Function call interrupts

Please note that if the global drain is removed from the
direct reclaim path as a patch from Mel Gorman currently
suggests this should be replaced with an on_each_cpu_cond
invocation.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.com>
CC: Milton Miller <miltonm@bga.com>
---
 mm/page_alloc.c |   39 +++++++++++++++++++++++++++++++++++++--
 1 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d2186ec..3ff5aff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1161,11 +1161,46 @@ void drain_local_pages(void *arg)
 }
 
 /*
- * Spill all the per-cpu pages from all CPUs back into the buddy allocator
+ * Spill all the per-cpu pages from all CPUs back into the buddy allocator.
+ *
+ * Note that this code is protected against sending an IPI to an offline
+ * CPU but does not guarantee sending an IPI to newly hotplugged CPUs:
+ * on_each_cpu_mask() blocks hotplug and won't talk to offlined CPUs but
+ * nothing keeps CPUs from showing up after we populated the cpumask and
+ * before the call to on_each_cpu_mask().
  */
 void drain_all_pages(void)
 {
-	on_each_cpu(drain_local_pages, NULL, 1);
+	int cpu;
+	struct per_cpu_pageset *pcp;
+	struct zone *zone;
+
+	/* Allocate in the BSS so we wont require allocation in
+	 * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
+	 */
+	static cpumask_t cpus_with_pcps;
+
+	/*
+	 * We don't care about racing with CPU hotplug event
+	 * as offline notification will cause the notified
+	 * cpu to drain that CPU pcps and on_each_cpu_mask
+	 * disables preemption as part of its processing
+	 */
+	for_each_online_cpu(cpu) {
+		bool has_pcps = false;
+		for_each_populated_zone(zone) {
+			pcp = per_cpu_ptr(zone->pageset, cpu);
+			if (pcp->pcp.count) {
+				has_pcps = true;
+				break;
+			}
+		}
+		if (has_pcps)
+			cpumask_set_cpu(cpu, &cpus_with_pcps);
+		else
+			cpumask_clear_cpu(cpu, &cpus_with_pcps);
+	}
+	on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
 }
 
 #ifdef CONFIG_HIBERNATION
-- 
1.7.0.4


WARNING: multiple messages have this Message-ID (diff)
From: Gilad Ben-Yossef <gilad@benyossef.com>
To: linux-kernel@vger.kernel.org
Cc: Gilad Ben-Yossef <gilad@benyossef.com>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	Chris Metcalf <cmetcalf@tilera.com>,
	Frederic Weisbecker <fweisbec@gmail.com>,
	Russell King <linux@arm.linux.org.uk>,
	linux-mm@kvack.org, Pekka Enberg <penberg@kernel.org>,
	Matt Mackall <mpm@selenic.com>,
	Sasha Levin <levinsasha928@gmail.com>,
	Rik van Riel <riel@redhat.com>, Andi Kleen <andi@firstfloor.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	linux-fsdevel@vger.kernel.org, Avi Kivity <avi@redhat.com>,
	Michal Nazarewicz <mina86@mina86.com>,
	Milton Miller <miltonm@bga.com>
Subject: [PATCH v8 7/8] mm: only IPI CPUs to drain local pages if they exist
Date: Sun,  5 Feb 2012 15:48:41 +0200	[thread overview]
Message-ID: <1328449722-15959-6-git-send-email-gilad@benyossef.com> (raw)
In-Reply-To: <1328448800-15794-1-git-send-email-gilad@benyossef.com>

Calculate a cpumask of CPUs with per-cpu pages in any zone
and only send an IPI requesting CPUs to drain these pages
to the buddy allocator if they actually have pages when
asked to flush.

This patch saves 85%+ of IPIs asking to drain per-cpu
pages in case of severe memory preassure that leads
to OOM since in these cases multiple, possibly concurrent,
allocation requests end up in the direct reclaim code
path so when the per-cpu pages end up reclaimed on first
allocation failure for most of the proceeding allocation
attempts until the memory pressure is off (possibly via
the OOM killer) there are no per-cpu pages on most CPUs
(and there can easily be hundreds of them).

This also has the side effect of shortening the average
latency of direct reclaim by 1 or more order of magnitude
since waiting for all the CPUs to ACK the IPI takes a
long time.

Tested by running "hackbench 400" on a 8 CPU x86 VM and
observing the difference between the number of direct
reclaim attempts that end up in drain_all_pages() and
those were more then 1/2 of the online CPU had any per-cpu
page in them, using the vmstat counters introduced
in the next patch in the series and using proc/interrupts.

In the test sceanrio, this was seen to save around 3600 global
IPIs after trigerring an OOM on a concurrent workload:

$ cat /proc/vmstat | tail -n 2
pcp_global_drain 0
pcp_global_ipi_saved 0

$ cat /proc/interrupts | grep CAL
CAL:          1          2          1          2
          2          2          2          2   Function call interrupts

$ hackbench 400
[OOM messages snipped]

$ cat /proc/vmstat | tail -n 2
pcp_global_drain 3647
pcp_global_ipi_saved 3642

$ cat /proc/interrupts | grep CAL
CAL:          6         13          6          3
          3          3         1 2          7   Function call interrupts

Please note that if the global drain is removed from the
direct reclaim path as a patch from Mel Gorman currently
suggests this should be replaced with an on_each_cpu_cond
invocation.

Signed-off-by: Gilad Ben-Yossef <gilad@benyossef.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Mel Gorman <mel@csn.ul.ie>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Chris Metcalf <cmetcalf@tilera.com>
CC: Frederic Weisbecker <fweisbec@gmail.com>
CC: Russell King <linux@arm.linux.org.uk>
CC: linux-mm@kvack.org
CC: Pekka Enberg <penberg@kernel.org>
CC: Matt Mackall <mpm@selenic.com>
CC: Sasha Levin <levinsasha928@gmail.com>
CC: Rik van Riel <riel@redhat.com>
CC: Andi Kleen <andi@firstfloor.org>
CC: Andrew Morton <akpm@linux-foundation.org>
CC: Alexander Viro <viro@zeniv.linux.org.uk>
CC: linux-fsdevel@vger.kernel.org
CC: Avi Kivity <avi@redhat.com>
CC: Michal Nazarewicz <mina86@mina86.com>
CC: Milton Miller <miltonm@bga.com>
---
 mm/page_alloc.c |   39 +++++++++++++++++++++++++++++++++++++--
 1 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d2186ec..3ff5aff 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1161,11 +1161,46 @@ void drain_local_pages(void *arg)
 }
 
 /*
- * Spill all the per-cpu pages from all CPUs back into the buddy allocator
+ * Spill all the per-cpu pages from all CPUs back into the buddy allocator.
+ *
+ * Note that this code is protected against sending an IPI to an offline
+ * CPU but does not guarantee sending an IPI to newly hotplugged CPUs:
+ * on_each_cpu_mask() blocks hotplug and won't talk to offlined CPUs but
+ * nothing keeps CPUs from showing up after we populated the cpumask and
+ * before the call to on_each_cpu_mask().
  */
 void drain_all_pages(void)
 {
-	on_each_cpu(drain_local_pages, NULL, 1);
+	int cpu;
+	struct per_cpu_pageset *pcp;
+	struct zone *zone;
+
+	/* Allocate in the BSS so we wont require allocation in
+	 * direct reclaim path for CONFIG_CPUMASK_OFFSTACK=y
+	 */
+	static cpumask_t cpus_with_pcps;
+
+	/*
+	 * We don't care about racing with CPU hotplug event
+	 * as offline notification will cause the notified
+	 * cpu to drain that CPU pcps and on_each_cpu_mask
+	 * disables preemption as part of its processing
+	 */
+	for_each_online_cpu(cpu) {
+		bool has_pcps = false;
+		for_each_populated_zone(zone) {
+			pcp = per_cpu_ptr(zone->pageset, cpu);
+			if (pcp->pcp.count) {
+				has_pcps = true;
+				break;
+			}
+		}
+		if (has_pcps)
+			cpumask_set_cpu(cpu, &cpus_with_pcps);
+		else
+			cpumask_clear_cpu(cpu, &cpus_with_pcps);
+	}
+	on_each_cpu_mask(&cpus_with_pcps, drain_local_pages, NULL, 1);
 }
 
 #ifdef CONFIG_HIBERNATION
-- 
1.7.0.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2012-02-05 13:49 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-05 13:33 [PATCH v8 0/8] Reduce cross CPU IPI interference Gilad Ben-Yossef
2012-02-05 13:33 ` Gilad Ben-Yossef
2012-02-05 13:44 ` [PATCH v8 1/8] smp: introduce a generic on_each_cpu_mask function Gilad Ben-Yossef
2012-02-05 13:44   ` Gilad Ben-Yossef
2012-02-05 15:18   ` Srivatsa S. Bhat
2012-02-05 15:18     ` Srivatsa S. Bhat
2012-02-05 13:48 ` [PATCH v8 2/8] arm: move arm over to generic on_each_cpu_mask Gilad Ben-Yossef
2012-02-05 13:48   ` Gilad Ben-Yossef
2012-02-05 13:48 ` [PATCH v8 3/8] tile: move tile to use " Gilad Ben-Yossef
2012-02-05 13:48   ` Gilad Ben-Yossef
2012-02-05 13:48 ` [PATCH v8 4/8] smp: add func to IPI cpus based on parameter func Gilad Ben-Yossef
2012-02-05 13:48   ` Gilad Ben-Yossef
2012-02-05 15:36   ` Srivatsa S. Bhat
2012-02-05 15:36     ` Srivatsa S. Bhat
2012-02-05 15:46     ` Gilad Ben-Yossef
2012-02-05 15:46       ` Gilad Ben-Yossef
2012-02-05 16:00       ` Srivatsa S. Bhat
2012-02-05 16:00         ` Srivatsa S. Bhat
2012-02-05 16:03         ` Srivatsa S. Bhat
2012-02-05 16:03           ` Srivatsa S. Bhat
2012-02-08  9:30   ` Michal Nazarewicz
2012-02-08  9:30     ` Michal Nazarewicz
2012-02-09  0:03     ` Andrew Morton
2012-02-09  0:03       ` Andrew Morton
2012-02-09  8:08       ` Gilad Ben-Yossef
2012-02-09  8:08         ` Gilad Ben-Yossef
2012-02-09  8:13         ` Andrew Morton
2012-02-09  8:13           ` Andrew Morton
2012-02-09  9:53           ` Gilad Ben-Yossef
2012-02-09  9:53             ` Gilad Ben-Yossef
2012-02-05 13:48 ` [PATCH v8 5/8] slub: only IPI CPUs that have per cpu obj to flush Gilad Ben-Yossef
2012-02-05 13:48   ` Gilad Ben-Yossef
2012-02-05 13:48 ` [PATCH v8 6/8] fs: only send IPI to invalidate LRU BH when needed Gilad Ben-Yossef
2012-02-05 13:48   ` Gilad Ben-Yossef
2012-02-05 13:48 ` Gilad Ben-Yossef [this message]
2012-02-05 13:48   ` [PATCH v8 7/8] mm: only IPI CPUs to drain local pages if they exist Gilad Ben-Yossef
2012-02-08  9:33   ` Michal Nazarewicz
2012-02-08  9:33     ` Michal Nazarewicz
2012-02-09  8:09     ` Gilad Ben-Yossef
2012-02-09  8:09       ` Gilad Ben-Yossef
2012-02-05 13:48 ` [PATCH v8 8/8] mm: add vmstat counters for tracking PCP drains Gilad Ben-Yossef
2012-02-05 13:48   ` Gilad Ben-Yossef
2012-02-08  9:36 ` [PATCH v8 0/8] Reduce cross CPU IPI interference Michal Nazarewicz
2012-02-08  9:36   ` Michal Nazarewicz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1328449722-15959-6-git-send-email-gilad@benyossef.com \
    --to=gilad@benyossef.com \
    --cc=akpm@linux-foundation.org \
    --cc=andi@firstfloor.org \
    --cc=avi@redhat.com \
    --cc=cmetcalf@tilera.com \
    --cc=fweisbec@gmail.com \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=levinsasha928@gmail.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@arm.linux.org.uk \
    --cc=miltonm@bga.com \
    --cc=mina86@mina86.com \
    --cc=mpm@selenic.com \
    --cc=penberg@kernel.org \
    --cc=riel@redhat.com \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.