All of lore.kernel.org
 help / color / mirror / Atom feed
From: Shakeel Butt <shakeelb@google.com>
To: Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Muchun Song <songmuchun@bytedance.com>
Cc: "Michal Koutný" <mkoutny@suse.com>,
	"Eric Dumazet" <edumazet@google.com>,
	"Soheil Hassas Yeganeh" <soheil@google.com>,
	"Feng Tang" <feng.tang@intel.com>,
	"Oliver Sang" <oliver.sang@intel.com>,
	"Andrew Morton" <akpm@linux-foundation.org>,
	lkp@lists.01.org, cgroups@vger.kernel.org, linux-mm@kvack.org,
	netdev@vger.kernel.org, linux-kernel@vger.kernel.org,
	"Shakeel Butt" <shakeelb@google.com>
Subject: [PATCH 2/3] mm: page_counter: rearrange struct page_counter fields
Date: Mon, 22 Aug 2022 00:17:36 +0000	[thread overview]
Message-ID: <20220822001737.4120417-3-shakeelb@google.com> (raw)
In-Reply-To: <20220822001737.4120417-1-shakeelb@google.com>

With memcg v2 enabled, memcg->memory.usage is a very hot member for
the workloads doing memcg charging on multiple CPUs concurrently.
Particularly the network intensive workloads. In addition, there is a
false cache sharing between memory.usage and memory.high on the charge
path. This patch moves the usage into a separate cacheline and move all
the read most fields into separate cacheline.

To evaluate the impact of this optimization, on a 72 CPUs machine, we
ran the following workload in a three level of cgroup hierarchy with top
level having min and low setup appropriately. More specifically
memory.min equal to size of netperf binary and memory.low double of
that.

 $ netserver -6
 # 36 instances of netperf with following params
 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

Results (average throughput of netperf):
Without (6.0-rc1)	10482.7 Mbps
With patch		12413.7 Mbps (18.4% improvement)

With the patch, the throughput improved by 18.4%.

One side-effect of this patch is the increase in the size of struct
mem_cgroup. However for the performance improvement, this additional
size is worth it. In addition there are opportunities to reduce the size
of struct mem_cgroup like deprecation of kmem and tcpmem page counters
and better packing.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
---
 include/linux/page_counter.h | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 679591301994..8ce99bde645f 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -3,15 +3,27 @@
 #define _LINUX_PAGE_COUNTER_H
 
 #include <linux/atomic.h>
+#include <linux/cache.h>
 #include <linux/kernel.h>
 #include <asm/page.h>
 
+#if defined(CONFIG_SMP)
+struct pc_padding {
+	char x[0];
+} ____cacheline_internodealigned_in_smp;
+#define PC_PADDING(name)	struct pc_padding name
+#else
+#define PC_PADDING(name)
+#endif
+
 struct page_counter {
+	/*
+	 * Make sure 'usage' does not share cacheline with any other field. The
+	 * memcg->memory.usage is a hot member of struct mem_cgroup.
+	 */
+	PC_PADDING(_pad1_);
 	atomic_long_t usage;
-	unsigned long min;
-	unsigned long low;
-	unsigned long high;
-	unsigned long max;
+	PC_PADDING(_pad2_);
 
 	/* effective memory.min and memory.min usage tracking */
 	unsigned long emin;
@@ -23,16 +35,16 @@ struct page_counter {
 	atomic_long_t low_usage;
 	atomic_long_t children_low_usage;
 
-	/* legacy */
 	unsigned long watermark;
 	unsigned long failcnt;
 
-	/*
-	 * 'parent' is placed here to be far from 'usage' to reduce
-	 * cache false sharing, as 'usage' is written mostly while
-	 * parent is frequently read for cgroup's hierarchical
-	 * counting nature.
-	 */
+	/* Keep all the read most fields in a separete cacheline. */
+	PC_PADDING(_pad3_);
+
+	unsigned long min;
+	unsigned long low;
+	unsigned long high;
+	unsigned long max;
 	struct page_counter *parent;
 };
 
-- 
2.37.1.595.g718a3a8f04-goog


WARNING: multiple messages have this Message-ID (diff)
From: Shakeel Butt <shakeelb@google.com>
To: lkp@lists.01.org
Subject: [PATCH 2/3] mm: page_counter: rearrange struct page_counter fields
Date: Mon, 22 Aug 2022 00:17:36 +0000	[thread overview]
Message-ID: <20220822001737.4120417-3-shakeelb@google.com> (raw)
In-Reply-To: <20220822001737.4120417-1-shakeelb@google.com>

[-- Attachment #1: Type: text/plain, Size: 3162 bytes --]

With memcg v2 enabled, memcg->memory.usage is a very hot member for
the workloads doing memcg charging on multiple CPUs concurrently.
Particularly the network intensive workloads. In addition, there is a
false cache sharing between memory.usage and memory.high on the charge
path. This patch moves the usage into a separate cacheline and move all
the read most fields into separate cacheline.

To evaluate the impact of this optimization, on a 72 CPUs machine, we
ran the following workload in a three level of cgroup hierarchy with top
level having min and low setup appropriately. More specifically
memory.min equal to size of netperf binary and memory.low double of
that.

 $ netserver -6
 # 36 instances of netperf with following params
 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K

Results (average throughput of netperf):
Without (6.0-rc1)	10482.7 Mbps
With patch		12413.7 Mbps (18.4% improvement)

With the patch, the throughput improved by 18.4%.

One side-effect of this patch is the increase in the size of struct
mem_cgroup. However for the performance improvement, this additional
size is worth it. In addition there are opportunities to reduce the size
of struct mem_cgroup like deprecation of kmem and tcpmem page counters
and better packing.

Signed-off-by: Shakeel Butt <shakeelb@google.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
---
 include/linux/page_counter.h | 34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h
index 679591301994..8ce99bde645f 100644
--- a/include/linux/page_counter.h
+++ b/include/linux/page_counter.h
@@ -3,15 +3,27 @@
 #define _LINUX_PAGE_COUNTER_H
 
 #include <linux/atomic.h>
+#include <linux/cache.h>
 #include <linux/kernel.h>
 #include <asm/page.h>
 
+#if defined(CONFIG_SMP)
+struct pc_padding {
+	char x[0];
+} ____cacheline_internodealigned_in_smp;
+#define PC_PADDING(name)	struct pc_padding name
+#else
+#define PC_PADDING(name)
+#endif
+
 struct page_counter {
+	/*
+	 * Make sure 'usage' does not share cacheline with any other field. The
+	 * memcg->memory.usage is a hot member of struct mem_cgroup.
+	 */
+	PC_PADDING(_pad1_);
 	atomic_long_t usage;
-	unsigned long min;
-	unsigned long low;
-	unsigned long high;
-	unsigned long max;
+	PC_PADDING(_pad2_);
 
 	/* effective memory.min and memory.min usage tracking */
 	unsigned long emin;
@@ -23,16 +35,16 @@ struct page_counter {
 	atomic_long_t low_usage;
 	atomic_long_t children_low_usage;
 
-	/* legacy */
 	unsigned long watermark;
 	unsigned long failcnt;
 
-	/*
-	 * 'parent' is placed here to be far from 'usage' to reduce
-	 * cache false sharing, as 'usage' is written mostly while
-	 * parent is frequently read for cgroup's hierarchical
-	 * counting nature.
-	 */
+	/* Keep all the read most fields in a separete cacheline. */
+	PC_PADDING(_pad3_);
+
+	unsigned long min;
+	unsigned long low;
+	unsigned long high;
+	unsigned long max;
 	struct page_counter *parent;
 };
 
-- 
2.37.1.595.g718a3a8f04-goog

  parent reply	other threads:[~2022-08-22  0:18 UTC|newest]

Thread overview: 86+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-22  0:17 [PATCH 0/3] memcg: optimizatize charge codepath Shakeel Butt
2022-08-22  0:17 ` Shakeel Butt
2022-08-22  0:17 ` Shakeel Butt
2022-08-22  0:17 ` [PATCH 1/3] mm: page_counter: remove unneeded atomic ops for low/min Shakeel Butt
2022-08-22  0:17   ` Shakeel Butt
2022-08-22  0:17   ` Shakeel Butt
2022-08-22  0:20   ` Soheil Hassas Yeganeh
2022-08-22  0:20     ` Soheil Hassas Yeganeh
2022-08-22  0:20     ` Soheil Hassas Yeganeh
2022-08-22  2:39   ` Feng Tang
2022-08-22  2:39     ` Feng Tang
2022-08-22  2:39     ` Feng Tang
2022-08-22  9:55   ` Michal Hocko
2022-08-22  9:55     ` Michal Hocko
2022-08-22 10:18     ` Michal Hocko
2022-08-22 10:18       ` Michal Hocko
2022-08-22 10:18       ` Michal Hocko
2022-08-22 14:55       ` Shakeel Butt
2022-08-22 14:55         ` Shakeel Butt
2022-08-22 15:20         ` Michal Hocko
2022-08-22 15:20           ` Michal Hocko
2022-08-22 16:06           ` Shakeel Butt
2022-08-22 16:06             ` Shakeel Butt
2022-08-22 16:06             ` Shakeel Butt
2022-08-23  9:42           ` Michal Hocko
2022-08-23  9:42             ` Michal Hocko
2022-08-23  9:42             ` Michal Hocko
2022-08-22 18:23   ` Roman Gushchin
2022-08-22 18:23     ` Roman Gushchin
2022-08-22 18:23     ` Roman Gushchin
2022-08-22  0:17 ` Shakeel Butt [this message]
2022-08-22  0:17   ` [PATCH 2/3] mm: page_counter: rearrange struct page_counter fields Shakeel Butt
2022-08-22  0:24   ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  4:55     ` Shakeel Butt
2022-08-22  4:55       ` Shakeel Butt
2022-08-22 13:06       ` Soheil Hassas Yeganeh
2022-08-22 13:06         ` Soheil Hassas Yeganeh
2022-08-22  2:10   ` Feng Tang
2022-08-22  2:10     ` Feng Tang
2022-08-22  2:10     ` Feng Tang
2022-08-22  4:59     ` Shakeel Butt
2022-08-22  4:59       ` Shakeel Butt
2022-08-22  4:59       ` Shakeel Butt
2022-08-22 10:23   ` Michal Hocko
2022-08-22 10:23     ` Michal Hocko
2022-08-22 10:23     ` Michal Hocko
2022-08-22 15:06     ` Shakeel Butt
2022-08-22 15:06       ` Shakeel Butt
2022-08-22 15:15       ` Michal Hocko
2022-08-22 15:15         ` Michal Hocko
2022-08-22 15:15         ` Michal Hocko
2022-08-22 16:04         ` Shakeel Butt
2022-08-22 16:04           ` Shakeel Butt
2022-08-22 18:27           ` Roman Gushchin
2022-08-22 18:27             ` Roman Gushchin
2022-08-22 18:27             ` Roman Gushchin
2022-08-22  0:17 ` [PATCH 3/3] memcg: increase MEMCG_CHARGE_BATCH to 64 Shakeel Butt
2022-08-22  0:17   ` Shakeel Butt
2022-08-22  0:24   ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  0:24     ` Soheil Hassas Yeganeh
2022-08-22  2:30   ` Feng Tang
2022-08-22  2:30     ` Feng Tang
2022-08-22 10:47   ` Michal Hocko
2022-08-22 10:47     ` Michal Hocko
2022-08-22 10:47     ` Michal Hocko
2022-08-22 15:09     ` Shakeel Butt
2022-08-22 15:09       ` Shakeel Butt
2022-08-22 15:22       ` Michal Hocko
2022-08-22 15:22         ` Michal Hocko
2022-08-22 16:07         ` Shakeel Butt
2022-08-22 16:07           ` Shakeel Butt
2022-08-22 16:07           ` Shakeel Butt
2022-08-22 18:37   ` Roman Gushchin
2022-08-22 18:37     ` Roman Gushchin
2022-08-22 18:37     ` Roman Gushchin
2022-08-22 19:34     ` Michal Hocko
2022-08-22 19:34       ` Michal Hocko
2022-08-22 19:34       ` Michal Hocko
2022-08-23  2:22       ` Roman Gushchin
2022-08-23  2:22         ` Roman Gushchin
2022-08-23  2:22         ` Roman Gushchin
2022-08-23  4:49         ` Michal Hocko
2022-08-23  4:49           ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220822001737.4120417-3-shakeelb@google.com \
    --to=shakeelb@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=edumazet@google.com \
    --cc=feng.tang@intel.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@lists.01.org \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=netdev@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=roman.gushchin@linux.dev \
    --cc=soheil@google.com \
    --cc=songmuchun@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.