linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event
@ 2021-03-11 11:54 Namhyung Kim
  2021-03-11 11:54 ` [PATCH 2/2] perf core: Allocate perf_event in the target node memory Namhyung Kim
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Namhyung Kim @ 2021-03-11 11:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: Jiri Olsa, Ingo Molnar, Mark Rutland, Alexander Shishkin, LKML,
	Stephane Eranian, Andi Kleen, Ian Rogers, David Rientjes,
	Namhyung Kim

From: Namhyung Kim <namhyung@google.com>

The kernel can allocate a lot of struct perf_event when profiling. For
example, 256 cpu x 8 events x 20 cgroups = 40K instances of the struct
would be allocated on a large system.

The size of struct perf_event in my setup is 1152 byte. As it's
allocated by kmalloc, the actual allocation size would be rounded up
to 2K.

Then there's 896 byte (~43%) of waste per instance resulting in total
~35MB with 40K instances. We can create a dedicated kmem_cache to
avoid such a big unnecessary memory consumption.

With this change, I can see below (note this machine has 112 cpus).

  # grep perf_event /proc/slabinfo
  perf_event    224    784   1152    7    2 : tunables   24   12    8 : slabdata    112    112      0

The sixth column is pages-per-slab which is 2, and the fifth column is
obj-per-slab which is 7.  Thus actually it can use 1152 x 7 = 8064
byte in the 8K, and wasted memory is (8192 - 8064) / 7 = ~18 byte per
instance.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/events/core.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 5206097d4d3d..10f2548211d0 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -402,6 +402,7 @@ static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
 static cpumask_var_t perf_online_mask;
+static struct kmem_cache *perf_event_cache;
 
 /*
  * perf event paranoia level:
@@ -4591,7 +4592,7 @@ static void free_event_rcu(struct rcu_head *head)
 	if (event->ns)
 		put_pid_ns(event->ns);
 	perf_event_free_filter(event);
-	kfree(event);
+	kmem_cache_free(perf_event_cache, event);
 }
 
 static void ring_buffer_attach(struct perf_event *event,
@@ -11251,7 +11252,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 			return ERR_PTR(-EINVAL);
 	}
 
-	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	event = kmem_cache_zalloc(perf_event_cache, GFP_KERNEL);
 	if (!event)
 		return ERR_PTR(-ENOMEM);
 
@@ -11455,7 +11456,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		put_pid_ns(event->ns);
 	if (event->hw.target)
 		put_task_struct(event->hw.target);
-	kfree(event);
+	kmem_cache_free(perf_event_cache, event);
 
 	return ERR_PTR(err);
 }
@@ -13087,6 +13088,8 @@ void __init perf_event_init(void)
 	ret = init_hw_breakpoint();
 	WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
 
+	perf_event_cache = KMEM_CACHE(perf_event, SLAB_PANIC);
+
 	/*
 	 * Build time assertion that we keep the data_head at the intended
 	 * location.  IOW, validation we got the __reserved[] size right.
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH 2/2] perf core: Allocate perf_event in the target node memory
  2021-03-11 11:54 [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event Namhyung Kim
@ 2021-03-11 11:54 ` Namhyung Kim
  2021-03-17 12:38   ` [tip: perf/core] " tip-bot2 for Namhyung Kim
  2021-03-11 12:29 ` [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event Peter Zijlstra
  2021-03-17 12:38 ` [tip: perf/core] " tip-bot2 for Namhyung Kim
  2 siblings, 1 reply; 5+ messages in thread
From: Namhyung Kim @ 2021-03-11 11:54 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Peter Zijlstra
  Cc: Jiri Olsa, Ingo Molnar, Mark Rutland, Alexander Shishkin, LKML,
	Stephane Eranian, Andi Kleen, Ian Rogers, David Rientjes

For cpu events, it'd better allocating them in the corresponding node
memory as they would be mostly accessed by the target cpu.  Although
perf tools sets the cpu affinity before calling perf_event_open, there
are places it doesn't (notably perf record) and we should consider
other external users too.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/events/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 10f2548211d0..519faf0b7b54 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11246,13 +11246,16 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
 	long err = -EINVAL;
+	int node;
 
 	if ((unsigned)cpu >= nr_cpu_ids) {
 		if (!task || cpu != -1)
 			return ERR_PTR(-EINVAL);
 	}
 
-	event = kmem_cache_zalloc(perf_event_cache, GFP_KERNEL);
+	node = (cpu >= 0) ? cpu_to_node(cpu) : -1;
+	event = kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO,
+				      node);
 	if (!event)
 		return ERR_PTR(-ENOMEM);
 
-- 
2.31.0.rc2.261.g7f71774620-goog


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event
  2021-03-11 11:54 [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event Namhyung Kim
  2021-03-11 11:54 ` [PATCH 2/2] perf core: Allocate perf_event in the target node memory Namhyung Kim
@ 2021-03-11 12:29 ` Peter Zijlstra
  2021-03-17 12:38 ` [tip: perf/core] " tip-bot2 for Namhyung Kim
  2 siblings, 0 replies; 5+ messages in thread
From: Peter Zijlstra @ 2021-03-11 12:29 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, David Rientjes, Namhyung Kim

On Thu, Mar 11, 2021 at 08:54:12PM +0900, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung@google.com>
> 
> The kernel can allocate a lot of struct perf_event when profiling. For
> example, 256 cpu x 8 events x 20 cgroups = 40K instances of the struct
> would be allocated on a large system.
> 
> The size of struct perf_event in my setup is 1152 byte. As it's
> allocated by kmalloc, the actual allocation size would be rounded up
> to 2K.
> 
> Then there's 896 byte (~43%) of waste per instance resulting in total
> ~35MB with 40K instances. We can create a dedicated kmem_cache to
> avoid such a big unnecessary memory consumption.
> 
> With this change, I can see below (note this machine has 112 cpus).
> 
>   # grep perf_event /proc/slabinfo
>   perf_event    224    784   1152    7    2 : tunables   24   12    8 : slabdata    112    112      0
> 
> The sixth column is pages-per-slab which is 2, and the fifth column is
> obj-per-slab which is 7.  Thus actually it can use 1152 x 7 = 8064
> byte in the 8K, and wasted memory is (8192 - 8064) / 7 = ~18 byte per
> instance.
> 
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>

Thanks for both!

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip: perf/core] perf core: Add a kmem_cache for struct perf_event
  2021-03-11 11:54 [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event Namhyung Kim
  2021-03-11 11:54 ` [PATCH 2/2] perf core: Allocate perf_event in the target node memory Namhyung Kim
  2021-03-11 12:29 ` [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event Peter Zijlstra
@ 2021-03-17 12:38 ` tip-bot2 for Namhyung Kim
  2 siblings, 0 replies; 5+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2021-03-17 12:38 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Namhyung Kim, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     bdacfaf26da166dd56c62f23f27a4b3e71f2d89e
Gitweb:        https://git.kernel.org/tip/bdacfaf26da166dd56c62f23f27a4b3e71f2d89e
Author:        Namhyung Kim <namhyung@google.com>
AuthorDate:    Thu, 11 Mar 2021 20:54:12 +09:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 16 Mar 2021 21:44:42 +01:00

perf core: Add a kmem_cache for struct perf_event

The kernel can allocate a lot of struct perf_event when profiling. For
example, 256 cpu x 8 events x 20 cgroups = 40K instances of the struct
would be allocated on a large system.

The size of struct perf_event in my setup is 1152 byte. As it's
allocated by kmalloc, the actual allocation size would be rounded up
to 2K.

Then there's 896 byte (~43%) of waste per instance resulting in total
~35MB with 40K instances. We can create a dedicated kmem_cache to
avoid such a big unnecessary memory consumption.

With this change, I can see below (note this machine has 112 cpus).

  # grep perf_event /proc/slabinfo
  perf_event    224    784   1152    7    2 : tunables   24   12    8 : slabdata    112    112      0

The sixth column is pages-per-slab which is 2, and the fifth column is
obj-per-slab which is 7.  Thus actually it can use 1152 x 7 = 8064
byte in the 8K, and wasted memory is (8192 - 8064) / 7 = ~18 byte per
instance.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311115413.444407-1-namhyung@kernel.org
---
 kernel/events/core.c |  9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 03db40f..f526ddb 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -405,6 +405,7 @@ static LIST_HEAD(pmus);
 static DEFINE_MUTEX(pmus_lock);
 static struct srcu_struct pmus_srcu;
 static cpumask_var_t perf_online_mask;
+static struct kmem_cache *perf_event_cache;
 
 /*
  * perf event paranoia level:
@@ -4611,7 +4612,7 @@ static void free_event_rcu(struct rcu_head *head)
 	if (event->ns)
 		put_pid_ns(event->ns);
 	perf_event_free_filter(event);
-	kfree(event);
+	kmem_cache_free(perf_event_cache, event);
 }
 
 static void ring_buffer_attach(struct perf_event *event,
@@ -11293,7 +11294,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 			return ERR_PTR(-EINVAL);
 	}
 
-	event = kzalloc(sizeof(*event), GFP_KERNEL);
+	event = kmem_cache_zalloc(perf_event_cache, GFP_KERNEL);
 	if (!event)
 		return ERR_PTR(-ENOMEM);
 
@@ -11497,7 +11498,7 @@ err_ns:
 		put_pid_ns(event->ns);
 	if (event->hw.target)
 		put_task_struct(event->hw.target);
-	kfree(event);
+	kmem_cache_free(perf_event_cache, event);
 
 	return ERR_PTR(err);
 }
@@ -13130,6 +13131,8 @@ void __init perf_event_init(void)
 	ret = init_hw_breakpoint();
 	WARN(ret, "hw_breakpoint initialization failed with: %d", ret);
 
+	perf_event_cache = KMEM_CACHE(perf_event, SLAB_PANIC);
+
 	/*
 	 * Build time assertion that we keep the data_head at the intended
 	 * location.  IOW, validation we got the __reserved[] size right.

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [tip: perf/core] perf core: Allocate perf_event in the target node memory
  2021-03-11 11:54 ` [PATCH 2/2] perf core: Allocate perf_event in the target node memory Namhyung Kim
@ 2021-03-17 12:38   ` tip-bot2 for Namhyung Kim
  0 siblings, 0 replies; 5+ messages in thread
From: tip-bot2 for Namhyung Kim @ 2021-03-17 12:38 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Namhyung Kim, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     ff65338e78418e5970a7aabbabb94c46f2bb821d
Gitweb:        https://git.kernel.org/tip/ff65338e78418e5970a7aabbabb94c46f2bb821d
Author:        Namhyung Kim <namhyung@kernel.org>
AuthorDate:    Thu, 11 Mar 2021 20:54:13 +09:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Tue, 16 Mar 2021 21:44:43 +01:00

perf core: Allocate perf_event in the target node memory

For cpu events, it'd better allocating them in the corresponding node
memory as they would be mostly accessed by the target cpu.  Although
perf tools sets the cpu affinity before calling perf_event_open, there
are places it doesn't (notably perf record) and we should consider
other external users too.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210311115413.444407-2-namhyung@kernel.org
---
 kernel/events/core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index f526ddb..6182cb1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -11288,13 +11288,16 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
 	long err = -EINVAL;
+	int node;
 
 	if ((unsigned)cpu >= nr_cpu_ids) {
 		if (!task || cpu != -1)
 			return ERR_PTR(-EINVAL);
 	}
 
-	event = kmem_cache_zalloc(perf_event_cache, GFP_KERNEL);
+	node = (cpu >= 0) ? cpu_to_node(cpu) : -1;
+	event = kmem_cache_alloc_node(perf_event_cache, GFP_KERNEL | __GFP_ZERO,
+				      node);
 	if (!event)
 		return ERR_PTR(-ENOMEM);
 

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-03-17 12:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-11 11:54 [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event Namhyung Kim
2021-03-11 11:54 ` [PATCH 2/2] perf core: Allocate perf_event in the target node memory Namhyung Kim
2021-03-17 12:38   ` [tip: perf/core] " tip-bot2 for Namhyung Kim
2021-03-11 12:29 ` [PATCH 1/2] perf core: Add a kmem_cache for struct perf_event Peter Zijlstra
2021-03-17 12:38 ` [tip: perf/core] " tip-bot2 for Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).