All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator
@ 2017-06-19 23:28 ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

There is limited visibility into the percpu memory allocator making it hard to
understand usage patterns. Without these concrete numbers, we are left to
conjecture about the correctness of percpu memory patterns and usage.
Additionally, there is no mechanism to review the correctness/efficiency of the
current implementation.

This patchset address the following:
- Adds basic statistics to reason about the number of allocations over the
  lifetime, allocation sizes, and fragmentation.
- Adds tracepoints to enable better debug capabilities as well as the ability
  to review allocation requests and corresponding decisions.

This patchiest contains the following four patches:
0001-percpu-add-missing-lockdep_assert_held-to-func-pcpu_.patch
0002-percpu-migrate-percpu-data-structures-to-internal-he.patch
0003-percpu-expose-statistics-about-percpu-memory-via-deb.patch
0004-percpu-add-tracepoint-support-for-percpu-memory.patch

0001 adds a missing lockdep_assert_held for pcpu_lock to improve consistency
and safety. 0002 prepares for the following patches by moving the definition of
data structures and exposes previously static variables. 0003 adds percpu
statistics via debugfs. 0004 adds tracepoints to key percpu events: chunk
creation/deletion and area allocation/free/failure.

This patchset is on top of linus#master 1132d5e.

diffstats below:

  percpu: add missing lockdep_assert_held to func pcpu_free_area
  percpu: migrate percpu data structures to internal header
  percpu: expose statistics about percpu memory via debugfs
  percpu: add tracepoint support for percpu memory

 include/trace/events/percpu.h | 125 ++++++++++++++++++++++++
 mm/Kconfig                    |   8 ++
 mm/Makefile                   |   1 +
 mm/percpu-internal.h          | 164 +++++++++++++++++++++++++++++++
 mm/percpu-km.c                |   6 ++
 mm/percpu-stats.c             | 222 ++++++++++++++++++++++++++++++++++++++++++
 mm/percpu-vm.c                |   7 ++
 mm/percpu.c                   |  53 +++++-----
 8 files changed, 563 insertions(+), 23 deletions(-)
 create mode 100644 include/trace/events/percpu.h
 create mode 100644 mm/percpu-internal.h
 create mode 100644 mm/percpu-stats.c

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator
@ 2017-06-19 23:28 ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

There is limited visibility into the percpu memory allocator making it hard to
understand usage patterns. Without these concrete numbers, we are left to
conjecture about the correctness of percpu memory patterns and usage.
Additionally, there is no mechanism to review the correctness/efficiency of the
current implementation.

This patchset address the following:
- Adds basic statistics to reason about the number of allocations over the
  lifetime, allocation sizes, and fragmentation.
- Adds tracepoints to enable better debug capabilities as well as the ability
  to review allocation requests and corresponding decisions.

This patchiest contains the following four patches:
0001-percpu-add-missing-lockdep_assert_held-to-func-pcpu_.patch
0002-percpu-migrate-percpu-data-structures-to-internal-he.patch
0003-percpu-expose-statistics-about-percpu-memory-via-deb.patch
0004-percpu-add-tracepoint-support-for-percpu-memory.patch

0001 adds a missing lockdep_assert_held for pcpu_lock to improve consistency
and safety. 0002 prepares for the following patches by moving the definition of
data structures and exposes previously static variables. 0003 adds percpu
statistics via debugfs. 0004 adds tracepoints to key percpu events: chunk
creation/deletion and area allocation/free/failure.

This patchset is on top of linus#master 1132d5e.

diffstats below:

  percpu: add missing lockdep_assert_held to func pcpu_free_area
  percpu: migrate percpu data structures to internal header
  percpu: expose statistics about percpu memory via debugfs
  percpu: add tracepoint support for percpu memory

 include/trace/events/percpu.h | 125 ++++++++++++++++++++++++
 mm/Kconfig                    |   8 ++
 mm/Makefile                   |   1 +
 mm/percpu-internal.h          | 164 +++++++++++++++++++++++++++++++
 mm/percpu-km.c                |   6 ++
 mm/percpu-stats.c             | 222 ++++++++++++++++++++++++++++++++++++++++++
 mm/percpu-vm.c                |   7 ++
 mm/percpu.c                   |  53 +++++-----
 8 files changed, 563 insertions(+), 23 deletions(-)
 create mode 100644 include/trace/events/percpu.h
 create mode 100644 mm/percpu-internal.h
 create mode 100644 mm/percpu-stats.c

Thanks,
Dennis

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] percpu: add missing lockdep_assert_held to func pcpu_free_area
  2017-06-19 23:28 ` Dennis Zhou
@ 2017-06-19 23:28   ` Dennis Zhou
  -1 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

Add a missing lockdep_assert_held for pcpu_lock to improve consistency
and safety throughout mm/percpu.c.

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 mm/percpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/percpu.c b/mm/percpu.c
index e0aa8ae..f94a5eb 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -672,6 +672,8 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme,
 	int to_free = 0;
 	int *p;
 
+	lockdep_assert_held(&pcpu_lock);
+
 	freeme |= 1;	/* we are searching for <given offset, in use> pair */
 
 	i = 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/4] percpu: add missing lockdep_assert_held to func pcpu_free_area
@ 2017-06-19 23:28   ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

Add a missing lockdep_assert_held for pcpu_lock to improve consistency
and safety throughout mm/percpu.c.

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 mm/percpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/percpu.c b/mm/percpu.c
index e0aa8ae..f94a5eb 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -672,6 +672,8 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme,
 	int to_free = 0;
 	int *p;
 
+	lockdep_assert_held(&pcpu_lock);
+
 	freeme |= 1;	/* we are searching for <given offset, in use> pair */
 
 	i = 0;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] percpu: migrate percpu data structures to internal header
  2017-06-19 23:28 ` Dennis Zhou
@ 2017-06-19 23:28   ` Dennis Zhou
  -1 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

Migrates pcpu_chunk definition and a few percpu static variables to an
internal header file from mm/percpu.c. These will be used with debugfs
to expose statistics about percpu memory improving visibility regarding
allocations and fragmentation.

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 mm/percpu-internal.h | 33 +++++++++++++++++++++++++++++++++
 mm/percpu.c          | 30 +++++++-----------------------
 2 files changed, 40 insertions(+), 23 deletions(-)
 create mode 100644 mm/percpu-internal.h

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
new file mode 100644
index 0000000..8b6cb2a
--- /dev/null
+++ b/mm/percpu-internal.h
@@ -0,0 +1,33 @@
+#ifndef _MM_PERCPU_INTERNAL_H
+#define _MM_PERCPU_INTERNAL_H
+
+#include <linux/types.h>
+#include <linux/percpu.h>
+
+struct pcpu_chunk {
+	struct list_head	list;		/* linked to pcpu_slot lists */
+	int			free_size;	/* free bytes in the chunk */
+	int			contig_hint;	/* max contiguous size hint */
+	void			*base_addr;	/* base address of this chunk */
+
+	int			map_used;	/* # of map entries used before the sentry */
+	int			map_alloc;	/* # of map entries allocated */
+	int			*map;		/* allocation map */
+	struct list_head	map_extend_list;/* on pcpu_map_extend_chunks */
+
+	void			*data;		/* chunk data */
+	int			first_free;	/* no free below this */
+	bool			immutable;	/* no [de]population allowed */
+	int			nr_populated;	/* # of populated pages */
+	unsigned long		populated[];	/* populated bitmap */
+};
+
+extern spinlock_t pcpu_lock;
+
+extern struct list_head *pcpu_slot __read_mostly;
+extern int pcpu_nr_slots __read_mostly;
+
+extern struct pcpu_chunk *pcpu_first_chunk;
+extern struct pcpu_chunk *pcpu_reserved_chunk;
+
+#endif
diff --git a/mm/percpu.c b/mm/percpu.c
index f94a5eb..5cf7d73 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -76,6 +76,8 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
+#include "percpu-internal.h"
+
 #define PCPU_SLOT_BASE_SHIFT		5	/* 1-31 shares the same slot */
 #define PCPU_DFL_MAP_ALLOC		16	/* start a map with 16 ents */
 #define PCPU_ATOMIC_MAP_MARGIN_LOW	32
@@ -103,29 +105,11 @@
 #define __pcpu_ptr_to_addr(ptr)		(void __force *)(ptr)
 #endif	/* CONFIG_SMP */
 
-struct pcpu_chunk {
-	struct list_head	list;		/* linked to pcpu_slot lists */
-	int			free_size;	/* free bytes in the chunk */
-	int			contig_hint;	/* max contiguous size hint */
-	void			*base_addr;	/* base address of this chunk */
-
-	int			map_used;	/* # of map entries used before the sentry */
-	int			map_alloc;	/* # of map entries allocated */
-	int			*map;		/* allocation map */
-	struct list_head	map_extend_list;/* on pcpu_map_extend_chunks */
-
-	void			*data;		/* chunk data */
-	int			first_free;	/* no free below this */
-	bool			immutable;	/* no [de]population allowed */
-	int			nr_populated;	/* # of populated pages */
-	unsigned long		populated[];	/* populated bitmap */
-};
-
 static int pcpu_unit_pages __read_mostly;
 static int pcpu_unit_size __read_mostly;
 static int pcpu_nr_units __read_mostly;
 static int pcpu_atom_size __read_mostly;
-static int pcpu_nr_slots __read_mostly;
+int pcpu_nr_slots __read_mostly;
 static size_t pcpu_chunk_struct_size __read_mostly;
 
 /* cpus with the lowest and highest unit addresses */
@@ -149,7 +133,7 @@ static const size_t *pcpu_group_sizes __read_mostly;
  * chunks, this one can be allocated and mapped in several different
  * ways and thus often doesn't live in the vmalloc area.
  */
-static struct pcpu_chunk *pcpu_first_chunk;
+struct pcpu_chunk *pcpu_first_chunk;
 
 /*
  * Optional reserved chunk.  This chunk reserves part of the first
@@ -158,13 +142,13 @@ static struct pcpu_chunk *pcpu_first_chunk;
  * area doesn't exist, the following variables contain NULL and 0
  * respectively.
  */
-static struct pcpu_chunk *pcpu_reserved_chunk;
+struct pcpu_chunk *pcpu_reserved_chunk;
 static int pcpu_reserved_chunk_limit;
 
-static DEFINE_SPINLOCK(pcpu_lock);	/* all internal data structures */
+DEFINE_SPINLOCK(pcpu_lock);	/* all internal data structures */
 static DEFINE_MUTEX(pcpu_alloc_mutex);	/* chunk create/destroy, [de]pop, map ext */
 
-static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
+struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
 
 /* chunks which need their map areas extended, protected by pcpu_lock */
 static LIST_HEAD(pcpu_map_extend_chunks);
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] percpu: migrate percpu data structures to internal header
@ 2017-06-19 23:28   ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

Migrates pcpu_chunk definition and a few percpu static variables to an
internal header file from mm/percpu.c. These will be used with debugfs
to expose statistics about percpu memory improving visibility regarding
allocations and fragmentation.

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 mm/percpu-internal.h | 33 +++++++++++++++++++++++++++++++++
 mm/percpu.c          | 30 +++++++-----------------------
 2 files changed, 40 insertions(+), 23 deletions(-)
 create mode 100644 mm/percpu-internal.h

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
new file mode 100644
index 0000000..8b6cb2a
--- /dev/null
+++ b/mm/percpu-internal.h
@@ -0,0 +1,33 @@
+#ifndef _MM_PERCPU_INTERNAL_H
+#define _MM_PERCPU_INTERNAL_H
+
+#include <linux/types.h>
+#include <linux/percpu.h>
+
+struct pcpu_chunk {
+	struct list_head	list;		/* linked to pcpu_slot lists */
+	int			free_size;	/* free bytes in the chunk */
+	int			contig_hint;	/* max contiguous size hint */
+	void			*base_addr;	/* base address of this chunk */
+
+	int			map_used;	/* # of map entries used before the sentry */
+	int			map_alloc;	/* # of map entries allocated */
+	int			*map;		/* allocation map */
+	struct list_head	map_extend_list;/* on pcpu_map_extend_chunks */
+
+	void			*data;		/* chunk data */
+	int			first_free;	/* no free below this */
+	bool			immutable;	/* no [de]population allowed */
+	int			nr_populated;	/* # of populated pages */
+	unsigned long		populated[];	/* populated bitmap */
+};
+
+extern spinlock_t pcpu_lock;
+
+extern struct list_head *pcpu_slot __read_mostly;
+extern int pcpu_nr_slots __read_mostly;
+
+extern struct pcpu_chunk *pcpu_first_chunk;
+extern struct pcpu_chunk *pcpu_reserved_chunk;
+
+#endif
diff --git a/mm/percpu.c b/mm/percpu.c
index f94a5eb..5cf7d73 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -76,6 +76,8 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
+#include "percpu-internal.h"
+
 #define PCPU_SLOT_BASE_SHIFT		5	/* 1-31 shares the same slot */
 #define PCPU_DFL_MAP_ALLOC		16	/* start a map with 16 ents */
 #define PCPU_ATOMIC_MAP_MARGIN_LOW	32
@@ -103,29 +105,11 @@
 #define __pcpu_ptr_to_addr(ptr)		(void __force *)(ptr)
 #endif	/* CONFIG_SMP */
 
-struct pcpu_chunk {
-	struct list_head	list;		/* linked to pcpu_slot lists */
-	int			free_size;	/* free bytes in the chunk */
-	int			contig_hint;	/* max contiguous size hint */
-	void			*base_addr;	/* base address of this chunk */
-
-	int			map_used;	/* # of map entries used before the sentry */
-	int			map_alloc;	/* # of map entries allocated */
-	int			*map;		/* allocation map */
-	struct list_head	map_extend_list;/* on pcpu_map_extend_chunks */
-
-	void			*data;		/* chunk data */
-	int			first_free;	/* no free below this */
-	bool			immutable;	/* no [de]population allowed */
-	int			nr_populated;	/* # of populated pages */
-	unsigned long		populated[];	/* populated bitmap */
-};
-
 static int pcpu_unit_pages __read_mostly;
 static int pcpu_unit_size __read_mostly;
 static int pcpu_nr_units __read_mostly;
 static int pcpu_atom_size __read_mostly;
-static int pcpu_nr_slots __read_mostly;
+int pcpu_nr_slots __read_mostly;
 static size_t pcpu_chunk_struct_size __read_mostly;
 
 /* cpus with the lowest and highest unit addresses */
@@ -149,7 +133,7 @@ static const size_t *pcpu_group_sizes __read_mostly;
  * chunks, this one can be allocated and mapped in several different
  * ways and thus often doesn't live in the vmalloc area.
  */
-static struct pcpu_chunk *pcpu_first_chunk;
+struct pcpu_chunk *pcpu_first_chunk;
 
 /*
  * Optional reserved chunk.  This chunk reserves part of the first
@@ -158,13 +142,13 @@ static struct pcpu_chunk *pcpu_first_chunk;
  * area doesn't exist, the following variables contain NULL and 0
  * respectively.
  */
-static struct pcpu_chunk *pcpu_reserved_chunk;
+struct pcpu_chunk *pcpu_reserved_chunk;
 static int pcpu_reserved_chunk_limit;
 
-static DEFINE_SPINLOCK(pcpu_lock);	/* all internal data structures */
+DEFINE_SPINLOCK(pcpu_lock);	/* all internal data structures */
 static DEFINE_MUTEX(pcpu_alloc_mutex);	/* chunk create/destroy, [de]pop, map ext */
 
-static struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
+struct list_head *pcpu_slot __read_mostly; /* chunk list slots */
 
 /* chunks which need their map areas extended, protected by pcpu_lock */
 static LIST_HEAD(pcpu_map_extend_chunks);
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs
  2017-06-19 23:28 ` Dennis Zhou
@ 2017-06-19 23:28   ` Dennis Zhou
  -1 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

There is limited visibility into the use of percpu memory leaving us
unable to reason about correctness of parameters and overall use of
percpu memory. These counters and statistics aim to help understand
basic statistics about percpu memory such as number of allocations over
the lifetime, allocation sizes, and fragmentation.

New Config: PERCPU_STATS

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 mm/Kconfig           |   8 ++
 mm/Makefile          |   1 +
 mm/percpu-internal.h | 131 ++++++++++++++++++++++++++++++
 mm/percpu-km.c       |   4 +
 mm/percpu-stats.c    | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/percpu-vm.c       |   5 ++
 mm/percpu.c          |   9 +++
 7 files changed, 380 insertions(+)
 create mode 100644 mm/percpu-stats.c

diff --git a/mm/Kconfig b/mm/Kconfig
index beb7a45..8fae426 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -706,3 +706,11 @@ config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
 	bool
+
+config PERCPU_STATS
+	bool "Collect percpu memory statistics"
+	default n
+	help
+	  This feature collects and exposes statistics via debugfs. The
+	  information includes global and per chunk statistics, which can
+	  be used to help understand percpu memory usage.
diff --git a/mm/Makefile b/mm/Makefile
index 026f6a8..411bd24 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index 8b6cb2a..5509593 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -5,6 +5,11 @@
 #include <linux/percpu.h>
 
 struct pcpu_chunk {
+#ifdef CONFIG_PERCPU_STATS
+	int			nr_alloc;	/* # of allocations */
+	size_t			max_alloc_size; /* largest allocation size */
+#endif
+
 	struct list_head	list;		/* linked to pcpu_slot lists */
 	int			free_size;	/* free bytes in the chunk */
 	int			contig_hint;	/* max contiguous size hint */
@@ -18,6 +23,11 @@ struct pcpu_chunk {
 	void			*data;		/* chunk data */
 	int			first_free;	/* no free below this */
 	bool			immutable;	/* no [de]population allowed */
+	bool			has_reserved;	/* Indicates if chunk has reserved space
+						   at the beginning. Reserved chunk will
+						   contain reservation for static chunk.
+						   Dynamic chunk will contain reservation
+						   for static and reserved chunks. */
 	int			nr_populated;	/* # of populated pages */
 	unsigned long		populated[];	/* populated bitmap */
 };
@@ -30,4 +40,125 @@ extern int pcpu_nr_slots __read_mostly;
 extern struct pcpu_chunk *pcpu_first_chunk;
 extern struct pcpu_chunk *pcpu_reserved_chunk;
 
+#ifdef CONFIG_PERCPU_STATS
+
+#include <linux/spinlock.h>
+
+struct percpu_stats {
+	u64 nr_alloc;		/* lifetime # of allocations */
+	u64 nr_dealloc;		/* lifetime # of deallocations */
+	u64 nr_cur_alloc;	/* current # of allocations */
+	u64 nr_max_alloc;	/* max # of live allocations */
+	u32 nr_chunks;		/* current # of live chunks */
+	u32 nr_max_chunks;	/* max # of live chunks */
+	size_t min_alloc_size;	/* min allocaiton size */
+	size_t max_alloc_size;	/* max allocation size */
+};
+
+extern struct percpu_stats pcpu_stats;
+extern struct pcpu_alloc_info pcpu_stats_ai;
+
+/*
+ * For debug purposes. We don't care about the flexible array.
+ */
+static inline void pcpu_stats_save_ai(const struct pcpu_alloc_info *ai)
+{
+	memcpy(&pcpu_stats_ai, ai, sizeof(struct pcpu_alloc_info));
+
+	/* initialize min_alloc_size to unit_size */
+	pcpu_stats.min_alloc_size = pcpu_stats_ai.unit_size;
+}
+
+/*
+ * pcpu_stats_area_alloc - increment area allocation stats
+ * @chunk: the location of the area being allocated
+ * @size: size of area to allocate in bytes
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static inline void pcpu_stats_area_alloc(struct pcpu_chunk *chunk, size_t size)
+{
+	lockdep_assert_held(&pcpu_lock);
+
+	pcpu_stats.nr_alloc++;
+	pcpu_stats.nr_cur_alloc++;
+	pcpu_stats.nr_max_alloc =
+		max(pcpu_stats.nr_max_alloc, pcpu_stats.nr_cur_alloc);
+	pcpu_stats.min_alloc_size =
+		min(pcpu_stats.min_alloc_size, size);
+	pcpu_stats.max_alloc_size =
+		max(pcpu_stats.max_alloc_size, size);
+
+	chunk->nr_alloc++;
+	chunk->max_alloc_size = max(chunk->max_alloc_size, size);
+}
+
+/*
+ * pcpu_stats_area_dealloc - decrement allocation stats
+ * @chunk: the location of the area being deallocated
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
+{
+	lockdep_assert_held(&pcpu_lock);
+
+	pcpu_stats.nr_dealloc++;
+	pcpu_stats.nr_cur_alloc--;
+
+	chunk->nr_alloc--;
+}
+
+/*
+ * pcpu_stats_chunk_alloc - increment chunk stats
+ */
+static inline void pcpu_stats_chunk_alloc(void)
+{
+	spin_lock_irq(&pcpu_lock);
+
+	pcpu_stats.nr_chunks++;
+	pcpu_stats.nr_max_chunks =
+		max(pcpu_stats.nr_max_chunks, pcpu_stats.nr_chunks);
+
+	spin_unlock_irq(&pcpu_lock);
+}
+
+/*
+ * pcpu_stats_chunk_dealloc - decrement chunk stats
+ */
+static inline void pcpu_stats_chunk_dealloc(void)
+{
+	spin_lock_irq(&pcpu_lock);
+
+	pcpu_stats.nr_chunks--;
+
+	spin_unlock_irq(&pcpu_lock);
+}
+
+#else
+
+static inline void pcpu_stats_save_ai(const struct pcpu_alloc_info *ai)
+{
+}
+
+static inline void pcpu_stats_area_alloc(struct pcpu_chunk *chunk, size_t size)
+{
+}
+
+static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
+{
+}
+
+static inline void pcpu_stats_chunk_alloc(void)
+{
+}
+
+static inline void pcpu_stats_chunk_dealloc(void)
+{
+}
+
+#endif /* !CONFIG_PERCPU_STATS */
+
 #endif
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index d66911f..3bbfa0c 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -72,6 +72,8 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 	pcpu_chunk_populated(chunk, 0, nr_pages);
 	spin_unlock_irq(&pcpu_lock);
 
+	pcpu_stats_chunk_alloc();
+
 	return chunk;
 }
 
@@ -79,6 +81,8 @@ static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 {
 	const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT;
 
+	pcpu_stats_chunk_dealloc();
+
 	if (chunk && chunk->data)
 		__free_pages(chunk->data, order_base_2(nr_pages));
 	pcpu_free_chunk(chunk);
diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c
new file mode 100644
index 0000000..03524a5
--- /dev/null
+++ b/mm/percpu-stats.c
@@ -0,0 +1,222 @@
+/*
+ * mm/percpu-debug.c
+ *
+ * Copyright (C) 2017		Facebook Inc.
+ * Copyright (C) 2017		Dennis Zhou <dennisz@fb.com>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Prints statistics about the percpu allocator and backing chunks.
+ */
+#include <linux/debugfs.h>
+#include <linux/list.h>
+#include <linux/percpu.h>
+#include <linux/seq_file.h>
+#include <linux/sort.h>
+#include <linux/vmalloc.h>
+
+#include "percpu-internal.h"
+
+#define P(X, Y) \
+	seq_printf(m, "  %-24s: %8lld\n", X, (long long int)Y)
+
+struct percpu_stats pcpu_stats;
+struct pcpu_alloc_info pcpu_stats_ai;
+
+static int cmpint(const void *a, const void *b)
+{
+	return *(int *)a - *(int *)b;
+}
+
+/*
+ * Iterates over all chunks to find the max # of map entries used.
+ */
+static int find_max_map_used(void)
+{
+	struct pcpu_chunk *chunk;
+	int slot, max_map_used;
+
+	max_map_used = 0;
+	for (slot = 0; slot < pcpu_nr_slots; slot++)
+		list_for_each_entry(chunk, &pcpu_slot[slot], list)
+			max_map_used = max(max_map_used, chunk->map_used);
+
+	return max_map_used;
+}
+
+/*
+ * Prints out chunk state. Fragmentation is considered between
+ * the beginning of the chunk to the last allocation.
+ */
+static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk,
+			    void *buffer)
+{
+	int i, s_index, last_alloc, alloc_sign, as_len;
+	int *alloc_sizes, *p;
+	/* statistics */
+	int sum_frag = 0, max_frag = 0;
+	int cur_min_alloc = 0, cur_med_alloc = 0, cur_max_alloc = 0;
+
+	alloc_sizes = buffer;
+	s_index = chunk->has_reserved ? 1 : 0;
+
+	/* find last allocation */
+	last_alloc = -1;
+	for (i = chunk->map_used - 1; i >= s_index; i--) {
+		if (chunk->map[i] & 1) {
+			last_alloc = i;
+			break;
+		}
+	}
+
+	/* if the chunk is not empty - ignoring reserve */
+	if (last_alloc >= s_index) {
+		as_len = last_alloc + 1 - s_index;
+
+		/*
+		 * Iterate through chunk map computing size info.
+		 * The first bit is overloaded to be a used flag.
+		 * negative = free space, positive = allocated
+		 */
+		for (i = 0, p = chunk->map + s_index; i < as_len; i++, p++) {
+			alloc_sign = (*p & 1) ? 1 : -1;
+			alloc_sizes[i] = alloc_sign *
+				((p[1] & ~1) - (p[0] & ~1));
+		}
+
+		sort(alloc_sizes, as_len, sizeof(chunk->map[0]), cmpint, NULL);
+
+		/* Iterate through the unallocated fragements. */
+		for (i = 0, p = alloc_sizes; *p < 0 && i < as_len; i++, p++) {
+			sum_frag -= *p;
+			max_frag = max(max_frag, -1 * (*p));
+		}
+
+		cur_min_alloc = alloc_sizes[i];
+		cur_med_alloc = alloc_sizes[(i + as_len - 1) / 2];
+		cur_max_alloc = alloc_sizes[as_len - 1];
+	}
+
+	P("nr_alloc", chunk->nr_alloc);
+	P("max_alloc_size", chunk->max_alloc_size);
+	P("free_size", chunk->free_size);
+	P("contig_hint", chunk->contig_hint);
+	P("sum_frag", sum_frag);
+	P("max_frag", max_frag);
+	P("cur_min_alloc", cur_min_alloc);
+	P("cur_med_alloc", cur_med_alloc);
+	P("cur_max_alloc", cur_max_alloc);
+	seq_putc(m, '\n');
+}
+
+static int percpu_stats_show(struct seq_file *m, void *v)
+{
+	struct pcpu_chunk *chunk;
+	int slot, max_map_used;
+	void *buffer;
+
+alloc_buffer:
+	spin_lock_irq(&pcpu_lock);
+	max_map_used = find_max_map_used();
+	spin_unlock_irq(&pcpu_lock);
+
+	buffer = vmalloc(max_map_used * sizeof(pcpu_first_chunk->map[0]));
+	if (!buffer)
+		return -ENOMEM;
+
+	spin_lock_irq(&pcpu_lock);
+
+	/* if the buffer allocated earlier is too small */
+	if (max_map_used < find_max_map_used()) {
+		spin_unlock_irq(&pcpu_lock);
+		vfree(buffer);
+		goto alloc_buffer;
+	}
+
+#define PL(X) \
+	seq_printf(m, "  %-24s: %8lld\n", #X, (long long int)pcpu_stats_ai.X)
+
+	seq_printf(m,
+			"Percpu Memory Statistics\n"
+			"Allocation Info:\n"
+			"----------------------------------------\n");
+	PL(unit_size);
+	PL(static_size);
+	PL(reserved_size);
+	PL(dyn_size);
+	PL(atom_size);
+	PL(alloc_size);
+	seq_putc(m, '\n');
+
+#undef PL
+
+#define PU(X) \
+	seq_printf(m, "  %-18s: %14llu\n", #X, (unsigned long long)pcpu_stats.X)
+
+	seq_printf(m,
+			"Global Stats:\n"
+			"----------------------------------------\n");
+	PU(nr_alloc);
+	PU(nr_dealloc);
+	PU(nr_cur_alloc);
+	PU(nr_max_alloc);
+	PU(nr_chunks);
+	PU(nr_max_chunks);
+	PU(min_alloc_size);
+	PU(max_alloc_size);
+	seq_putc(m, '\n');
+
+#undef PU
+
+	seq_printf(m,
+			"Per Chunk Stats:\n"
+			"----------------------------------------\n");
+
+	if (pcpu_reserved_chunk) {
+		seq_puts(m, "Chunk: <- Reserved Chunk\n");
+		chunk_map_stats(m, pcpu_reserved_chunk, buffer);
+	}
+
+	for (slot = 0; slot < pcpu_nr_slots; slot++) {
+		list_for_each_entry(chunk, &pcpu_slot[slot], list) {
+			if (chunk == pcpu_first_chunk) {
+				seq_puts(m, "Chunk: <- First Chunk\n");
+				chunk_map_stats(m, chunk, buffer);
+
+
+			} else {
+				seq_puts(m, "Chunk:\n");
+				chunk_map_stats(m, chunk, buffer);
+			}
+
+		}
+	}
+
+	spin_unlock_irq(&pcpu_lock);
+
+	vfree(buffer);
+
+	return 0;
+}
+
+static int percpu_stats_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, percpu_stats_show, NULL);
+}
+
+static const struct file_operations percpu_stats_fops = {
+	.open		= percpu_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int __init init_percpu_stats_debugfs(void)
+{
+	debugfs_create_file("percpu_stats", 0444, NULL, NULL,
+			&percpu_stats_fops);
+
+	return 0;
+}
+
+late_initcall(init_percpu_stats_debugfs);
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 9ac6394..5915a22 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -343,11 +343,16 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 
 	chunk->data = vms;
 	chunk->base_addr = vms[0]->addr - pcpu_group_offsets[0];
+
+	pcpu_stats_chunk_alloc();
+
 	return chunk;
 }
 
 static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 {
+	pcpu_stats_chunk_dealloc();
+
 	if (chunk && chunk->data)
 		pcpu_free_vm_areas(chunk->data, pcpu_nr_groups);
 	pcpu_free_chunk(chunk);
diff --git a/mm/percpu.c b/mm/percpu.c
index 5cf7d73..25b4ba5 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -657,6 +657,7 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme,
 	int *p;
 
 	lockdep_assert_held(&pcpu_lock);
+	pcpu_stats_area_dealloc(chunk);
 
 	freeme |= 1;	/* we are searching for <given offset, in use> pair */
 
@@ -721,6 +722,7 @@ static struct pcpu_chunk *pcpu_alloc_chunk(void)
 	chunk->map[0] = 0;
 	chunk->map[1] = pcpu_unit_size | 1;
 	chunk->map_used = 1;
+	chunk->has_reserved = false;
 
 	INIT_LIST_HEAD(&chunk->list);
 	INIT_LIST_HEAD(&chunk->map_extend_list);
@@ -970,6 +972,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 	goto restart;
 
 area_found:
+	pcpu_stats_area_alloc(chunk, size);
 	spin_unlock_irqrestore(&pcpu_lock, flags);
 
 	/* populate if not all pages are already there */
@@ -1642,6 +1645,8 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 	pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
 		BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long);
 
+	pcpu_stats_save_ai(ai);
+
 	/*
 	 * Allocate chunk slots.  The additional last slot is for
 	 * empty chunks.
@@ -1685,6 +1690,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 	if (schunk->free_size)
 		schunk->map[++schunk->map_used] = ai->static_size + schunk->free_size;
 	schunk->map[schunk->map_used] |= 1;
+	schunk->has_reserved = true;
 
 	/* init dynamic chunk if necessary */
 	if (dyn_size) {
@@ -1703,6 +1709,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 		dchunk->map[1] = pcpu_reserved_chunk_limit;
 		dchunk->map[2] = (pcpu_reserved_chunk_limit + dchunk->free_size) | 1;
 		dchunk->map_used = 2;
+		dchunk->has_reserved = true;
 	}
 
 	/* link the first chunk in */
@@ -1711,6 +1718,8 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 		pcpu_count_occupied_pages(pcpu_first_chunk, 1);
 	pcpu_chunk_relocate(pcpu_first_chunk, -1);
 
+	pcpu_stats_chunk_alloc();
+
 	/* we're done */
 	pcpu_base_addr = base_addr;
 	return 0;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs
@ 2017-06-19 23:28   ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

There is limited visibility into the use of percpu memory leaving us
unable to reason about correctness of parameters and overall use of
percpu memory. These counters and statistics aim to help understand
basic statistics about percpu memory such as number of allocations over
the lifetime, allocation sizes, and fragmentation.

New Config: PERCPU_STATS

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 mm/Kconfig           |   8 ++
 mm/Makefile          |   1 +
 mm/percpu-internal.h | 131 ++++++++++++++++++++++++++++++
 mm/percpu-km.c       |   4 +
 mm/percpu-stats.c    | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/percpu-vm.c       |   5 ++
 mm/percpu.c          |   9 +++
 7 files changed, 380 insertions(+)
 create mode 100644 mm/percpu-stats.c

diff --git a/mm/Kconfig b/mm/Kconfig
index beb7a45..8fae426 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -706,3 +706,11 @@ config ARCH_USES_HIGH_VMA_FLAGS
 	bool
 config ARCH_HAS_PKEYS
 	bool
+
+config PERCPU_STATS
+	bool "Collect percpu memory statistics"
+	default n
+	help
+	  This feature collects and exposes statistics via debugfs. The
+	  information includes global and per chunk statistics, which can
+	  be used to help understand percpu memory usage.
diff --git a/mm/Makefile b/mm/Makefile
index 026f6a8..411bd24 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -103,3 +103,4 @@ obj-$(CONFIG_IDLE_PAGE_TRACKING) += page_idle.o
 obj-$(CONFIG_FRAME_VECTOR) += frame_vector.o
 obj-$(CONFIG_DEBUG_PAGE_REF) += debug_page_ref.o
 obj-$(CONFIG_HARDENED_USERCOPY) += usercopy.o
+obj-$(CONFIG_PERCPU_STATS) += percpu-stats.o
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index 8b6cb2a..5509593 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -5,6 +5,11 @@
 #include <linux/percpu.h>
 
 struct pcpu_chunk {
+#ifdef CONFIG_PERCPU_STATS
+	int			nr_alloc;	/* # of allocations */
+	size_t			max_alloc_size; /* largest allocation size */
+#endif
+
 	struct list_head	list;		/* linked to pcpu_slot lists */
 	int			free_size;	/* free bytes in the chunk */
 	int			contig_hint;	/* max contiguous size hint */
@@ -18,6 +23,11 @@ struct pcpu_chunk {
 	void			*data;		/* chunk data */
 	int			first_free;	/* no free below this */
 	bool			immutable;	/* no [de]population allowed */
+	bool			has_reserved;	/* Indicates if chunk has reserved space
+						   at the beginning. Reserved chunk will
+						   contain reservation for static chunk.
+						   Dynamic chunk will contain reservation
+						   for static and reserved chunks. */
 	int			nr_populated;	/* # of populated pages */
 	unsigned long		populated[];	/* populated bitmap */
 };
@@ -30,4 +40,125 @@ extern int pcpu_nr_slots __read_mostly;
 extern struct pcpu_chunk *pcpu_first_chunk;
 extern struct pcpu_chunk *pcpu_reserved_chunk;
 
+#ifdef CONFIG_PERCPU_STATS
+
+#include <linux/spinlock.h>
+
+struct percpu_stats {
+	u64 nr_alloc;		/* lifetime # of allocations */
+	u64 nr_dealloc;		/* lifetime # of deallocations */
+	u64 nr_cur_alloc;	/* current # of allocations */
+	u64 nr_max_alloc;	/* max # of live allocations */
+	u32 nr_chunks;		/* current # of live chunks */
+	u32 nr_max_chunks;	/* max # of live chunks */
+	size_t min_alloc_size;	/* min allocaiton size */
+	size_t max_alloc_size;	/* max allocation size */
+};
+
+extern struct percpu_stats pcpu_stats;
+extern struct pcpu_alloc_info pcpu_stats_ai;
+
+/*
+ * For debug purposes. We don't care about the flexible array.
+ */
+static inline void pcpu_stats_save_ai(const struct pcpu_alloc_info *ai)
+{
+	memcpy(&pcpu_stats_ai, ai, sizeof(struct pcpu_alloc_info));
+
+	/* initialize min_alloc_size to unit_size */
+	pcpu_stats.min_alloc_size = pcpu_stats_ai.unit_size;
+}
+
+/*
+ * pcpu_stats_area_alloc - increment area allocation stats
+ * @chunk: the location of the area being allocated
+ * @size: size of area to allocate in bytes
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static inline void pcpu_stats_area_alloc(struct pcpu_chunk *chunk, size_t size)
+{
+	lockdep_assert_held(&pcpu_lock);
+
+	pcpu_stats.nr_alloc++;
+	pcpu_stats.nr_cur_alloc++;
+	pcpu_stats.nr_max_alloc =
+		max(pcpu_stats.nr_max_alloc, pcpu_stats.nr_cur_alloc);
+	pcpu_stats.min_alloc_size =
+		min(pcpu_stats.min_alloc_size, size);
+	pcpu_stats.max_alloc_size =
+		max(pcpu_stats.max_alloc_size, size);
+
+	chunk->nr_alloc++;
+	chunk->max_alloc_size = max(chunk->max_alloc_size, size);
+}
+
+/*
+ * pcpu_stats_area_dealloc - decrement allocation stats
+ * @chunk: the location of the area being deallocated
+ *
+ * CONTEXT:
+ * pcpu_lock.
+ */
+static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
+{
+	lockdep_assert_held(&pcpu_lock);
+
+	pcpu_stats.nr_dealloc++;
+	pcpu_stats.nr_cur_alloc--;
+
+	chunk->nr_alloc--;
+}
+
+/*
+ * pcpu_stats_chunk_alloc - increment chunk stats
+ */
+static inline void pcpu_stats_chunk_alloc(void)
+{
+	spin_lock_irq(&pcpu_lock);
+
+	pcpu_stats.nr_chunks++;
+	pcpu_stats.nr_max_chunks =
+		max(pcpu_stats.nr_max_chunks, pcpu_stats.nr_chunks);
+
+	spin_unlock_irq(&pcpu_lock);
+}
+
+/*
+ * pcpu_stats_chunk_dealloc - decrement chunk stats
+ */
+static inline void pcpu_stats_chunk_dealloc(void)
+{
+	spin_lock_irq(&pcpu_lock);
+
+	pcpu_stats.nr_chunks--;
+
+	spin_unlock_irq(&pcpu_lock);
+}
+
+#else
+
+static inline void pcpu_stats_save_ai(const struct pcpu_alloc_info *ai)
+{
+}
+
+static inline void pcpu_stats_area_alloc(struct pcpu_chunk *chunk, size_t size)
+{
+}
+
+static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
+{
+}
+
+static inline void pcpu_stats_chunk_alloc(void)
+{
+}
+
+static inline void pcpu_stats_chunk_dealloc(void)
+{
+}
+
+#endif /* !CONFIG_PERCPU_STATS */
+
 #endif
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index d66911f..3bbfa0c 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -72,6 +72,8 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 	pcpu_chunk_populated(chunk, 0, nr_pages);
 	spin_unlock_irq(&pcpu_lock);
 
+	pcpu_stats_chunk_alloc();
+
 	return chunk;
 }
 
@@ -79,6 +81,8 @@ static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 {
 	const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT;
 
+	pcpu_stats_chunk_dealloc();
+
 	if (chunk && chunk->data)
 		__free_pages(chunk->data, order_base_2(nr_pages));
 	pcpu_free_chunk(chunk);
diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c
new file mode 100644
index 0000000..03524a5
--- /dev/null
+++ b/mm/percpu-stats.c
@@ -0,0 +1,222 @@
+/*
+ * mm/percpu-debug.c
+ *
+ * Copyright (C) 2017		Facebook Inc.
+ * Copyright (C) 2017		Dennis Zhou <dennisz@fb.com>
+ *
+ * This file is released under the GPLv2.
+ *
+ * Prints statistics about the percpu allocator and backing chunks.
+ */
+#include <linux/debugfs.h>
+#include <linux/list.h>
+#include <linux/percpu.h>
+#include <linux/seq_file.h>
+#include <linux/sort.h>
+#include <linux/vmalloc.h>
+
+#include "percpu-internal.h"
+
+#define P(X, Y) \
+	seq_printf(m, "  %-24s: %8lld\n", X, (long long int)Y)
+
+struct percpu_stats pcpu_stats;
+struct pcpu_alloc_info pcpu_stats_ai;
+
+static int cmpint(const void *a, const void *b)
+{
+	return *(int *)a - *(int *)b;
+}
+
+/*
+ * Iterates over all chunks to find the max # of map entries used.
+ */
+static int find_max_map_used(void)
+{
+	struct pcpu_chunk *chunk;
+	int slot, max_map_used;
+
+	max_map_used = 0;
+	for (slot = 0; slot < pcpu_nr_slots; slot++)
+		list_for_each_entry(chunk, &pcpu_slot[slot], list)
+			max_map_used = max(max_map_used, chunk->map_used);
+
+	return max_map_used;
+}
+
+/*
+ * Prints out chunk state. Fragmentation is considered between
+ * the beginning of the chunk to the last allocation.
+ */
+static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk,
+			    void *buffer)
+{
+	int i, s_index, last_alloc, alloc_sign, as_len;
+	int *alloc_sizes, *p;
+	/* statistics */
+	int sum_frag = 0, max_frag = 0;
+	int cur_min_alloc = 0, cur_med_alloc = 0, cur_max_alloc = 0;
+
+	alloc_sizes = buffer;
+	s_index = chunk->has_reserved ? 1 : 0;
+
+	/* find last allocation */
+	last_alloc = -1;
+	for (i = chunk->map_used - 1; i >= s_index; i--) {
+		if (chunk->map[i] & 1) {
+			last_alloc = i;
+			break;
+		}
+	}
+
+	/* if the chunk is not empty - ignoring reserve */
+	if (last_alloc >= s_index) {
+		as_len = last_alloc + 1 - s_index;
+
+		/*
+		 * Iterate through chunk map computing size info.
+		 * The first bit is overloaded to be a used flag.
+		 * negative = free space, positive = allocated
+		 */
+		for (i = 0, p = chunk->map + s_index; i < as_len; i++, p++) {
+			alloc_sign = (*p & 1) ? 1 : -1;
+			alloc_sizes[i] = alloc_sign *
+				((p[1] & ~1) - (p[0] & ~1));
+		}
+
+		sort(alloc_sizes, as_len, sizeof(chunk->map[0]), cmpint, NULL);
+
+		/* Iterate through the unallocated fragements. */
+		for (i = 0, p = alloc_sizes; *p < 0 && i < as_len; i++, p++) {
+			sum_frag -= *p;
+			max_frag = max(max_frag, -1 * (*p));
+		}
+
+		cur_min_alloc = alloc_sizes[i];
+		cur_med_alloc = alloc_sizes[(i + as_len - 1) / 2];
+		cur_max_alloc = alloc_sizes[as_len - 1];
+	}
+
+	P("nr_alloc", chunk->nr_alloc);
+	P("max_alloc_size", chunk->max_alloc_size);
+	P("free_size", chunk->free_size);
+	P("contig_hint", chunk->contig_hint);
+	P("sum_frag", sum_frag);
+	P("max_frag", max_frag);
+	P("cur_min_alloc", cur_min_alloc);
+	P("cur_med_alloc", cur_med_alloc);
+	P("cur_max_alloc", cur_max_alloc);
+	seq_putc(m, '\n');
+}
+
+static int percpu_stats_show(struct seq_file *m, void *v)
+{
+	struct pcpu_chunk *chunk;
+	int slot, max_map_used;
+	void *buffer;
+
+alloc_buffer:
+	spin_lock_irq(&pcpu_lock);
+	max_map_used = find_max_map_used();
+	spin_unlock_irq(&pcpu_lock);
+
+	buffer = vmalloc(max_map_used * sizeof(pcpu_first_chunk->map[0]));
+	if (!buffer)
+		return -ENOMEM;
+
+	spin_lock_irq(&pcpu_lock);
+
+	/* if the buffer allocated earlier is too small */
+	if (max_map_used < find_max_map_used()) {
+		spin_unlock_irq(&pcpu_lock);
+		vfree(buffer);
+		goto alloc_buffer;
+	}
+
+#define PL(X) \
+	seq_printf(m, "  %-24s: %8lld\n", #X, (long long int)pcpu_stats_ai.X)
+
+	seq_printf(m,
+			"Percpu Memory Statistics\n"
+			"Allocation Info:\n"
+			"----------------------------------------\n");
+	PL(unit_size);
+	PL(static_size);
+	PL(reserved_size);
+	PL(dyn_size);
+	PL(atom_size);
+	PL(alloc_size);
+	seq_putc(m, '\n');
+
+#undef PL
+
+#define PU(X) \
+	seq_printf(m, "  %-18s: %14llu\n", #X, (unsigned long long)pcpu_stats.X)
+
+	seq_printf(m,
+			"Global Stats:\n"
+			"----------------------------------------\n");
+	PU(nr_alloc);
+	PU(nr_dealloc);
+	PU(nr_cur_alloc);
+	PU(nr_max_alloc);
+	PU(nr_chunks);
+	PU(nr_max_chunks);
+	PU(min_alloc_size);
+	PU(max_alloc_size);
+	seq_putc(m, '\n');
+
+#undef PU
+
+	seq_printf(m,
+			"Per Chunk Stats:\n"
+			"----------------------------------------\n");
+
+	if (pcpu_reserved_chunk) {
+		seq_puts(m, "Chunk: <- Reserved Chunk\n");
+		chunk_map_stats(m, pcpu_reserved_chunk, buffer);
+	}
+
+	for (slot = 0; slot < pcpu_nr_slots; slot++) {
+		list_for_each_entry(chunk, &pcpu_slot[slot], list) {
+			if (chunk == pcpu_first_chunk) {
+				seq_puts(m, "Chunk: <- First Chunk\n");
+				chunk_map_stats(m, chunk, buffer);
+
+
+			} else {
+				seq_puts(m, "Chunk:\n");
+				chunk_map_stats(m, chunk, buffer);
+			}
+
+		}
+	}
+
+	spin_unlock_irq(&pcpu_lock);
+
+	vfree(buffer);
+
+	return 0;
+}
+
+static int percpu_stats_open(struct inode *inode, struct file *filp)
+{
+	return single_open(filp, percpu_stats_show, NULL);
+}
+
+static const struct file_operations percpu_stats_fops = {
+	.open		= percpu_stats_open,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+	.release	= single_release,
+};
+
+static int __init init_percpu_stats_debugfs(void)
+{
+	debugfs_create_file("percpu_stats", 0444, NULL, NULL,
+			&percpu_stats_fops);
+
+	return 0;
+}
+
+late_initcall(init_percpu_stats_debugfs);
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 9ac6394..5915a22 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -343,11 +343,16 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 
 	chunk->data = vms;
 	chunk->base_addr = vms[0]->addr - pcpu_group_offsets[0];
+
+	pcpu_stats_chunk_alloc();
+
 	return chunk;
 }
 
 static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 {
+	pcpu_stats_chunk_dealloc();
+
 	if (chunk && chunk->data)
 		pcpu_free_vm_areas(chunk->data, pcpu_nr_groups);
 	pcpu_free_chunk(chunk);
diff --git a/mm/percpu.c b/mm/percpu.c
index 5cf7d73..25b4ba5 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -657,6 +657,7 @@ static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme,
 	int *p;
 
 	lockdep_assert_held(&pcpu_lock);
+	pcpu_stats_area_dealloc(chunk);
 
 	freeme |= 1;	/* we are searching for <given offset, in use> pair */
 
@@ -721,6 +722,7 @@ static struct pcpu_chunk *pcpu_alloc_chunk(void)
 	chunk->map[0] = 0;
 	chunk->map[1] = pcpu_unit_size | 1;
 	chunk->map_used = 1;
+	chunk->has_reserved = false;
 
 	INIT_LIST_HEAD(&chunk->list);
 	INIT_LIST_HEAD(&chunk->map_extend_list);
@@ -970,6 +972,7 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 	goto restart;
 
 area_found:
+	pcpu_stats_area_alloc(chunk, size);
 	spin_unlock_irqrestore(&pcpu_lock, flags);
 
 	/* populate if not all pages are already there */
@@ -1642,6 +1645,8 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 	pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
 		BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long);
 
+	pcpu_stats_save_ai(ai);
+
 	/*
 	 * Allocate chunk slots.  The additional last slot is for
 	 * empty chunks.
@@ -1685,6 +1690,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 	if (schunk->free_size)
 		schunk->map[++schunk->map_used] = ai->static_size + schunk->free_size;
 	schunk->map[schunk->map_used] |= 1;
+	schunk->has_reserved = true;
 
 	/* init dynamic chunk if necessary */
 	if (dyn_size) {
@@ -1703,6 +1709,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 		dchunk->map[1] = pcpu_reserved_chunk_limit;
 		dchunk->map[2] = (pcpu_reserved_chunk_limit + dchunk->free_size) | 1;
 		dchunk->map_used = 2;
+		dchunk->has_reserved = true;
 	}
 
 	/* link the first chunk in */
@@ -1711,6 +1718,8 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 		pcpu_count_occupied_pages(pcpu_first_chunk, 1);
 	pcpu_chunk_relocate(pcpu_first_chunk, -1);
 
+	pcpu_stats_chunk_alloc();
+
 	/* we're done */
 	pcpu_base_addr = base_addr;
 	return 0;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] percpu: add tracepoint support for percpu memory
  2017-06-19 23:28 ` Dennis Zhou
@ 2017-06-19 23:28   ` Dennis Zhou
  -1 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

Add support for tracepoints to the following events: chunk allocation,
chunk free, area allocation, area free, and area allocation failure.
This should let us replay percpu memory requests and evaluate
corresponding decisions.

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 include/trace/events/percpu.h | 125 ++++++++++++++++++++++++++++++++++++++++++
 mm/percpu-km.c                |   2 +
 mm/percpu-vm.c                |   2 +
 mm/percpu.c                   |  12 ++++
 4 files changed, 141 insertions(+)
 create mode 100644 include/trace/events/percpu.h

diff --git a/include/trace/events/percpu.h b/include/trace/events/percpu.h
new file mode 100644
index 0000000..ad34b1b
--- /dev/null
+++ b/include/trace/events/percpu.h
@@ -0,0 +1,125 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM percpu
+
+#if !defined(_TRACE_PERCPU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PERCPU_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(percpu_alloc_percpu,
+
+	TP_PROTO(bool reserved, bool is_atomic, size_t size,
+		 size_t align, void *base_addr, int off, void __percpu *ptr),
+
+	TP_ARGS(reserved, is_atomic, size, align, base_addr, off, ptr),
+
+	TP_STRUCT__entry(
+		__field(	bool,			reserved	)
+		__field(	bool,			is_atomic	)
+		__field(	size_t,			size		)
+		__field(	size_t,			align		)
+		__field(	void *,			base_addr	)
+		__field(	int,			off		)
+		__field(	void __percpu *,	ptr		)
+	),
+
+	TP_fast_assign(
+		__entry->reserved	= reserved;
+		__entry->is_atomic	= is_atomic;
+		__entry->size		= size;
+		__entry->align		= align;
+		__entry->base_addr	= base_addr;
+		__entry->off		= off;
+		__entry->ptr		= ptr;
+	),
+
+	TP_printk("reserved=%d is_atomic=%d size=%zu align=%zu base_addr=%p off=%d ptr=%p",
+		  __entry->reserved, __entry->is_atomic,
+		  __entry->size, __entry->align,
+		  __entry->base_addr, __entry->off, __entry->ptr)
+);
+
+TRACE_EVENT(percpu_free_percpu,
+
+	TP_PROTO(void *base_addr, int off, void __percpu *ptr),
+
+	TP_ARGS(base_addr, off, ptr),
+
+	TP_STRUCT__entry(
+		__field(	void *,			base_addr	)
+		__field(	int,			off		)
+		__field(	void __percpu *,	ptr		)
+	),
+
+	TP_fast_assign(
+		__entry->base_addr	= base_addr;
+		__entry->off		= off;
+		__entry->ptr		= ptr;
+	),
+
+	TP_printk("base_addr=%p off=%d ptr=%p",
+		__entry->base_addr, __entry->off, __entry->ptr)
+);
+
+TRACE_EVENT(percpu_alloc_percpu_fail,
+
+	TP_PROTO(bool reserved, bool is_atomic, size_t size, size_t align),
+
+	TP_ARGS(reserved, is_atomic, size, align),
+
+	TP_STRUCT__entry(
+		__field(	bool,	reserved	)
+		__field(	bool,	is_atomic	)
+		__field(	size_t,	size		)
+		__field(	size_t, align		)
+	),
+
+	TP_fast_assign(
+		__entry->reserved	= reserved;
+		__entry->is_atomic	= is_atomic;
+		__entry->size		= size;
+		__entry->align		= align;
+	),
+
+	TP_printk("reserved=%d is_atomic=%d size=%zu align=%zu",
+		  __entry->reserved, __entry->is_atomic,
+		  __entry->size, __entry->align)
+);
+
+TRACE_EVENT(percpu_create_chunk,
+
+	TP_PROTO(void *base_addr),
+
+	TP_ARGS(base_addr),
+
+	TP_STRUCT__entry(
+		__field(	void *, base_addr	)
+	),
+
+	TP_fast_assign(
+		__entry->base_addr	= base_addr;
+	),
+
+	TP_printk("base_addr=%p", __entry->base_addr)
+);
+
+TRACE_EVENT(percpu_destroy_chunk,
+
+	TP_PROTO(void *base_addr),
+
+	TP_ARGS(base_addr),
+
+	TP_STRUCT__entry(
+		__field(	void *,	base_addr	)
+	),
+
+	TP_fast_assign(
+		__entry->base_addr	= base_addr;
+	),
+
+	TP_printk("base_addr=%p", __entry->base_addr)
+);
+
+#endif /* _TRACE_PERCPU_H */
+
+#include <trace/define_trace.h>
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index 3bbfa0c..2b79e43 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -73,6 +73,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 	spin_unlock_irq(&pcpu_lock);
 
 	pcpu_stats_chunk_alloc();
+	trace_percpu_create_chunk(chunk->base_addr);
 
 	return chunk;
 }
@@ -82,6 +83,7 @@ static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 	const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT;
 
 	pcpu_stats_chunk_dealloc();
+	trace_percpu_destroy_chunk(chunk->base_addr);
 
 	if (chunk && chunk->data)
 		__free_pages(chunk->data, order_base_2(nr_pages));
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 5915a22..7ad9d94 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -345,6 +345,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 	chunk->base_addr = vms[0]->addr - pcpu_group_offsets[0];
 
 	pcpu_stats_chunk_alloc();
+	trace_percpu_create_chunk(chunk->base_addr);
 
 	return chunk;
 }
@@ -352,6 +353,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 {
 	pcpu_stats_chunk_dealloc();
+	trace_percpu_destroy_chunk(chunk->base_addr);
 
 	if (chunk && chunk->data)
 		pcpu_free_vm_areas(chunk->data, pcpu_nr_groups);
diff --git a/mm/percpu.c b/mm/percpu.c
index 25b4ba5..7a1707a 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -76,6 +76,9 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/percpu.h>
+
 #include "percpu-internal.h"
 
 #define PCPU_SLOT_BASE_SHIFT		5	/* 1-31 shares the same slot */
@@ -1015,11 +1018,17 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 
 	ptr = __addr_to_pcpu_ptr(chunk->base_addr + off);
 	kmemleak_alloc_percpu(ptr, size, gfp);
+
+	trace_percpu_alloc_percpu(reserved, is_atomic, size, align,
+			chunk->base_addr, off, ptr);
+
 	return ptr;
 
 fail_unlock:
 	spin_unlock_irqrestore(&pcpu_lock, flags);
 fail:
+	trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
+
 	if (!is_atomic && warn_limit) {
 		pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
 			size, align, is_atomic, err);
@@ -1269,6 +1278,8 @@ void free_percpu(void __percpu *ptr)
 			}
 	}
 
+	trace_percpu_free_percpu(chunk->base_addr, off, ptr);
+
 	spin_unlock_irqrestore(&pcpu_lock, flags);
 }
 EXPORT_SYMBOL_GPL(free_percpu);
@@ -1719,6 +1730,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 	pcpu_chunk_relocate(pcpu_first_chunk, -1);
 
 	pcpu_stats_chunk_alloc();
+	trace_percpu_create_chunk(base_addr);
 
 	/* we're done */
 	pcpu_base_addr = base_addr;
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] percpu: add tracepoint support for percpu memory
@ 2017-06-19 23:28   ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-19 23:28 UTC (permalink / raw)
  To: Tejun Heo, Christoph Lameter
  Cc: linux-mm, linux-kernel, kernel-team, Dennis Zhou

Add support for tracepoints to the following events: chunk allocation,
chunk free, area allocation, area free, and area allocation failure.
This should let us replay percpu memory requests and evaluate
corresponding decisions.

Signed-off-by: Dennis Zhou <dennisz@fb.com>
---
 include/trace/events/percpu.h | 125 ++++++++++++++++++++++++++++++++++++++++++
 mm/percpu-km.c                |   2 +
 mm/percpu-vm.c                |   2 +
 mm/percpu.c                   |  12 ++++
 4 files changed, 141 insertions(+)
 create mode 100644 include/trace/events/percpu.h

diff --git a/include/trace/events/percpu.h b/include/trace/events/percpu.h
new file mode 100644
index 0000000..ad34b1b
--- /dev/null
+++ b/include/trace/events/percpu.h
@@ -0,0 +1,125 @@
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM percpu
+
+#if !defined(_TRACE_PERCPU_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_PERCPU_H
+
+#include <linux/tracepoint.h>
+
+TRACE_EVENT(percpu_alloc_percpu,
+
+	TP_PROTO(bool reserved, bool is_atomic, size_t size,
+		 size_t align, void *base_addr, int off, void __percpu *ptr),
+
+	TP_ARGS(reserved, is_atomic, size, align, base_addr, off, ptr),
+
+	TP_STRUCT__entry(
+		__field(	bool,			reserved	)
+		__field(	bool,			is_atomic	)
+		__field(	size_t,			size		)
+		__field(	size_t,			align		)
+		__field(	void *,			base_addr	)
+		__field(	int,			off		)
+		__field(	void __percpu *,	ptr		)
+	),
+
+	TP_fast_assign(
+		__entry->reserved	= reserved;
+		__entry->is_atomic	= is_atomic;
+		__entry->size		= size;
+		__entry->align		= align;
+		__entry->base_addr	= base_addr;
+		__entry->off		= off;
+		__entry->ptr		= ptr;
+	),
+
+	TP_printk("reserved=%d is_atomic=%d size=%zu align=%zu base_addr=%p off=%d ptr=%p",
+		  __entry->reserved, __entry->is_atomic,
+		  __entry->size, __entry->align,
+		  __entry->base_addr, __entry->off, __entry->ptr)
+);
+
+TRACE_EVENT(percpu_free_percpu,
+
+	TP_PROTO(void *base_addr, int off, void __percpu *ptr),
+
+	TP_ARGS(base_addr, off, ptr),
+
+	TP_STRUCT__entry(
+		__field(	void *,			base_addr	)
+		__field(	int,			off		)
+		__field(	void __percpu *,	ptr		)
+	),
+
+	TP_fast_assign(
+		__entry->base_addr	= base_addr;
+		__entry->off		= off;
+		__entry->ptr		= ptr;
+	),
+
+	TP_printk("base_addr=%p off=%d ptr=%p",
+		__entry->base_addr, __entry->off, __entry->ptr)
+);
+
+TRACE_EVENT(percpu_alloc_percpu_fail,
+
+	TP_PROTO(bool reserved, bool is_atomic, size_t size, size_t align),
+
+	TP_ARGS(reserved, is_atomic, size, align),
+
+	TP_STRUCT__entry(
+		__field(	bool,	reserved	)
+		__field(	bool,	is_atomic	)
+		__field(	size_t,	size		)
+		__field(	size_t, align		)
+	),
+
+	TP_fast_assign(
+		__entry->reserved	= reserved;
+		__entry->is_atomic	= is_atomic;
+		__entry->size		= size;
+		__entry->align		= align;
+	),
+
+	TP_printk("reserved=%d is_atomic=%d size=%zu align=%zu",
+		  __entry->reserved, __entry->is_atomic,
+		  __entry->size, __entry->align)
+);
+
+TRACE_EVENT(percpu_create_chunk,
+
+	TP_PROTO(void *base_addr),
+
+	TP_ARGS(base_addr),
+
+	TP_STRUCT__entry(
+		__field(	void *, base_addr	)
+	),
+
+	TP_fast_assign(
+		__entry->base_addr	= base_addr;
+	),
+
+	TP_printk("base_addr=%p", __entry->base_addr)
+);
+
+TRACE_EVENT(percpu_destroy_chunk,
+
+	TP_PROTO(void *base_addr),
+
+	TP_ARGS(base_addr),
+
+	TP_STRUCT__entry(
+		__field(	void *,	base_addr	)
+	),
+
+	TP_fast_assign(
+		__entry->base_addr	= base_addr;
+	),
+
+	TP_printk("base_addr=%p", __entry->base_addr)
+);
+
+#endif /* _TRACE_PERCPU_H */
+
+#include <trace/define_trace.h>
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index 3bbfa0c..2b79e43 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -73,6 +73,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 	spin_unlock_irq(&pcpu_lock);
 
 	pcpu_stats_chunk_alloc();
+	trace_percpu_create_chunk(chunk->base_addr);
 
 	return chunk;
 }
@@ -82,6 +83,7 @@ static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 	const int nr_pages = pcpu_group_sizes[0] >> PAGE_SHIFT;
 
 	pcpu_stats_chunk_dealloc();
+	trace_percpu_destroy_chunk(chunk->base_addr);
 
 	if (chunk && chunk->data)
 		__free_pages(chunk->data, order_base_2(nr_pages));
diff --git a/mm/percpu-vm.c b/mm/percpu-vm.c
index 5915a22..7ad9d94 100644
--- a/mm/percpu-vm.c
+++ b/mm/percpu-vm.c
@@ -345,6 +345,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 	chunk->base_addr = vms[0]->addr - pcpu_group_offsets[0];
 
 	pcpu_stats_chunk_alloc();
+	trace_percpu_create_chunk(chunk->base_addr);
 
 	return chunk;
 }
@@ -352,6 +353,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
 static void pcpu_destroy_chunk(struct pcpu_chunk *chunk)
 {
 	pcpu_stats_chunk_dealloc();
+	trace_percpu_destroy_chunk(chunk->base_addr);
 
 	if (chunk && chunk->data)
 		pcpu_free_vm_areas(chunk->data, pcpu_nr_groups);
diff --git a/mm/percpu.c b/mm/percpu.c
index 25b4ba5..7a1707a 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -76,6 +76,9 @@
 #include <asm/tlbflush.h>
 #include <asm/io.h>
 
+#define CREATE_TRACE_POINTS
+#include <trace/events/percpu.h>
+
 #include "percpu-internal.h"
 
 #define PCPU_SLOT_BASE_SHIFT		5	/* 1-31 shares the same slot */
@@ -1015,11 +1018,17 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
 
 	ptr = __addr_to_pcpu_ptr(chunk->base_addr + off);
 	kmemleak_alloc_percpu(ptr, size, gfp);
+
+	trace_percpu_alloc_percpu(reserved, is_atomic, size, align,
+			chunk->base_addr, off, ptr);
+
 	return ptr;
 
 fail_unlock:
 	spin_unlock_irqrestore(&pcpu_lock, flags);
 fail:
+	trace_percpu_alloc_percpu_fail(reserved, is_atomic, size, align);
+
 	if (!is_atomic && warn_limit) {
 		pr_warn("allocation failed, size=%zu align=%zu atomic=%d, %s\n",
 			size, align, is_atomic, err);
@@ -1269,6 +1278,8 @@ void free_percpu(void __percpu *ptr)
 			}
 	}
 
+	trace_percpu_free_percpu(chunk->base_addr, off, ptr);
+
 	spin_unlock_irqrestore(&pcpu_lock, flags);
 }
 EXPORT_SYMBOL_GPL(free_percpu);
@@ -1719,6 +1730,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
 	pcpu_chunk_relocate(pcpu_first_chunk, -1);
 
 	pcpu_stats_chunk_alloc();
+	trace_percpu_create_chunk(base_addr);
 
 	/* we're done */
 	pcpu_base_addr = base_addr;
-- 
2.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator
  2017-06-19 23:28 ` Dennis Zhou
@ 2017-06-20 17:45   ` Tejun Heo
  -1 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2017-06-20 17:45 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Christoph Lameter, linux-mm, linux-kernel, kernel-team

On Mon, Jun 19, 2017 at 07:28:28PM -0400, Dennis Zhou wrote:
> There is limited visibility into the percpu memory allocator making it hard to
> understand usage patterns. Without these concrete numbers, we are left to
> conjecture about the correctness of percpu memory patterns and usage.
> Additionally, there is no mechanism to review the correctness/efficiency of the
> current implementation.
> 
> This patchset address the following:
> - Adds basic statistics to reason about the number of allocations over the
>   lifetime, allocation sizes, and fragmentation.
> - Adds tracepoints to enable better debug capabilities as well as the ability
>   to review allocation requests and corresponding decisions.
> 
> This patchiest contains the following four patches:
> 0001-percpu-add-missing-lockdep_assert_held-to-func-pcpu_.patch
> 0002-percpu-migrate-percpu-data-structures-to-internal-he.patch
> 0003-percpu-expose-statistics-about-percpu-memory-via-deb.patch
> 0004-percpu-add-tracepoint-support-for-percpu-memory.patch

Applied to percpu/for-4.13.  I had to update 0002 because of the
recent __ro_after_init changes.  Can you please see whether I made any
mistakes while updating it?

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-4.13

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator
@ 2017-06-20 17:45   ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2017-06-20 17:45 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Christoph Lameter, linux-mm, linux-kernel, kernel-team

On Mon, Jun 19, 2017 at 07:28:28PM -0400, Dennis Zhou wrote:
> There is limited visibility into the percpu memory allocator making it hard to
> understand usage patterns. Without these concrete numbers, we are left to
> conjecture about the correctness of percpu memory patterns and usage.
> Additionally, there is no mechanism to review the correctness/efficiency of the
> current implementation.
> 
> This patchset address the following:
> - Adds basic statistics to reason about the number of allocations over the
>   lifetime, allocation sizes, and fragmentation.
> - Adds tracepoints to enable better debug capabilities as well as the ability
>   to review allocation requests and corresponding decisions.
> 
> This patchiest contains the following four patches:
> 0001-percpu-add-missing-lockdep_assert_held-to-func-pcpu_.patch
> 0002-percpu-migrate-percpu-data-structures-to-internal-he.patch
> 0003-percpu-expose-statistics-about-percpu-memory-via-deb.patch
> 0004-percpu-add-tracepoint-support-for-percpu-memory.patch

Applied to percpu/for-4.13.  I had to update 0002 because of the
recent __ro_after_init changes.  Can you please see whether I made any
mistakes while updating it?

 git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu.git for-4.13

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator
  2017-06-20 17:45   ` Tejun Heo
  (?)
@ 2017-06-20 19:12   ` Dennis Zhou
  2017-06-20 19:32       ` Tejun Heo
  -1 siblings, 1 reply; 25+ messages in thread
From: Dennis Zhou @ 2017-06-20 19:12 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Christoph Lameter, linux-mm, linux-kernel, Kernel Team

On 6/20/17, 1:45 PM, "Tejun Heo" <htejun@gmail.com on behalf of tj@kernel.org> wrote:
> Applied to percpu/for-4.13.  I had to update 0002 because of the
> recent __ro_after_init changes.  Can you please see whether I made any
> mistakes while updating it?

There is a tagging mismatch in 0002. Can you please change or remove the __read_mostly annotation in mm/percpu-internal.h?

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator
  2017-06-20 19:12   ` Dennis Zhou
@ 2017-06-20 19:32       ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2017-06-20 19:32 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Christoph Lameter, linux-mm, linux-kernel, Kernel Team

On Tue, Jun 20, 2017 at 07:12:49PM +0000, Dennis Zhou wrote:
> On 6/20/17, 1:45 PM, "Tejun Heo" <htejun@gmail.com on behalf of tj@kernel.org> wrote:
> > Applied to percpu/for-4.13.  I had to update 0002 because of the
> > recent __ro_after_init changes.  Can you please see whether I made any
> > mistakes while updating it?
> 
> There is a tagging mismatch in 0002. Can you please change or remove the __read_mostly annotation in mm/percpu-internal.h?

Fixed.  Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator
@ 2017-06-20 19:32       ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2017-06-20 19:32 UTC (permalink / raw)
  To: Dennis Zhou; +Cc: Christoph Lameter, linux-mm, linux-kernel, Kernel Team

On Tue, Jun 20, 2017 at 07:12:49PM +0000, Dennis Zhou wrote:
> On 6/20/17, 1:45 PM, "Tejun Heo" <htejun@gmail.com on behalf of tj@kernel.org> wrote:
> > Applied to percpu/for-4.13.  I had to update 0002 because of the
> > recent __ro_after_init changes.  Can you please see whether I made any
> > mistakes while updating it?
> 
> There is a tagging mismatch in 0002. Can you please change or remove the __read_mostly annotation in mm/percpu-internal.h?

Fixed.  Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] percpu: add tracepoint support for percpu memory
  2017-06-19 23:28   ` Dennis Zhou
@ 2017-06-21 16:18     ` Levin, Alexander (Sasha Levin)
  -1 siblings, 0 replies; 25+ messages in thread
From: Levin, Alexander (Sasha Levin) @ 2017-06-21 16:18 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel, kernel-team

On Mon, Jun 19, 2017 at 07:28:32PM -0400, Dennis Zhou wrote:
>Add support for tracepoints to the following events: chunk allocation,
>chunk free, area allocation, area free, and area allocation failure.
>This should let us replay percpu memory requests and evaluate
>corresponding decisions.

This patch breaks boot for me:

[    0.000000] DEBUG_LOCKS_WARN_ON(unlikely(early_boot_irqs_disabled))
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2741 trace_hardirqs_on_caller.cold.58+0x47/0x4e
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc6-next-20170621+ #155
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[    0.000000] task: ffffffffb7831180 task.stack: ffffffffb7800000
[    0.000000] RIP: 0010:trace_hardirqs_on_caller.cold.58+0x47/0x4e
[    0.000000] RSP: 0000:ffffffffb78079d0 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[    0.000000] RAX: 0000000000000037 RBX: 0000000000000003 RCX: 0000000000000000
[    0.000000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 1ffffffff6f00ef6
[    0.000000] RBP: ffffffffb78079e0 R08: 0000000000000000 R09: ffffffffb7831180
[    0.000000] R10: 0000000000000000 R11: ffffffffb24e96ce R12: ffffffffb6b39b87
[    0.000000] R13: 00000000001f0001 R14: ffffffffb85603a0 R15: 0000000000002000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffffb81be000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ffff88007fbff000 CR3: 000000006b828000 CR4: 00000000000406b0
[    0.000000] Call Trace:
[    0.000000]  trace_hardirqs_on+0xd/0x10
[    0.000000]  _raw_spin_unlock_irq+0x27/0x50
[    0.000000]  pcpu_setup_first_chunk+0x19c2/0x1c27
[    0.000000]  ? pcpu_free_alloc_info+0x4b/0x4b
[    0.000000]  ? vprintk_emit+0x403/0x480
[    0.000000]  ? __down_trylock_console_sem+0xb7/0xc0
[    0.000000]  ? __down_trylock_console_sem+0x6e/0xc0
[    0.000000]  ? vprintk_emit+0x362/0x480
[    0.000000]  ? vprintk_default+0x28/0x30
[    0.000000]  ? printk+0xb2/0xdd
[    0.000000]  ? snapshot_ioctl.cold.1+0x19/0x19
[    0.000000]  ? __alloc_bootmem_node_nopanic+0x88/0x96
[    0.000000]  pcpu_embed_first_chunk+0x7b0/0x8ef
[    0.000000]  ? pcpup_populate_pte+0xb/0xb
[    0.000000]  setup_per_cpu_areas+0x105/0x6d9
[    0.000000]  ? find_last_bit+0xa6/0xd0
[    0.000000]  start_kernel+0x25e/0x78f
[    0.000000]  ? thread_stack_cache_init+0xb/0xb
[    0.000000]  ? early_idt_handler_common+0x3b/0x52
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x143/0x166
[    0.000000]  secondary_startup_64+0x9f/0x9f
[    0.000000] Code: c6 a0 49 c6 b6 48 c7 c7 e0 49 c6 b6 e8 43 34 00 00 0f ff e9 ed 71 ce ff 48 c7 c6 c0 79 c6 b6 48 c7 c7 e0 49 c6 b6 e8 29 34 00 00 <0f> ff e9 d3 71 ce ff 48 c7 c6 20 7c c6 b6 48 c7 c7 e0 49 c6 b6 
[    0.000000] random: print_oops_end_marker+0x30/0x50 get_random_bytes called with crng_init=0
[    0.000000] ---[ end trace f68728a0d3053b52 ]---
[    0.000000] BUG: unable to handle kernel paging request at 00000000ffffffff
[    0.000000] IP: native_write_msr+0x6/0x30
[    0.000000] PGD 0 
[    0.000000] P4D 0 
[    0.000000] 
[    0.000000] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W       4.12.0-rc6-next-20170621+ #155
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[    0.000000] task: ffffffffb7831180 task.stack: ffffffffb7800000
[    0.000000] RIP: 0010:native_write_msr+0x6/0x30
[    0.000000] RSP: 0000:ffffffffb7807dc8 EFLAGS: 00010202
[    0.000000] RAX: 000000003ea15d43 RBX: ffff88003ea15d40 RCX: 000000004b564d02
[    0.000000] RDX: 0000000000000000 RSI: 000000003ea15d43 RDI: 000000004b564d02
[    0.000000] RBP: ffffffffb7807df0 R08: 0000000000000040 R09: 0000000000000000
[    0.000000] R10: 0000000000007100 R11: 000000007ffd6f00 R12: 0000000000000000
[    0.000000] R13: 1ffffffff6f00fc3 R14: ffffffffb7807eb8 R15: dffffc0000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 00000000ffffffff CR3: 000000006b828000 CR4: 00000000000406b0
[    0.000000] Call Trace:
[    0.000000]  ? kvm_guest_cpu_init+0x155/0x220
[    0.000000]  kvm_smp_prepare_boot_cpu+0x9/0x10
[    0.000000]  start_kernel+0x28c/0x78f
[    0.000000]  ? thread_stack_cache_init+0xb/0xb
[    0.000000]  ? early_idt_handler_common+0x3b/0x52
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x143/0x166
[    0.000000]  secondary_startup_64+0x9f/0x9f
[    0.000000] Code: c3 0f 21 c8 5d c3 0f 21 d0 5d c3 0f 21 d8 5d c3 0f 21 f0 5d c3 0f 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 89 f9 89 f0 0f 30 <0f> 1f 44 00 00 c3 48 89 d6 55 89 c2 48 c1 e6 20 48 89 e5 48 09 
[    0.000000] RIP: native_write_msr+0x6/0x30 RSP: ffffffffb7807dc8
[    0.000000] CR2: 00000000ffffffff
[    0.000000] ---[ end trace f68728a0d3053b53 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception

-- 

Thanks,
Sasha

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] percpu: add tracepoint support for percpu memory
@ 2017-06-21 16:18     ` Levin, Alexander (Sasha Levin)
  0 siblings, 0 replies; 25+ messages in thread
From: Levin, Alexander (Sasha Levin) @ 2017-06-21 16:18 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Tejun Heo, Christoph Lameter, linux-mm, linux-kernel, kernel-team

On Mon, Jun 19, 2017 at 07:28:32PM -0400, Dennis Zhou wrote:
>Add support for tracepoints to the following events: chunk allocation,
>chunk free, area allocation, area free, and area allocation failure.
>This should let us replay percpu memory requests and evaluate
>corresponding decisions.

This patch breaks boot for me:

[    0.000000] DEBUG_LOCKS_WARN_ON(unlikely(early_boot_irqs_disabled))
[    0.000000] ------------[ cut here ]------------
[    0.000000] WARNING: CPU: 0 PID: 0 at kernel/locking/lockdep.c:2741 trace_hardirqs_on_caller.cold.58+0x47/0x4e
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.12.0-rc6-next-20170621+ #155
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[    0.000000] task: ffffffffb7831180 task.stack: ffffffffb7800000
[    0.000000] RIP: 0010:trace_hardirqs_on_caller.cold.58+0x47/0x4e
[    0.000000] RSP: 0000:ffffffffb78079d0 EFLAGS: 00010086 ORIG_RAX: 0000000000000000
[    0.000000] RAX: 0000000000000037 RBX: 0000000000000003 RCX: 0000000000000000
[    0.000000] RDX: 0000000000000000 RSI: 0000000000000001 RDI: 1ffffffff6f00ef6
[    0.000000] RBP: ffffffffb78079e0 R08: 0000000000000000 R09: ffffffffb7831180
[    0.000000] R10: 0000000000000000 R11: ffffffffb24e96ce R12: ffffffffb6b39b87
[    0.000000] R13: 00000000001f0001 R14: ffffffffb85603a0 R15: 0000000000002000
[    0.000000] FS:  0000000000000000(0000) GS:ffffffffb81be000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: ffff88007fbff000 CR3: 000000006b828000 CR4: 00000000000406b0
[    0.000000] Call Trace:
[    0.000000]  trace_hardirqs_on+0xd/0x10
[    0.000000]  _raw_spin_unlock_irq+0x27/0x50
[    0.000000]  pcpu_setup_first_chunk+0x19c2/0x1c27
[    0.000000]  ? pcpu_free_alloc_info+0x4b/0x4b
[    0.000000]  ? vprintk_emit+0x403/0x480
[    0.000000]  ? __down_trylock_console_sem+0xb7/0xc0
[    0.000000]  ? __down_trylock_console_sem+0x6e/0xc0
[    0.000000]  ? vprintk_emit+0x362/0x480
[    0.000000]  ? vprintk_default+0x28/0x30
[    0.000000]  ? printk+0xb2/0xdd
[    0.000000]  ? snapshot_ioctl.cold.1+0x19/0x19
[    0.000000]  ? __alloc_bootmem_node_nopanic+0x88/0x96
[    0.000000]  pcpu_embed_first_chunk+0x7b0/0x8ef
[    0.000000]  ? pcpup_populate_pte+0xb/0xb
[    0.000000]  setup_per_cpu_areas+0x105/0x6d9
[    0.000000]  ? find_last_bit+0xa6/0xd0
[    0.000000]  start_kernel+0x25e/0x78f
[    0.000000]  ? thread_stack_cache_init+0xb/0xb
[    0.000000]  ? early_idt_handler_common+0x3b/0x52
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x143/0x166
[    0.000000]  secondary_startup_64+0x9f/0x9f
[    0.000000] Code: c6 a0 49 c6 b6 48 c7 c7 e0 49 c6 b6 e8 43 34 00 00 0f ff e9 ed 71 ce ff 48 c7 c6 c0 79 c6 b6 48 c7 c7 e0 49 c6 b6 e8 29 34 00 00 <0f> ff e9 d3 71 ce ff 48 c7 c6 20 7c c6 b6 48 c7 c7 e0 49 c6 b6 
[    0.000000] random: print_oops_end_marker+0x30/0x50 get_random_bytes called with crng_init=0
[    0.000000] ---[ end trace f68728a0d3053b52 ]---
[    0.000000] BUG: unable to handle kernel paging request at 00000000ffffffff
[    0.000000] IP: native_write_msr+0x6/0x30
[    0.000000] PGD 0 
[    0.000000] P4D 0 
[    0.000000] 
[    0.000000] Oops: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC KASAN
[    0.000000] Modules linked in:
[    0.000000] CPU: 0 PID: 0 Comm: swapper Tainted: G        W       4.12.0-rc6-next-20170621+ #155
[    0.000000] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-1ubuntu1 04/01/2014
[    0.000000] task: ffffffffb7831180 task.stack: ffffffffb7800000
[    0.000000] RIP: 0010:native_write_msr+0x6/0x30
[    0.000000] RSP: 0000:ffffffffb7807dc8 EFLAGS: 00010202
[    0.000000] RAX: 000000003ea15d43 RBX: ffff88003ea15d40 RCX: 000000004b564d02
[    0.000000] RDX: 0000000000000000 RSI: 000000003ea15d43 RDI: 000000004b564d02
[    0.000000] RBP: ffffffffb7807df0 R08: 0000000000000040 R09: 0000000000000000
[    0.000000] R10: 0000000000007100 R11: 000000007ffd6f00 R12: 0000000000000000
[    0.000000] R13: 1ffffffff6f00fc3 R14: ffffffffb7807eb8 R15: dffffc0000000000
[    0.000000] FS:  0000000000000000(0000) GS:ffff88003ea00000(0000) knlGS:0000000000000000
[    0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.000000] CR2: 00000000ffffffff CR3: 000000006b828000 CR4: 00000000000406b0
[    0.000000] Call Trace:
[    0.000000]  ? kvm_guest_cpu_init+0x155/0x220
[    0.000000]  kvm_smp_prepare_boot_cpu+0x9/0x10
[    0.000000]  start_kernel+0x28c/0x78f
[    0.000000]  ? thread_stack_cache_init+0xb/0xb
[    0.000000]  ? early_idt_handler_common+0x3b/0x52
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  ? early_idt_handler_array+0x120/0x120
[    0.000000]  x86_64_start_reservations+0x24/0x26
[    0.000000]  x86_64_start_kernel+0x143/0x166
[    0.000000]  secondary_startup_64+0x9f/0x9f
[    0.000000] Code: c3 0f 21 c8 5d c3 0f 21 d0 5d c3 0f 21 d8 5d c3 0f 21 f0 5d c3 0f 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 89 f9 89 f0 0f 30 <0f> 1f 44 00 00 c3 48 89 d6 55 89 c2 48 c1 e6 20 48 89 e5 48 09 
[    0.000000] RIP: native_write_msr+0x6/0x30 RSP: ffffffffb7807dc8
[    0.000000] CR2: 00000000ffffffff
[    0.000000] ---[ end trace f68728a0d3053b53 ]---
[    0.000000] Kernel panic - not syncing: Fatal exception
[    0.000000] ---[ end Kernel panic - not syncing: Fatal exception

-- 

Thanks,
Sasha
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/1] percpu: fix early calls for spinlock in pcpu_stats
  2017-06-21 16:18     ` Levin, Alexander (Sasha Levin)
@ 2017-06-21 17:52       ` Dennis Zhou
  -1 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-21 17:52 UTC (permalink / raw)
  To: Levin, Alexander (Sasha Levin), Tejun Heo
  Cc: Christoph Lameter, linux-mm, linux-kernel, kernel-team

>From 2c06e795162cb306c9707ec51d3e1deadb37f573 Mon Sep 17 00:00:00 2001
From: Dennis Zhou <dennisz@fb.com>
Date: Wed, 21 Jun 2017 10:17:09 -0700

Commit 30a5b5367ef9 ("percpu: expose statistics about percpu memory via
debugfs") introduces percpu memory statistics. pcpu_stats_chunk_alloc
takes the spin lock and disables/enables irqs on creation of a chunk. Irqs
are not enabled when the first chunk is initialized and thus kernels are
failing to boot with kernel debugging enabled. Fixed by changing _irq to
_irqsave and _irqrestore.

Fixes: 30a5b5367ef9 ("percpu: expose statistics about percpu memory via debugfs")
Signed-off-by: Dennis Zhou <dennisz@fb.com>
Reported-by: Alexander Levin <alexander.levin@verizon.com>
---

Hi Sasha,

The root cause was from 0003 of that series where I prematurely enabled
irqs and the problem is addresssed here. I am able to boot with debug
options enabled.

Thanks,
Dennis

 mm/percpu-internal.h | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index d030fce..cd2442e 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -116,13 +116,14 @@ static inline void pcpu_stats_area_dealloc(struct pcpu_chunk *chunk)
  */
 static inline void pcpu_stats_chunk_alloc(void)
 {
-	spin_lock_irq(&pcpu_lock);
+	unsigned long flags;
+	spin_lock_irqsave(&pcpu_lock, flags);
 
 	pcpu_stats.nr_chunks++;
 	pcpu_stats.nr_max_chunks =
 		max(pcpu_stats.nr_max_chunks, pcpu_stats.nr_chunks);
 
-	spin_unlock_irq(&pcpu_lock);
+	spin_unlock_irqrestore(&pcpu_lock, flags);
 }
 
 /*
@@ -130,11 +131,12 @@ static inline void pcpu_stats_chunk_alloc(void)
  */
 static inline void pcpu_stats_chunk_dealloc(void)
 {
-	spin_lock_irq(&pcpu_lock);
+	unsigned long flags;
+	spin_lock_irqsave(&pcpu_lock, flags);
 
 	pcpu_stats.nr_chunks--;
 
-	spin_unlock_irq(&pcpu_lock);
+	spin_unlock_irqrestore(&pcpu_lock, flags);
 }
 
 #else
-- 
2.9.3

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 1/1] percpu: fix early calls for spinlock in pcpu_stats
@ 2017-06-21 17:52       ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-06-21 17:52 UTC (permalink / raw)
  To: Levin, Alexander (Sasha Levin), Tejun Heo
  Cc: Christoph Lameter, linux-mm, linux-kernel, kernel-team



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] percpu: fix early calls for spinlock in pcpu_stats
  2017-06-21 17:52       ` Dennis Zhou
@ 2017-06-21 17:54         ` Tejun Heo
  -1 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2017-06-21 17:54 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Levin, Alexander (Sasha Levin),
	Christoph Lameter, linux-mm, linux-kernel, kernel-team

On Wed, Jun 21, 2017 at 01:52:46PM -0400, Dennis Zhou wrote:
> From 2c06e795162cb306c9707ec51d3e1deadb37f573 Mon Sep 17 00:00:00 2001
> From: Dennis Zhou <dennisz@fb.com>
> Date: Wed, 21 Jun 2017 10:17:09 -0700
> 
> Commit 30a5b5367ef9 ("percpu: expose statistics about percpu memory via
> debugfs") introduces percpu memory statistics. pcpu_stats_chunk_alloc
> takes the spin lock and disables/enables irqs on creation of a chunk. Irqs
> are not enabled when the first chunk is initialized and thus kernels are
> failing to boot with kernel debugging enabled. Fixed by changing _irq to
> _irqsave and _irqrestore.
> 
> Fixes: 30a5b5367ef9 ("percpu: expose statistics about percpu memory via debugfs")
> Signed-off-by: Dennis Zhou <dennisz@fb.com>
> Reported-by: Alexander Levin <alexander.levin@verizon.com>

Applied to percpu/for-4.13.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/1] percpu: fix early calls for spinlock in pcpu_stats
@ 2017-06-21 17:54         ` Tejun Heo
  0 siblings, 0 replies; 25+ messages in thread
From: Tejun Heo @ 2017-06-21 17:54 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Levin, Alexander (Sasha Levin),
	Christoph Lameter, linux-mm, linux-kernel, kernel-team

On Wed, Jun 21, 2017 at 01:52:46PM -0400, Dennis Zhou wrote:
> From 2c06e795162cb306c9707ec51d3e1deadb37f573 Mon Sep 17 00:00:00 2001
> From: Dennis Zhou <dennisz@fb.com>
> Date: Wed, 21 Jun 2017 10:17:09 -0700
> 
> Commit 30a5b5367ef9 ("percpu: expose statistics about percpu memory via
> debugfs") introduces percpu memory statistics. pcpu_stats_chunk_alloc
> takes the spin lock and disables/enables irqs on creation of a chunk. Irqs
> are not enabled when the first chunk is initialized and thus kernels are
> failing to boot with kernel debugging enabled. Fixed by changing _irq to
> _irqsave and _irqrestore.
> 
> Fixes: 30a5b5367ef9 ("percpu: expose statistics about percpu memory via debugfs")
> Signed-off-by: Dennis Zhou <dennisz@fb.com>
> Reported-by: Alexander Levin <alexander.levin@verizon.com>

Applied to percpu/for-4.13.

Thanks.

-- 
tejun

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs
  2017-06-19 23:28   ` Dennis Zhou
@ 2017-07-07  8:16     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2017-07-07  8:16 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Tejun Heo, Christoph Lameter, Linux MM, linux-kernel, kernel-team

Hi Dennis,

On Tue, Jun 20, 2017 at 1:28 AM, Dennis Zhou <dennisz@fb.com> wrote:
> There is limited visibility into the use of percpu memory leaving us
> unable to reason about correctness of parameters and overall use of
> percpu memory. These counters and statistics aim to help understand
> basic statistics about percpu memory such as number of allocations over
> the lifetime, allocation sizes, and fragmentation.
>
> New Config: PERCPU_STATS
>
> Signed-off-by: Dennis Zhou <dennisz@fb.com>
> ---
>  mm/Kconfig           |   8 ++
>  mm/Makefile          |   1 +
>  mm/percpu-internal.h | 131 ++++++++++++++++++++++++++++++
>  mm/percpu-km.c       |   4 +
>  mm/percpu-stats.c    | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/percpu-vm.c       |   5 ++
>  mm/percpu.c          |   9 +++
>  7 files changed, 380 insertions(+)
>  create mode 100644 mm/percpu-stats.c
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index beb7a45..8fae426 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -706,3 +706,11 @@ config ARCH_USES_HIGH_VMA_FLAGS
>         bool
>  config ARCH_HAS_PKEYS
>         bool
> +
> +config PERCPU_STATS
> +       bool "Collect percpu memory statistics"
> +       default n
> +       help
> +         This feature collects and exposes statistics via debugfs. The
> +         information includes global and per chunk statistics, which can
> +         be used to help understand percpu memory usage.

Just wondering: does this option make sense to enable on !SMP?

If not, you may want to make it depend on SMP.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs
@ 2017-07-07  8:16     ` Geert Uytterhoeven
  0 siblings, 0 replies; 25+ messages in thread
From: Geert Uytterhoeven @ 2017-07-07  8:16 UTC (permalink / raw)
  To: Dennis Zhou
  Cc: Tejun Heo, Christoph Lameter, Linux MM, linux-kernel, kernel-team

Hi Dennis,

On Tue, Jun 20, 2017 at 1:28 AM, Dennis Zhou <dennisz@fb.com> wrote:
> There is limited visibility into the use of percpu memory leaving us
> unable to reason about correctness of parameters and overall use of
> percpu memory. These counters and statistics aim to help understand
> basic statistics about percpu memory such as number of allocations over
> the lifetime, allocation sizes, and fragmentation.
>
> New Config: PERCPU_STATS
>
> Signed-off-by: Dennis Zhou <dennisz@fb.com>
> ---
>  mm/Kconfig           |   8 ++
>  mm/Makefile          |   1 +
>  mm/percpu-internal.h | 131 ++++++++++++++++++++++++++++++
>  mm/percpu-km.c       |   4 +
>  mm/percpu-stats.c    | 222 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/percpu-vm.c       |   5 ++
>  mm/percpu.c          |   9 +++
>  7 files changed, 380 insertions(+)
>  create mode 100644 mm/percpu-stats.c
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index beb7a45..8fae426 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -706,3 +706,11 @@ config ARCH_USES_HIGH_VMA_FLAGS
>         bool
>  config ARCH_HAS_PKEYS
>         bool
> +
> +config PERCPU_STATS
> +       bool "Collect percpu memory statistics"
> +       default n
> +       help
> +         This feature collects and exposes statistics via debugfs. The
> +         information includes global and per chunk statistics, which can
> +         be used to help understand percpu memory usage.

Just wondering: does this option make sense to enable on !SMP?

If not, you may want to make it depend on SMP.

Thanks!

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs
  2017-07-07  8:16     ` Geert Uytterhoeven
@ 2017-07-08 20:33       ` Dennis Zhou
  -1 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-07-08 20:33 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Tejun Heo, Christoph Lameter, Linux MM, linux-kernel, kernel-team

On Fri, Jul 07, 2017 at 10:16:01AM +0200, Geert Uytterhoeven wrote:
> Hi Dennis,
> 
> On Tue, Jun 20, 2017 at 1:28 AM, Dennis Zhou <dennisz@fb.com> wrote:
> 
> Just wondering: does this option make sense to enable on !SMP?
> 
> If not, you may want to make it depend on SMP.
> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert

Hi Geert,

The percpu allocator is still used on UP configs, so it would still
provide useful data.

Thanks,
Dennis

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs
@ 2017-07-08 20:33       ` Dennis Zhou
  0 siblings, 0 replies; 25+ messages in thread
From: Dennis Zhou @ 2017-07-08 20:33 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: Tejun Heo, Christoph Lameter, Linux MM, linux-kernel, kernel-team

On Fri, Jul 07, 2017 at 10:16:01AM +0200, Geert Uytterhoeven wrote:
> Hi Dennis,
> 
> On Tue, Jun 20, 2017 at 1:28 AM, Dennis Zhou <dennisz@fb.com> wrote:
> 
> Just wondering: does this option make sense to enable on !SMP?
> 
> If not, you may want to make it depend on SMP.
> 
> Thanks!
> 
> Gr{oetje,eeting}s,
> 
>                         Geert

Hi Geert,

The percpu allocator is still used on UP configs, so it would still
provide useful data.

Thanks,
Dennis

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-07-08 20:33 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-06-19 23:28 [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator Dennis Zhou
2017-06-19 23:28 ` Dennis Zhou
2017-06-19 23:28 ` [PATCH 1/4] percpu: add missing lockdep_assert_held to func pcpu_free_area Dennis Zhou
2017-06-19 23:28   ` Dennis Zhou
2017-06-19 23:28 ` [PATCH 2/4] percpu: migrate percpu data structures to internal header Dennis Zhou
2017-06-19 23:28   ` Dennis Zhou
2017-06-19 23:28 ` [PATCH 3/4] percpu: expose statistics about percpu memory via debugfs Dennis Zhou
2017-06-19 23:28   ` Dennis Zhou
2017-07-07  8:16   ` Geert Uytterhoeven
2017-07-07  8:16     ` Geert Uytterhoeven
2017-07-08 20:33     ` Dennis Zhou
2017-07-08 20:33       ` Dennis Zhou
2017-06-19 23:28 ` [PATCH 4/4] percpu: add tracepoint support for percpu memory Dennis Zhou
2017-06-19 23:28   ` Dennis Zhou
2017-06-21 16:18   ` Levin, Alexander (Sasha Levin)
2017-06-21 16:18     ` Levin, Alexander (Sasha Levin)
2017-06-21 17:52     ` [PATCH 1/1] percpu: fix early calls for spinlock in pcpu_stats Dennis Zhou
2017-06-21 17:52       ` Dennis Zhou
2017-06-21 17:54       ` Tejun Heo
2017-06-21 17:54         ` Tejun Heo
2017-06-20 17:45 ` [PATCH 0/4] percpu: add basic stats and tracepoints to percpu allocator Tejun Heo
2017-06-20 17:45   ` Tejun Heo
2017-06-20 19:12   ` Dennis Zhou
2017-06-20 19:32     ` Tejun Heo
2017-06-20 19:32       ` Tejun Heo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.