From: Andrew Morton <akpm@linux-foundation.org>
To: mm-commits@vger.kernel.org,vbabka@suse.cz,mhocko@suse.com,glider@google.com,elver@google.com,andreyknvl@gmail.com,osalvador@suse.de,akpm@linux-foundation.org
Subject: + lib-stackdepot-move-stack_record-struct-definition-into-the-header.patch added to mm-unstable branch
Date: Mon, 12 Feb 2024 15:28:36 -0800 [thread overview]
Message-ID: <20240212232836.DA5ABC433F1@smtp.kernel.org> (raw)
The patch titled
Subject: lib/stackdepot: move stack_record struct definition into the header
has been added to the -mm mm-unstable branch. Its filename is
lib-stackdepot-move-stack_record-struct-definition-into-the-header.patch
This patch will shortly appear at
https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/lib-stackdepot-move-stack_record-struct-definition-into-the-header.patch
This patch will later appear in the mm-unstable branch at
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Before you just go and hit "reply", please:
a) Consider who else should be cc'ed
b) Prefer to cc a suitable mailing list as well
c) Ideally: find the original patch on the mailing list and do a
reply-to-all to that, adding suitable additional cc's
*** Remember to use Documentation/process/submit-checklist.rst when testing your code ***
The -mm tree is included into linux-next via the mm-everything
branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
and is updated there every 2-3 working days
------------------------------------------------------
From: Oscar Salvador <osalvador@suse.de>
Subject: lib/stackdepot: move stack_record struct definition into the header
Date: Mon, 12 Feb 2024 23:30:25 +0100
Patch eries "page_owner: print stacks and their outstanding allocations",
v8.
page_owner is a great debug functionality tool that lets us know about all
pages that have been allocated/freed and their specific stacktrace. This
comes in very handy when debugging memory leaks, since with some scripting
we can see the outstanding allocations, which might point to a memory
leak.
In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages and try to reconstruct the
stack <-> allocated/freed relationship, becoming most of the time a
daunting and slow process when we have tons of allocation/free operations.
This patchset aims to ease that by adding a new functionality into
page_owner. This functionality creates a new directory called
'page_owner_stacks' under 'sys/kernel//debug' with a read-only file called
'show_stacks', which prints out all the stacks followed by their
outstanding number of allocations (being that the times the stacktrace has
allocated but not freed yet). This gives us a clear and a quick overview
of stacks <-> allocated/free.
We take advantage of the new refcount_f field that stack_record struct
gained, and increment/decrement the stack refcount on every
__set_page_owner() (alloc operation) and __reset_page_owner (free
operation) call.
Unfortunately, we cannot use the new stackdepot api
STACK_DEPOT_FLAG_{GET,PUT} because it does not fulfill page_owner needs,
meaning we would have to special case things, at which point makes more
sense for page_owner to do its own {dec,inc}rementing of the stacks. E.g:
Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0, such stack gets
evicted, so page_owner would lose information.
This patch also creates a new file called 'set_threshold' within
'page_owner_stacks' directory, and by writing a value to it, the stacks
which refcount is below such value will be filtered out.
A PoC can be found below:
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks.txt
# head -40 page_owner_full_stacks.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
alloc_pages_mpol+0x91/0x1f0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb2/0x100
page_cache_ra_unbounded+0x96/0x180
filemap_get_pages+0xfd/0x590
filemap_read+0xcc/0x330
blkdev_read_iter+0xb8/0x150
vfs_read+0x285/0x320
ksys_read+0xa5/0xe0
do_syscall_64+0x80/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
stack_count: 521
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
alloc_pages_mpol+0x91/0x1f0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb2/0x100
__filemap_get_folio+0x14a/0x490
ext4_write_begin+0xbd/0x4b0 [ext4]
generic_perform_write+0xc1/0x1e0
ext4_buffered_write_iter+0x68/0xe0 [ext4]
ext4_file_write_iter+0x70/0x740 [ext4]
vfs_write+0x33d/0x420
ksys_write+0xa5/0xe0
do_syscall_64+0x80/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
stack_count: 4609
...
...
# echo 5000 > /sys/kernel/debug/page_owner_stacks/set_threshold
# cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks_5000.txt
# head -40 page_owner_full_stacks_5000.txt
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
alloc_pages_mpol+0x91/0x1f0
folio_alloc+0x14/0x50
filemap_alloc_folio+0xb2/0x100
__filemap_get_folio+0x14a/0x490
ext4_write_begin+0xbd/0x4b0 [ext4]
generic_perform_write+0xc1/0x1e0
ext4_buffered_write_iter+0x68/0xe0 [ext4]
ext4_file_write_iter+0x70/0x740 [ext4]
vfs_write+0x33d/0x420
ksys_pwrite64+0x75/0x90
do_syscall_64+0x80/0x160
entry_SYSCALL_64_after_hwframe+0x6e/0x76
stack_count: 6781
prep_new_page+0xa9/0x120
get_page_from_freelist+0x801/0x2210
__alloc_pages+0x18b/0x350
pcpu_populate_chunk+0xec/0x350
pcpu_balance_workfn+0x2d1/0x4a0
process_scheduled_works+0x84/0x380
worker_thread+0x12a/0x2a0
kthread+0xe3/0x110
ret_from_fork+0x30/0x50
ret_from_fork_asm+0x1b/0x30
stack_count: 8641
This patch (of 5):
In order to move the heavy lifting into page_owner code, this one needs to
have access to the stack_record structure, which right now sits in
lib/stackdepot.c. Move it to the stackdepot.h header so page_owner can
access stack_record's struct fields.
Link: https://lkml.kernel.org/r/20240212223029.30769-1-osalvador@suse.de
Link: https://lkml.kernel.org/r/20240212223029.30769-2-osalvador@suse.de
Signed-off-by: Oscar Salvador <osalvador@suse.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Andrey Konovalov <andreyknvl@gmail.com>
Cc: Marco Elver <elver@google.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
include/linux/stackdepot.h | 44 +++++++++++++++++++++++++++++++++++
lib/stackdepot.c | 43 ----------------------------------
2 files changed, 44 insertions(+), 43 deletions(-)
--- a/include/linux/stackdepot.h~lib-stackdepot-move-stack_record-struct-definition-into-the-header
+++ a/include/linux/stackdepot.h
@@ -30,6 +30,50 @@ typedef u32 depot_stack_handle_t;
*/
#define STACK_DEPOT_EXTRA_BITS 5
+#define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
+
+#define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
+#define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
+#define DEPOT_STACK_ALIGN 4
+#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
+#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
+ STACK_DEPOT_EXTRA_BITS)
+
+/* Compact structure that stores a reference to a stack. */
+union handle_parts {
+ depot_stack_handle_t handle;
+ struct {
+ u32 pool_index : DEPOT_POOL_INDEX_BITS;
+ u32 offset : DEPOT_OFFSET_BITS;
+ u32 extra : STACK_DEPOT_EXTRA_BITS;
+ };
+};
+
+struct stack_record {
+ struct list_head hash_list; /* Links in the hash table */
+ u32 hash; /* Hash in hash table */
+ u32 size; /* Number of stored frames */
+ union handle_parts handle; /* Constant after initialization */
+ refcount_t count;
+ union {
+ unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES]; /* Frames */
+ struct {
+ /*
+ * An important invariant of the implementation is to
+ * only place a stack record onto the freelist iff its
+ * refcount is zero. Because stack records with a zero
+ * refcount are never considered as valid, it is safe to
+ * union @entries and freelist management state below.
+ * Conversely, as soon as an entry is off the freelist
+ * and its refcount becomes non-zero, the below must not
+ * be accessed until being placed back on the freelist.
+ */
+ struct list_head free_list; /* Links in the freelist */
+ unsigned long rcu_state; /* RCU cookie */
+ };
+ };
+};
+
typedef u32 depot_flags_t;
/*
--- a/lib/stackdepot.c~lib-stackdepot-move-stack_record-struct-definition-into-the-header
+++ a/lib/stackdepot.c
@@ -36,54 +36,11 @@
#include <linux/memblock.h>
#include <linux/kasan-enabled.h>
-#define DEPOT_HANDLE_BITS (sizeof(depot_stack_handle_t) * 8)
-
-#define DEPOT_POOL_ORDER 2 /* Pool size order, 4 pages */
-#define DEPOT_POOL_SIZE (1LL << (PAGE_SHIFT + DEPOT_POOL_ORDER))
-#define DEPOT_STACK_ALIGN 4
-#define DEPOT_OFFSET_BITS (DEPOT_POOL_ORDER + PAGE_SHIFT - DEPOT_STACK_ALIGN)
-#define DEPOT_POOL_INDEX_BITS (DEPOT_HANDLE_BITS - DEPOT_OFFSET_BITS - \
- STACK_DEPOT_EXTRA_BITS)
#define DEPOT_POOLS_CAP 8192
#define DEPOT_MAX_POOLS \
(((1LL << (DEPOT_POOL_INDEX_BITS)) < DEPOT_POOLS_CAP) ? \
(1LL << (DEPOT_POOL_INDEX_BITS)) : DEPOT_POOLS_CAP)
-/* Compact structure that stores a reference to a stack. */
-union handle_parts {
- depot_stack_handle_t handle;
- struct {
- u32 pool_index : DEPOT_POOL_INDEX_BITS;
- u32 offset : DEPOT_OFFSET_BITS;
- u32 extra : STACK_DEPOT_EXTRA_BITS;
- };
-};
-
-struct stack_record {
- struct list_head hash_list; /* Links in the hash table */
- u32 hash; /* Hash in hash table */
- u32 size; /* Number of stored frames */
- union handle_parts handle; /* Constant after initialization */
- refcount_t count;
- union {
- unsigned long entries[CONFIG_STACKDEPOT_MAX_FRAMES]; /* Frames */
- struct {
- /*
- * An important invariant of the implementation is to
- * only place a stack record onto the freelist iff its
- * refcount is zero. Because stack records with a zero
- * refcount are never considered as valid, it is safe to
- * union @entries and freelist management state below.
- * Conversely, as soon as an entry is off the freelist
- * and its refcount becomes non-zero, the below must not
- * be accessed until being placed back on the freelist.
- */
- struct list_head free_list; /* Links in the freelist */
- unsigned long rcu_state; /* RCU cookie */
- };
- };
-};
-
static bool stack_depot_disabled;
static bool __stack_depot_early_init_requested __initdata = IS_ENABLED(CONFIG_STACKDEPOT_ALWAYS_INIT);
static bool __stack_depot_early_init_passed __initdata;
_
Patches currently in -mm which might be from osalvador@suse.de are
lib-stackdepot-move-stack_record-struct-definition-into-the-header.patch
mmpage_owner-implement-the-tracking-of-the-stacks-count.patch
mmpage_owner-display-all-stacks-and-their-count.patch
mmpage_owner-filter-out-stacks-by-a-threshold.patch
mmpage_owner-update-documentation-regarding-page_owner_stacks.patch
next reply other threads:[~2024-02-12 23:28 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-02-12 23:28 Andrew Morton [this message]
2024-02-14 18:32 + lib-stackdepot-move-stack_record-struct-definition-into-the-header.patch added to mm-unstable branch Andrew Morton
2024-02-15 23:34 Andrew Morton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20240212232836.DA5ABC433F1@smtp.kernel.org \
--to=akpm@linux-foundation.org \
--cc=andreyknvl@gmail.com \
--cc=elver@google.com \
--cc=glider@google.com \
--cc=mhocko@suse.com \
--cc=mm-commits@vger.kernel.org \
--cc=osalvador@suse.de \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.