All of lore.kernel.org
 help / color / mirror / Atom feed
From: Oscar Salvador <osalvador@suse.de>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	Marco Elver <elver@google.com>,
	Andrey Konovalov <andreyknvl@gmail.com>,
	Alexander Potapenko <glider@google.com>,
	Oscar Salvador <osalvador@suse.de>
Subject: [PATCH v9 0/7] page_owner: print stacks and their outstanding allocations
Date: Wed, 14 Feb 2024 18:01:50 +0100	[thread overview]
Message-ID: <20240214170157.17530-1-osalvador@suse.de> (raw)

Changes v8 -> v9
     - Fix handle-0 for the very first stack_record entry
     - Collect Acked-by and Reviewed-by from Marco and Vlastimil
     - Adressed feedback from Marco and Vlastimil
     - stack_print() no longer allocates a memory buffer, prints directly
       using seq_printf: by Vlastimil
     - Added two static struct stack for dummy_handle and faiure_handle
     - add_stack_record_to_list() now filters out the gfp_mask the same way
       stackdepot does, for consistency
     - Rename set_threshold to count_threshold

Changes v7 -> v8
     - Rebased on top of -next
     - page_owner maintains its own stack_records list now
     - Kill auxiliary stackdepot function to traverse buckets
     - page_owner_stacks is now a directory with 'show_stacks'
       and 'set_threshold'
     - Update Documentation/mm/page_owner.rst
     - Adressed feedback from Marco

Changes v6 -> v7:
     - Rebased on top of Andrey Konovalov's libstackdepot patchset
     - Reformulated the changelogs

Changes v5 -> v6:
     - Rebase on top of v6.7-rc1
     - Move stack_record struct to the header
     - Addressed feedback from Vlastimil
       (some code tweaks and changelogs suggestions)

Changes v4 -> v5:
     - Addressed feedback from Alexander Potapenko

Changes v3 -> v4:
     - Rebase (long time has passed)
     - Use boolean instead of enum for action by Alexander Potapenko
     - (I left some feedback untouched because it's been long and
        would like to discuss it here now instead of re-vamping
        and old thread)

Changes v2 -> v3:
     - Replace interface in favor of seq operations
       (suggested by Vlastimil)
     - Use debugfs interface to store/read valued (suggested by Ammar)


page_owner is a great debug functionality tool that lets us know
about all pages that have been allocated/freed and their specific
stacktrace.
This comes very handy when debugging memory leaks, since with
some scripting we can see the outstanding allocations, which might point
to a memory leak.

In my experience, that is one of the most useful cases, but it can get
really tedious to screen through all pages and try to reconstruct the
stack <-> allocated/freed relationship, becoming most of the time a
daunting and slow process when we have tons of allocation/free operations. 

This patchset aims to ease that by adding a new functionality into
page_owner.
This functionality creates a new directory called 'page_owner_stacks'
under 'sys/kernel//debug' with a read-only file called 'show_stacks',
which prints out all the stacks followed by their outstanding number
of allocations (being that the times the stacktrace has allocated
but not freed yet).
This gives us a clear and a quick overview of stacks <-> allocated/free.

We take advantage of the new refcount_f field that stack_record struct
gained, and increment/decrement the stack refcount on every
__set_page_owner() (alloc operation) and __reset_page_owner (free operation)
call.

Unfortunately, we cannot use the new stackdepot api
STACK_DEPOT_FLAG_GET because it does not fulfill page_owner needs,
meaning we would have to special case things, at which point
makes more sense for page_owner to do its own {dec,inc}rementing
of the stacks.
E.g: Using STACK_DEPOT_FLAG_PUT, once the refcount reaches 0,
such stack gets evicted, so page_owner would lose information.

This patch also creates a new file called 'set_threshold' within
'page_owner_stacks' directory, and by writing a value to it, the stacks
which refcount is below such value will be filtered out.

A PoC can be found below:

 # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks.txt
 # head -40 page_owner_full_stacks.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  page_cache_ra_unbounded+0x96/0x180
  filemap_get_pages+0xfd/0x590
  filemap_read+0xcc/0x330
  blkdev_read_iter+0xb8/0x150
  vfs_read+0x285/0x320
  ksys_read+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 521



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_write+0xa5/0xe0
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 4609
...
...

 # echo 5000 > /sys/kernel/debug/page_owner_stacks/set_threshold 
 # cat /sys/kernel/debug/page_owner_stacks/show_stacks > page_owner_full_stacks_5000.txt
 # head -40 page_owner_full_stacks_5000.txt 
  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  alloc_pages_mpol+0x91/0x1f0
  folio_alloc+0x14/0x50
  filemap_alloc_folio+0xb2/0x100
  __filemap_get_folio+0x14a/0x490
  ext4_write_begin+0xbd/0x4b0 [ext4]
  generic_perform_write+0xc1/0x1e0
  ext4_buffered_write_iter+0x68/0xe0 [ext4]
  ext4_file_write_iter+0x70/0x740 [ext4]
  vfs_write+0x33d/0x420
  ksys_pwrite64+0x75/0x90
  do_syscall_64+0x80/0x160
  entry_SYSCALL_64_after_hwframe+0x6e/0x76
 stack_count: 6781



  prep_new_page+0xa9/0x120
  get_page_from_freelist+0x801/0x2210
  __alloc_pages+0x18b/0x350
  pcpu_populate_chunk+0xec/0x350
  pcpu_balance_workfn+0x2d1/0x4a0
  process_scheduled_works+0x84/0x380
  worker_thread+0x12a/0x2a0
  kthread+0xe3/0x110
  ret_from_fork+0x30/0x50
  ret_from_fork_asm+0x1b/0x30
 stack_count: 8641

Oscar Salvador (7):
  lib/stackdepot: Fix first entry having a 0-handle
  lib/stackdepot: Move stack_record struct definition into the header
  mm,page_owner: Maintain own list of stack_records structs
  mm,page_owner: Implement the tracking of the stacks count
  mm,page_owner: Display all stacks and their count
  mm,page_owner: Filter out stacks by a threshold
  mm,page_owner: Update Documentation regarding page_owner_stacks

 Documentation/mm/page_owner.rst |  45 +++++++
 include/linux/stackdepot.h      |  58 +++++++++
 lib/stackdepot.c                |  65 +++--------
 mm/page_owner.c                 | 200 +++++++++++++++++++++++++++++++-
 4 files changed, 318 insertions(+), 50 deletions(-)

-- 
2.43.0


             reply	other threads:[~2024-02-14 17:00 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-14 17:01 Oscar Salvador [this message]
2024-02-14 17:01 ` [PATCH v9 1/7] lib/stackdepot: Fix first entry having a 0-handle Oscar Salvador
2024-02-15 10:46   ` Vlastimil Babka
2024-02-14 17:01 ` [PATCH v9 2/7] lib/stackdepot: Move stack_record struct definition into the header Oscar Salvador
2024-02-15  8:16   ` Marco Elver
2024-02-15  8:22     ` Oscar Salvador
2024-02-15  9:30     ` Vlastimil Babka
2024-02-15  9:33       ` Marco Elver
2024-02-15 10:43         ` Vlastimil Babka
2024-02-14 17:01 ` [PATCH v9 3/7] mm,page_owner: Maintain own list of stack_records structs Oscar Salvador
2024-02-15 10:55   ` Vlastimil Babka
2024-02-15 12:52     ` Marco Elver
2024-02-14 17:01 ` [PATCH v9 4/7] mm,page_owner: Implement the tracking of the stacks count Oscar Salvador
2024-02-15 11:08   ` Vlastimil Babka
2024-02-15 11:57     ` Oscar Salvador
2024-02-14 17:01 ` [PATCH v9 5/7] mm,page_owner: Display all stacks and their count Oscar Salvador
2024-02-15 11:10   ` Vlastimil Babka
2024-02-15 11:58     ` Oscar Salvador
2024-02-14 17:01 ` [PATCH v9 6/7] mm,page_owner: Filter out stacks by a threshold Oscar Salvador
2024-02-15 11:12   ` Vlastimil Babka
2024-02-15 12:01     ` Oscar Salvador
2024-02-14 17:01 ` [PATCH v9 7/7] mm,page_owner: Update Documentation regarding page_owner_stacks Oscar Salvador
2024-02-15 11:13   ` Vlastimil Babka
2024-02-15 12:53     ` Marco Elver

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240214170157.17530-1-osalvador@suse.de \
    --to=osalvador@suse.de \
    --cc=akpm@linux-foundation.org \
    --cc=andreyknvl@gmail.com \
    --cc=elver@google.com \
    --cc=glider@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.