linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn
@ 2020-05-25 14:19 Konstantin Khlebnikov
  2020-05-25 14:56 ` Kirill A. Shutemov
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Konstantin Khlebnikov @ 2020-05-25 14:19 UTC (permalink / raw)
  To: linux-mm, linux-kernel, Andrew Morton; +Cc: Vlastimil Babka, Kirill A. Shutemov

Tool 'page-types' could list pages mapped by process or file cache pages,
but it shows only limited amount of state exported via procfs.

Let's employ existing helper dump_page() to reach remaining information:
writing pfn into /sys/kernel/debug/dump_page dumps state into kernel log.

# echo 0x37c43c > /sys/kernel/debug/dump_page
# dmesg | tail -6
 page:ffffcb0b0df10f00 refcount:1 mapcount:0 mapping:000000007755d3d9 index:0x30
 0xffffffffae4239e0 name:"libGeoIP.so.1.6.9"
 flags: 0x200000000020014(uptodate|lru|mappedtodisk)
 raw: 0200000000020014 ffffcb0b187fd288 ffffcb0b189e6248 ffff9528a04afe10
 raw: 0000000000000030 0000000000000000 00000001ffffffff 0000000000000000
 page dumped because: debugfs request

With CONFIG_PAGE_OWNER=y shows also stacks for last page alloc and free:

 page:ffffea0018fff480 refcount:1 mapcount:1 mapping:0000000000000000 index:0x7f9f28f62
 anon flags: 0x100000000080034(uptodate|lru|active|swapbacked)
 raw: 0100000000080034 ffffea00184140c8 ffffea0018517d88 ffff8886076ba161
 raw: 00000007f9f28f62 0000000000000000 0000000100000000 ffff888bfc79f000
 page dumped because: debugfs request
 page->mem_cgroup:ffff888bfc79f000
 page_owner tracks the page as allocated
 page last allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
  prep_new_page+0x139/0x1a0
  get_page_from_freelist+0xde9/0x14e0
  __alloc_pages_nodemask+0x18b/0x360
  alloc_pages_vma+0x7c/0x270
  __handle_mm_fault+0xd40/0x12b0
  handle_mm_fault+0xe7/0x1e0
  do_page_fault+0x2d5/0x610
  page_fault+0x2f/0x40
 page last free stack trace:
  free_pcp_prepare+0x11e/0x1c0
  free_unref_page_list+0x71/0x180
  release_pages+0x31e/0x480
  tlb_flush_mmu+0x44/0x150
  tlb_finish_mmu+0x3d/0x70
  exit_mmap+0xdd/0x1a0
  mmput+0x70/0x140
  do_exit+0x33f/0xc40
  do_group_exit+0x3a/0xa0
  __x64_sys_exit_group+0x14/0x20
  do_syscall_64+0x48/0x130
  entry_SYSCALL_64_after_hwframe+0x44/0xa9

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/admin-guide/mm/pagemap.rst |    3 +++
 Documentation/vm/page_owner.rst          |   10 ++++++++++
 mm/debug.c                               |   27 +++++++++++++++++++++++++++
 3 files changed, 40 insertions(+)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index 340a5aee9b80..663ad5490d72 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -205,3 +205,6 @@ Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
 always 12 at most architectures). Since Linux 3.11 their meaning changes
 after first clear of soft-dirty bits. Since Linux 4.2 they are used for
 flags unconditionally.
+
+Page state could be dumped into kernel log by writing pfn in text form
+into /sys/kernel/debug/dump_page.
diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst
index 0ed5ab8c7ab4..d4d4dc64c19d 100644
--- a/Documentation/vm/page_owner.rst
+++ b/Documentation/vm/page_owner.rst
@@ -88,3 +88,13 @@ Usage
 
    See the result about who allocated each page
    in the ``sorted_page_owner.txt``.
+
+Notes
+=====
+
+To lookup pages in file cache or mapped in process you could use interface
+pagemap documented in Documentation/admin-guide/mm/pagemap.rst or tool
+page-types in the tools/vm directory.
+
+Page state could be dumped into kernel log by writing pfn in text form
+into /sys/kernel/debug/dump_page.
diff --git a/mm/debug.c b/mm/debug.c
index 2189357f0987..5803f2b63d95 100644
--- a/mm/debug.c
+++ b/mm/debug.c
@@ -14,6 +14,7 @@
 #include <linux/migrate.h>
 #include <linux/page_owner.h>
 #include <linux/ctype.h>
+#include <linux/debugfs.h>
 
 #include "internal.h"
 
@@ -147,6 +148,32 @@ void dump_page(struct page *page, const char *reason)
 }
 EXPORT_SYMBOL(dump_page);
 
+#ifdef CONFIG_DEBUG_FS
+static int dump_page_set(void *data, u64 pfn)
+{
+	struct page *page;
+
+	if (!capable(CAP_SYS_ADMIN))
+		return -EPERM;
+
+	page = pfn_to_online_page(pfn);
+	if (!page)
+		return -ENXIO;
+
+	dump_page(page, "debugfs request");
+	return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(dump_page_fops, NULL, dump_page_set, "%llx\n");
+
+static int __init dump_page_debugfs(void)
+{
+	debugfs_create_file_unsafe("dump_page", 0200, NULL, NULL,
+				   &dump_page_fops);
+	return 0;
+}
+late_initcall(dump_page_debugfs);
+#endif /* CONFIG_DEBUG_FS */
+
 #ifdef CONFIG_DEBUG_VM
 
 void dump_vma(const struct vm_area_struct *vma)



^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn
  2020-05-25 14:19 [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn Konstantin Khlebnikov
@ 2020-05-25 14:56 ` Kirill A. Shutemov
  2020-05-25 15:33 ` Matthew Wilcox
  2020-05-25 15:35 ` Vlastimil Babka
  2 siblings, 0 replies; 7+ messages in thread
From: Kirill A. Shutemov @ 2020-05-25 14:56 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Kirill A. Shutemov

On Mon, May 25, 2020 at 05:19:11PM +0300, Konstantin Khlebnikov wrote:
> Tool 'page-types' could list pages mapped by process or file cache pages,
> but it shows only limited amount of state exported via procfs.
> 
> Let's employ existing helper dump_page() to reach remaining information:
> writing pfn into /sys/kernel/debug/dump_page dumps state into kernel log.
> 
> # echo 0x37c43c > /sys/kernel/debug/dump_page
> # dmesg | tail -6
>  page:ffffcb0b0df10f00 refcount:1 mapcount:0 mapping:000000007755d3d9 index:0x30
>  0xffffffffae4239e0 name:"libGeoIP.so.1.6.9"
>  flags: 0x200000000020014(uptodate|lru|mappedtodisk)
>  raw: 0200000000020014 ffffcb0b187fd288 ffffcb0b189e6248 ffff9528a04afe10
>  raw: 0000000000000030 0000000000000000 00000001ffffffff 0000000000000000
>  page dumped because: debugfs request
> 
> With CONFIG_PAGE_OWNER=y shows also stacks for last page alloc and free:
> 
>  page:ffffea0018fff480 refcount:1 mapcount:1 mapping:0000000000000000 index:0x7f9f28f62
>  anon flags: 0x100000000080034(uptodate|lru|active|swapbacked)
>  raw: 0100000000080034 ffffea00184140c8 ffffea0018517d88 ffff8886076ba161
>  raw: 00000007f9f28f62 0000000000000000 0000000100000000 ffff888bfc79f000
>  page dumped because: debugfs request
>  page->mem_cgroup:ffff888bfc79f000
>  page_owner tracks the page as allocated
>  page last allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
>   prep_new_page+0x139/0x1a0
>   get_page_from_freelist+0xde9/0x14e0
>   __alloc_pages_nodemask+0x18b/0x360
>   alloc_pages_vma+0x7c/0x270
>   __handle_mm_fault+0xd40/0x12b0
>   handle_mm_fault+0xe7/0x1e0
>   do_page_fault+0x2d5/0x610
>   page_fault+0x2f/0x40
>  page last free stack trace:
>   free_pcp_prepare+0x11e/0x1c0
>   free_unref_page_list+0x71/0x180
>   release_pages+0x31e/0x480
>   tlb_flush_mmu+0x44/0x150
>   tlb_finish_mmu+0x3d/0x70
>   exit_mmap+0xdd/0x1a0
>   mmput+0x70/0x140
>   do_exit+0x33f/0xc40
>   do_group_exit+0x3a/0xa0
>   __x64_sys_exit_group+0x14/0x20
>   do_syscall_64+0x48/0x130
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Looks useful to me:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn
  2020-05-25 14:19 [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn Konstantin Khlebnikov
  2020-05-25 14:56 ` Kirill A. Shutemov
@ 2020-05-25 15:33 ` Matthew Wilcox
  2020-05-25 16:03   ` Konstantin Khlebnikov
  2020-05-25 15:35 ` Vlastimil Babka
  2 siblings, 1 reply; 7+ messages in thread
From: Matthew Wilcox @ 2020-05-25 15:33 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Kirill A. Shutemov

On Mon, May 25, 2020 at 05:19:11PM +0300, Konstantin Khlebnikov wrote:
> Tool 'page-types' could list pages mapped by process or file cache pages,
> but it shows only limited amount of state exported via procfs.
> 
> Let's employ existing helper dump_page() to reach remaining information:
> writing pfn into /sys/kernel/debug/dump_page dumps state into kernel log.
> 
> # echo 0x37c43c > /sys/kernel/debug/dump_page
> # dmesg | tail -6
>  page:ffffcb0b0df10f00 refcount:1 mapcount:0 mapping:000000007755d3d9 index:0x30
>  0xffffffffae4239e0 name:"libGeoIP.so.1.6.9"
>  flags: 0x200000000020014(uptodate|lru|mappedtodisk)
>  raw: 0200000000020014 ffffcb0b187fd288 ffffcb0b189e6248 ffff9528a04afe10
>  raw: 0000000000000030 0000000000000000 00000001ffffffff 0000000000000000
>  page dumped because: debugfs request

This makes me deeply uncomfortable.  We're using %px, and %lx
(for the 'raw' lines) so we actually get to see kernel addresses.
We've rationalised this in the past as being acceptable because you're
already in an "assert triggered" kind of situation.  Now you're adding
a way for any process with CAP_SYS_ADMIN to get kernel addresses dumped
into the syslog.

I think we need a different function for this, or we need to re-audit
dump_page() for exposing kernel pointers, and not expose the raw data
in struct page.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn
  2020-05-25 14:19 [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn Konstantin Khlebnikov
  2020-05-25 14:56 ` Kirill A. Shutemov
  2020-05-25 15:33 ` Matthew Wilcox
@ 2020-05-25 15:35 ` Vlastimil Babka
  2020-05-25 15:58   ` Konstantin Khlebnikov
  2 siblings, 1 reply; 7+ messages in thread
From: Vlastimil Babka @ 2020-05-25 15:35 UTC (permalink / raw)
  To: Konstantin Khlebnikov, linux-mm, linux-kernel, Andrew Morton
  Cc: Kirill A. Shutemov, Joonsoo Kim

On 5/25/20 4:19 PM, Konstantin Khlebnikov wrote:
> Tool 'page-types' could list pages mapped by process or file cache pages,
> but it shows only limited amount of state exported via procfs.
> 
> Let's employ existing helper dump_page() to reach remaining information:
> writing pfn into /sys/kernel/debug/dump_page dumps state into kernel log.

Yeah that's indeed useful, however I'm less sure if kernel log is the proper way
to extract the data. For example IIRC with the page_owner file can "seek to pfn"
to dump it, although that makes it somewhat harder to use.

Or we could write pfn to one file and read the dump from another one? But that's
not atomic.

Perhaps if we could do something like "cat /sys/kernel/debug/dump_page/<pfn>"
without all the pfns being actually listed in the dump_page directory with "ls"?
Is that possible?

> # echo 0x37c43c > /sys/kernel/debug/dump_page
> # dmesg | tail -6
>  page:ffffcb0b0df10f00 refcount:1 mapcount:0 mapping:000000007755d3d9 index:0x30
>  0xffffffffae4239e0 name:"libGeoIP.so.1.6.9"
>  flags: 0x200000000020014(uptodate|lru|mappedtodisk)
>  raw: 0200000000020014 ffffcb0b187fd288 ffffcb0b189e6248 ffff9528a04afe10
>  raw: 0000000000000030 0000000000000000 00000001ffffffff 0000000000000000
>  page dumped because: debugfs request
> 
> With CONFIG_PAGE_OWNER=y shows also stacks for last page alloc and free:
> 
>  page:ffffea0018fff480 refcount:1 mapcount:1 mapping:0000000000000000 index:0x7f9f28f62
>  anon flags: 0x100000000080034(uptodate|lru|active|swapbacked)
>  raw: 0100000000080034 ffffea00184140c8 ffffea0018517d88 ffff8886076ba161
>  raw: 00000007f9f28f62 0000000000000000 0000000100000000 ffff888bfc79f000
>  page dumped because: debugfs request
>  page->mem_cgroup:ffff888bfc79f000
>  page_owner tracks the page as allocated
>  page last allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
>   prep_new_page+0x139/0x1a0
>   get_page_from_freelist+0xde9/0x14e0
>   __alloc_pages_nodemask+0x18b/0x360
>   alloc_pages_vma+0x7c/0x270
>   __handle_mm_fault+0xd40/0x12b0
>   handle_mm_fault+0xe7/0x1e0
>   do_page_fault+0x2d5/0x610
>   page_fault+0x2f/0x40
>  page last free stack trace:
>   free_pcp_prepare+0x11e/0x1c0
>   free_unref_page_list+0x71/0x180
>   release_pages+0x31e/0x480
>   tlb_flush_mmu+0x44/0x150
>   tlb_finish_mmu+0x3d/0x70
>   exit_mmap+0xdd/0x1a0
>   mmput+0x70/0x140
>   do_exit+0x33f/0xc40
>   do_group_exit+0x3a/0xa0
>   __x64_sys_exit_group+0x14/0x20
>   do_syscall_64+0x48/0x130
>   entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> ---
>  Documentation/admin-guide/mm/pagemap.rst |    3 +++
>  Documentation/vm/page_owner.rst          |   10 ++++++++++
>  mm/debug.c                               |   27 +++++++++++++++++++++++++++
>  3 files changed, 40 insertions(+)
> 
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index 340a5aee9b80..663ad5490d72 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -205,3 +205,6 @@ Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
>  always 12 at most architectures). Since Linux 3.11 their meaning changes
>  after first clear of soft-dirty bits. Since Linux 4.2 they are used for
>  flags unconditionally.
> +
> +Page state could be dumped into kernel log by writing pfn in text form
> +into /sys/kernel/debug/dump_page.
> diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst
> index 0ed5ab8c7ab4..d4d4dc64c19d 100644
> --- a/Documentation/vm/page_owner.rst
> +++ b/Documentation/vm/page_owner.rst
> @@ -88,3 +88,13 @@ Usage
>  
>     See the result about who allocated each page
>     in the ``sorted_page_owner.txt``.
> +
> +Notes
> +=====
> +
> +To lookup pages in file cache or mapped in process you could use interface
> +pagemap documented in Documentation/admin-guide/mm/pagemap.rst or tool
> +page-types in the tools/vm directory.
> +
> +Page state could be dumped into kernel log by writing pfn in text form
> +into /sys/kernel/debug/dump_page.
> diff --git a/mm/debug.c b/mm/debug.c
> index 2189357f0987..5803f2b63d95 100644
> --- a/mm/debug.c
> +++ b/mm/debug.c
> @@ -14,6 +14,7 @@
>  #include <linux/migrate.h>
>  #include <linux/page_owner.h>
>  #include <linux/ctype.h>
> +#include <linux/debugfs.h>
>  
>  #include "internal.h"
>  
> @@ -147,6 +148,32 @@ void dump_page(struct page *page, const char *reason)
>  }
>  EXPORT_SYMBOL(dump_page);
>  
> +#ifdef CONFIG_DEBUG_FS
> +static int dump_page_set(void *data, u64 pfn)
> +{
> +	struct page *page;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	page = pfn_to_online_page(pfn);
> +	if (!page)
> +		return -ENXIO;
> +
> +	dump_page(page, "debugfs request");
> +	return 0;
> +}
> +DEFINE_DEBUGFS_ATTRIBUTE(dump_page_fops, NULL, dump_page_set, "%llx\n");
> +
> +static int __init dump_page_debugfs(void)
> +{
> +	debugfs_create_file_unsafe("dump_page", 0200, NULL, NULL,
> +				   &dump_page_fops);
> +	return 0;
> +}
> +late_initcall(dump_page_debugfs);
> +#endif /* CONFIG_DEBUG_FS */
> +
>  #ifdef CONFIG_DEBUG_VM
>  
>  void dump_vma(const struct vm_area_struct *vma)
> 



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn
  2020-05-25 15:35 ` Vlastimil Babka
@ 2020-05-25 15:58   ` Konstantin Khlebnikov
  0 siblings, 0 replies; 7+ messages in thread
From: Konstantin Khlebnikov @ 2020-05-25 15:58 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, linux-kernel, Andrew Morton
  Cc: Kirill A. Shutemov, Joonsoo Kim

On 25/05/2020 18.35, Vlastimil Babka wrote:
> On 5/25/20 4:19 PM, Konstantin Khlebnikov wrote:
>> Tool 'page-types' could list pages mapped by process or file cache pages,
>> but it shows only limited amount of state exported via procfs.
>>
>> Let's employ existing helper dump_page() to reach remaining information:
>> writing pfn into /sys/kernel/debug/dump_page dumps state into kernel log.
> 
> Yeah that's indeed useful, however I'm less sure if kernel log is the proper way
> to extract the data. For example IIRC with the page_owner file can "seek to pfn"
> to dump it, although that makes it somewhat harder to use.
> 
> Or we could write pfn to one file and read the dump from another one? But that's
> not atomic.
> 
> Perhaps if we could do something like "cat /sys/kernel/debug/dump_page/<pfn>"
> without all the pfns being actually listed in the dump_page directory with "ls"?
> Is that possible?

Too much code for me. =)

This could be kind of ftrace tracer which iterates over pages and dumps them,
but anyway looks ridiculously overengineered.

This one hack connects existing 'pagemap' with existing 'dump_page', so almost free.

For complicated cases there is gdb and special tool drgn https://github.com/osandov/drgn

Writing script which parses all that stuff from kernel log isn't big deal either.
I have one with 100+ lines regexp for all kinds of kernel splats.
Will publish when find time for polishing.

> 
>> # echo 0x37c43c > /sys/kernel/debug/dump_page
>> # dmesg | tail -6
>>   page:ffffcb0b0df10f00 refcount:1 mapcount:0 mapping:000000007755d3d9 index:0x30
>>   0xffffffffae4239e0 name:"libGeoIP.so.1.6.9"
>>   flags: 0x200000000020014(uptodate|lru|mappedtodisk)
>>   raw: 0200000000020014 ffffcb0b187fd288 ffffcb0b189e6248 ffff9528a04afe10
>>   raw: 0000000000000030 0000000000000000 00000001ffffffff 0000000000000000
>>   page dumped because: debugfs request
>>
>> With CONFIG_PAGE_OWNER=y shows also stacks for last page alloc and free:
>>
>>   page:ffffea0018fff480 refcount:1 mapcount:1 mapping:0000000000000000 index:0x7f9f28f62
>>   anon flags: 0x100000000080034(uptodate|lru|active|swapbacked)
>>   raw: 0100000000080034 ffffea00184140c8 ffffea0018517d88 ffff8886076ba161
>>   raw: 00000007f9f28f62 0000000000000000 0000000100000000 ffff888bfc79f000
>>   page dumped because: debugfs request
>>   page->mem_cgroup:ffff888bfc79f000
>>   page_owner tracks the page as allocated
>>   page last allocated via order 0, migratetype Movable, gfp_mask 0x100dca(GFP_HIGHUSER_MOVABLE|__GFP_ZERO)
>>    prep_new_page+0x139/0x1a0
>>    get_page_from_freelist+0xde9/0x14e0
>>    __alloc_pages_nodemask+0x18b/0x360
>>    alloc_pages_vma+0x7c/0x270
>>    __handle_mm_fault+0xd40/0x12b0
>>    handle_mm_fault+0xe7/0x1e0
>>    do_page_fault+0x2d5/0x610
>>    page_fault+0x2f/0x40
>>   page last free stack trace:
>>    free_pcp_prepare+0x11e/0x1c0
>>    free_unref_page_list+0x71/0x180
>>    release_pages+0x31e/0x480
>>    tlb_flush_mmu+0x44/0x150
>>    tlb_finish_mmu+0x3d/0x70
>>    exit_mmap+0xdd/0x1a0
>>    mmput+0x70/0x140
>>    do_exit+0x33f/0xc40
>>    do_group_exit+0x3a/0xa0
>>    __x64_sys_exit_group+0x14/0x20
>>    do_syscall_64+0x48/0x130
>>    entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> ---
>>   Documentation/admin-guide/mm/pagemap.rst |    3 +++
>>   Documentation/vm/page_owner.rst          |   10 ++++++++++
>>   mm/debug.c                               |   27 +++++++++++++++++++++++++++
>>   3 files changed, 40 insertions(+)
>>
>> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
>> index 340a5aee9b80..663ad5490d72 100644
>> --- a/Documentation/admin-guide/mm/pagemap.rst
>> +++ b/Documentation/admin-guide/mm/pagemap.rst
>> @@ -205,3 +205,6 @@ Before Linux 3.11 pagemap bits 55-60 were used for "page-shift" (which is
>>   always 12 at most architectures). Since Linux 3.11 their meaning changes
>>   after first clear of soft-dirty bits. Since Linux 4.2 they are used for
>>   flags unconditionally.
>> +
>> +Page state could be dumped into kernel log by writing pfn in text form
>> +into /sys/kernel/debug/dump_page.
>> diff --git a/Documentation/vm/page_owner.rst b/Documentation/vm/page_owner.rst
>> index 0ed5ab8c7ab4..d4d4dc64c19d 100644
>> --- a/Documentation/vm/page_owner.rst
>> +++ b/Documentation/vm/page_owner.rst
>> @@ -88,3 +88,13 @@ Usage
>>   
>>      See the result about who allocated each page
>>      in the ``sorted_page_owner.txt``.
>> +
>> +Notes
>> +=====
>> +
>> +To lookup pages in file cache or mapped in process you could use interface
>> +pagemap documented in Documentation/admin-guide/mm/pagemap.rst or tool
>> +page-types in the tools/vm directory.
>> +
>> +Page state could be dumped into kernel log by writing pfn in text form
>> +into /sys/kernel/debug/dump_page.
>> diff --git a/mm/debug.c b/mm/debug.c
>> index 2189357f0987..5803f2b63d95 100644
>> --- a/mm/debug.c
>> +++ b/mm/debug.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/migrate.h>
>>   #include <linux/page_owner.h>
>>   #include <linux/ctype.h>
>> +#include <linux/debugfs.h>
>>   
>>   #include "internal.h"
>>   
>> @@ -147,6 +148,32 @@ void dump_page(struct page *page, const char *reason)
>>   }
>>   EXPORT_SYMBOL(dump_page);
>>   
>> +#ifdef CONFIG_DEBUG_FS
>> +static int dump_page_set(void *data, u64 pfn)
>> +{
>> +	struct page *page;
>> +
>> +	if (!capable(CAP_SYS_ADMIN))
>> +		return -EPERM;
>> +
>> +	page = pfn_to_online_page(pfn);
>> +	if (!page)
>> +		return -ENXIO;
>> +
>> +	dump_page(page, "debugfs request");
>> +	return 0;
>> +}
>> +DEFINE_DEBUGFS_ATTRIBUTE(dump_page_fops, NULL, dump_page_set, "%llx\n");
>> +
>> +static int __init dump_page_debugfs(void)
>> +{
>> +	debugfs_create_file_unsafe("dump_page", 0200, NULL, NULL,
>> +				   &dump_page_fops);
>> +	return 0;
>> +}
>> +late_initcall(dump_page_debugfs);
>> +#endif /* CONFIG_DEBUG_FS */
>> +
>>   #ifdef CONFIG_DEBUG_VM
>>   
>>   void dump_vma(const struct vm_area_struct *vma)
>>
> 


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn
  2020-05-25 15:33 ` Matthew Wilcox
@ 2020-05-25 16:03   ` Konstantin Khlebnikov
  2020-05-25 16:05     ` Konstantin Khlebnikov
  0 siblings, 1 reply; 7+ messages in thread
From: Konstantin Khlebnikov @ 2020-05-25 16:03 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Kirill A. Shutemov



On 25/05/2020 18.33, Matthew Wilcox wrote:
> On Mon, May 25, 2020 at 05:19:11PM +0300, Konstantin Khlebnikov wrote:
>> Tool 'page-types' could list pages mapped by process or file cache pages,
>> but it shows only limited amount of state exported via procfs.
>>
>> Let's employ existing helper dump_page() to reach remaining information:
>> writing pfn into /sys/kernel/debug/dump_page dumps state into kernel log.
>>
>> # echo 0x37c43c > /sys/kernel/debug/dump_page
>> # dmesg | tail -6
>>   page:ffffcb0b0df10f00 refcount:1 mapcount:0 mapping:000000007755d3d9 index:0x30
>>   0xffffffffae4239e0 name:"libGeoIP.so.1.6.9"
>>   flags: 0x200000000020014(uptodate|lru|mappedtodisk)
>>   raw: 0200000000020014 ffffcb0b187fd288 ffffcb0b189e6248 ffff9528a04afe10
>>   raw: 0000000000000030 0000000000000000 00000001ffffffff 0000000000000000
>>   page dumped because: debugfs request
> 
> This makes me deeply uncomfortable.  We're using %px, and %lx
> (for the 'raw' lines) so we actually get to see kernel addresses.
> We've rationalised this in the past as being acceptable because you're
> already in an "assert triggered" kind of situation.  Now you're adding
> a way for any process with CAP_SYS_ADMIN to get kernel addresses dumped
> into the syslog.
> 
> I think we need a different function for this, or we need to re-audit
> dump_page() for exposing kernel pointers, and not expose the raw data
> in struct page.
> 

It's better to add switch for disabling paranoia if bad things happening.
I.e. keep everything safe by default (or whatever sysctl/config set) and
flip the switch when needed.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn
  2020-05-25 16:03   ` Konstantin Khlebnikov
@ 2020-05-25 16:05     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 7+ messages in thread
From: Konstantin Khlebnikov @ 2020-05-25 16:05 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: linux-mm, linux-kernel, Andrew Morton, Vlastimil Babka,
	Kirill A. Shutemov

b

On 25/05/2020 19.03, Konstantin Khlebnikov wrote:
> 
> 
> On 25/05/2020 18.33, Matthew Wilcox wrote:
>> On Mon, May 25, 2020 at 05:19:11PM +0300, Konstantin Khlebnikov wrote:
>>> Tool 'page-types' could list pages mapped by process or file cache pages,
>>> but it shows only limited amount of state exported via procfs.
>>>
>>> Let's employ existing helper dump_page() to reach remaining information:
>>> writing pfn into /sys/kernel/debug/dump_page dumps state into kernel log.
>>>
>>> # echo 0x37c43c > /sys/kernel/debug/dump_page
>>> # dmesg | tail -6
>>>   page:ffffcb0b0df10f00 refcount:1 mapcount:0 mapping:000000007755d3d9 index:0x30
>>>   0xffffffffae4239e0 name:"libGeoIP.so.1.6.9"
>>>   flags: 0x200000000020014(uptodate|lru|mappedtodisk)
>>>   raw: 0200000000020014 ffffcb0b187fd288 ffffcb0b189e6248 ffff9528a04afe10
>>>   raw: 0000000000000030 0000000000000000 00000001ffffffff 0000000000000000
>>>   page dumped because: debugfs request
>>
>> This makes me deeply uncomfortable.  We're using %px, and %lx
>> (for the 'raw' lines) so we actually get to see kernel addresses.
>> We've rationalised this in the past as being acceptable because you're
>> already in an "assert triggered" kind of situation.  Now you're adding
>> a way for any process with CAP_SYS_ADMIN to get kernel addresses dumped
>> into the syslog.
>>
>> I think we need a different function for this, or we need to re-audit
>> dump_page() for exposing kernel pointers, and not expose the raw data
>> in struct page.
>>
> 
> It's better to add switch for disabling paranoia if bad things happening.
> I.e. keep everything safe by default (or whatever sysctl/config set) and
> flip the switch when needed.

Also I'm ok to seal this interface if kernel in mode of serious paranoia.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-05-25 16:05 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-25 14:19 [PATCH] mm: dump_page: add debugfs file for dumping page state by pfn Konstantin Khlebnikov
2020-05-25 14:56 ` Kirill A. Shutemov
2020-05-25 15:33 ` Matthew Wilcox
2020-05-25 16:03   ` Konstantin Khlebnikov
2020-05-25 16:05     ` Konstantin Khlebnikov
2020-05-25 15:35 ` Vlastimil Babka
2020-05-25 15:58   ` Konstantin Khlebnikov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).