linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/2] Report on physically contiguous memory in smaps
@ 2023-06-13 16:09 Ryan Roberts
  2023-06-13 16:09 ` [PATCH v1 1/2] mm: /proc/pid/smaps: Report large folio mappings Ryan Roberts
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Ryan Roberts @ 2023-06-13 16:09 UTC (permalink / raw)
  To: Jonathan Corbet, Andrew Morton, Matthew Wilcox (Oracle), Yu Zhao
  Cc: Ryan Roberts, linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	linux-arm-kernel

Hi All,

I thought I would try my luck with this pair of patches...

This series adds new entries to /proc/pid/smaps[_rollup] to report on physically
contiguous runs of memory. The first patch reports on the sizes of the runs by
binning into power-of-2 blocks and reporting how much memory is in which bin.
The second patch reports on how much of the memory is contpte-mapped in the page
table (this is a hint that arm64 supports to tell the HW that a range of ptes
map physically contiguous memory).

With filesystems now supporting large folios in the page cache, this provides a
useful way to see what sizes are actually getting mapped. And with the prospect
of large folios for anonymous memory and contpte mapping for conformant large
folios on the horizon, this reporting will become useful to aid application
performance optimization.

Perhaps I should really be submitting these patches as part of my large anon
folios and contpte sets (which I plan to post soon), but given this touches
the user ABI, I thought it was sensible to post it early and separately to get
feedback.

It would specifically be good to get feedback on:

  - The exact set of new fields depend on the system that its being run on. Does
    this cause problem for compat? (specifically the bins are determined based
    on PAGE_SIZE and PMD_SIZE).
  - The ContPTEMapped field is effectively arm64-specific. What is the preferred
    way to handle arch-specific values if not here?

The patches are based on mm-unstable (dd69ce3382a2). Some minor conflicts will
need to be resolved if rebasing to Linus's tree. I have a branch at [1]. I've
tested on Ampere Altra (arm64) only.

[1] https://gitlab.arm.com/linux-arm/linux-rr/-/tree/features/granule_perf/folio_smap-lkml_v1

Thanks,
Ryan

Ryan Roberts (2):
  mm: /proc/pid/smaps: Report large folio mappings
  mm: /proc/pid/smaps: Report contpte mappings

 Documentation/filesystems/proc.rst |  31 +++++++
 fs/proc/task_mmu.c                 | 134 ++++++++++++++++++++++++++++-
 2 files changed, 161 insertions(+), 4 deletions(-)

--
2.25.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v1 1/2] mm: /proc/pid/smaps: Report large folio mappings
  2023-06-13 16:09 [PATCH v1 0/2] Report on physically contiguous memory in smaps Ryan Roberts
@ 2023-06-13 16:09 ` Ryan Roberts
  2023-06-13 16:09 ` [PATCH v1 2/2] mm: /proc/pid/smaps: Report contpte mappings Ryan Roberts
  2023-06-13 18:44 ` [PATCH v1 0/2] Report on physically contiguous memory in smaps Yu Zhao
  2 siblings, 0 replies; 5+ messages in thread
From: Ryan Roberts @ 2023-06-13 16:09 UTC (permalink / raw)
  To: Jonathan Corbet, Andrew Morton, Matthew Wilcox (Oracle), Yu Zhao
  Cc: Ryan Roberts, linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	linux-arm-kernel

With the addition of large folios for page cache pages, it is useful to
see which orders of folios are being mapped into a process.
Additionally, with planned future improvements to allocate large folios
for anonymous memory this will become even more useful. Visibility will
help to tune performance.

New fields "AnonContXXX" and "FileContXXX" indicate physically
contiguous runs of memory, binned into power-of-2 sizes starting with
the page size and ending with the pmd size. Therefore the exact set of
keys will vary by platform. It only includes pte-mapped memory and
reports on anonymous and file-backed memory separately.

Rollup Example:

aaaac9960000-ffffddfdd000 ---p 00000000 00:00 0                 [rollup]
Rss:               10852 kB
...
AnonCont4K:         3480 kB
AnonCont8K:            0 kB
AnonCont16K:           0 kB
AnonCont32K:           0 kB
AnonCont64K:           0 kB
AnonCont128K:          0 kB
AnonCont256K:          0 kB
AnonCont512K:          0 kB
AnonCont1M:            0 kB
AnonCont2M:            0 kB
FileCont4K:         3060 kB
FileCont8K:           40 kB
FileCont16K:        3792 kB
FileCont32K:         160 kB
FileCont64K:         320 kB
FileCont128K:          0 kB
FileCont256K:          0 kB
FileCont512K:          0 kB
FileCont1M:            0 kB
FileCont2M:            0 kB

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 Documentation/filesystems/proc.rst |  26 +++++++
 fs/proc/task_mmu.c                 | 115 +++++++++++++++++++++++++++++
 2 files changed, 141 insertions(+)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 7897a7dafcbc..5fa3f638848d 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -471,6 +471,26 @@ Memory Area, or VMA) there is a series of lines such as the following::
     KernelPageSize:        4 kB
     MMUPageSize:           4 kB
     Locked:                0 kB
+    AnonCont4K:            0 kB
+    AnonCont8K:            0 kB
+    AnonCont16K:           0 kB
+    AnonCont32K:           0 kB
+    AnonCont64K:           0 kB
+    AnonCont128K:          0 kB
+    AnonCont256K:          0 kB
+    AnonCont512K:          0 kB
+    AnonCont1M:            0 kB
+    AnonCont2M:            0 kB
+    FileCont4K:          348 kB
+    FileCont8K:            0 kB
+    FileCont16K:          32 kB
+    FileCont32K:           0 kB
+    FileCont64K:         512 kB
+    FileCont128K:          0 kB
+    FileCont256K:          0 kB
+    FileCont512K:          0 kB
+    FileCont1M:            0 kB
+    FileCont2M:            0 kB
     THPeligible:           0
     VmFlags: rd ex mr mw me dw
 
@@ -524,6 +544,12 @@ replaced by copy-on-write) part of the underlying shmem object out on swap.
 does not take into account swapped out page of underlying shmem objects.
 "Locked" indicates whether the mapping is locked in memory or not.
 
+"AnonContXXX" and "FileContXXX" indicate physically contiguous runs of memory,
+binned into power-of-2 sizes starting with the page size and ending with the
+pmd size. Therefore the exact set of keys will vary by platform. It only
+includes pte-mapped memory and reports on anonymous and file-backed memory
+separately.
+
 "THPeligible" indicates whether the mapping is eligible for allocating THP
 pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
 It just shows the current status.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 507cd4e59d07..29fee5b7b00b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -397,6 +397,49 @@ const struct file_operations proc_pid_maps_operations = {
 #define PSS_SHIFT 12
 
 #ifdef CONFIG_PROC_PAGE_MONITOR
+
+#define CONT_ORDER_MAX		(PMD_SHIFT-PAGE_SHIFT)
+#define CONT_LABEL_FIELD_SIZE	8
+#define CONT_LABEL_BUF_SIZE	32
+
+static char *cont_label(int order, char buf[CONT_LABEL_BUF_SIZE])
+{
+	unsigned long size = ((1UL << order) * PAGE_SIZE) >> 10;
+	char suffix = 'K';
+	int count;
+
+	if (size >= SZ_1K) {
+		size >>= 10;
+		suffix = 'M';
+	}
+
+	if (size >= SZ_1K) {
+		size >>= 10;
+		suffix = 'G';
+	}
+
+	count = snprintf(buf, CONT_LABEL_BUF_SIZE, "%lu%c:", size, suffix);
+
+	/*
+	 * If the string is less than the field size, pad it with spaces so that
+	 * the values line up in smaps.
+	 */
+	if (count < CONT_LABEL_FIELD_SIZE) {
+		memset(&buf[count], ' ', CONT_LABEL_FIELD_SIZE - count);
+		buf[CONT_LABEL_FIELD_SIZE] = '\0';
+	}
+
+	return buf;
+}
+
+struct cont_accumulator {
+	bool anon;
+	unsigned long folio_start_pfn;
+	unsigned long folio_end_pfn;
+	unsigned long next_pfn;
+	unsigned long nrpages;
+};
+
 struct mem_size_stats {
 	unsigned long resident;
 	unsigned long shared_clean;
@@ -419,8 +462,60 @@ struct mem_size_stats {
 	u64 pss_dirty;
 	u64 pss_locked;
 	u64 swap_pss;
+	unsigned long anon_cont[CONT_ORDER_MAX + 1];
+	unsigned long file_cont[CONT_ORDER_MAX + 1];
+	struct cont_accumulator cacc;
 };
 
+static void cacc_init(struct mem_size_stats *mss)
+{
+	struct cont_accumulator *cacc = &mss->cacc;
+
+	cacc->next_pfn = -1;
+	cacc->nrpages = 0;
+}
+
+static void cacc_drain(struct mem_size_stats *mss)
+{
+	struct cont_accumulator *cacc = &mss->cacc;
+	unsigned long *cont = cacc->anon ? mss->anon_cont : mss->file_cont;
+	unsigned long order;
+	unsigned long nrpages;
+
+	while (cacc->nrpages > 0) {
+		order = ilog2(cacc->nrpages);
+		nrpages = 1UL << order;
+		cacc->nrpages -= nrpages;
+		cont[order] += nrpages * PAGE_SIZE;
+	}
+}
+
+static void cacc_accumulate(struct mem_size_stats *mss, struct page *page)
+{
+	struct cont_accumulator *cacc = &mss->cacc;
+	unsigned long pfn = page_to_pfn(page);
+	bool anon = PageAnon(page);
+	struct folio *folio;
+	unsigned long start_pfn;
+
+	if (cacc->next_pfn == pfn && cacc->anon == anon &&
+	    pfn >= cacc->folio_start_pfn && pfn < cacc->folio_end_pfn) {
+		cacc->next_pfn++;
+		cacc->nrpages++;
+	} else {
+		cacc_drain(mss);
+
+		folio = page_folio(page);
+		start_pfn = page_to_pfn(&folio->page);
+
+		cacc->anon = anon;
+		cacc->folio_start_pfn = start_pfn;
+		cacc->folio_end_pfn = start_pfn + folio_nr_pages(folio);
+		cacc->next_pfn = pfn + 1;
+		cacc->nrpages = 1;
+	}
+}
+
 static void smaps_page_accumulate(struct mem_size_stats *mss,
 		struct page *page, unsigned long size, unsigned long pss,
 		bool dirty, bool locked, bool private)
@@ -473,6 +568,10 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	if (young || page_is_young(page) || PageReferenced(page))
 		mss->referenced += size;
 
+	/* Accumulate physically contiguous map size information. */
+	if (!compound)
+		cacc_accumulate(mss, page);
+
 	/*
 	 * Then accumulate quantities that may depend on sharing, or that may
 	 * differ page-by-page.
@@ -622,6 +721,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 			   struct mm_walk *walk)
 {
 	struct vm_area_struct *vma = walk->vma;
+	struct mem_size_stats *mss = walk->private;
 	pte_t *pte;
 	spinlock_t *ptl;
 
@@ -632,6 +732,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 		goto out;
 	}
 
+	cacc_init(mss);
 	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
 	if (!pte) {
 		walk->action = ACTION_AGAIN;
@@ -640,6 +741,7 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	for (; addr != end; pte++, addr += PAGE_SIZE)
 		smaps_pte_entry(pte, addr, walk);
 	pte_unmap_unlock(pte - 1, ptl);
+	cacc_drain(mss);
 out:
 	cond_resched();
 	return 0;
@@ -816,6 +918,9 @@ static void smap_gather_stats(struct vm_area_struct *vma,
 static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss,
 	bool rollup_mode)
 {
+	int i;
+	char label[CONT_LABEL_BUF_SIZE];
+
 	SEQ_PUT_DEC("Rss:            ", mss->resident);
 	SEQ_PUT_DEC(" kB\nPss:            ", mss->pss >> PSS_SHIFT);
 	SEQ_PUT_DEC(" kB\nPss_Dirty:      ", mss->pss_dirty >> PSS_SHIFT);
@@ -849,6 +954,16 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss,
 					mss->swap_pss >> PSS_SHIFT);
 	SEQ_PUT_DEC(" kB\nLocked:         ",
 					mss->pss_locked >> PSS_SHIFT);
+	for (i = 0; i <= CONT_ORDER_MAX; i++) {
+		seq_printf(m, " kB\nAnonCont%s%8lu",
+					cont_label(i, label),
+					mss->anon_cont[i] >> 10);
+	}
+	for (i = 0; i <= CONT_ORDER_MAX; i++) {
+		seq_printf(m, " kB\nFileCont%s%8lu",
+					cont_label(i, label),
+					mss->file_cont[i] >> 10);
+	}
 	seq_puts(m, " kB\n");
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v1 2/2] mm: /proc/pid/smaps: Report contpte mappings
  2023-06-13 16:09 [PATCH v1 0/2] Report on physically contiguous memory in smaps Ryan Roberts
  2023-06-13 16:09 ` [PATCH v1 1/2] mm: /proc/pid/smaps: Report large folio mappings Ryan Roberts
@ 2023-06-13 16:09 ` Ryan Roberts
  2023-06-13 18:44 ` [PATCH v1 0/2] Report on physically contiguous memory in smaps Yu Zhao
  2 siblings, 0 replies; 5+ messages in thread
From: Ryan Roberts @ 2023-06-13 16:09 UTC (permalink / raw)
  To: Jonathan Corbet, Andrew Morton, Matthew Wilcox (Oracle), Yu Zhao
  Cc: Ryan Roberts, linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	linux-arm-kernel

arm64 intends to start using its "contpte" bit in pgtables more
frequently, and therefore it would be useful to know how well utilised
it is in order to help diagnose and fix performance issues.

Add "ContPTEMapped" field, which shows how much of the rss is mapped
using contptes. For architectures that do not support contpte mappings
(as determined by pte_cont() not being defined) the field will be
suppressed.

Rollup Example:

aaaac5150000-ffffccf07000 ---p 00000000 00:00 0                 [rollup]
Rss:               11504 kB
...
ContPTEMapped:      6848 kB

Signed-off-by: Ryan Roberts <ryan.roberts@arm.com>
---
 Documentation/filesystems/proc.rst |  5 +++++
 fs/proc/task_mmu.c                 | 19 +++++++++++++++----
 2 files changed, 20 insertions(+), 4 deletions(-)

diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
index 5fa3f638848d..726951374c57 100644
--- a/Documentation/filesystems/proc.rst
+++ b/Documentation/filesystems/proc.rst
@@ -491,6 +491,7 @@ Memory Area, or VMA) there is a series of lines such as the following::
     FileCont512K:          0 kB
     FileCont1M:            0 kB
     FileCont2M:            0 kB
+    ContPTEMapped:         0 kB
     THPeligible:           0
     VmFlags: rd ex mr mw me dw
 
@@ -550,6 +551,10 @@ pmd size. Therefore the exact set of keys will vary by platform. It only
 includes pte-mapped memory and reports on anonymous and file-backed memory
 separately.
 
+"ContPTEMapped" is only present for architectures that support indicating a set
+of contiguously mapped ptes in their page tables. In this case, it indicates
+how much of the memory is currently mapped using contpte mappings.
+
 "THPeligible" indicates whether the mapping is eligible for allocating THP
 pages as well as the THP is PMD mappable or not - 1 if true, 0 otherwise.
 It just shows the current status.
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 29fee5b7b00b..0ebd6eb7efd4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -465,6 +465,7 @@ struct mem_size_stats {
 	unsigned long anon_cont[CONT_ORDER_MAX + 1];
 	unsigned long file_cont[CONT_ORDER_MAX + 1];
 	struct cont_accumulator cacc;
+	unsigned long contpte_mapped;
 };
 
 static void cacc_init(struct mem_size_stats *mss)
@@ -548,7 +549,7 @@ static void smaps_page_accumulate(struct mem_size_stats *mss,
 
 static void smaps_account(struct mem_size_stats *mss, struct page *page,
 		bool compound, bool young, bool dirty, bool locked,
-		bool migration)
+		bool migration, bool contpte)
 {
 	int i, nr = compound ? compound_nr(page) : 1;
 	unsigned long size = nr * PAGE_SIZE;
@@ -572,6 +573,10 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	if (!compound)
 		cacc_accumulate(mss, page);
 
+	/* Accumulate all the pages that are part of a contpte. */
+	if (contpte)
+		mss->contpte_mapped += size;
+
 	/*
 	 * Then accumulate quantities that may depend on sharing, or that may
 	 * differ page-by-page.
@@ -636,13 +641,16 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	struct vm_area_struct *vma = walk->vma;
 	bool locked = !!(vma->vm_flags & VM_LOCKED);
 	struct page *page = NULL;
-	bool migration = false, young = false, dirty = false;
+	bool migration = false, young = false, dirty = false, cont = false;
 	pte_t ptent = ptep_get(pte);
 
 	if (pte_present(ptent)) {
 		page = vm_normal_page(vma, addr, ptent);
 		young = pte_young(ptent);
 		dirty = pte_dirty(ptent);
+#ifdef pte_cont
+		cont = pte_cont(ptent);
+#endif
 	} else if (is_swap_pte(ptent)) {
 		swp_entry_t swpent = pte_to_swp_entry(ptent);
 
@@ -672,7 +680,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	if (!page)
 		return;
 
-	smaps_account(mss, page, false, young, dirty, locked, migration);
+	smaps_account(mss, page, false, young, dirty, locked, migration, cont);
 }
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
@@ -708,7 +716,7 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
 		mss->file_thp += HPAGE_PMD_SIZE;
 
 	smaps_account(mss, page, true, pmd_young(*pmd), pmd_dirty(*pmd),
-		      locked, migration);
+		      locked, migration, false);
 }
 #else
 static void smaps_pmd_entry(pmd_t *pmd, unsigned long addr,
@@ -964,6 +972,9 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss,
 					cont_label(i, label),
 					mss->file_cont[i] >> 10);
 	}
+#ifdef pte_cont
+	SEQ_PUT_DEC(" kB\nContPTEMapped:  ", mss->contpte_mapped);
+#endif
 	seq_puts(m, " kB\n");
 }
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 0/2] Report on physically contiguous memory in smaps
  2023-06-13 16:09 [PATCH v1 0/2] Report on physically contiguous memory in smaps Ryan Roberts
  2023-06-13 16:09 ` [PATCH v1 1/2] mm: /proc/pid/smaps: Report large folio mappings Ryan Roberts
  2023-06-13 16:09 ` [PATCH v1 2/2] mm: /proc/pid/smaps: Report contpte mappings Ryan Roberts
@ 2023-06-13 18:44 ` Yu Zhao
  2023-06-14 10:41   ` Ryan Roberts
  2 siblings, 1 reply; 5+ messages in thread
From: Yu Zhao @ 2023-06-13 18:44 UTC (permalink / raw)
  To: Ryan Roberts
  Cc: Jonathan Corbet, Andrew Morton, Matthew Wilcox (Oracle),
	linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	linux-arm-kernel

On Tue, Jun 13, 2023 at 05:09:48PM +0100, Ryan Roberts wrote:
> Hi All,
> 
> I thought I would try my luck with this pair of patches...

Ack on the idea.

Actually I have a script to do just this, but it's based on pagemap (attaching the script at the end).

> This series adds new entries to /proc/pid/smaps[_rollup] to report on physically
> contiguous runs of memory. The first patch reports on the sizes of the runs by
> binning into power-of-2 blocks and reporting how much memory is in which bin.
> The second patch reports on how much of the memory is contpte-mapped in the page
> table (this is a hint that arm64 supports to tell the HW that a range of ptes
> map physically contiguous memory).
> 
> With filesystems now supporting large folios in the page cache, this provides a
> useful way to see what sizes are actually getting mapped. And with the prospect
> of large folios for anonymous memory and contpte mapping for conformant large
> folios on the horizon, this reporting will become useful to aid application
> performance optimization.
> 
> Perhaps I should really be submitting these patches as part of my large anon
> folios and contpte sets (which I plan to post soon), but given this touches
> the user ABI, I thought it was sensible to post it early and separately to get
> feedback.
> 
> It would specifically be good to get feedback on:
> 
>   - The exact set of new fields depend on the system that its being run on. Does
>     this cause problem for compat? (specifically the bins are determined based
>     on PAGE_SIZE and PMD_SIZE).
>   - The ContPTEMapped field is effectively arm64-specific. What is the preferred
>     way to handle arch-specific values if not here?

No strong opinions here.

===

$ cat memory-histogram/mem_hist.py
"""Script that scans VMAs, outputting histograms regarding memory allocations.

Example usage:
  python3 mem_hist.py --omit-file-backed --omit-unfaulted-vmas

For every process on the system, this script scans each VMA, counting the number
of order n allocations for 0 <= n <= MAX_ORDER. An order n allocation is a
region of memory aligned to a PAGESIZE * (2 ^ n) sized region consisting of 2 ^
n pages in which every page is present (according to the data in
/proc/<pid>/pagemap).  VMA information as in /proc/<pid>/maps is output for all
scanned VMAs along with a histogram of allocation orders. For example, this
histogram states that there are 12 order 0 allocations, 4 order 1 allocations, 5
order 2 allocations, and so on:

  [12, 4, 5, 9, 5, 10, 6, 2, 2, 4, 3, 4]

In addition to per-VMA histograms, per-process histograms are printed.
Per-process histograms are the sum of the histograms of all VMAs contained
within it, allowing for an overview of the memory allocations patterns of the
process as a whole.

Processes, and VMAs under each process are printed sorted in reverse-lexographic
order of historgrams. That is, VMAs containing more high order allocations will
be printed after ones containing more low order allocations. The output can thus
be easily visually scanned to find VMAs in which hugepage use shows the most
potential benefit.

To reduce output clutter, the options --omit-file-backed exists to omit VMAs
that are file backed (which, outside of tmpfs, don't support transparent
hugepages on Linux). Additionally, the option --omit-unfaulted-vmas exists to
omit VMAs containing zero resident pages.
"""
import argparse
import functools
import re
import struct
import subprocess
import sys

ALL_PIDS_CMD = "ps --no-headers -e | awk '{ print $1 }'"

# Maximum order the script creates histograms up to. This is by default 9
# since the usual hugepage size on x86 is 2MB which is 2**9 4KB pages
MAX_ORDER = 9

PAGE_SIZE = 2**12
BLANK_HIST = [0] * (MAX_ORDER + 1)

class Vma:
  """Represents a virtual memory area.

  Attributes:
    proc: Process object in which this VMA is contained
    start_vaddr: Start virtual address of VMA
    end_vaddr: End virtual address of VMA
    perms: Permission string of VMA as in /proc/<pid>/maps (eg. rw-p)
    mapped_file: Path to file backing this VMA from /proc/<pid>/maps, empty
      string if not file backed. Note there are some cases in Linux where this
      may be nonempty and the VMA not file backed (eg. memfds)
    hist: This VMA's histogram as a list of integers
  """

  def __init__(self, proc, start_vaddr, end_vaddr, perms, mapped_file):
    self.proc = proc
    self.start_vaddr = start_vaddr
    self.end_vaddr = end_vaddr
    self.perms = perms
    self.mapped_file = mapped_file

  def is_file_backed(self):
    """Returns true if this VMA is file backed, false otherwise."""
    # The output printed for memfds (eg. /memfd:crosvm) also happens to be a
    # valid file path on *nix, so special case them
    return (bool(re.match("(?:/[^/]+)+", self.mapped_file)) and
            not bool(re.match("^/memfd:", self.mapped_file)))

  @staticmethod
  def bitmask(hi, lo):
    """Returns a bitmask with the bits from index hi to low+1 set."""
    return ((1 << (hi - lo)) - 1) << lo

  @property
  @functools.lru_cache(maxsize=50000)
  def hist(self):
    """Returns this VMA's histogram as a list."""
    hist = BLANK_HIST[:]

    pagemap_file = safe_open_procfile(self.proc.pid, "pagemap", "rb")
    if not pagemap_file:
      err_print(
          "Cannot open /proc/{0}/pagemap, not generating histogram".format(
              self.proc.pid))
      return hist

    # Page index of start/end VMA virtual addresses
    vma_start_page_i = self.start_vaddr // PAGE_SIZE
    vma_end_page_i = self.end_vaddr // PAGE_SIZE

    for order in range(0, MAX_ORDER + 1):
      # If there are less than two previous order pages, there can be no more
      # pages of a higher order so just break out to save time
      if order > 0 and hist[order - 1] < 2:
        break

      # First and last pages aligned to 2**order bytes in this VMA
      first_aligned_page = (vma_start_page_i
                            & self.bitmask(64, order)) + 2**order
      last_aligned_page = vma_end_page_i & self.bitmask(64, order)

      # Iterate over all order-sized and order-aligned chunks in this VMA
      for start_page_i in range(first_aligned_page, last_aligned_page,
                                2**order):
        if self._is_region_present(pagemap_file, start_page_i,
                                   start_page_i + 2**order):
          hist[order] += 1

          # Subtract two lower order VMAs so that we don't double-count
          # order n VMAs as two order n-1 VMAs as well
          if order > 0:
            hist[order - 1] -= 2

    pagemap_file.close()
    return hist

  def _is_region_present(self, pagemap_file, start_page_i, end_page_i):
    """Returns True if all pages in the given range are resident.

    Args:
      pagemap_file: Opened /proc/<pid>/pagemap file for this process
      start_page_i: Start page index for range
      end_page_i: End page index for range

    Returns:
      True if all pages from page index start_page_i to end_page_i are present
      according to the pagemap file, False otherwise.
    """
    pagemap_file.seek(start_page_i * 8)
    for _ in range(start_page_i, end_page_i):
      # /proc/<pid>/pagemaps contains an 8 byte value for every page
      page_info, = struct.unpack("Q", pagemap_file.read(8))
      # Bit 63 is set if the page is present
      if not page_info & (1 << 63):
        return False
    return True

  def __str__(self):
    return ("{start:016x}-{end:016x} {size:<8} {perms:<4} {hist:<50} "
            "{mapped_file:<40}").format(
                start=self.start_vaddr,
                end=self.end_vaddr,
                size="%dk" % ((self.end_vaddr - self.start_vaddr) // 1024),
                perms=self.perms,
                hist=str(self.hist),
                mapped_file=str(self.mapped_file))


class Process:
  """Represents a running process.

  Attributes:
    vmas: List of VMA objects representing this processes's VMAs
    pid: Process PID
    name: Name of process (read from /proc/<pid>/status
  """
  _MAPS_LINE_REGEX = ("([0-9a-f]+)-([0-9a-f]+) ([r-][w-][x-][ps-]) "
                      "[0-9a-f]+ [0-9a-f]+:[0-9a-f]+ [0-9]+[ ]*(.*)")

  def __init__(self, pid):
    self.vmas = []
    self.pid = pid
    self.name = None
    self._read_name()
    self._read_vma_info()

  def _read_name(self):
    """Reads this Process's name from /proc/<pid>/status."""
    get_name_sp = subprocess.Popen(
        "grep Name: /proc/%d/status | awk '{ print $2 }'" % self.pid,
        shell=True,
        stdout=subprocess.PIPE)
    self.name = get_name_sp.communicate()[0].decode("ascii").strip()

  def _read_vma_info(self):
    """Populates this Process's VMA list."""
    f = safe_open_procfile(self.pid, "maps", "r")
    if not f:
      err_print("Could not read maps for process {0}".format(self.pid))
      return

    for line in f:
      match = re.match(Process._MAPS_LINE_REGEX, line)
      start_vaddr = int(match.group(1), 16)
      end_vaddr = int(match.group(2), 16)
      perms = match.group(3)
      mapped_file = match.group(4) if match.lastindex == 4 else None
      self.vmas.append(Vma(self, start_vaddr, end_vaddr, perms, mapped_file))
    f.close()

  @property
  @functools.lru_cache(maxsize=50000)
  def hist(self):
    """The process-level memory allocation histogram.

    This is the sum of all VMA histograms for every VMA in this process.
    For example, if a process had two VMAs with the following histograms:

      [1, 2, 3, 0, 0, 0, 0, 0, 0, 0, 0]
      [0, 1, 2, 3, 0, 0, 0, 0, 0, 0, 0]

    This would return:
      [1, 3, 5, 3, 0, 0, 0, 0, 0, 0, 0]
    """
    return [sum(x) for x in zip(*[vma.hist for vma in self.vmas])]

  def __str__(self):
    return "process {pid:<18} {name:<25} {hist:<50}".format(
        pid=self.pid, name=str(self.name), hist=str(self.hist))


def safe_open_procfile(pid, file_name, mode):
  """Safely open the given file under /proc/<pid>.

  This catches a variety of common errors bound to happen when using this
  script (eg. permission denied, process already exited).

  Args:
    pid: Pid of process (used to construct /proc/<pid>/)
    file_name: File directly under /proc/<pid>/ to open
    mode: Mode to pass to open (eg. "w", "r")

  Returns:
    File object corresponding to file requested or None if there was an error
  """
  full_path = "/proc/{0}/{1}".format(pid, file_name)
  try:
    return open(full_path, mode)
  except PermissionError:
    err_print("Not accessing {0} (permission denied)".format(full_path))
  except FileNotFoundError:
    err_print(
        "Not opening {0} (does not exist, process {1} likely exited)".format(
            full_path, pid))


def err_print(*args, **kwargs):
  print(*args, file=sys.stderr, **kwargs)


def print_hists(args):
  """Prints all process and VMA histograms as/per module documentation."""
  pid_list_sp = subprocess.Popen(
      ALL_PIDS_CMD, shell=True, stdout=subprocess.PIPE)
  pid_list = map(int, pid_list_sp.communicate()[0].splitlines())
  procs = []

  for pid in pid_list:
    procs.append(Process(pid))

  for proc in sorted(procs, key=lambda p: p.hist[::-1]):
    # Don't print info on kernel threads or processes we couldn't collect info
    # on due to insufficent permissions
    if not proc.vmas:
      continue
    print(proc)
    for vma in sorted(proc.vmas, key=lambda v: v.hist[::-1]):
      if args.no_unfaulted_vmas and vma.hist == BLANK_HIST:
        continue
      elif args.omit_file_backed and vma.is_file_backed():
        continue
      print("    ", vma)


if __name__ == "__main__":
  parser = argparse.ArgumentParser(
      description=("Create per-process and per-VMA "
                   "histograms of contigous virtual "
                   "memory allocations"))
  parser.add_argument(
      "--omit-unfaulted-vmas",
      dest="no_unfaulted_vmas",
      action="store_true",
      help="Omit VMAs containing 0 present pages from output")
  parser.add_argument(
      "--omit-file-backed",
      dest="omit_file_backed",
      action="store_true",
      help="Omit VMAs corresponding to mmaped files")
  print_hists(parser.parse_args())

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v1 0/2] Report on physically contiguous memory in smaps
  2023-06-13 18:44 ` [PATCH v1 0/2] Report on physically contiguous memory in smaps Yu Zhao
@ 2023-06-14 10:41   ` Ryan Roberts
  0 siblings, 0 replies; 5+ messages in thread
From: Ryan Roberts @ 2023-06-14 10:41 UTC (permalink / raw)
  To: Yu Zhao
  Cc: Jonathan Corbet, Andrew Morton, Matthew Wilcox (Oracle),
	linux-kernel, linux-mm, linux-fsdevel, linux-doc,
	linux-arm-kernel

On 13/06/2023 19:44, Yu Zhao wrote:
> On Tue, Jun 13, 2023 at 05:09:48PM +0100, Ryan Roberts wrote:
>> Hi All,
>>
>> I thought I would try my luck with this pair of patches...
> 
> Ack on the idea.
> 
> Actually I have a script to do just this, but it's based on pagemap (attaching the script at the end).

I did consider that approach, but it was much more code to write the script than
to modify smaps ;-). Longer term, I think it would be good to have it in smaps
because its more accessible. For the contpte case we would need to add a bit to
every pagemap entry to express that. I'm not sure how palletable that would be
for a arch-specific thing?

Thanks for the script anyway!



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-14 10:42 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-13 16:09 [PATCH v1 0/2] Report on physically contiguous memory in smaps Ryan Roberts
2023-06-13 16:09 ` [PATCH v1 1/2] mm: /proc/pid/smaps: Report large folio mappings Ryan Roberts
2023-06-13 16:09 ` [PATCH v1 2/2] mm: /proc/pid/smaps: Report contpte mappings Ryan Roberts
2023-06-13 18:44 ` [PATCH v1 0/2] Report on physically contiguous memory in smaps Yu Zhao
2023-06-14 10:41   ` Ryan Roberts

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).