All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/6] VA to numa node information
@ 2018-09-12 20:23 Prakash Sangappa
  2018-09-12 20:23 ` [PATCH V2 1/6] Add check to match numa node id when gathering pte stats Prakash Sangappa
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-12 20:23 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: dave.hansen, mhocko, nao.horiguchi, akpm, kirill.shutemov,
	khandual, steven.sistare, prakash.sangappa

For analysis purpose it is useful to have numa node information
corresponding mapped virtual address ranges of a process. Currently,
the file /proc/<pid>/numa_maps provides list of numa nodes from where pages
are allocated per VMA of a process. This is not useful if an user needs to
determine which numa node the mapped pages are allocated from for a
particular address range. It would have helped if the numa node information
presented in /proc/<pid>/numa_maps was broken down by VA ranges showing the
exact numa node from where the pages have been allocated.

The format of /proc/<pid>/numa_maps file content is dependent on
/proc/<pid>/maps file content as mentioned in the manpage. i.e one line
entry for every VMA corresponding to entries in /proc/<pids>/maps file.
Therefore changing the output of /proc/<pid>/numa_maps may not be possible.

This patch set introduces the file /proc/<pid>/numa_vamaps which
will provide proper break down of VA ranges by numa node id from where the
mapped pages are allocated. For Address ranges not having any pages mapped,
a '-' is printed instead of numa node id.

Includes support to lseek, allowing seeking to a specific process Virtual
address(VA) starting from where the address range to numa node information
can to be read from this file.

The new file /proc/<pid>/numa_vamaps will be governed by ptrace access
mode PTRACE_MODE_READ_REALCREDS.

See following for previous discussion about this proposal

https://marc.info/?t=152524073400001&r=1&w=2


Prakash Sangappa (6):
  Add check to match numa node id when gathering pte stats
  Add /proc/<pid>/numa_vamaps file for numa node information
  Provide process address range to numa node id mapping
  Add support to lseek /proc/<pid>/numa_vamaps file
  File /proc/<pid>/numa_vamaps access needs PTRACE_MODE_READ_REALCREDS
    check
  /proc/pid/numa_vamaps: document in Documentation/filesystems/proc.txt

 Documentation/filesystems/proc.txt |  21 +++
 fs/proc/base.c                     |   6 +-
 fs/proc/internal.h                 |   1 +
 fs/proc/task_mmu.c                 | 265 ++++++++++++++++++++++++++++++++++++-
 4 files changed, 285 insertions(+), 8 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH V2 1/6] Add check to match numa node id when gathering pte stats
  2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
@ 2018-09-12 20:23 ` Prakash Sangappa
  2018-09-12 20:24 ` [PATCH V2 2/6] Add /proc/<pid>/numa_vamaps file for numa node information Prakash Sangappa
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-12 20:23 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: dave.hansen, mhocko, nao.horiguchi, akpm, kirill.shutemov,
	khandual, steven.sistare, prakash.sangappa

Add support to check if numa node id matches when gathering pte stats,
to be used by later patches.

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 fs/proc/task_mmu.c | 44 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 37 insertions(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 5ea1d64..0e2095c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1569,9 +1569,15 @@ struct numa_maps {
 	unsigned long mapcount_max;
 	unsigned long dirty;
 	unsigned long swapcache;
+        unsigned long nextaddr;
+		 long nid;
+		 long isvamaps;
 	unsigned long node[MAX_NUMNODES];
 };
 
+#define        NUMA_VAMAPS_NID_NOPAGES         (-1)
+#define        NUMA_VAMAPS_NID_NONE    (-2)
+
 struct numa_maps_private {
 	struct proc_maps_private proc_maps;
 	struct numa_maps md;
@@ -1653,6 +1659,20 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd,
 }
 #endif
 
+static bool
+vamap_match_nid(struct numa_maps *md, unsigned long addr, struct page *page)
+{
+	long target = (page ? page_to_nid(page) : NUMA_VAMAPS_NID_NOPAGES);
+
+	if (md->nid == NUMA_VAMAPS_NID_NONE)
+		md->nid = target;
+	if (md->nid == target)
+		return 0;
+	/* did not match */
+	md->nextaddr = addr;
+	return 1;
+}
+
 static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 		unsigned long end, struct mm_walk *walk)
 {
@@ -1661,6 +1681,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 	spinlock_t *ptl;
 	pte_t *orig_pte;
 	pte_t *pte;
+	int ret = 0;
 
 #ifdef CONFIG_TRANSPARENT_HUGEPAGE
 	ptl = pmd_trans_huge_lock(pmd, vma);
@@ -1668,11 +1689,13 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 		struct page *page;
 
 		page = can_gather_numa_stats_pmd(*pmd, vma, addr);
-		if (page)
+		if (md->isvamaps)
+			ret = vamap_match_nid(md, addr, page);
+		if (page && !ret)
 			gather_stats(page, md, pmd_dirty(*pmd),
 				     HPAGE_PMD_SIZE/PAGE_SIZE);
 		spin_unlock(ptl);
-		return 0;
+		return ret;
 	}
 
 	if (pmd_trans_unstable(pmd))
@@ -1681,6 +1704,10 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 	orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl);
 	do {
 		struct page *page = can_gather_numa_stats(*pte, vma, addr);
+		if (md->isvamaps && vamap_match_nid(md, addr, page)) {
+			ret = 1;
+			break;
+		}
 		if (!page)
 			continue;
 		gather_stats(page, md, pte_dirty(*pte), 1);
@@ -1688,7 +1715,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr,
 	} while (pte++, addr += PAGE_SIZE, addr != end);
 	pte_unmap_unlock(orig_pte, ptl);
 	cond_resched();
-	return 0;
+	return ret;
 }
 #ifdef CONFIG_HUGETLB_PAGE
 static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
@@ -1697,15 +1724,18 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask,
 	pte_t huge_pte = huge_ptep_get(pte);
 	struct numa_maps *md;
 	struct page *page;
+	int ret = 0;
+	md = walk->private;
 
 	if (!pte_present(huge_pte))
-		return 0;
+		return (md->isvamaps ? vamap_match_nid(md, addr, NULL) : 0);
 
 	page = pte_page(huge_pte);
-	if (!page)
-		return 0;
+	if (md->isvamaps)
+		ret = vamap_match_nid(md, addr, page);
+	if (!page || ret)
+		return ret;
 
-	md = walk->private;
 	gather_stats(page, md, pte_dirty(huge_pte), 1);
 	return 0;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 2/6] Add /proc/<pid>/numa_vamaps file for numa node information
  2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
  2018-09-12 20:23 ` [PATCH V2 1/6] Add check to match numa node id when gathering pte stats Prakash Sangappa
@ 2018-09-12 20:24 ` Prakash Sangappa
  2018-09-12 20:24 ` [PATCH V2 3/6] Provide process address range to numa node id mapping Prakash Sangappa
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-12 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: dave.hansen, mhocko, nao.horiguchi, akpm, kirill.shutemov,
	khandual, steven.sistare, prakash.sangappa

Introduce supporting data structures and file operations. Later
patch will provide changes for generating file content.

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 fs/proc/base.c     |  2 ++
 fs/proc/internal.h |  1 +
 fs/proc/task_mmu.c | 42 ++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index ccf86f1..1af99ae 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -2927,6 +2927,7 @@ static const struct pid_entry tgid_base_stuff[] = {
 	REG("maps",       S_IRUGO, proc_pid_maps_operations),
 #ifdef CONFIG_NUMA
 	REG("numa_maps",  S_IRUGO, proc_pid_numa_maps_operations),
+	REG("numa_vamaps",  S_IRUGO, proc_numa_vamaps_operations),
 #endif
 	REG("mem",        S_IRUSR|S_IWUSR, proc_mem_operations),
 	LNK("cwd",        proc_cwd_link),
@@ -3313,6 +3314,7 @@ static const struct pid_entry tid_base_stuff[] = {
 #endif
 #ifdef CONFIG_NUMA
 	REG("numa_maps", S_IRUGO, proc_pid_numa_maps_operations),
+	REG("numa_vamaps",  S_IRUGO, proc_numa_vamaps_operations),
 #endif
 	REG("mem",       S_IRUSR|S_IWUSR, proc_mem_operations),
 	LNK("cwd",       proc_cwd_link),
diff --git a/fs/proc/internal.h b/fs/proc/internal.h
index 5185d7f..994c7fd 100644
--- a/fs/proc/internal.h
+++ b/fs/proc/internal.h
@@ -298,6 +298,7 @@ extern const struct file_operations proc_pid_smaps_operations;
 extern const struct file_operations proc_pid_smaps_rollup_operations;
 extern const struct file_operations proc_clear_refs_operations;
 extern const struct file_operations proc_pagemap_operations;
+extern const struct file_operations proc_numa_vamaps_operations;
 
 extern unsigned long task_vsize(struct mm_struct *);
 extern unsigned long task_statm(struct mm_struct *,
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 0e2095c..02b553c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1583,6 +1583,16 @@ struct numa_maps_private {
 	struct numa_maps md;
 };
 
+#define NUMA_VAMAPS_BUFSZ      1024
+struct numa_vamaps_private {
+	struct mm_struct *mm;
+	struct numa_maps md;
+	u64 vm_start;
+	size_t from;
+	size_t count; /* residual bytes in buf at offset 'from' */
+	char buf[NUMA_VAMAPS_BUFSZ]; /* buffer */
+};
+
 static void gather_stats(struct page *page, struct numa_maps *md, int pte_dirty,
 			unsigned long nr_pages)
 {
@@ -1848,6 +1858,34 @@ static int pid_numa_maps_open(struct inode *inode, struct file *file)
 				sizeof(struct numa_maps_private));
 }
 
+static int numa_vamaps_open(struct inode *inode, struct file *file)
+{
+	struct mm_struct *mm;
+	struct numa_vamaps_private *nvm;
+	nvm = kzalloc(sizeof(struct numa_vamaps_private), GFP_KERNEL);
+	if (!nvm)
+		return -ENOMEM;
+
+	mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	if (IS_ERR(mm)) {
+		kfree(nvm);
+		return PTR_ERR(mm);
+	}
+	nvm->mm = mm;
+	file->private_data = nvm;
+	return 0;
+}
+
+static int numa_vamaps_release(struct inode *inode, struct file *file)
+{
+	struct numa_vamaps_private *nvm = file->private_data;
+
+	if (nvm->mm)
+		mmdrop(nvm->mm);
+	kfree(nvm);
+	return 0;
+}
+
 const struct file_operations proc_pid_numa_maps_operations = {
 	.open		= pid_numa_maps_open,
 	.read		= seq_read,
@@ -1855,4 +1893,8 @@ const struct file_operations proc_pid_numa_maps_operations = {
 	.release	= proc_map_release,
 };
 
+const struct file_operations proc_numa_vamaps_operations = {
+	.open		= numa_vamaps_open,
+	.release	= numa_vamaps_release,
+};
 #endif /* CONFIG_NUMA */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 3/6] Provide process address range to numa node id mapping
  2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
  2018-09-12 20:23 ` [PATCH V2 1/6] Add check to match numa node id when gathering pte stats Prakash Sangappa
  2018-09-12 20:24 ` [PATCH V2 2/6] Add /proc/<pid>/numa_vamaps file for numa node information Prakash Sangappa
@ 2018-09-12 20:24 ` Prakash Sangappa
  2018-09-12 20:24 ` [PATCH V2 4/6] Add support to lseek /proc/<pid>/numa_vamaps file Prakash Sangappa
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-12 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: dave.hansen, mhocko, nao.horiguchi, akpm, kirill.shutemov,
	khandual, steven.sistare, prakash.sangappa

This patch provides process address range to numa node information
thru /proc/<pid>/numa_vamaps file. For address ranges not having
any pages mapped, a '-' is printed instead of the numa node id.

Following is the sample of the file format

00400000-00410000 N1
00410000-0047f000 N0
0047f000-00480000 N2
00480000-00481000 -
00481000-004a0000 N0
004a0000-004a2000 -
004a2000-004aa000 N2
004aa000-004ad000 N0
004ad000-004ae000 -
..

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 fs/proc/task_mmu.c | 158 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 158 insertions(+)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 02b553c..1371e379 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1845,6 +1845,162 @@ static int show_numa_map(struct seq_file *m, void *v)
 	return 0;
 }
 
+static int gather_hole_info_vamap(unsigned long start, unsigned long end,
+			struct mm_walk *walk)
+{
+       struct numa_maps *md = walk->private;
+       struct vm_area_struct *vma = walk->vma;
+
+       /*
+	* If in a nid, end walk at hole start.
+	* If no nid and vma changes, end walk at next vma start.
+	*/
+	if (md->nid >= 0 || vma != find_vma(walk->mm, start)) {
+		md->nextaddr = start;
+		return 1;
+	}
+
+	if (md->nid == NUMA_VAMAPS_NID_NONE)
+		md->nid = NUMA_VAMAPS_NID_NOPAGES;
+
+	return 0;
+}
+
+static int vamap_vprintf(struct numa_vamaps_private *nvm, const char *f, ...)
+{
+	va_list args;
+	int len, space;
+
+	space = NUMA_VAMAPS_BUFSZ - nvm->count;
+	va_start(args, f);
+	len = vsnprintf(nvm->buf + nvm->count, space, f, args);
+	va_end(args);
+	if (len < space) {
+		nvm->count += len;
+		return 0;
+	}
+	return 1;
+}
+
+/*
+ * Display va-range to numa node info via /proc
+ */
+static ssize_t numa_vamaps_read(struct file *file, char __user *buf,
+	size_t count, loff_t *ppos)
+{
+	struct numa_vamaps_private *nvm = file->private_data;
+	struct vm_area_struct *vma, *tailvma;
+	struct numa_maps *md = &nvm->md;
+	struct mm_struct *mm = nvm->mm;
+	u64 vm_start = nvm->vm_start;
+	size_t ucount;
+	struct mm_walk walk = {
+		.hugetlb_entry = gather_hugetlb_stats,
+		.pmd_entry = gather_pte_stats,
+		.pte_hole = gather_hole_info_vamap,
+		.private = md,
+		.mm = mm,
+	};
+	int ret = 0, copied = 0, done = 0;
+
+	if (!mm || !mmget_not_zero(mm))
+		return 0;
+
+	if (count <= 0)
+		goto out_mm;
+
+	/* First copy leftover contents in buffer */
+	if (nvm->from)
+		goto docopy;
+
+repeat:
+	down_read(&mm->mmap_sem);
+	vma = find_vma(mm, vm_start);
+	if (!vma) {
+		done = 1;
+		goto out;
+	}
+
+	if (vma->vm_start > vm_start)
+		vm_start = vma->vm_start;
+
+	while (nvm->count < count) {
+		u64 vm_end;
+
+		/* Ensure we start with an empty numa_maps statistics */
+		memset(md, 0, sizeof(*md));
+		md->nid = NUMA_VAMAPS_NID_NONE; /* invalid nodeid at start */
+		md->nextaddr = 0;
+		md->isvamaps = 1;
+
+		 if (walk_page_range(vm_start, vma->vm_end, &walk) < 0)
+			break;
+
+		/* nextaddr ends the range. if 0, reached the vma end */
+		vm_end = (md->nextaddr ? md->nextaddr : vma->vm_end);
+
+		 /* break if buffer full */
+		if (md->nid >= 0 && md->node[md->nid]) {
+		   if (vamap_vprintf(nvm, "%08lx-%08lx N%ld\n", vm_start,
+			vm_end, md->nid))
+			break;
+		} else if (vamap_vprintf(nvm, "%08lx-%08lx - \n", vm_start,
+			vm_end)) {
+			break;
+		}
+
+		/* advance to next VA */
+		vm_start = vm_end;
+		if (vm_end == vma->vm_end) {
+			vma = vma->vm_next;
+			if (!vma) {
+				done = 1;
+				break;
+			}
+			vm_start = vma->vm_start;
+		}
+	}
+out:
+	/* last, add gate vma details */
+	if (!vma && (tailvma = get_gate_vma(mm)) != NULL &&
+		vm_start < tailvma->vm_end) {
+		done = 0;
+		if (!vamap_vprintf(nvm, "%08lx-%08lx - \n",
+		   tailvma->vm_start, tailvma->vm_end)) {
+			done = 1;
+			vm_start = tailvma->vm_end;
+		}
+	}
+
+	up_read(&mm->mmap_sem);
+docopy:
+	ucount = min(count, nvm->count);
+	if (ucount && copy_to_user(buf, nvm->buf + nvm->from, ucount)) {
+		ret = -EFAULT;
+		goto out_mm;;
+	}
+	copied += ucount;
+	count -= ucount;
+	nvm->count -= ucount;
+	buf += ucount;
+	if (!done && count) {
+		nvm->from = 0;
+		goto repeat;
+	}
+	/* somthing left in the buffer */
+	if (nvm->count)
+		nvm->from += ucount;
+	else
+		nvm->from = 0;
+
+	nvm->vm_start = vm_start;
+	ret = copied;
+	*ppos +=  copied;
+out_mm:
+	mmput(mm);
+	return ret;
+}
+
 static const struct seq_operations proc_pid_numa_maps_op = {
 	.start  = m_start,
 	.next   = m_next,
@@ -1895,6 +2051,8 @@ const struct file_operations proc_pid_numa_maps_operations = {
 
 const struct file_operations proc_numa_vamaps_operations = {
 	.open		= numa_vamaps_open,
+	.read		= numa_vamaps_read,
+	.llseek		= noop_llseek,
 	.release	= numa_vamaps_release,
 };
 #endif /* CONFIG_NUMA */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 4/6] Add support to lseek /proc/<pid>/numa_vamaps file
  2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
                   ` (2 preceding siblings ...)
  2018-09-12 20:24 ` [PATCH V2 3/6] Provide process address range to numa node id mapping Prakash Sangappa
@ 2018-09-12 20:24 ` Prakash Sangappa
  2018-09-12 20:24 ` [PATCH V2 5/6] File /proc/<pid>/numa_vamaps access needs PTRACE_MODE_READ_REALCREDS check Prakash Sangappa
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-12 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: dave.hansen, mhocko, nao.horiguchi, akpm, kirill.shutemov,
	khandual, steven.sistare, prakash.sangappa

Allow lseeking to a process virtual address(VA), starting from where
the address range to numa node information can be read. The lseek offset
will be the process virtual address.

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 fs/proc/task_mmu.c | 23 ++++++++++++++++++++++-
 1 file changed, 22 insertions(+), 1 deletion(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1371e379..93dce46 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -1866,6 +1866,27 @@ static int gather_hole_info_vamap(unsigned long start, unsigned long end,
 	return 0;
 }
 
+static loff_t numa_vamaps_llseek(struct file *file, loff_t offset, int orig)
+{
+	struct numa_vamaps_private *nvm = file->private_data;
+
+	if (orig == SEEK_CUR && offset < 0 && nvm->vm_start < -offset)
+		return -EINVAL;
+
+	switch (orig) {
+	case SEEK_SET:
+		nvm->vm_start = offset & PAGE_MASK;
+		break;
+	case SEEK_CUR:
+		nvm->vm_start += offset;
+		nvm->vm_start = nvm->vm_start & PAGE_MASK;
+		break;
+	default:
+		return -EINVAL;
+	}
+	return nvm->vm_start;
+}
+
 static int vamap_vprintf(struct numa_vamaps_private *nvm, const char *f, ...)
 {
 	va_list args;
@@ -2052,7 +2073,7 @@ const struct file_operations proc_pid_numa_maps_operations = {
 const struct file_operations proc_numa_vamaps_operations = {
 	.open		= numa_vamaps_open,
 	.read		= numa_vamaps_read,
-	.llseek		= noop_llseek,
+	.llseek		= numa_vamaps_llseek,
 	.release	= numa_vamaps_release,
 };
 #endif /* CONFIG_NUMA */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 5/6] File /proc/<pid>/numa_vamaps access needs PTRACE_MODE_READ_REALCREDS check
  2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
                   ` (3 preceding siblings ...)
  2018-09-12 20:24 ` [PATCH V2 4/6] Add support to lseek /proc/<pid>/numa_vamaps file Prakash Sangappa
@ 2018-09-12 20:24 ` Prakash Sangappa
  2018-09-12 20:24 ` [PATCH V2 6/6] /proc/pid/numa_vamaps: document in Documentation/filesystems/proc.txt Prakash Sangappa
  2018-09-13  8:40 ` [PATCH V2 0/6] VA to numa node information Michal Hocko
  6 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-12 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: dave.hansen, mhocko, nao.horiguchi, akpm, kirill.shutemov,
	khandual, steven.sistare, prakash.sangappa

Permission to access /proc/<pid>/numa_vamaps file should be governed by
PTRACE_READ_REALCREADS check to restrict getting specific VA range to numa
node mapping information.

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 fs/proc/base.c     | 4 +++-
 fs/proc/task_mmu.c | 2 +-
 2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1af99ae..3c19a55 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -745,7 +745,9 @@ struct mm_struct *proc_mem_open(struct inode *inode, unsigned int mode)
 	struct mm_struct *mm = ERR_PTR(-ESRCH);
 
 	if (task) {
-		mm = mm_access(task, mode | PTRACE_MODE_FSCREDS);
+		if (!(mode & PTRACE_MODE_REALCREDS))
+			mode |= PTRACE_MODE_FSCREDS;
+		mm = mm_access(task, mode);
 		put_task_struct(task);
 
 		if (!IS_ERR_OR_NULL(mm)) {
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 93dce46..30b29d2 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -2043,7 +2043,7 @@ static int numa_vamaps_open(struct inode *inode, struct file *file)
 	if (!nvm)
 		return -ENOMEM;
 
-	mm = proc_mem_open(inode, PTRACE_MODE_READ);
+	mm = proc_mem_open(inode, PTRACE_MODE_READ | PTRACE_MODE_REALCREDS);
 	if (IS_ERR(mm)) {
 		kfree(nvm);
 		return PTR_ERR(mm);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH V2 6/6] /proc/pid/numa_vamaps: document in Documentation/filesystems/proc.txt
  2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
                   ` (4 preceding siblings ...)
  2018-09-12 20:24 ` [PATCH V2 5/6] File /proc/<pid>/numa_vamaps access needs PTRACE_MODE_READ_REALCREDS check Prakash Sangappa
@ 2018-09-12 20:24 ` Prakash Sangappa
  2018-09-13  8:40 ` [PATCH V2 0/6] VA to numa node information Michal Hocko
  6 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-12 20:24 UTC (permalink / raw)
  To: linux-kernel, linux-mm
  Cc: dave.hansen, mhocko, nao.horiguchi, akpm, kirill.shutemov,
	khandual, steven.sistare, prakash.sangappa

Add documentation for /proc/<pid>/numa_vamaps in
Documentation/filesystems/proc.txt

Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com>
Reviewed-by: Steve Sistare <steven.sistare@oracle.com>
---
 Documentation/filesystems/proc.txt | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 22b4b00..7095216 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -150,6 +150,9 @@ Table 1-1: Process specific entries in /proc
 		each mapping and flags associated with it
  numa_maps	an extension based on maps, showing the memory locality and
 		binding policy as well as mem usage (in pages) of each mapping.
+ numa_vamaps	Presents information about mapped address ranges to numa node
+		from where the physical memory is allocated.
+
 ..............................................................................
 
 For example, to get the status information of a process, all you have to do is
@@ -571,6 +574,24 @@ Where:
 node locality page counters (N0 == node0, N1 == node1, ...) and the kernel page
 size, in KB, that is backing the mapping up.
 
+The /proc/pid/numa_vamaps shows mapped address ranges to numa node id from
+where the physical pages are allocated. For mapped address ranges not having
+any pages mapped a '-' is shown instead of the node id. Each line in the file
+will show address range to one numa node.
+
+address-range	numa-node-id
+
+00400000-00410000 N1
+00410000-0047f000 N0
+0047f000-00480000 N2
+00480000-00481000 -
+00481000-004a0000 N0
+004a0000-004a2000 -
+004a2000-004aa000 N2
+004aa000-004ad000 N0
+004ad000-004ae000 -
+..
+
 1.2 Kernel data
 ---------------
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
                   ` (5 preceding siblings ...)
  2018-09-12 20:24 ` [PATCH V2 6/6] /proc/pid/numa_vamaps: document in Documentation/filesystems/proc.txt Prakash Sangappa
@ 2018-09-13  8:40 ` Michal Hocko
  2018-09-13 22:32   ` prakash.sangappa
  6 siblings, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-09-13  8:40 UTC (permalink / raw)
  To: Prakash Sangappa
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual, steven.sistare

On Wed 12-09-18 13:23:58, Prakash Sangappa wrote:
> For analysis purpose it is useful to have numa node information
> corresponding mapped virtual address ranges of a process. Currently,
> the file /proc/<pid>/numa_maps provides list of numa nodes from where pages
> are allocated per VMA of a process. This is not useful if an user needs to
> determine which numa node the mapped pages are allocated from for a
> particular address range. It would have helped if the numa node information
> presented in /proc/<pid>/numa_maps was broken down by VA ranges showing the
> exact numa node from where the pages have been allocated.
> 
> The format of /proc/<pid>/numa_maps file content is dependent on
> /proc/<pid>/maps file content as mentioned in the manpage. i.e one line
> entry for every VMA corresponding to entries in /proc/<pids>/maps file.
> Therefore changing the output of /proc/<pid>/numa_maps may not be possible.
> 
> This patch set introduces the file /proc/<pid>/numa_vamaps which
> will provide proper break down of VA ranges by numa node id from where the
> mapped pages are allocated. For Address ranges not having any pages mapped,
> a '-' is printed instead of numa node id.
> 
> Includes support to lseek, allowing seeking to a specific process Virtual
> address(VA) starting from where the address range to numa node information
> can to be read from this file.
> 
> The new file /proc/<pid>/numa_vamaps will be governed by ptrace access
> mode PTRACE_MODE_READ_REALCREDS.
> 
> See following for previous discussion about this proposal
> 
> https://marc.info/?t=152524073400001&r=1&w=2

It would be really great to give a short summary of the previous
discussion. E.g. why do we need a proc interface in the first place when
we already have an API to query for the information you are proposing to
export [1]

[1] http://lkml.kernel.org/r/20180503085741.GD4535@dhcp22.suse.cz
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-13  8:40 ` [PATCH V2 0/6] VA to numa node information Michal Hocko
@ 2018-09-13 22:32   ` prakash.sangappa
  2018-09-14  0:10     ` Andrew Morton
  2018-09-14  5:56     ` Michal Hocko
  0 siblings, 2 replies; 23+ messages in thread
From: prakash.sangappa @ 2018-09-13 22:32 UTC (permalink / raw)
  To: Michal Hocko
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual, steven.sistare



On 09/13/2018 01:40 AM, Michal Hocko wrote:
> On Wed 12-09-18 13:23:58, Prakash Sangappa wrote:
>> For analysis purpose it is useful to have numa node information
>> corresponding mapped virtual address ranges of a process. Currently,
>> the file /proc/<pid>/numa_maps provides list of numa nodes from where pages
>> are allocated per VMA of a process. This is not useful if an user needs to
>> determine which numa node the mapped pages are allocated from for a
>> particular address range. It would have helped if the numa node information
>> presented in /proc/<pid>/numa_maps was broken down by VA ranges showing the
>> exact numa node from where the pages have been allocated.
>>
>> The format of /proc/<pid>/numa_maps file content is dependent on
>> /proc/<pid>/maps file content as mentioned in the manpage. i.e one line
>> entry for every VMA corresponding to entries in /proc/<pids>/maps file.
>> Therefore changing the output of /proc/<pid>/numa_maps may not be possible.
>>
>> This patch set introduces the file /proc/<pid>/numa_vamaps which
>> will provide proper break down of VA ranges by numa node id from where the
>> mapped pages are allocated. For Address ranges not having any pages mapped,
>> a '-' is printed instead of numa node id.
>>
>> Includes support to lseek, allowing seeking to a specific process Virtual
>> address(VA) starting from where the address range to numa node information
>> can to be read from this file.
>>
>> The new file /proc/<pid>/numa_vamaps will be governed by ptrace access
>> mode PTRACE_MODE_READ_REALCREDS.
>>
>> See following for previous discussion about this proposal
>>
>> https://marc.info/?t=152524073400001&r=1&w=2
> It would be really great to give a short summary of the previous
> discussion. E.g. why do we need a proc interface in the first place when
> we already have an API to query for the information you are proposing to
> export [1]
>
> [1] http://lkml.kernel.org/r/20180503085741.GD4535@dhcp22.suse.cz

The proc interface provides an efficient way to export address range
to numa node id mapping information compared to using the API.
For example, for sparsely populated mappings, if a VMA has large portions
not have any physical pages mapped, the page walk done thru the /proc file
interface can skip over non existent PMDs / ptes. Whereas using the
API the application would have to scan the entire VMA in page size units.

Also, VMAs having THP pages can have a mix of 4k pages and hugepages.
The page walks would be efficient in scanning and determining if it is
a THP huge page and step over it. Whereas using the API, the application
would not know what page size mapping is used for a given VA and so would
have to again scan the VMA in units of 4k page size.

If this sounds reasonable, I can add it to the commit / patch description.

-Prakash.






^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-13 22:32   ` prakash.sangappa
@ 2018-09-14  0:10     ` Andrew Morton
  2018-09-14  0:25       ` Dave Hansen
  2018-09-14  5:56     ` Michal Hocko
  1 sibling, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2018-09-14  0:10 UTC (permalink / raw)
  To: prakash.sangappa
  Cc: Michal Hocko, linux-kernel, linux-mm, dave.hansen, nao.horiguchi,
	kirill.shutemov, khandual, steven.sistare

On Thu, 13 Sep 2018 15:32:25 -0700 "prakash.sangappa" <prakash.sangappa@oracle.com> wrote:

> >> https://marc.info/?t=152524073400001&r=1&w=2
> > It would be really great to give a short summary of the previous
> > discussion. E.g. why do we need a proc interface in the first place when
> > we already have an API to query for the information you are proposing to
> > export [1]
> >
> > [1] http://lkml.kernel.org/r/20180503085741.GD4535@dhcp22.suse.cz
> 
> The proc interface provides an efficient way to export address range
> to numa node id mapping information compared to using the API.
> For example, for sparsely populated mappings, if a VMA has large portions
> not have any physical pages mapped, the page walk done thru the /proc file
> interface can skip over non existent PMDs / ptes. Whereas using the
> API the application would have to scan the entire VMA in page size units.
> 
> Also, VMAs having THP pages can have a mix of 4k pages and hugepages.
> The page walks would be efficient in scanning and determining if it is
> a THP huge page and step over it. Whereas using the API, the application
> would not know what page size mapping is used for a given VA and so would
> have to again scan the VMA in units of 4k page size.
> 
> If this sounds reasonable, I can add it to the commit / patch description.

Preferably with some runtime measurements, please.  How much faster is
this interface in real-world situations?  And why does that performance
matter?

It would also be useful to see more details on how this info helps
operators understand/tune/etc their applications and workloads.  In
other words, I'm trying to get an understanding of how useful this code
might be to our users in general.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-14  0:10     ` Andrew Morton
@ 2018-09-14  0:25       ` Dave Hansen
  2018-09-15  1:31         ` Prakash Sangappa
  0 siblings, 1 reply; 23+ messages in thread
From: Dave Hansen @ 2018-09-14  0:25 UTC (permalink / raw)
  To: Andrew Morton, prakash.sangappa
  Cc: Michal Hocko, linux-kernel, linux-mm, nao.horiguchi,
	kirill.shutemov, khandual, steven.sistare

On 09/13/2018 05:10 PM, Andrew Morton wrote:
>> Also, VMAs having THP pages can have a mix of 4k pages and hugepages.
>> The page walks would be efficient in scanning and determining if it is
>> a THP huge page and step over it. Whereas using the API, the application
>> would not know what page size mapping is used for a given VA and so would
>> have to again scan the VMA in units of 4k page size.
>>
>> If this sounds reasonable, I can add it to the commit / patch description.

As we are judging whether this is a "good" interface, can you tell us a
bit about its scalability?  For instance, let's say someone has a 1TB
VMA that's populated with interleaved 4k pages.  How much data comes
out?  How long does it take to parse?  Will we effectively deadlock the
system if someone accidentally cat's the wrong /proc file?

/proc seems like a really simple way to implement this, but it seems a
*really* odd choice for something that needs to collect a large amount
of data.  The lseek() stuff is a nice addition, but I wonder if it's
unwieldy to use in practice.  For instance, if you want to read data for
the VMA at 0x1000000 you lseek(fd, 0x1000000, SEEK_SET, right?  You read
~20 bytes of data and then the fd is at 0x1000020.  But, you're getting
data out at the next read() for (at least) the next page, which is also
available at 0x1001000.  Seems funky.  Do other /proc files behave this way?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-13 22:32   ` prakash.sangappa
  2018-09-14  0:10     ` Andrew Morton
@ 2018-09-14  5:56     ` Michal Hocko
  2018-09-14 16:01       ` Steven Sistare
  1 sibling, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-09-14  5:56 UTC (permalink / raw)
  To: prakash.sangappa
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual, steven.sistare

On Thu 13-09-18 15:32:25, prakash.sangappa wrote:
> 
> 
> On 09/13/2018 01:40 AM, Michal Hocko wrote:
> > On Wed 12-09-18 13:23:58, Prakash Sangappa wrote:
> > > For analysis purpose it is useful to have numa node information
> > > corresponding mapped virtual address ranges of a process. Currently,
> > > the file /proc/<pid>/numa_maps provides list of numa nodes from where pages
> > > are allocated per VMA of a process. This is not useful if an user needs to
> > > determine which numa node the mapped pages are allocated from for a
> > > particular address range. It would have helped if the numa node information
> > > presented in /proc/<pid>/numa_maps was broken down by VA ranges showing the
> > > exact numa node from where the pages have been allocated.
> > > 
> > > The format of /proc/<pid>/numa_maps file content is dependent on
> > > /proc/<pid>/maps file content as mentioned in the manpage. i.e one line
> > > entry for every VMA corresponding to entries in /proc/<pids>/maps file.
> > > Therefore changing the output of /proc/<pid>/numa_maps may not be possible.
> > > 
> > > This patch set introduces the file /proc/<pid>/numa_vamaps which
> > > will provide proper break down of VA ranges by numa node id from where the
> > > mapped pages are allocated. For Address ranges not having any pages mapped,
> > > a '-' is printed instead of numa node id.
> > > 
> > > Includes support to lseek, allowing seeking to a specific process Virtual
> > > address(VA) starting from where the address range to numa node information
> > > can to be read from this file.
> > > 
> > > The new file /proc/<pid>/numa_vamaps will be governed by ptrace access
> > > mode PTRACE_MODE_READ_REALCREDS.
> > > 
> > > See following for previous discussion about this proposal
> > > 
> > > https://marc.info/?t=152524073400001&r=1&w=2
> > It would be really great to give a short summary of the previous
> > discussion. E.g. why do we need a proc interface in the first place when
> > we already have an API to query for the information you are proposing to
> > export [1]
> > 
> > [1] http://lkml.kernel.org/r/20180503085741.GD4535@dhcp22.suse.cz
> 
> The proc interface provides an efficient way to export address range
> to numa node id mapping information compared to using the API.

Do you have any numbers?

> For example, for sparsely populated mappings, if a VMA has large portions
> not have any physical pages mapped, the page walk done thru the /proc file
> interface can skip over non existent PMDs / ptes. Whereas using the
> API the application would have to scan the entire VMA in page size units.

What prevents you from pre-filtering by reading /proc/$pid/maps to get
ranges of interest?

> Also, VMAs having THP pages can have a mix of 4k pages and hugepages.
> The page walks would be efficient in scanning and determining if it is
> a THP huge page and step over it. Whereas using the API, the application
> would not know what page size mapping is used for a given VA and so would
> have to again scan the VMA in units of 4k page size.

Why does this matter for something that is for analysis purposes.
Reading the file for the whole address space is far from a free
operation. Is the page walk optimization really essential for usability?
Moreover what prevents move_pages implementation to be clever for the
page walk itself? In other words why would we want to add a new API
rather than make the existing one faster for everybody.
 
> If this sounds reasonable, I can add it to the commit / patch description.

This all is absolutely _essential_ for any new API proposed. Remember that
once we add a new user interface, we have to maintain it for ever. We
used to be too relaxed when adding new proc files in the past and it
backfired many times already.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-14  5:56     ` Michal Hocko
@ 2018-09-14 16:01       ` Steven Sistare
  2018-09-14 18:04           ` Prakash Sangappa
  2018-09-24 17:14         ` Michal Hocko
  0 siblings, 2 replies; 23+ messages in thread
From: Steven Sistare @ 2018-09-14 16:01 UTC (permalink / raw)
  To: Michal Hocko, prakash.sangappa
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual

On 9/14/2018 1:56 AM, Michal Hocko wrote:
> On Thu 13-09-18 15:32:25, prakash.sangappa wrote:
>> On 09/13/2018 01:40 AM, Michal Hocko wrote:
>>> On Wed 12-09-18 13:23:58, Prakash Sangappa wrote:
>>>> For analysis purpose it is useful to have numa node information
>>>> corresponding mapped virtual address ranges of a process. Currently,
>>>> the file /proc/<pid>/numa_maps provides list of numa nodes from where pages
>>>> are allocated per VMA of a process. This is not useful if an user needs to
>>>> determine which numa node the mapped pages are allocated from for a
>>>> particular address range. It would have helped if the numa node information
>>>> presented in /proc/<pid>/numa_maps was broken down by VA ranges showing the
>>>> exact numa node from where the pages have been allocated.
>>>>
>>>> The format of /proc/<pid>/numa_maps file content is dependent on
>>>> /proc/<pid>/maps file content as mentioned in the manpage. i.e one line
>>>> entry for every VMA corresponding to entries in /proc/<pids>/maps file.
>>>> Therefore changing the output of /proc/<pid>/numa_maps may not be possible.
>>>>
>>>> This patch set introduces the file /proc/<pid>/numa_vamaps which
>>>> will provide proper break down of VA ranges by numa node id from where the
>>>> mapped pages are allocated. For Address ranges not having any pages mapped,
>>>> a '-' is printed instead of numa node id.
>>>>
>>>> Includes support to lseek, allowing seeking to a specific process Virtual
>>>> address(VA) starting from where the address range to numa node information
>>>> can to be read from this file.
>>>>
>>>> The new file /proc/<pid>/numa_vamaps will be governed by ptrace access
>>>> mode PTRACE_MODE_READ_REALCREDS.
>>>>
>>>> See following for previous discussion about this proposal
>>>>
>>>> https://marc.info/?t=152524073400001&r=1&w=2
>>> It would be really great to give a short summary of the previous
>>> discussion. E.g. why do we need a proc interface in the first place when
>>> we already have an API to query for the information you are proposing to
>>> export [1]
>>>
>>> [1] http://lkml.kernel.org/r/20180503085741.GD4535@dhcp22.suse.cz
>>
>> The proc interface provides an efficient way to export address range
>> to numa node id mapping information compared to using the API.
> 
> Do you have any numbers?
> 
>> For example, for sparsely populated mappings, if a VMA has large portions
>> not have any physical pages mapped, the page walk done thru the /proc file
>> interface can skip over non existent PMDs / ptes. Whereas using the
>> API the application would have to scan the entire VMA in page size units.
> 
> What prevents you from pre-filtering by reading /proc/$pid/maps to get
> ranges of interest?

That works for skipping holes, but not for skipping huge pages.  I did a 
quick experiment to time move_pages on a 3 GHz Xeon and a 4.18 kernel.  
Allocate 128 GB and touch every small page.  Call move_pages with nodes=NULL 
to get the node id for all pages, passing 512 consecutive small pages per 
call to move_nodes. The total move_nodes time is 1.85 secs, and 55 nsec 
per page.  Extrapolating to a 1 TB range, it would take 15 sec to retrieve 
the numa node for every small page in the range.  That is not terrible, but 
it is not interactive, and it becomes terrible for multiple TB.

>> Also, VMAs having THP pages can have a mix of 4k pages and hugepages.
>> The page walks would be efficient in scanning and determining if it is
>> a THP huge page and step over it. Whereas using the API, the application
>> would not know what page size mapping is used for a given VA and so would
>> have to again scan the VMA in units of 4k page size.
> 
> Why does this matter for something that is for analysis purposes.
> Reading the file for the whole address space is far from a free
> operation. Is the page walk optimization really essential for usability?
> Moreover what prevents move_pages implementation to be clever for the
> page walk itself? In other words why would we want to add a new API
> rather than make the existing one faster for everybody.

One could optimize move pages.  If the caller passes a consecutive range
of small pages, and the page walk sees that a VA is mapped by a huge page, 
then it can return the same numa node for each of the following VA's that fall 
into the huge page range. It would be faster than 55 nsec per small page, but 
hard to say how much faster, and the cost is still driven by the number of 
small pages. 
 
>> If this sounds reasonable, I can add it to the commit / patch description.
> 
> This all is absolutely _essential_ for any new API proposed. Remember that
> once we add a new user interface, we have to maintain it for ever. We
> used to be too relaxed when adding new proc files in the past and it
> backfired many times already.

An offhand idea -- we could extend /proc/pid/numa_maps in a backward compatible
way by providing a control interface that is poked via write() or ioctl().
Provide one control "do-not-combine".  If do-not-combine has been set, then
the read() function returns a separate line for each range of memory mapped
on the same numa node, in the existing format.

- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-14 16:01       ` Steven Sistare
@ 2018-09-14 18:04           ` Prakash Sangappa
  2018-09-24 17:14         ` Michal Hocko
  1 sibling, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-14 18:04 UTC (permalink / raw)
  To: Steven Sistare, Michal Hocko
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual



On 9/14/18 9:01 AM, Steven Sistare wrote:
> On 9/14/2018 1:56 AM, Michal Hocko wrote:
>> On Thu 13-09-18 15:32:25, prakash.sangappa wrote:
>>>
>>> The proc interface provides an efficient way to export address range
>>> to numa node id mapping information compared to using the API.
>> Do you have any numbers?
>>
>>> For example, for sparsely populated mappings, if a VMA has large portions
>>> not have any physical pages mapped, the page walk done thru the /proc file
>>> interface can skip over non existent PMDs / ptes. Whereas using the
>>> API the application would have to scan the entire VMA in page size units.
>> What prevents you from pre-filtering by reading /proc/$pid/maps to get
>> ranges of interest?
> That works for skipping holes, but not for skipping huge pages.  I did a
> quick experiment to time move_pages on a 3 GHz Xeon and a 4.18 kernel.
> Allocate 128 GB and touch every small page.  Call move_pages with nodes=NULL
> to get the node id for all pages, passing 512 consecutive small pages per
> call to move_nodes. The total move_nodes time is 1.85 secs, and 55 nsec
> per page.  Extrapolating to a 1 TB range, it would take 15 sec to retrieve
> the numa node for every small page in the range.  That is not terrible, but
> it is not interactive, and it becomes terrible for multiple TB.
>

Also, for valid VMAs in  'maps' file, if the VMA is sparsely populated 
with  physical pages,
the page walk can skip over non existing page table entires (PMDs) and 
so can be faster.

For example  reading va range of a 400GB VMA which has few pages mapped
in beginning and few pages at the end and the rest of VMA does not have 
any pages, it
takes 0.001s using the /proc interface. Whereas with move_page() api 
passing 1024
consecutive small pages address, it takes about 2.4secs. This is on a 
similar system
running 4.19 kernel.




^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
@ 2018-09-14 18:04           ` Prakash Sangappa
  0 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-14 18:04 UTC (permalink / raw)
  To: Steven Sistare, Michal Hocko
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual



On 9/14/18 9:01 AM, Steven Sistare wrote:
> On 9/14/2018 1:56 AM, Michal Hocko wrote:
>> On Thu 13-09-18 15:32:25, prakash.sangappa wrote:
>>>
>>> The proc interface provides an efficient way to export address range
>>> to numa node id mapping information compared to using the API.
>> Do you have any numbers?
>>
>>> For example, for sparsely populated mappings, if a VMA has large portions
>>> not have any physical pages mapped, the page walk done thru the /proc file
>>> interface can skip over non existent PMDs / ptes. Whereas using the
>>> API the application would have to scan the entire VMA in page size units.
>> What prevents you from pre-filtering by reading /proc/$pid/maps to get
>> ranges of interest?
> That works for skipping holes, but not for skipping huge pages.  I did a
> quick experiment to time move_pages on a 3 GHz Xeon and a 4.18 kernel.
> Allocate 128 GB and touch every small page.  Call move_pages with nodes=NULL
> to get the node id for all pages, passing 512 consecutive small pages per
> call to move_nodes. The total move_nodes time is 1.85 secs, and 55 nsec
> per page.  Extrapolating to a 1 TB range, it would take 15 sec to retrieve
> the numa node for every small page in the range.  That is not terrible, but
> it is not interactive, and it becomes terrible for multiple TB.
>

Also, for valid VMAs inA  'maps' file, if the VMA is sparsely populated 
withA  physical pages,
the page walk can skip over non existing page table entires (PMDs) and 
so can be faster.

For exampleA  reading va range of a 400GB VMA which has few pages mapped
in beginning and few pages at the end and the rest of VMA does not have 
any pages, it
takes 0.001s using the /proc interface. Whereas with move_page() api 
passing 1024
consecutive small pages address, it takes about 2.4secs. This is on a 
similar system
running 4.19 kernel.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-14 18:04           ` Prakash Sangappa
  (?)
@ 2018-09-14 19:01           ` Dave Hansen
  -1 siblings, 0 replies; 23+ messages in thread
From: Dave Hansen @ 2018-09-14 19:01 UTC (permalink / raw)
  To: Prakash Sangappa, Steven Sistare, Michal Hocko
  Cc: linux-kernel, linux-mm, nao.horiguchi, akpm, kirill.shutemov, khandual

On 09/14/2018 11:04 AM, Prakash Sangappa wrote:
> Also, for valid VMAs in  'maps' file, if the VMA is sparsely
> populated with  physical pages, the page walk can skip over non
> existing page table entires (PMDs) and so can be faster.
Note that this only works for things that were _never_ populated.  They
might be sparse after once being populated and then being reclaimed or
discarded.  Those will still have all the page tables allocated.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-14  0:25       ` Dave Hansen
@ 2018-09-15  1:31         ` Prakash Sangappa
  0 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-09-15  1:31 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Michal Hocko, linux-kernel, linux-mm,
	nao.horiguchi, kirill.shutemov, khandual, steven.sistare

On 9/13/2018 5:25 PM, Dave Hansen wrote:
> On 09/13/2018 05:10 PM, Andrew Morton wrote:
>>> Also, VMAs having THP pages can have a mix of 4k pages and hugepages.
>>> The page walks would be efficient in scanning and determining if it is
>>> a THP huge page and step over it. Whereas using the API, the application
>>> would not know what page size mapping is used for a given VA and so would
>>> have to again scan the VMA in units of 4k page size.
>>>
>>> If this sounds reasonable, I can add it to the commit / patch description.
> As we are judging whether this is a "good" interface, can you tell us a
> bit about its scalability?  For instance, let's say someone has a 1TB
> VMA that's populated with interleaved 4k pages.  How much data comes
> out?  How long does it take to parse?  Will we effectively deadlock the
> system if someone accidentally cat's the wrong /proc file?

For the worst case scenario you describe, it would be one line(range) 
for each 4k. Which would
be similar to what you get with  '/proc/*/pagemap'. The amount of data 
copied out at a
time is based on the buffer size used in the kernel. Which is 1024. That 
is if one line(one range)
printed is about 40 bytes(char), that means  about 25 lines per copy 
out.  Main concern would
be holding  'mmap_sem' lock, which can cause hangs. When the 1024 buffer 
gets filled the
mmap_sem is dropped and the buffer content is copied out to the user 
buffer. Then the
mmap_sem lock is reacquired and the page walk continues as needed until 
the specified user
buffer size is filed or till end of process address space is reached.

One potential issue could be that there is  a large VA range with all 
pages populated from
one numa node, then the page walk could take longer while holding 
mmap_sem lock. This
can be addressed by dropping and re-acquiring the mmap_sem lock after 
certain number of
pages have been walked(Say 512 - which is what happens in 
'/proc/*/pagemap' case).

>
> /proc seems like a really simple way to implement this, but it seems a
> *really* odd choice for something that needs to collect a large amount
> of data.  The lseek() stuff is a nice addition, but I wonder if it's
> unwieldy to use in practice.  For instance, if you want to read data for
> the VMA at 0x1000000 you lseek(fd, 0x1000000, SEEK_SET, right?  You read
> ~20 bytes of data and then the fd is at 0x1000020.  But, you're getting
> data out at the next read() for (at least) the next page, which is also
> available at 0x1001000.  Seems funky.  Do other /proc files behave this way?
>
Yes, SEEK_SET to the VA.  The lseek offset is the process VA. So it is 
not going to be
different from reading a normal text file.  Expect that  /proc files are 
special. Ex In
/proc/*/pagemap' file case, read enforces that seek/file offset and the 
user buffer size
passed in to  be a  multiple of the pagemap_entry_t  size or else the 
read would fail.

The usage for numa_vamaps file will be to SEEK_SET to the VA from where 
VA range
to numa node information needs to be read.

The  'fd' offset is not taken into consideration here, just the VA. Say 
each va range to numa
node id printed is about 40 bytes(chars). Now if  the read only read 20 
bytes, it would have read
part of the line. Subsequent read would read the remaining bytes of the 
line, which will
be stored in the kernel buffer.



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-14 16:01       ` Steven Sistare
  2018-09-14 18:04           ` Prakash Sangappa
@ 2018-09-24 17:14         ` Michal Hocko
  2018-11-10  4:48             ` Prakash Sangappa
  1 sibling, 1 reply; 23+ messages in thread
From: Michal Hocko @ 2018-09-24 17:14 UTC (permalink / raw)
  To: Steven Sistare
  Cc: prakash.sangappa, linux-kernel, linux-mm, dave.hansen,
	nao.horiguchi, akpm, kirill.shutemov, khandual

On Fri 14-09-18 12:01:18, Steven Sistare wrote:
> On 9/14/2018 1:56 AM, Michal Hocko wrote:
[...]
> > Why does this matter for something that is for analysis purposes.
> > Reading the file for the whole address space is far from a free
> > operation. Is the page walk optimization really essential for usability?
> > Moreover what prevents move_pages implementation to be clever for the
> > page walk itself? In other words why would we want to add a new API
> > rather than make the existing one faster for everybody.
> 
> One could optimize move pages.  If the caller passes a consecutive range
> of small pages, and the page walk sees that a VA is mapped by a huge page, 
> then it can return the same numa node for each of the following VA's that fall 
> into the huge page range. It would be faster than 55 nsec per small page, but 
> hard to say how much faster, and the cost is still driven by the number of 
> small pages. 

This is exactly what I was arguing for. There is some room for
improvements for the existing interface. I yet have to hear the explicit
usecase which would required even better performance that cannot be
achieved by the existing API.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-09-24 17:14         ` Michal Hocko
@ 2018-11-10  4:48             ` Prakash Sangappa
  0 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-11-10  4:48 UTC (permalink / raw)
  To: Michal Hocko, Steven Sistare
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual



On 9/24/18 10:14 AM, Michal Hocko wrote:
> On Fri 14-09-18 12:01:18, Steven Sistare wrote:
>> On 9/14/2018 1:56 AM, Michal Hocko wrote:
> [...]
>>> Why does this matter for something that is for analysis purposes.
>>> Reading the file for the whole address space is far from a free
>>> operation. Is the page walk optimization really essential for usability?
>>> Moreover what prevents move_pages implementation to be clever for the
>>> page walk itself? In other words why would we want to add a new API
>>> rather than make the existing one faster for everybody.
>> One could optimize move pages.  If the caller passes a consecutive range
>> of small pages, and the page walk sees that a VA is mapped by a huge page,
>> then it can return the same numa node for each of the following VA's that fall
>> into the huge page range. It would be faster than 55 nsec per small page, but
>> hard to say how much faster, and the cost is still driven by the number of
>> small pages.
> This is exactly what I was arguing for. There is some room for
> improvements for the existing interface. I yet have to hear the explicit
> usecase which would required even better performance that cannot be
> achieved by the existing API.
>

Above mentioned optimization to move_pages() API helps when scanning
mapped huge pages, but does not help if there are large sparse mappings
with few pages mapped. Otherwise, consider adding page walk support in
the move_pages() implementation, enhance the API(new flag?) to return
address range to numa node information. The page walk optimization
would certainly make a difference for usability.

We can have applications(Like Oracle DB) having processes with large sparse
mappings(in TBs)  with only some areas of these mapped address range
being accessed, basically  large portions not having page tables backing 
it.
This can become more prevalent on newer systems with multiple TBs of
memory.

Here is some data from pmap using move_pages() API  with optimization.
Following table compares time pmap takes to print address mapping of a
large process, with numa node information using move_pages() api vs pmap
using /proc numa_vamaps file.

Running pmap command on a process with 1.3 TB of address space, with
sparse mappings.

                        ~1.3 TB sparse      250G dense segment with hugepages.
move_pages              8.33s              3.14
optimized move_pages    6.29s              0.92
/proc numa_vamaps       0.08s              0.04

  
Second column is pmap time on a 250G address range of this process, which maps
hugepages(THP & hugetlb).


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
@ 2018-11-10  4:48             ` Prakash Sangappa
  0 siblings, 0 replies; 23+ messages in thread
From: Prakash Sangappa @ 2018-11-10  4:48 UTC (permalink / raw)
  To: Michal Hocko, Steven Sistare
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual



On 9/24/18 10:14 AM, Michal Hocko wrote:
> On Fri 14-09-18 12:01:18, Steven Sistare wrote:
>> On 9/14/2018 1:56 AM, Michal Hocko wrote:
> [...]
>>> Why does this matter for something that is for analysis purposes.
>>> Reading the file for the whole address space is far from a free
>>> operation. Is the page walk optimization really essential for usability?
>>> Moreover what prevents move_pages implementation to be clever for the
>>> page walk itself? In other words why would we want to add a new API
>>> rather than make the existing one faster for everybody.
>> One could optimize move pages.  If the caller passes a consecutive range
>> of small pages, and the page walk sees that a VA is mapped by a huge page,
>> then it can return the same numa node for each of the following VA's that fall
>> into the huge page range. It would be faster than 55 nsec per small page, but
>> hard to say how much faster, and the cost is still driven by the number of
>> small pages.
> This is exactly what I was arguing for. There is some room for
> improvements for the existing interface. I yet have to hear the explicit
> usecase which would required even better performance that cannot be
> achieved by the existing API.
>

Above mentioned optimization to move_pages() API helps when scanning
mapped huge pages, but does not help if there are large sparse mappings
with few pages mapped. Otherwise, consider adding page walk support in
the move_pages() implementation, enhance the API(new flag?) to return
address range to numa node information. The page walk optimization
would certainly make a difference for usability.

We can have applications(Like Oracle DB) having processes with large sparse
mappings(in TBs)A  with only some areas of these mapped address range
being accessed, basicallyA  large portions not having page tables backing 
it.
This can become more prevalent on newer systems with multiple TBs of
memory.

Here is some data from pmap using move_pages() APIA  with optimization.
Following table compares time pmap takes to print address mapping of a
large process, with numa node information using move_pages() api vs pmap
using /proc numa_vamaps file.

Running pmap command on a process with 1.3 TB of address space, with
sparse mappings.

             A A  A A A A   A  ~1.3 TB sparseA A A    250G dense segment with hugepages.
move_pagesA A A A A A A A A A A A A  8.33sA A A A A A A A A A A A A  3.14
optimized move_pagesA A A  6.29sA A A A A A A A A A A A A  0.92
/proc numa_vamapsA A A A A A  0.08sA A A A A A A A A A A A A  0.04

  
Second column is pmap time on a 250G address range of this process, which maps
hugepages(THP & hugetlb).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-11-10  4:48             ` Prakash Sangappa
  (?)
@ 2018-11-26 19:20             ` Steven Sistare
  2018-12-18 23:46               ` prakash.sangappa
  -1 siblings, 1 reply; 23+ messages in thread
From: Steven Sistare @ 2018-11-26 19:20 UTC (permalink / raw)
  To: Prakash Sangappa, Michal Hocko
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual

On 11/9/2018 11:48 PM, Prakash Sangappa wrote:
> On 9/24/18 10:14 AM, Michal Hocko wrote:
>> On Fri 14-09-18 12:01:18, Steven Sistare wrote:
>>> On 9/14/2018 1:56 AM, Michal Hocko wrote:
>> [...]
>>>> Why does this matter for something that is for analysis purposes.
>>>> Reading the file for the whole address space is far from a free
>>>> operation. Is the page walk optimization really essential for usability?
>>>> Moreover what prevents move_pages implementation to be clever for the
>>>> page walk itself? In other words why would we want to add a new API
>>>> rather than make the existing one faster for everybody.
>>> One could optimize move pages.  If the caller passes a consecutive range
>>> of small pages, and the page walk sees that a VA is mapped by a huge page,
>>> then it can return the same numa node for each of the following VA's that fall
>>> into the huge page range. It would be faster than 55 nsec per small page, but
>>> hard to say how much faster, and the cost is still driven by the number of
>>> small pages.
>> This is exactly what I was arguing for. There is some room for
>> improvements for the existing interface. I yet have to hear the explicit
>> usecase which would required even better performance that cannot be
>> achieved by the existing API.
>>
> 
> Above mentioned optimization to move_pages() API helps when scanning
> mapped huge pages, but does not help if there are large sparse mappings
> with few pages mapped. Otherwise, consider adding page walk support in
> the move_pages() implementation, enhance the API(new flag?) to return
> address range to numa node information. The page walk optimization
> would certainly make a difference for usability.
> 
> We can have applications(Like Oracle DB) having processes with large sparse
> mappings(in TBs)  with only some areas of these mapped address range
> being accessed, basically  large portions not having page tables backing it.
> This can become more prevalent on newer systems with multiple TBs of
> memory.
> 
> Here is some data from pmap using move_pages() API  with optimization.
> Following table compares time pmap takes to print address mapping of a
> large process, with numa node information using move_pages() api vs pmap
> using /proc numa_vamaps file.
> 
> Running pmap command on a process with 1.3 TB of address space, with
> sparse mappings.
> 
>                        ~1.3 TB sparse      250G dense segment with hugepages.
> move_pages              8.33s              3.14
> optimized move_pages    6.29s              0.92
> /proc numa_vamaps       0.08s              0.04
> 
>  
> Second column is pmap time on a 250G address range of this process, which maps
> hugepages(THP & hugetlb).

The data look compelling to me.  numa_vmap provides a much smoother user experience
for the analyst who is casting a wide net looking for the root of a performance issue.
Almost no waiting to see the data.

- Steve

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-11-26 19:20             ` Steven Sistare
@ 2018-12-18 23:46               ` prakash.sangappa
  2018-12-19 20:52                 ` Michal Hocko
  0 siblings, 1 reply; 23+ messages in thread
From: prakash.sangappa @ 2018-12-18 23:46 UTC (permalink / raw)
  To: Steven Sistare, Michal Hocko
  Cc: linux-kernel, linux-mm, dave.hansen, nao.horiguchi, akpm,
	kirill.shutemov, khandual

[-- Attachment #1: Type: text/plain, Size: 3572 bytes --]



On 11/26/2018 11:20 AM, Steven Sistare wrote:
> On 11/9/2018 11:48 PM, Prakash Sangappa wrote:
>>
>> Here is some data from pmap using move_pages() API  with optimization.
>> Following table compares time pmap takes to print address mapping of a
>> large process, with numa node information using move_pages() api vs pmap
>> using /proc numa_vamaps file.
>>
>> Running pmap command on a process with 1.3 TB of address space, with
>> sparse mappings.
>>
>>                         ~1.3 TB sparse      250G dense segment with hugepages.
>> move_pages              8.33s              3.14
>> optimized move_pages    6.29s              0.92
>> /proc numa_vamaps       0.08s              0.04
>>
>>   
>> Second column is pmap time on a 250G address range of this process, which maps
>> hugepages(THP & hugetlb).
> The data look compelling to me.  numa_vmap provides a much smoother user experience
> for the analyst who is casting a wide net looking for the root of a performance issue.
> Almost no waiting to see the data.
>
> - Steve

What do others think? How to proceed on this?

Summarizing the discussion so far:

Usecase for getting VA(Virtual Address) to numa node information is
for performance analysis purpose. Investigating  performance issues
would involve looking at where a process memory is allocated from
(which numa node). For the user analyzing the issue, an efficient way
to get this information will be useful when looking at application
processes having large address space.

The patch proposed  adding /proc/<pid>/numa_vamaps file for providing
VA to Numa node id mapping information of a process. This file provides
address range to numa node id info. Address range not having any pages
mapped will be indicated with '-' for numa node id. Sample file content

00400000-00410000 N1
00410000-0047f000 N0
00480000-00481000 -
00481000-004a0000 N0
..

Dave Hansen asked how would it scale, with respect reading this file from
a large process. Answer is, the file contents are generated using page
table walk, and copied to user buffer. The mmap_sem lock is drop and
re-acquired in the process of walking the page table and copying file
content. The kernel buffer size used determines how long the lock is held.
Which can be further improved to drop the lock and re-acquire after a
fixed number(512) of pages are walked.

Also, with support for seeking to a specific VA of the process from where
the VA to numa node information will be provided, the file offset is not
taken into consideration. This behavior is different and unlike reading a
normal file. Other /proc files(Ex /proc/<pid>/pagemap) also have certain
differences compared to reading a normal file.

Michal Hocko suggested that the currently available 'move_pages' API
could be used to collect the VA to numa node id information. However,
use of numa_vamaps /proc file will be more efficient then move_pages().
Steven Sistare Suggested optimizing move_pages(), for the case when
consecutive 4k page  addresses are passed in. I tried out this optimization
and above mentioned table shows  performance comparison of
move_pages() API vs 'numa_vamaps' /proc file. Specifically, in the case of
sparse mapping the optimization to move_pages() does not help. The
performance benefits seen with the /proc file will make a difference from
an usability point of view.

Andrew Morton had asked about the performance difference between
move_pages() API and use of 'numa_vamaps' /proc file, also the usecase
for getting VA to numa node id information. Hope above description
answers the questions.








[-- Attachment #2: Type: text/html, Size: 4678 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH V2 0/6] VA to numa node information
  2018-12-18 23:46               ` prakash.sangappa
@ 2018-12-19 20:52                 ` Michal Hocko
  0 siblings, 0 replies; 23+ messages in thread
From: Michal Hocko @ 2018-12-19 20:52 UTC (permalink / raw)
  To: prakash.sangappa
  Cc: Steven Sistare, linux-kernel, linux-mm, dave.hansen,
	nao.horiguchi, akpm, kirill.shutemov, khandual

On Tue 18-12-18 15:46:45, prakash.sangappa wrote:
[...]
> Dave Hansen asked how would it scale, with respect reading this file from
> a large process. Answer is, the file contents are generated using page
> table walk, and copied to user buffer. The mmap_sem lock is drop and
> re-acquired in the process of walking the page table and copying file
> content. The kernel buffer size used determines how long the lock is held.
> Which can be further improved to drop the lock and re-acquire after a
> fixed number(512) of pages are walked.

I guess you are still missing the point here. Have you tried a larger
mapping with interleaved memory policy? I would bet my hat that you are
going to spend a large part of the time just pushing the output to the
userspace... Not to mention the parsing on the consumer side.

Also you keep failing (IMO) explaining _who_ is going to be the consumer
of the file. What kind of analysis will need such an optimized data
collection and what can you do about that?

This is really _essential_ when adding a new interface to provide a data
that is already available by other means. In other words tell us your
specific usecase that is hitting a bottleneck that cannot be handled by
the existing API and we can start considering a new one.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-12-19 20:52 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-12 20:23 [PATCH V2 0/6] VA to numa node information Prakash Sangappa
2018-09-12 20:23 ` [PATCH V2 1/6] Add check to match numa node id when gathering pte stats Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 2/6] Add /proc/<pid>/numa_vamaps file for numa node information Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 3/6] Provide process address range to numa node id mapping Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 4/6] Add support to lseek /proc/<pid>/numa_vamaps file Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 5/6] File /proc/<pid>/numa_vamaps access needs PTRACE_MODE_READ_REALCREDS check Prakash Sangappa
2018-09-12 20:24 ` [PATCH V2 6/6] /proc/pid/numa_vamaps: document in Documentation/filesystems/proc.txt Prakash Sangappa
2018-09-13  8:40 ` [PATCH V2 0/6] VA to numa node information Michal Hocko
2018-09-13 22:32   ` prakash.sangappa
2018-09-14  0:10     ` Andrew Morton
2018-09-14  0:25       ` Dave Hansen
2018-09-15  1:31         ` Prakash Sangappa
2018-09-14  5:56     ` Michal Hocko
2018-09-14 16:01       ` Steven Sistare
2018-09-14 18:04         ` Prakash Sangappa
2018-09-14 18:04           ` Prakash Sangappa
2018-09-14 19:01           ` Dave Hansen
2018-09-24 17:14         ` Michal Hocko
2018-11-10  4:48           ` Prakash Sangappa
2018-11-10  4:48             ` Prakash Sangappa
2018-11-26 19:20             ` Steven Sistare
2018-12-18 23:46               ` prakash.sangappa
2018-12-19 20:52                 ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.