linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Report the size of pages backing VMAs in /proc V3
@ 2008-10-03 16:46 Mel Gorman
  2008-10-03 16:46 ` [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps Mel Gorman
  2008-10-03 16:46 ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Mel Gorman
  0 siblings, 2 replies; 19+ messages in thread
From: Mel Gorman @ 2008-10-03 16:46 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, kosaki.motohiro, dave, linux-mm, linux-kernel

The following two patches add support for printing the size of pages used
by the kernel to back VMAs in maps and smaps. This can be used by a user
to verify that a hugepage-aware application is using the expected page sizes.
In one case the pagesize used by the MMU differs from the size used by the
kernel. This is on PPC64 using 64K as a base page size running on a processor
that does not support 64K in the MMU. In this case, the kernel uses 64K pages
but the MMU is still using 4K.

The first patch prints the size of page used by the kernel when allocating
pages for a VMA in /proc/pid/smaps and should not be considered too
contentious as it is highly unlikely to break any parsers.  The second patch
reports the size of page used by hugetlbfs regions in /proc/pid/maps. There is
a possibility that the final patch will break parsers but they are arguably
already broken. More details are in the patches themselves.

Thanks to KOSAKI Motohiro for rebasing the patches onto mmotm, reviewing
and testing.

Changelog since V2
  o Drop printing of MMUPageSize (mel)
  o Rebase onto mmotm (KOSAKI Motohiro)

Changelog since V1
  o Fix build failure on !CONFIG_HUGETLB_PAGE
  o Uninline helper functions
  o Distinguish between base pagesize and MMU pagesize

 fs/proc/task_mmu.c      |   27 ++++++++++++++++++---------
 include/linux/hugetlb.h |    3 +++
 mm/hugetlb.c            |   17 +++++++++++++++++
 3 files changed, 38 insertions(+), 9 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps
  2008-10-03 16:46 [PATCH 0/2] Report the size of pages backing VMAs in /proc V3 Mel Gorman
@ 2008-10-03 16:46 ` Mel Gorman
  2008-10-08 21:38   ` Alexey Dobriyan
  2008-10-03 16:46 ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Mel Gorman
  1 sibling, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2008-10-03 16:46 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, kosaki.motohiro, dave, linux-mm, linux-kernel

It is useful to verify a hugepage-aware application is using the expected
pagesizes for its memory regions. This patch creates an entry called
KernelPageSize in /proc/pid/smaps that is the size of page used by the
kernel to back a VMA. The entry is not called PageSize as it is possible
the MMU uses a different size. This extension should not break any sensible
parser that skips lines containing unrecognised information.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/task_mmu.c      |    6 ++++--
 include/linux/hugetlb.h |    3 +++
 mm/hugetlb.c            |   17 +++++++++++++++++
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index f6add87..beb884d 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -402,7 +402,8 @@ static int show_smap(struct seq_file *m, void *v)
 		   "Private_Clean:  %8lu kB\n"
 		   "Private_Dirty:  %8lu kB\n"
 		   "Referenced:     %8lu kB\n"
-		   "Swap:           %8lu kB\n",
+		   "Swap:           %8lu kB\n"
+		   "KernelPageSize: %8lu kB\n",
 		   (vma->vm_end - vma->vm_start) >> 10,
 		   mss.resident >> 10,
 		   (unsigned long)(mss.pss >> (10 + PSS_SHIFT)),
@@ -411,7 +412,8 @@ static int show_smap(struct seq_file *m, void *v)
 		   mss.private_clean >> 10,
 		   mss.private_dirty >> 10,
 		   mss.referenced >> 10,
-		   mss.swap >> 10);
+		   mss.swap >> 10,
+		   vma_kernel_pagesize(vma) >> 10);
 
 	if (m->count < m->size)  /* vma is copied successfully */
 		m->version = (vma != get_gate_vma(task)) ? vma->vm_start : 0;
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 32e0ef0..ace04a7 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -231,6 +231,8 @@ static inline unsigned long huge_page_size(struct hstate *h)
 	return (unsigned long)PAGE_SIZE << h->order;
 }
 
+extern unsigned long vma_kernel_pagesize(struct vm_area_struct *vma);
+
 static inline unsigned long huge_page_mask(struct hstate *h)
 {
 	return h->mask;
@@ -271,6 +273,7 @@ struct hstate {};
 #define hstate_inode(i) NULL
 #define huge_page_size(h) PAGE_SIZE
 #define huge_page_mask(h) PAGE_MASK
+#define vma_kernel_pagesize(v) PAGE_SIZE
 #define huge_page_order(h) 0
 #define huge_page_shift(h) PAGE_SHIFT
 static inline unsigned int pages_per_huge_page(struct hstate *h)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index adf3568..856949c 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -219,6 +219,23 @@ static pgoff_t vma_hugecache_offset(struct hstate *h,
 }
 
 /*
+ * Return the size of the pages allocated when backing a VMA. In the majority
+ * cases this will be same size as used by the page table entries. 
+ */
+unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
+{
+	struct hstate *hstate;
+
+	if (!is_vm_hugetlb_page(vma))
+		return PAGE_SIZE;
+
+	hstate = hstate_vma(vma);
+	VM_BUG_ON(!hstate);
+
+	return 1UL << (hstate->order + PAGE_SHIFT);
+}
+
+/*
  * Flags for MAP_PRIVATE reservations.  These are stored in the bottom
  * bits of the reservation map pointer, which are always clear due to
  * alignment.
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps
  2008-10-03 16:46 [PATCH 0/2] Report the size of pages backing VMAs in /proc V3 Mel Gorman
  2008-10-03 16:46 ` [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps Mel Gorman
@ 2008-10-03 16:46 ` Mel Gorman
  2008-10-04  8:14   ` KOSAKI Motohiro
                     ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Mel Gorman @ 2008-10-03 16:46 UTC (permalink / raw)
  To: akpm; +Cc: Mel Gorman, kosaki.motohiro, dave, linux-mm, linux-kernel

This patch adds a new field for hugepage-backed memory regions to show the
pagesize in /proc/pid/maps.  While the information is available in smaps,
maps is more human-readable and does not incur the cost of calculating Pss. An
example of a /proc/self/maps output for an application using hugepages with
this patch applied is;

08048000-0804c000 r-xp 00000000 03:01 49135      /bin/cat
0804c000-0804d000 rw-p 00003000 03:01 49135      /bin/cat
08400000-08800000 rw-p 00000000 00:10 4055       /mnt/libhugetlbfs.tmp.QzPPTJ (deleted) (hpagesize=4096kB)
b7daa000-b7dab000 rw-p b7daa000 00:00 0
b7dab000-b7ed2000 r-xp 00000000 03:01 116846     /lib/tls/i686/cmov/libc-2.3.6.so
b7ed2000-b7ed7000 r--p 00127000 03:01 116846     /lib/tls/i686/cmov/libc-2.3.6.so
b7ed7000-b7ed9000 rw-p 0012c000 03:01 116846     /lib/tls/i686/cmov/libc-2.3.6.so
b7ed9000-b7edd000 rw-p b7ed9000 00:00 0
b7ee1000-b7ee8000 r-xp 00000000 03:01 49262      /root/libhugetlbfs-git/obj32/libhugetlbfs.so
b7ee8000-b7ee9000 rw-p 00006000 03:01 49262      /root/libhugetlbfs-git/obj32/libhugetlbfs.so
b7ee9000-b7eed000 rw-p b7ee9000 00:00 0
b7eed000-b7f02000 r-xp 00000000 03:01 119345     /lib/ld-2.3.6.so
b7f02000-b7f04000 rw-p 00014000 03:01 119345     /lib/ld-2.3.6.so
bf8ef000-bf903000 rwxp bffeb000 00:00 0          [stack]
bf903000-bf904000 rw-p bffff000 00:00 0
ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]

To be predictable for parsers, the patch adds the notion of reporting on VMA
attributes by appending one or more fields that look like "(attribute)". This
already happens when a file is deleted and the user sees (deleted) after the
filename. The expectation is that existing parsers will not break as those
that read the filename should be reading forward after the inode number
and stopping when it sees something that is not part of the filename.
Parsers that assume everything after / is a filename will get confused by
(hpagesize=XkB) but are already broken due to (deleted).

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
---
 fs/proc/task_mmu.c |   21 ++++++++++++++-------
 1 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index beb884d..793633b 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -198,7 +198,8 @@ static int do_maps_open(struct inode *inode, struct file *file,
 	return ret;
 }
 
-static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
+static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma,
+				int showattributes)
 {
 	struct mm_struct *mm = vma->vm_mm;
 	struct file *file = vma->vm_file;
@@ -227,8 +228,8 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 	 * Print the dentry name for named mappings, and a
 	 * special [heap] marker for the heap:
 	 */
+	pad_len_spaces(m, len);
 	if (file) {
-		pad_len_spaces(m, len);
 		seq_path(m, &file->f_path, "\n");
 	} else {
 		const char *name = arch_vma_name(vma);
@@ -245,11 +246,17 @@ static void show_map_vma(struct seq_file *m, struct vm_area_struct *vma)
 				name = "[vdso]";
 			}
 		}
-		if (name) {
-			pad_len_spaces(m, len);
+		if (name)
 			seq_puts(m, name);
-		}
 	}
+
+	/*
+	 * Print additional attributes of the VMA of interest
+	 * - hugepage size if hugepage-backed
+	 */
+	if (showattributes && vma->vm_flags & VM_HUGETLB)
+		seq_printf(m, " (hpagesize=%lukB)",
+			vma_kernel_pagesize(vma) >> 10);
 	seq_putc(m, '\n');
 }
 
@@ -262,7 +269,7 @@ static int show_map(struct seq_file *m, void *v)
 	if (maps_protect && !ptrace_may_access(task, PTRACE_MODE_READ))
 		return -EACCES;
 
-	show_map_vma(m, vma);
+	show_map_vma(m, vma, 1);
 
 	if (m->count < m->size)  /* vma is copied successfully */
 		m->version = (vma != get_gate_vma(task)) ? vma->vm_start : 0;
@@ -391,7 +398,7 @@ static int show_smap(struct seq_file *m, void *v)
 	if (maps_protect && !ptrace_may_access(task, PTRACE_MODE_READ))
 		return -EACCES;
 
-	show_map_vma(m, vma);
+	show_map_vma(m, vma, 0);
 
 	seq_printf(m,
 		   "Size:           %8lu kB\n"
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps
  2008-10-03 16:46 ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Mel Gorman
@ 2008-10-04  8:14   ` KOSAKI Motohiro
  2008-10-04 12:04   ` [RFC PATCH] Report the shmid backing a VMA in maps KOSAKI Motohiro
  2008-10-04 22:13   ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Alexey Dobriyan
  2 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-10-04  8:14 UTC (permalink / raw)
  To: Mel Gorman; +Cc: kosaki.motohiro, akpm, dave, linux-mm, linux-kernel

> This patch adds a new field for hugepage-backed memory regions to show the
> pagesize in /proc/pid/maps.  While the information is available in smaps,
> maps is more human-readable and does not incur the cost of calculating Pss. An
> example of a /proc/self/maps output for an application using hugepages with
> this patch applied is;
> 
> 08048000-0804c000 r-xp 00000000 03:01 49135      /bin/cat
> 0804c000-0804d000 rw-p 00003000 03:01 49135      /bin/cat
> 08400000-08800000 rw-p 00000000 00:10 4055       /mnt/libhugetlbfs.tmp.QzPPTJ (deleted) (hpagesize=4096kB)
> b7daa000-b7dab000 rw-p b7daa000 00:00 0
> b7dab000-b7ed2000 r-xp 00000000 03:01 116846     /lib/tls/i686/cmov/libc-2.3.6.so
> b7ed2000-b7ed7000 r--p 00127000 03:01 116846     /lib/tls/i686/cmov/libc-2.3.6.so
> b7ed7000-b7ed9000 rw-p 0012c000 03:01 116846     /lib/tls/i686/cmov/libc-2.3.6.so
> b7ed9000-b7edd000 rw-p b7ed9000 00:00 0
> b7ee1000-b7ee8000 r-xp 00000000 03:01 49262      /root/libhugetlbfs-git/obj32/libhugetlbfs.so
> b7ee8000-b7ee9000 rw-p 00006000 03:01 49262      /root/libhugetlbfs-git/obj32/libhugetlbfs.so
> b7ee9000-b7eed000 rw-p b7ee9000 00:00 0
> b7eed000-b7f02000 r-xp 00000000 03:01 119345     /lib/ld-2.3.6.so
> b7f02000-b7f04000 rw-p 00014000 03:01 119345     /lib/ld-2.3.6.so
> bf8ef000-bf903000 rwxp bffeb000 00:00 0          [stack]
> bf903000-bf904000 rw-p bffff000 00:00 0
> ffffe000-fffff000 r-xp 00000000 00:00 0          [vdso]
> 
> To be predictable for parsers, the patch adds the notion of reporting on VMA
> attributes by appending one or more fields that look like "(attribute)". This
> already happens when a file is deleted and the user sees (deleted) after the
> filename. The expectation is that existing parsers will not break as those
> that read the filename should be reading forward after the inode number
> and stopping when it sees something that is not part of the filename.
> Parsers that assume everything after / is a filename will get confused by
> (hpagesize=XkB) but are already broken due to (deleted).
> 
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>

This patch is nicer and cleaner than my version.
Thanks! mel.





^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC PATCH] Report the shmid backing a VMA in maps
  2008-10-03 16:46 ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Mel Gorman
  2008-10-04  8:14   ` KOSAKI Motohiro
@ 2008-10-04 12:04   ` KOSAKI Motohiro
  2008-10-04 12:07     ` KOSAKI Motohiro
  2008-10-04 21:52     ` Alexey Dobriyan
  2008-10-04 22:13   ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Alexey Dobriyan
  2 siblings, 2 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-10-04 12:04 UTC (permalink / raw)
  To: Mel Gorman, akpm, linux-mm, linux-kernel, Adam Litke; +Cc: kosaki.motohiro

Hi

I made another hugepage administrating helping patch.
So, I'd like to hear hugepage folks.

I tested this patch on mmotm 02/Oct + Mel's "Report the size of pages backing VMAs in /proc V3" series.


Thanks!


======================================================
Recently, Mel Gorman introduce attribute showing mechanism to /proc/{pid}/maps.
It is very powerful and useful feature.

In the other hand, huge page is often used via ipc shm, not mmap.
So, administrator often want to know relationship of memory region and shmid.

Then, To add shmid attribute in /proc/{pid}/maps is useful.


In addition, shmid information is not only useful for huge page, but also for normal shm.
Then, this patch works well on normal shm.

this patch depend on Mel's "Report the pagesize backing a VMA in /proc/pid/maps" patch.


example output of /proc/{pid}/maps
---------------------------------------------------------
00000000-00010000 r--p 00000000 00:00 0                                  
2000000000000000-2000000000040000 r-xp 00000000 fd:00 7372806            /lib/ld-2.5.so
2000000000040000-2000000000050000 rw-p 00030000 fd:00 7372806            /lib/ld-2.5.so
2000000000050000-2000000000060000 rw-p 2000000000050000 00:00 0          
2000000000060000-20000000000d0000 r-xp 00000000 fd:00 2334823            /usr/lib/libreadline.so.5.1
20000000000d0000-20000000000e0000 rw-p 00060000 fd:00 2334823            /usr/lib/libreadline.so.5.1
20000000000e0000-2000000000170000 r-xp 00000000 fd:00 2334751            /usr/lib/libncurses.so.5.5
2000000000170000-2000000000190000 rw-p 00080000 fd:00 2334751            /usr/lib/libncurses.so.5.5
2000000000190000-20000000001a0000 r-xp 00000000 fd:00 2337176            /usr/lib/libnuma.so.1
20000000001a0000-20000000001b0000 rw-p 00000000 fd:00 2337176            /usr/lib/libnuma.so.1
20000000001b0000-2000000000420000 r-xp 00000000 fd:00 7372813            /lib/libc-2.5.so
2000000000420000-2000000000430000 rw-p 00260000 fd:00 7372813            /lib/libc-2.5.so
2000000000430000-2000000000440000 rw-p 2000000000430000 00:00 0          
2000000000440000-2000000000450000 r-xp 00000000 fd:00 7372819            /lib/libdl-2.5.so
2000000000450000-2000000000460000 rw-p 00000000 fd:00 7372819            /lib/libdl-2.5.so
2000000000460000-20000000004d0000 rw-p 2000000000460000 00:00 0          
2000000000500000-2000000000900000 rw-s 00000000 00:09 0                  /SYSV00000000 (deleted) (shmid=0)
2000000000900000-2000000000d00000 rw-s 00000000 00:09 32769              /SYSV00000000 (deleted) (shmid=32769)
4000000000000000-4000000000030000 r-xp 00000000 fd:00 7536864            /home/kosaki/download/Memtoy-0.16/memtoy
6000000000000000-6000000000010000 rw-p 00020000 fd:00 7536864            /home/kosaki/download/Memtoy-0.16/memtoy
6000000000010000-6000000000040000 rw-p 6000000000010000 00:00 0          [heap]
6007ffffffc70000-6007ffffffc80000 rw-p 6007ffffffc70000 00:00 0          
600fffffffb10000-600fffffffc60000 rw-p 600fffffffea0000 00:00 0          [stack]
8000000000000000-8000000010000000 rw-s 00000000 00:0c 65538              /SYSV00000000 (deleted) (hpagesize=262144kB) (shmid=65538)
a000000000000000-a000000000020000 r-xp 00000000 00:00 0                  [vdso]
------------------------------------------------------------

Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Mel Gorman <mel@csn.ul.ie>
---
 fs/proc/task_mmu.c  |   12 +++++++++---
 include/linux/shm.h |   10 ++++++++++
 ipc/shm.c           |   17 +++++++++++++++++
 3 files changed, 36 insertions(+), 3 deletions(-)

Index: b/fs/proc/task_mmu.c
===================================================================
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -254,9 +254,15 @@ static void show_map_vma(struct seq_file
 	 * Print additional attributes of the VMA of interest
 	 * - hugepage size if hugepage-backed
 	 */
-	if (showattributes && vma->vm_flags & VM_HUGETLB)
-		seq_printf(m, " (hpagesize=%lukB)",
-			vma_kernel_pagesize(vma) >> 10);
+	if (showattributes) {
+		if (vma->vm_flags & VM_HUGETLB)
+			seq_printf(m, " (hpagesize=%lukB)",
+				   vma_kernel_pagesize(vma) >> 10);
+		if (is_shm_vma(vma))
+			seq_printf(m, " (shmid=%d)",
+				   vma_shmid(vma));
+	}
+
 	seq_putc(m, '\n');
 }
 
Index: b/include/linux/shm.h
===================================================================
--- a/include/linux/shm.h
+++ b/include/linux/shm.h
@@ -106,6 +106,8 @@ struct shmid_kernel /* private to the ke
 #ifdef CONFIG_SYSVIPC
 long do_shmat(int shmid, char __user *shmaddr, int shmflg, unsigned long *addr);
 extern int is_file_shm_hugepages(struct file *file);
+int is_shm_vma(struct vm_area_struct *vma);
+int vma_shmid(struct vm_area_struct *vma);
 #else
 static inline long do_shmat(int shmid, char __user *shmaddr,
 				int shmflg, unsigned long *addr)
@@ -116,6 +118,14 @@ static inline int is_file_shm_hugepages(
 {
 	return 0;
 }
+static inline int is_shm_vma(struct vm_area_struct *vma)
+{
+	return 0;
+}
+int vma_shmid(struct vm_area_struct *vma)
+{
+	return -ENOENT;
+}
 #endif
 
 #endif /* __KERNEL__ */
Index: b/ipc/shm.c
===================================================================
--- a/ipc/shm.c
+++ b/ipc/shm.c
@@ -1074,3 +1074,20 @@ static int sysvipc_shm_proc_show(struct 
 			  shp->shm_ctim);
 }
 #endif
+
+int is_shm_vma(struct vm_area_struct *vma)
+{
+	return !!(vma->vm_ops == &shm_vm_ops);
+}
+
+int vma_shmid(struct vm_area_struct *vma)
+{
+	struct shm_file_data *sfd;
+
+	if (!is_shm_vma(vma))
+		return -ENOENT;
+
+	sfd = (struct shm_file_data *)vma->vm_file->private_data;
+	return sfd->id;
+}
+



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] Report the shmid backing a VMA in maps
  2008-10-04 12:04   ` [RFC PATCH] Report the shmid backing a VMA in maps KOSAKI Motohiro
@ 2008-10-04 12:07     ` KOSAKI Motohiro
  2008-10-04 21:52     ` Alexey Dobriyan
  1 sibling, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-10-04 12:07 UTC (permalink / raw)
  To: Mel Gorman, akpm, linux-mm, linux-kernel, Adam Litke; +Cc: kosaki.motohiro

> Hi
> 
> I made another hugepage administrating helping patch.
> So, I'd like to hear hugepage folks.

s/folks/folks's opiniton/


yup, I'm really stupid ;-|





^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] Report the shmid backing a VMA in maps
  2008-10-04 12:04   ` [RFC PATCH] Report the shmid backing a VMA in maps KOSAKI Motohiro
  2008-10-04 12:07     ` KOSAKI Motohiro
@ 2008-10-04 21:52     ` Alexey Dobriyan
  2008-10-05  5:48       ` KOSAKI Motohiro
  1 sibling, 1 reply; 19+ messages in thread
From: Alexey Dobriyan @ 2008-10-04 21:52 UTC (permalink / raw)
  To: KOSAKI Motohiro; +Cc: Mel Gorman, akpm, linux-mm, linux-kernel, Adam Litke

On Sat, Oct 04, 2008 at 09:04:03PM +0900, KOSAKI Motohiro wrote:
> In the other hand, huge page is often used via ipc shm, not mmap.
> So, administrator often want to know relationship of memory region and shmid.
> 
> Then, To add shmid attribute in /proc/{pid}/maps is useful.
> 
> 
> In addition, shmid information is not only useful for huge page, but also for normal shm.
> Then, this patch works well on normal shm.

> 2000000000500000-2000000000900000 rw-s 00000000 00:09 0                  /SYSV00000000 (deleted) (shmid=0)
> 2000000000900000-2000000000d00000 rw-s 00000000 00:09 32769              /SYSV00000000 (deleted) (shmid=32769)
							^^^^^						  ^^^^^

shmid is already in place, and no, it's not a coincidence ;-)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps
  2008-10-03 16:46 ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Mel Gorman
  2008-10-04  8:14   ` KOSAKI Motohiro
  2008-10-04 12:04   ` [RFC PATCH] Report the shmid backing a VMA in maps KOSAKI Motohiro
@ 2008-10-04 22:13   ` Alexey Dobriyan
  2008-10-05  6:00     ` KOSAKI Motohiro
  2008-10-06 10:09     ` Mel Gorman
  2 siblings, 2 replies; 19+ messages in thread
From: Alexey Dobriyan @ 2008-10-04 22:13 UTC (permalink / raw)
  To: Mel Gorman; +Cc: akpm, kosaki.motohiro, dave, linux-mm, linux-kernel

On Fri, Oct 03, 2008 at 05:46:55PM +0100, Mel Gorman wrote:
> This patch adds a new field for hugepage-backed memory regions to show the
> pagesize in /proc/pid/maps.  While the information is available in smaps,
> maps is more human-readable and does not incur the cost of calculating Pss. An
> example of a /proc/self/maps output for an application using hugepages with
> this patch applied is;
> 
> 08048000-0804c000 r-xp 00000000 03:01 49135      /bin/cat
> 0804c000-0804d000 rw-p 00003000 03:01 49135      /bin/cat
> 08400000-08800000 rw-p 00000000 00:10 4055       /mnt/libhugetlbfs.tmp.QzPPTJ (deleted) (hpagesize=4096kB)

> To be predictable for parsers, the patch adds the notion of reporting on VMA
> attributes by appending one or more fields that look like "(attribute)". This
> already happens when a file is deleted and the user sees (deleted) after the
> filename. The expectation is that existing parsers will not break as those
> that read the filename should be reading forward after the inode number
> and stopping when it sees something that is not part of the filename.
> Parsers that assume everything after / is a filename will get confused by
> (hpagesize=XkB) but are already broken due to (deleted).

Looks like procps will start showing hpagesize tag as a mapping name
(apologies for pasting crappy code):



static const char *mapping_name(proc_t *p, unsigned KLONG addr, unsigned KLONG len, const char *mapbuf, unsigned showpath, unsigned dev_major, unsigned dev_minor, unsigned long long inode){
  const char *cp;

  if(!dev_major && dev_minor==shm_minor && strstr(mapbuf,"/SYSV")){
    static char shmbuf[64];
    snprintf(shmbuf, sizeof shmbuf, "  [ shmid=0x%Lx ]", inode);
    return shmbuf;
  }

  cp = strrchr(mapbuf,'/');
  if(cp){
    if(showpath) return strchr(mapbuf,'/');
    return cp[1] ? cp+1 : cp;
  }

  cp = strchr(mapbuf,'/');
  if(cp){
    if(showpath) return cp;
    return strrchr(cp,'/') + 1;  // it WILL succeed
  }

  cp = "  [ anon ]";
  if( (p->start_stack >= addr) && (p->start_stack <= addr+len) )  cp = "  [ stack ]";
  return cp;
}

static int one_proc(proc_t *p){

	...

  while(fgets(mapbuf,sizeof mapbuf,stdin)){

	...

    if(x_option){
      const char *cp = mapping_name(p, start, diff, mapbuf, 0, dev_major, dev_minor, inode);
      printf(
        (sizeof(KLONG)==8)
          ? "%016"KLF"x %7lu       -       -       - %s  %s\n"
          :      "%08lx %7lu       -       -       - %s  %s\n",
        start,
        (unsigned long)(diff>>10),
        flags,
        cp
      );
    }

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC PATCH] Report the shmid backing a VMA in maps
  2008-10-04 21:52     ` Alexey Dobriyan
@ 2008-10-05  5:48       ` KOSAKI Motohiro
  0 siblings, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-10-05  5:48 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Mel Gorman, akpm, linux-mm, linux-kernel, Adam Litke

2008/10/5 Alexey Dobriyan <adobriyan@gmail.com>:
> On Sat, Oct 04, 2008 at 09:04:03PM +0900, KOSAKI Motohiro wrote:
>> In the other hand, huge page is often used via ipc shm, not mmap.
>> So, administrator often want to know relationship of memory region and shmid.
>>
>> Then, To add shmid attribute in /proc/{pid}/maps is useful.
>>
>>
>> In addition, shmid information is not only useful for huge page, but also for normal shm.
>> Then, this patch works well on normal shm.
>
>> 2000000000500000-2000000000900000 rw-s 00000000 00:09 0                  /SYSV00000000 (deleted) (shmid=0)
>> 2000000000900000-2000000000d00000 rw-s 00000000 00:09 32769              /SYSV00000000 (deleted) (shmid=32769)
>                                                        ^^^^^                                             ^^^^^
>
> shmid is already in place, and no, it's not a coincidence ;-)

Oops, Thanks very good information.
I'll drop this patch :)

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps
  2008-10-04 22:13   ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Alexey Dobriyan
@ 2008-10-05  6:00     ` KOSAKI Motohiro
  2008-10-06 10:09     ` Mel Gorman
  1 sibling, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-10-05  6:00 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: Mel Gorman, akpm, dave, linux-mm, linux-kernel

Hi

>> This patch adds a new field for hugepage-backed memory regions to show the
>> pagesize in /proc/pid/maps.  While the information is available in smaps,
>> maps is more human-readable and does not incur the cost of calculating Pss. An
>> example of a /proc/self/maps output for an application using hugepages with
>> this patch applied is;
>>
>> 08048000-0804c000 r-xp 00000000 03:01 49135      /bin/cat
>> 0804c000-0804d000 rw-p 00003000 03:01 49135      /bin/cat
>> 08400000-08800000 rw-p 00000000 00:10 4055       /mnt/libhugetlbfs.tmp.QzPPTJ (deleted) (hpagesize=4096kB)
>
>> To be predictable for parsers, the patch adds the notion of reporting on VMA
>> attributes by appending one or more fields that look like "(attribute)". This
>> already happens when a file is deleted and the user sees (deleted) after the
>> filename. The expectation is that existing parsers will not break as those
>> that read the filename should be reading forward after the inode number
>> and stopping when it sees something that is not part of the filename.
>> Parsers that assume everything after / is a filename will get confused by
>> (hpagesize=XkB) but are already broken due to (deleted).
>
> Looks like procps will start showing hpagesize tag as a mapping name
> (apologies for pasting crappy code):

Administrator expect mapping name is just file name when vma is
hugepage via mmap.
So, I feel Mel's code is nicer.

Thanks.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps
  2008-10-04 22:13   ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Alexey Dobriyan
  2008-10-05  6:00     ` KOSAKI Motohiro
@ 2008-10-06 10:09     ` Mel Gorman
  1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2008-10-06 10:09 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: akpm, kosaki.motohiro, dave, linux-mm, linux-kernel

On (05/10/08 02:13), Alexey Dobriyan didst pronounce:
> On Fri, Oct 03, 2008 at 05:46:55PM +0100, Mel Gorman wrote:
> > This patch adds a new field for hugepage-backed memory regions to show the
> > pagesize in /proc/pid/maps.  While the information is available in smaps,
> > maps is more human-readable and does not incur the cost of calculating Pss. An
> > example of a /proc/self/maps output for an application using hugepages with
> > this patch applied is;
> > 
> > 08048000-0804c000 r-xp 00000000 03:01 49135      /bin/cat
> > 0804c000-0804d000 rw-p 00003000 03:01 49135      /bin/cat
> > 08400000-08800000 rw-p 00000000 00:10 4055       /mnt/libhugetlbfs.tmp.QzPPTJ (deleted) (hpagesize=4096kB)
> 
> > To be predictable for parsers, the patch adds the notion of reporting on VMA
> > attributes by appending one or more fields that look like "(attribute)". This
> > already happens when a file is deleted and the user sees (deleted) after the
> > filename. The expectation is that existing parsers will not break as those
> > that read the filename should be reading forward after the inode number
> > and stopping when it sees something that is not part of the filename.
> > Parsers that assume everything after / is a filename will get confused by
> > (hpagesize=XkB) but are already broken due to (deleted).
> 
> Looks like procps will start showing hpagesize tag as a mapping name
> (apologies for pasting crappy code):
> 

Looks that way. How about....

>From 0bb7a585e9c62efc675110fe50583113ded83ff5 Mon Sep 17 00:00:00 2001
From: Mel Gorman <mel@csn.ul.ie>
Date: Mon, 6 Oct 2008 10:40:33 +0100
Subject: [PATCH 1/1] procps: Strip attributes from filenames in the output of pmap

It is possible that additional attributes about a file are printed in
/proc/PID/maps such as the pagesize used to back a hugetlbfs mapping. It
is not expected that this be printed in the output of pmap. This patch
strips all attributes but (deleted) from the output of pmap. (deleted)
is left as it was historically displayed.

Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
 pmap.c |   57 +++++++++++++++++++++++++++++++++++++++++++--------------
 1 files changed, 43 insertions(+), 14 deletions(-)

diff --git a/pmap.c b/pmap.c
index a46c696..a56ee94 100644
--- a/pmap.c
+++ b/pmap.c
@@ -17,6 +17,7 @@
 #include <fcntl.h>
 #include <string.h>
 #include <unistd.h>
+#include <errno.h>
 
 #include <sys/ipc.h>
 #include <sys/shm.h>
@@ -93,8 +94,11 @@ out_destroy:
 }
 
 
-static const char *mapping_name(proc_t *p, unsigned KLONG addr, unsigned KLONG len, const char *mapbuf, unsigned showpath, unsigned dev_major, unsigned dev_minor, unsigned long long inode){
-  const char *cp;
+static const char *mapping_name(proc_t *p, unsigned KLONG addr, unsigned KLONG len, char *mapbuf, unsigned showpath, unsigned dev_major, unsigned dev_minor, unsigned long long inode){
+
+  char *cp;
+  char *cpfull;
+  const char *anon_cp;
 
   if(!dev_major && dev_minor==shm_minor && strstr(mapbuf,"/SYSV")){
     static char shmbuf[64];
@@ -102,21 +106,46 @@ static const char *mapping_name(proc_t *p, unsigned KLONG addr, unsigned KLONG l
     return shmbuf;
   }
 
-  cp = strrchr(mapbuf,'/');
-  if(cp){
-    if(showpath) return strchr(mapbuf,'/');
-    return cp[1] ? cp+1 : cp;
-  }
+  cpfull = strchr(mapbuf,'/');
+  if(cpfull){
+    struct stat statbuf;
+
+    /*
+     * Strip out attributes from the filename. Attributes can be printed
+     * after a filename like (attribute[=value]) and we don't print them
+     * out here with the exception of (deleted). We use stat() to determine
+     * if something is part of the filename or an attribute
+     */
+    while (stat(cpfull, &statbuf) == -1 && errno == ENOENT){
+      cp = strrchr(cpfull,'(');
+
+      /* Stop if there are no other attributes */
+      if (!cp || strchr(cp,')') == NULL)
+        break;
+
+      /* If the attribute looks like deleted, just stop and leave (deleted) */
+      if (cp && !strncmp(cp+1, "deleted", 7))
+	break;
+
+      /* Move back to see if this looks like an attribute */
+      if (--cp <= cpfull)
+        break;
+
+      /* If this looks like an attribute, remove it */
+      if (cp[0] == ' ')
+        *cp = '\0';
+    }
+
+    if(showpath)
+      return cpfull;
 
-  cp = strchr(mapbuf,'/');
-  if(cp){
-    if(showpath) return cp;
-    return strrchr(cp,'/') + 1;  // it WILL succeed
+    cp = strrchr(cpfull,'/');
+    return cp[1] ? cp+1 : cp;
   }
 
-  cp = "  [ anon ]";
-  if( (p->start_stack >= addr) && (p->start_stack <= addr+len) )  cp = "  [ stack ]";
-  return cp;
+  anon_cp = "  [ anon ]";
+  if( (p->start_stack >= addr) && (p->start_stack <= addr+len) ) anon_cp = "  [ stack ]";
+  return anon_cp;
 }
 
 static int one_proc(proc_t *p){
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps
  2008-10-03 16:46 ` [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps Mel Gorman
@ 2008-10-08 21:38   ` Alexey Dobriyan
  2008-10-09  2:16     ` KOSAKI Motohiro
  2008-10-09 10:24     ` Mel Gorman
  0 siblings, 2 replies; 19+ messages in thread
From: Alexey Dobriyan @ 2008-10-08 21:38 UTC (permalink / raw)
  To: Mel Gorman; +Cc: akpm, kosaki.motohiro, dave, linux-mm, linux-kernel

On Fri, Oct 03, 2008 at 05:46:54PM +0100, Mel Gorman wrote:
> It is useful to verify a hugepage-aware application is using the expected
> pagesizes for its memory regions. This patch creates an entry called
> KernelPageSize in /proc/pid/smaps that is the size of page used by the
> kernel to back a VMA. The entry is not called PageSize as it is possible
> the MMU uses a different size. This extension should not break any sensible
> parser that skips lines containing unrecognised information.

> +		   "KernelPageSize: %8lu kB\n",

> +unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
> +{
> +	struct hstate *hstate;
> +
> +	if (!is_vm_hugetlb_page(vma))
> +		return PAGE_SIZE;
> +
> +	hstate = hstate_vma(vma);
> +	VM_BUG_ON(!hstate);
> +
> +	return 1UL << (hstate->order + PAGE_SHIFT);
			    ^^^^
VM_BUG_ON is unneeded because kernel will oops here if hstate is NULL.

Also, in /proc/*/maps it's printed only for hugetlb vmas and called
hpagesize, in smaps it's printed for every vma and called
KernelPageSize. All of this is inconsistent.

And app will verify once that hugepages are of right size, so Pss cost
argument for changing /proc/*/maps seems weak to me.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps
  2008-10-08 21:38   ` Alexey Dobriyan
@ 2008-10-09  2:16     ` KOSAKI Motohiro
  2008-10-09 10:24     ` Mel Gorman
  1 sibling, 0 replies; 19+ messages in thread
From: KOSAKI Motohiro @ 2008-10-09  2:16 UTC (permalink / raw)
  To: Alexey Dobriyan
  Cc: kosaki.motohiro, Mel Gorman, akpm, dave, linux-mm, linux-kernel

Hi

> > It is useful to verify a hugepage-aware application is using the expected
> > pagesizes for its memory regions. This patch creates an entry called
> > KernelPageSize in /proc/pid/smaps that is the size of page used by the
> > kernel to back a VMA. The entry is not called PageSize as it is possible
> > the MMU uses a different size. This extension should not break any sensible
> > parser that skips lines containing unrecognised information.
> 
> > +		   "KernelPageSize: %8lu kB\n",
> 
> > +unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
> > +{
> > +	struct hstate *hstate;
> > +
> > +	if (!is_vm_hugetlb_page(vma))
> > +		return PAGE_SIZE;
> > +
> > +	hstate = hstate_vma(vma);
> > +	VM_BUG_ON(!hstate);
> > +
> > +	return 1UL << (hstate->order + PAGE_SHIFT);
> 			    ^^^^
> VM_BUG_ON is unneeded because kernel will oops here if hstate is NULL.

yup.


> Also, in /proc/*/maps it's printed only for hugetlb vmas and called
> hpagesize, in smaps it's printed for every vma and called
> KernelPageSize. All of this is inconsistent.

Is this a problem?
/proc/*/maps and /proc/*/smaps are different purpose file.

/proc/*/maps:  summary & suppressed information & easy readable
/proc/*/smaps: verbose output

Already some information output only smaps.


> And app will verify once that hugepages are of right size, so Pss cost
> argument for changing /proc/*/maps seems weak to me.

sorry, I don't understand yet.
Why pss cost changed?




^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps
  2008-10-08 21:38   ` Alexey Dobriyan
  2008-10-09  2:16     ` KOSAKI Motohiro
@ 2008-10-09 10:24     ` Mel Gorman
  1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2008-10-09 10:24 UTC (permalink / raw)
  To: Alexey Dobriyan; +Cc: akpm, kosaki.motohiro, dave, linux-mm, linux-kernel

On (09/10/08 01:38), Alexey Dobriyan didst pronounce:
> On Fri, Oct 03, 2008 at 05:46:54PM +0100, Mel Gorman wrote:
> > It is useful to verify a hugepage-aware application is using the expected
> > pagesizes for its memory regions. This patch creates an entry called
> > KernelPageSize in /proc/pid/smaps that is the size of page used by the
> > kernel to back a VMA. The entry is not called PageSize as it is possible
> > the MMU uses a different size. This extension should not break any sensible
> > parser that skips lines containing unrecognised information.
> 
> > +		   "KernelPageSize: %8lu kB\n",
> 
> > +unsigned long vma_kernel_pagesize(struct vm_area_struct *vma)
> > +{
> > +	struct hstate *hstate;
> > +
> > +	if (!is_vm_hugetlb_page(vma))
> > +		return PAGE_SIZE;
> > +
> > +	hstate = hstate_vma(vma);
> > +	VM_BUG_ON(!hstate);
> > +
> > +	return 1UL << (hstate->order + PAGE_SHIFT);
> 			    ^^^^
> VM_BUG_ON is unneeded because kernel will oops here if hstate is NULL.
> 

Ok, will drop it. I used the VM_BUG_ON so if the situation was triggered,
it would come with line numbers but it'll be an obvious oops so I guess it
is redundant.

> Also, in /proc/*/maps it's printed only for hugetlb vmas and called
> hpagesize,

Well yes... because it's a huge pagesize for that VMA. The name reflects
what is being described there.

> in smaps it's printed for every vma and called
> KernelPageSize. All of this is inconsistent.
> 

In smaps, we are printing for every VMA because it's easier for parsers to
deal with the presense of information than its absense. The name KernelPageSize
there is an accurate description.

I don't feel it is inconsistent.

> And app will verify once that hugepages are of right size, so Pss cost
> argument for changing /proc/*/maps seems weak to me.
> 

Lets say someone wanted to monitor an application to see what its use of
hugepages were over time, they would have to constantly incur the PSS
cost to do that which seems a bit unfair.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] Report the size of pages backing VMAs in /proc V3
  2008-10-20 18:07   ` Albert Cahalan
@ 2008-10-22  9:41     ` Mel Gorman
  0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2008-10-22  9:41 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: adobriyan, kosaki.motohiro, linux-kernel

On Mon, Oct 20, 2008 at 02:07:28PM -0400, Albert Cahalan wrote:
> On Mon, Oct 20, 2008 at 6:06 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> > On (20/10/08 05:18), Albert Cahalan didst pronounce:
> 
> >> Looping on stat() while chopping off suspected tags is dreadful.
> >> Besides just being gross, it's slow.
> >
> > You're probably right. It's a bit weird that it's what you have to do to
> > figure out if the file in /proc/PID/maps is really there or not.
> 
> Actually you can't do this, because of directory permissions.
> 

Good point.

> >> Obviously, every author of a /proc-based tool has been forced to
> >> take a random guess at the ABI. The /proc/*/smaps is so gross and
> >> that I put off writing a parser for years.
> >
> > I intend to take a stab at it for the purposes of teaching pmap to print
> > the pagesizes if the smaps change gets picked up.
> 
> FYI, "KernelPageSize" is at least unique under the perfect
> hash function I'm using to parse the damn smaps file.
> 
> hash = ( ( (s[8]&15) + (s[1]&15) ) ^ (s[0]&3) ) & 31;
> 

Good to know.

> I have to wonder if we'll be getting mixed page sizes
> within a single mapping, making such info unusable.
> 

It's not planned right now, but even if it is, KernelPageSize would
remain as the intended page size. VMAs would either split around each
mixed page size in which case there will be separate VMAs or an
additional field will be added that indicates what number of each
pagesize makes up the mapping.

> >> Right before the filename, you can add anything except a '/'.
> >> You could add a few columns of numbers or a second flags field.
> >
> > My fear was about parsers that hard-coded what number field stored the
> > filename. If a column was added for pagesize for example, then parsers
> > would think the pagesize was the filename.
> 
> It's possible. Every parser I've examined does strchr()
> or similar to find that '/' character.
> 

I might be the only criminal. A mucky shell script used awk to display
field X and everything past it to find the filename. A more rational
person would have used strchr or found the first / with cut or similar.

> Maybe try some dummy patches in a linux-next kernel?
> Give each one a month. You could do "xyz" concatenated
> to the flags, a second "rwx" concatenated to the flags,
> a single column of "0" before the filename, and several
> columns of "parsertest" before the filename.
> 

That sounds reasonable.

> > Now, that is an interested idea, albeit it's not one that is easily
> > human-readable and would need a second parser like pmap but that's ok. If
> > parsing smaps turns into a total pain in the ass
> 
> I assure you that parsing smaps is a total pain in the ass,
> especially if you want tolerable performance. Something
> like "top" is not viable if it performs like a Python script.
> 

I had assumed that smaps + performance were mutually exclusive because
of the PSS calculation and any active monitoring from something like top
would blow bigtime. That's why I tried modifying maps as well.

> >> BTW, I'm thinking that the /proc/*/*maps files fail when the
> >> lines exceed 4096 bytes. The pathname may legitimately be that
> >> long, plus it can be backslash escaped, plus there is all the
> >> junk on the beginning.
> >
> > Yes. While it's unlikely to be exceeded, a file could be 4096 bytes long
> > and the other fields will then cause a problem. It was because of things
> > like this, I was ok with dropping the idea of adding (attribute[=value])
> > from the end of the filename.
> 
> "unlikely" is not something one should trust. I think you
> can even get a name longer than 4096 bytes if you make
> directories relative to the current directory and keep
> changing directories as you make the directories.
> Then double that with backslashes becoming \\ or
> newlines becoming \n (must be escaped) in the output.
> 
> I think /proc/*/maps has been broken ever since it was
> converted to seq_file, and maybe ever since it got filenames.
> Prior to the filenames, lines were fixed-width records.
> 

You could be right. Only one way to find out for sure really.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] Report the size of pages backing VMAs in /proc V3
  2008-10-20 10:06 ` Mel Gorman
@ 2008-10-20 18:07   ` Albert Cahalan
  2008-10-22  9:41     ` Mel Gorman
  0 siblings, 1 reply; 19+ messages in thread
From: Albert Cahalan @ 2008-10-20 18:07 UTC (permalink / raw)
  To: Mel Gorman; +Cc: adobriyan, kosaki.motohiro, linux-kernel

On Mon, Oct 20, 2008 at 6:06 AM, Mel Gorman <mel@csn.ul.ie> wrote:
> On (20/10/08 05:18), Albert Cahalan didst pronounce:

>> Looping on stat() while chopping off suspected tags is dreadful.
>> Besides just being gross, it's slow.
>
> You're probably right. It's a bit weird that it's what you have to do to
> figure out if the file in /proc/PID/maps is really there or not.

Actually you can't do this, because of directory permissions.

>> Obviously, every author of a /proc-based tool has been forced to
>> take a random guess at the ABI. The /proc/*/smaps is so gross and
>> that I put off writing a parser for years.
>
> I intend to take a stab at it for the purposes of teaching pmap to print
> the pagesizes if the smaps change gets picked up.

FYI, "KernelPageSize" is at least unique under the perfect
hash function I'm using to parse the damn smaps file.

hash = ( ( (s[8]&15) + (s[1]&15) ) ^ (s[0]&3) ) & 31;

I have to wonder if we'll be getting mixed page sizes
within a single mapping, making such info unusable.

>> Right before the filename, you can add anything except a '/'.
>> You could add a few columns of numbers or a second flags field.
>
> My fear was about parsers that hard-coded what number field stored the
> filename. If a column was added for pagesize for example, then parsers
> would think the pagesize was the filename.

It's possible. Every parser I've examined does strchr()
or similar to find that '/' character.

Maybe try some dummy patches in a linux-next kernel?
Give each one a month. You could do "xyz" concatenated
to the flags, a second "rwx" concatenated to the flags,
a single column of "0" before the filename, and several
columns of "parsertest" before the filename.

> Now, that is an interested idea, albeit it's not one that is easily
> human-readable and would need a second parser like pmap but that's ok. If
> parsing smaps turns into a total pain in the ass

I assure you that parsing smaps is a total pain in the ass,
especially if you want tolerable performance. Something
like "top" is not viable if it performs like a Python script.

>> BTW, I'm thinking that the /proc/*/*maps files fail when the
>> lines exceed 4096 bytes. The pathname may legitimately be that
>> long, plus it can be backslash escaped, plus there is all the
>> junk on the beginning.
>
> Yes. While it's unlikely to be exceeded, a file could be 4096 bytes long
> and the other fields will then cause a problem. It was because of things
> like this, I was ok with dropping the idea of adding (attribute[=value])
> from the end of the filename.

"unlikely" is not something one should trust. I think you
can even get a name longer than 4096 bytes if you make
directories relative to the current directory and keep
changing directories as you make the directories.
Then double that with backslashes becoming \\ or
newlines becoming \n (must be escaped) in the output.

I think /proc/*/maps has been broken ever since it was
converted to seq_file, and maybe ever since it got filenames.
Prior to the filenames, lines were fixed-width records.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] Report the size of pages backing VMAs in /proc V3
  2008-10-20  9:18 Albert Cahalan
@ 2008-10-20 10:06 ` Mel Gorman
  2008-10-20 18:07   ` Albert Cahalan
  0 siblings, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2008-10-20 10:06 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: adobriyan, kosaki.motohiro, linux-kernel

On (20/10/08 05:18), Albert Cahalan didst pronounce:
> Adding " (hpagesize=4096kB)" onto the end of a filename is as vile
> as adding " (deleted)" onto the end. If anything is going to change
> in this area, it should be the elimination of " (deleted)". These
> tags are perfectly legitimate in filenames.
> 

I dropped that change altogether in the last series because of concerns like
this. It did have the potential to grow to something weird looking.

> Looping on stat() while chopping off suspected tags is dreadful.
> Besides just being gross, it's slow.
> 

You're probably right. It's a bit weird that it's what you have to do to
figure out if the file in /proc/PID/maps is really there or not.

> gdb will tolerate up to 7 flags, procps will tolerate up to 31 flags,
> and both will tolerate anything without a '/' before the filename.
> 

Understood.

> Obviously, every author of a /proc-based tool has been forced to
> take a random guess at the ABI. The /proc/*/smaps is so gross and
> that I put off writing a parser for years.
> 

I intend to take a stab at it for the purposes of teaching pmap to print
the pagesizes if the smaps change gets picked up.

> What you can probably get away with:
> 
> After the "rwxp" stuff you can add 3 more flags. (gdb limit)
> You could use 'L' for locked pages, 'R' for swap reservation,
> and 'D' for deleted files. It's probably much better to save
> space though, since gdb will crash if you add too many flags.
> Three characters can be 18 bits if you base-64 encode them,
> being careful to avoid the '/' character. (adding 0x30 works)
> 

Ok, noted in case I ever decide to tackle the (deleted) removal. It's
not something I feel strongly about though.

> Right before the filename, you can add anything except a '/'.
> You could add a few columns of numbers or a second flags field.
> 

My fear was about parsers that hard-coded what number field stored the
filename. If a column was added for pagesize for example, then parsers
would think the pagesize was the filename.

> Not that it matters on such a slow-ass file format, but you
> can make parsing faster if you encode the page size in one byte.
> Simply add 0x30 to the page shift, then print that byte. Note that
> this would let you cram the page size into the flags field.
> 

Now, that is an interested idea, albeit it's not one that is easily
human-readable and would need a second parser like pmap but that's ok. If
parsing smaps turns into a total pain in the ass or the performance overhead
of calculating PSS when reading the pagesize becomes a problem, then I'll
try this option. Thanks a lot for that idea.

> BTW, I'm thinking that the /proc/*/*maps files fail when the
> lines exceed 4096 bytes. The pathname may legitimately be that
> long, plus it can be backslash escaped, plus there is all the
> junk on the beginning.
> 

Yes. While it's unlikely to be exceeded, a file could be 4096 bytes long
and the other fields will then cause a problem. It was because of things
like this, I was ok with dropping the idea of adding (attribute[=value])
from the end of the filename.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/2] Report the size of pages backing VMAs in /proc V3
@ 2008-10-20  9:18 Albert Cahalan
  2008-10-20 10:06 ` Mel Gorman
  0 siblings, 1 reply; 19+ messages in thread
From: Albert Cahalan @ 2008-10-20  9:18 UTC (permalink / raw)
  To: Mel Gorman, adobriyan, kosaki.motohiro, linux-kernel

Adding " (hpagesize=4096kB)" onto the end of a filename is as vile
as adding " (deleted)" onto the end. If anything is going to change
in this area, it should be the elimination of " (deleted)". These
tags are perfectly legitimate in filenames.

Looping on stat() while chopping off suspected tags is dreadful.
Besides just being gross, it's slow.

gdb will tolerate up to 7 flags, procps will tolerate up to 31 flags,
and both will tolerate anything without a '/' before the filename.

Obviously, every author of a /proc-based tool has been forced to
take a random guess at the ABI. The /proc/*/smaps is so gross and
that I put off writing a parser for years.

What you can probably get away with:

After the "rwxp" stuff you can add 3 more flags. (gdb limit)
You could use 'L' for locked pages, 'R' for swap reservation,
and 'D' for deleted files. It's probably much better to save
space though, since gdb will crash if you add too many flags.
Three characters can be 18 bits if you base-64 encode them,
being careful to avoid the '/' character. (adding 0x30 works)

Right before the filename, you can add anything except a '/'.
You could add a few columns of numbers or a second flags field.

Not that it matters on such a slow-ass file format, but you
can make parsing faster if you encode the page size in one byte.
Simply add 0x30 to the page shift, then print that byte. Note that
this would let you cram the page size into the flags field.

BTW, I'm thinking that the /proc/*/*maps files fail when the
lines exceed 4096 bytes. The pathname may legitimately be that
long, plus it can be backslash escaped, plus there is all the
junk on the beginning.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 0/2] Report the size of pages backing VMAs in /proc V3
@ 2008-10-16 15:58 Mel Gorman
  0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2008-10-16 15:58 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Alexey Dobriyan, Dave Hansen, KOSAKI Motohiro, Linux-MM, LKML,
	Mel Gorman

The following two patches add support for printing the size of pages used by
the kernel and the MMU to back VMAs. This can be used by a user to verify
that a hugepage-aware application is using the expected page sizes.

The first patch prints the size of page used by the kernel when allocating
pages for a VMA in /proc/pid/smaps. The second patch reports on
the size of page used by the MMU as it can differ - for example on POWER
using 64K as a base pagesize on older processors.

Changelog since V2
  o Drop changes to /proc/pid/maps - could not get agreement and it affects
    procps. Patch to procps was posted but fell into silence. Dropping
    patch as smaps gives the necessary information, just with a bit more
    legwork by the user
  o Drop redundant VM_BUG_ON (Alexey)

Changelog since V1
  o Fix build failure on !CONFIG_HUGETLB_PAGE
  o Uninline helper functions
  o Distinguish between base pagesize and MMU pagesize

 arch/powerpc/include/asm/hugetlb.h |    6 ++++++
 arch/powerpc/mm/hugetlbpage.c      |    7 +++++++
 fs/proc/task_mmu.c                 |    8 ++++++--
 include/linux/hugetlb.h            |    6 ++++++
 mm/hugetlb.c                       |   29 +++++++++++++++++++++++++++++
 5 files changed, 54 insertions(+), 2 deletions(-)


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-10-22  9:44 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-10-03 16:46 [PATCH 0/2] Report the size of pages backing VMAs in /proc V3 Mel Gorman
2008-10-03 16:46 ` [PATCH 1/2] Report the pagesize backing a VMA in /proc/pid/smaps Mel Gorman
2008-10-08 21:38   ` Alexey Dobriyan
2008-10-09  2:16     ` KOSAKI Motohiro
2008-10-09 10:24     ` Mel Gorman
2008-10-03 16:46 ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Mel Gorman
2008-10-04  8:14   ` KOSAKI Motohiro
2008-10-04 12:04   ` [RFC PATCH] Report the shmid backing a VMA in maps KOSAKI Motohiro
2008-10-04 12:07     ` KOSAKI Motohiro
2008-10-04 21:52     ` Alexey Dobriyan
2008-10-05  5:48       ` KOSAKI Motohiro
2008-10-04 22:13   ` [PATCH 2/2] Report the pagesize backing a VMA in /proc/pid/maps Alexey Dobriyan
2008-10-05  6:00     ` KOSAKI Motohiro
2008-10-06 10:09     ` Mel Gorman
2008-10-16 15:58 [PATCH 0/2] Report the size of pages backing VMAs in /proc V3 Mel Gorman
2008-10-20  9:18 Albert Cahalan
2008-10-20 10:06 ` Mel Gorman
2008-10-20 18:07   ` Albert Cahalan
2008-10-22  9:41     ` Mel Gorman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).