All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RESEND 0/5] mm, shmem: Enhance per-process accounting of shared memory
@ 2014-07-22 13:43 ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even
though theirs implication on memory usage are quite different: at
reclaim, file mapping can be dropped or write back on disk while shmem
needs a place in swap. As for shmem pages that are swapped-out or in
swap cache, they aren't accounted at all.

This series addresses these issues by adding new fields to status and
smaps file in /proc/<pid>/. The accounting of resident shared memory is
made in the same way as it's currently done for resident memory and
general swap (a counter in mm_rss_stat), but this approach proved
impractical for paged-out shared memory (it would requires a rmap walk
each time a page is paged-in).

/proc/<pid>/smaps also lacks proper accounting of shared memory since
shmem subsystem hides all implementation detail to generic mm code.
This series adds the shmem_locate() function that returns the location
of a particular page (resident, in swap or swap cache). Called from
smaps code, it allows to show more detailled accounting of shmem
mappings in smaps.

Patch 1 adds a counter to keep track of resident shmem memory.
Patch 2 adds a function to allow generic code to know the physical
location of a shmem page.
Patch 3 adds simple helper function.
Patch 4 accounts swapped-out shmem in /proc/<pid>/status.
Patch 5 adds shmem specific fields to /proc/<pid>/smaps.

Thanks,
Jerome

Jerome Marchand (5):
  mm, shmem: Add shmem resident memory accounting
  mm, shmem: Add shmem_locate function
  mm, shmem: Add shmem_vma() helper
  mm, shmem: Add shmem swap memory accounting
  mm, shmem: Show location of non-resident shmem pages in smaps

 Documentation/filesystems/proc.txt |  15 ++++
 arch/s390/mm/pgtable.c             |   2 +-
 fs/proc/task_mmu.c                 | 139 +++++++++++++++++++++++++++++++++++--
 include/linux/mm.h                 |  20 ++++++
 include/linux/mm_types.h           |   7 +-
 kernel/events/uprobes.c            |   2 +-
 mm/filemap_xip.c                   |   2 +-
 mm/memory.c                        |  37 ++++++++--
 mm/rmap.c                          |   8 +--
 mm/shmem.c                         |  37 ++++++++++
 10 files changed, 249 insertions(+), 20 deletions(-)

-- 
1.9.3


^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH RESEND 0/5] mm, shmem: Enhance per-process accounting of shared memory
@ 2014-07-22 13:43 ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even
though theirs implication on memory usage are quite different: at
reclaim, file mapping can be dropped or write back on disk while shmem
needs a place in swap. As for shmem pages that are swapped-out or in
swap cache, they aren't accounted at all.

This series addresses these issues by adding new fields to status and
smaps file in /proc/<pid>/. The accounting of resident shared memory is
made in the same way as it's currently done for resident memory and
general swap (a counter in mm_rss_stat), but this approach proved
impractical for paged-out shared memory (it would requires a rmap walk
each time a page is paged-in).

/proc/<pid>/smaps also lacks proper accounting of shared memory since
shmem subsystem hides all implementation detail to generic mm code.
This series adds the shmem_locate() function that returns the location
of a particular page (resident, in swap or swap cache). Called from
smaps code, it allows to show more detailled accounting of shmem
mappings in smaps.

Patch 1 adds a counter to keep track of resident shmem memory.
Patch 2 adds a function to allow generic code to know the physical
location of a shmem page.
Patch 3 adds simple helper function.
Patch 4 accounts swapped-out shmem in /proc/<pid>/status.
Patch 5 adds shmem specific fields to /proc/<pid>/smaps.

Thanks,
Jerome

Jerome Marchand (5):
  mm, shmem: Add shmem resident memory accounting
  mm, shmem: Add shmem_locate function
  mm, shmem: Add shmem_vma() helper
  mm, shmem: Add shmem swap memory accounting
  mm, shmem: Show location of non-resident shmem pages in smaps

 Documentation/filesystems/proc.txt |  15 ++++
 arch/s390/mm/pgtable.c             |   2 +-
 fs/proc/task_mmu.c                 | 139 +++++++++++++++++++++++++++++++++++--
 include/linux/mm.h                 |  20 ++++++
 include/linux/mm_types.h           |   7 +-
 kernel/events/uprobes.c            |   2 +-
 mm/filemap_xip.c                   |   2 +-
 mm/memory.c                        |  37 ++++++++--
 mm/rmap.c                          |   8 +--
 mm/shmem.c                         |  37 ++++++++++
 10 files changed, 249 insertions(+), 20 deletions(-)

-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/5] mm, shmem: Add shmem resident memory accounting
  2014-07-22 13:43 ` Jerome Marchand
@ 2014-07-22 13:43   ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Currently looking at /proc/<pid>/status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem
pages are mapped to /dev/zero), even though their implication in
actual memory use is quite different.
This patch adds MM_SHMEMPAGES counter to mm_rss_stat. It keeps track of
resident shmem memory size. Its value is exposed in the new VmShm line
of /proc/<pid>/status.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 Documentation/filesystems/proc.txt |  2 ++
 arch/s390/mm/pgtable.c             |  2 +-
 fs/proc/task_mmu.c                 |  9 ++++++---
 include/linux/mm.h                 |  7 +++++++
 include/linux/mm_types.h           |  7 ++++---
 kernel/events/uprobes.c            |  2 +-
 mm/filemap_xip.c                   |  2 +-
 mm/memory.c                        | 37 +++++++++++++++++++++++++++++++------
 mm/rmap.c                          |  8 ++++----
 9 files changed, 57 insertions(+), 19 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index ddc531a..1c49957 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -171,6 +171,7 @@ read the file /proc/PID/status:
   VmLib:      1412 kB
   VmPTE:        20 kb
   VmSwap:        0 kB
+  VmShm:         0 kB
   Threads:        1
   SigQ:   0/28578
   SigPnd: 0000000000000000
@@ -228,6 +229,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmLib                       size of shared library code
  VmPTE                       size of page table entries
  VmSwap                      size of swap usage (the number of referred swapents)
+ VmShm	                      size of resident shmem memory
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
  SigPnd                      bitmap of pending signals for the thread
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 37b8241..9fe31b0 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -612,7 +612,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
 		if (PageAnon(page))
 			dec_mm_counter(mm, MM_ANONPAGES);
 		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_file_counters(mm, page);
 	}
 	free_swap_and_cache(entry);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..4e60751 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -21,7 +21,7 @@
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap;
+	unsigned long data, text, lib, swap, shmem;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
 	/*
@@ -42,6 +42,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
 	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
 	swap = get_mm_counter(mm, MM_SWAPENTS);
+	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
 	seq_printf(m,
 		"VmPeak:\t%8lu kB\n"
 		"VmSize:\t%8lu kB\n"
@@ -54,7 +55,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmExe:\t%8lu kB\n"
 		"VmLib:\t%8lu kB\n"
 		"VmPTE:\t%8lu kB\n"
-		"VmSwap:\t%8lu kB\n",
+		"VmSwap:\t%8lu kB\n"
+		"VmShm:\t%8lu kB\n",
 		hiwater_vm << (PAGE_SHIFT-10),
 		total_vm << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
@@ -65,7 +67,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
 		(PTRS_PER_PTE * sizeof(pte_t) *
 		 atomic_long_read(&mm->nr_ptes)) >> 10,
-		swap << (PAGE_SHIFT-10));
+		swap << (PAGE_SHIFT-10),
+		shmem << (PAGE_SHIFT-10));
 }
 
 unsigned long task_vsize(struct mm_struct *mm)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e03dd29..e69ee9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1299,6 +1299,13 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
 	atomic_long_dec(&mm->rss_stat.count[member]);
 }
 
+static inline void dec_mm_file_counters(struct mm_struct *mm, struct page *page)
+{
+	dec_mm_counter(mm, MM_FILEPAGES);
+	if (PageSwapBacked(page))
+		dec_mm_counter(mm, MM_SHMEMPAGES);
+}
+
 static inline unsigned long get_mm_rss(struct mm_struct *mm)
 {
 	return get_mm_counter(mm, MM_FILEPAGES) +
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21bff4b..e0307c8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -322,9 +322,10 @@ struct core_state {
 };
 
 enum {
-	MM_FILEPAGES,
-	MM_ANONPAGES,
-	MM_SWAPENTS,
+	MM_FILEPAGES,	/* Resident file mapping pages (includes /dev/zero) */
+	MM_ANONPAGES,	/* Resident anonymous pages */
+	MM_SWAPENTS,	/* Anonymous swap entries */
+	MM_SHMEMPAGES,	/* Resident shared memory pages */
 	NR_MM_COUNTERS
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1d0af8a..6c28c72 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	lru_cache_add_active_or_unevictable(kpage, vma);
 
 	if (!PageAnon(page)) {
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_file_counters(mm, page);
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index d8d9fe3..4bd4836 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,7 +194,7 @@ retry:
 			flush_cache_page(vma, address, pte_pfn(*pte));
 			pteval = ptep_clear_flush(vma, address, pte);
 			page_remove_rmap(page);
-			dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_file_counters(mm, page);
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
 			/* must invalidate_page _before_ freeing the page */
diff --git a/mm/memory.c b/mm/memory.c
index eb37dfb..39820ed 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -176,6 +176,20 @@ static void check_sync_rss_stat(struct task_struct *task)
 
 #endif /* SPLIT_RSS_COUNTING */
 
+static void inc_mm_file_counters_fast(struct mm_struct *mm, struct page *page)
+{
+	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	if (PageSwapBacked(page))
+		inc_mm_counter_fast(mm, MM_SHMEMPAGES);
+}
+
+static void dec_mm_file_counters_fast(struct mm_struct *mm, struct page *page)
+{
+	dec_mm_counter_fast(mm, MM_FILEPAGES);
+	if (PageSwapBacked(page))
+		dec_mm_counter_fast(mm, MM_SHMEMPAGES);
+}
+
 #ifdef HAVE_GENERIC_MMU_GATHER
 
 static int tlb_next_batch(struct mmu_gather *tlb)
@@ -832,8 +846,11 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 				if (PageAnon(page))
 					rss[MM_ANONPAGES]++;
-				else
+				else {
 					rss[MM_FILEPAGES]++;
+					if (PageSwapBacked(page))
+						rss[MM_SHMEMPAGES]++;
+				}
 
 				if (is_write_migration_entry(entry) &&
 				    is_cow_mapping(vm_flags)) {
@@ -875,8 +892,11 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		page_dup_rmap(page);
 		if (PageAnon(page))
 			rss[MM_ANONPAGES]++;
-		else
+		else {
 			rss[MM_FILEPAGES]++;
+			if (PageSwapBacked(page))
+				rss[MM_SHMEMPAGES]++;
+		}
 	}
 
 out_set_pte:
@@ -1140,6 +1160,8 @@ again:
 				    likely(!(vma->vm_flags & VM_SEQ_READ)))
 					mark_page_accessed(page);
 				rss[MM_FILEPAGES]--;
+				if (PageSwapBacked(page))
+					rss[MM_SHMEMPAGES]--;
 			}
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
@@ -1171,8 +1193,11 @@ again:
 
 				if (PageAnon(page))
 					rss[MM_ANONPAGES]--;
-				else
+				else {
 					rss[MM_FILEPAGES]--;
+					if (PageSwapBacked(page))
+						rss[MM_SHMEMPAGES]--;
+				}
 			}
 			if (unlikely(!free_swap_and_cache(entry)))
 				print_bad_pte(vma, addr, ptent, NULL);
@@ -1495,7 +1520,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
 
 	/* Ok, finally just insert the thing.. */
 	get_page(page);
-	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	inc_mm_file_counters_fast(mm, page);
 	page_add_file_rmap(page);
 	set_pte_at(mm, addr, pte, mk_pte(page, prot));
 
@@ -2217,7 +2242,7 @@ gotten:
 	if (likely(pte_same(*page_table, orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				dec_mm_file_counters_fast(mm, old_page);
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
 		} else
@@ -2759,7 +2784,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, address);
 	} else {
-		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
+		inc_mm_file_counters_fast(vma->vm_mm, page);
 		page_add_file_rmap(page);
 	}
 	set_pte_at(vma->vm_mm, address, pte, entry);
diff --git a/mm/rmap.c b/mm/rmap.c
index 3e8491c..618240d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1168,7 +1168,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			if (PageAnon(page))
 				dec_mm_counter(mm, MM_ANONPAGES);
 			else
-				dec_mm_counter(mm, MM_FILEPAGES);
+				dec_mm_file_counters(mm, page);
 		}
 		set_pte_at(mm, address, pte,
 			   swp_entry_to_pte(make_hwpoison_entry(page)));
@@ -1181,7 +1181,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		if (PageAnon(page))
 			dec_mm_counter(mm, MM_ANONPAGES);
 		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_file_counters(mm, page);
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		pte_t swp_pte;
@@ -1225,7 +1225,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		entry = make_migration_entry(page, pte_write(pteval));
 		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
 	} else
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_file_counters(mm, page);
 
 	page_remove_rmap(page);
 	page_cache_release(page);
@@ -1376,7 +1376,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 
 		page_remove_rmap(page);
 		page_cache_release(page);
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_file_counters(mm, page);
 		(*mapcount)--;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/5] mm, shmem: Add shmem resident memory accounting
@ 2014-07-22 13:43   ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Currently looking at /proc/<pid>/status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem
pages are mapped to /dev/zero), even though their implication in
actual memory use is quite different.
This patch adds MM_SHMEMPAGES counter to mm_rss_stat. It keeps track of
resident shmem memory size. Its value is exposed in the new VmShm line
of /proc/<pid>/status.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 Documentation/filesystems/proc.txt |  2 ++
 arch/s390/mm/pgtable.c             |  2 +-
 fs/proc/task_mmu.c                 |  9 ++++++---
 include/linux/mm.h                 |  7 +++++++
 include/linux/mm_types.h           |  7 ++++---
 kernel/events/uprobes.c            |  2 +-
 mm/filemap_xip.c                   |  2 +-
 mm/memory.c                        | 37 +++++++++++++++++++++++++++++++------
 mm/rmap.c                          |  8 ++++----
 9 files changed, 57 insertions(+), 19 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index ddc531a..1c49957 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -171,6 +171,7 @@ read the file /proc/PID/status:
   VmLib:      1412 kB
   VmPTE:        20 kb
   VmSwap:        0 kB
+  VmShm:         0 kB
   Threads:        1
   SigQ:   0/28578
   SigPnd: 0000000000000000
@@ -228,6 +229,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmLib                       size of shared library code
  VmPTE                       size of page table entries
  VmSwap                      size of swap usage (the number of referred swapents)
+ VmShm	                      size of resident shmem memory
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
  SigPnd                      bitmap of pending signals for the thread
diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 37b8241..9fe31b0 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -612,7 +612,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
 		if (PageAnon(page))
 			dec_mm_counter(mm, MM_ANONPAGES);
 		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_file_counters(mm, page);
 	}
 	free_swap_and_cache(entry);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index cfa63ee..4e60751 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -21,7 +21,7 @@
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap;
+	unsigned long data, text, lib, swap, shmem;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
 	/*
@@ -42,6 +42,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
 	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
 	swap = get_mm_counter(mm, MM_SWAPENTS);
+	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
 	seq_printf(m,
 		"VmPeak:\t%8lu kB\n"
 		"VmSize:\t%8lu kB\n"
@@ -54,7 +55,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmExe:\t%8lu kB\n"
 		"VmLib:\t%8lu kB\n"
 		"VmPTE:\t%8lu kB\n"
-		"VmSwap:\t%8lu kB\n",
+		"VmSwap:\t%8lu kB\n"
+		"VmShm:\t%8lu kB\n",
 		hiwater_vm << (PAGE_SHIFT-10),
 		total_vm << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
@@ -65,7 +67,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
 		(PTRS_PER_PTE * sizeof(pte_t) *
 		 atomic_long_read(&mm->nr_ptes)) >> 10,
-		swap << (PAGE_SHIFT-10));
+		swap << (PAGE_SHIFT-10),
+		shmem << (PAGE_SHIFT-10));
 }
 
 unsigned long task_vsize(struct mm_struct *mm)
diff --git a/include/linux/mm.h b/include/linux/mm.h
index e03dd29..e69ee9d 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1299,6 +1299,13 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
 	atomic_long_dec(&mm->rss_stat.count[member]);
 }
 
+static inline void dec_mm_file_counters(struct mm_struct *mm, struct page *page)
+{
+	dec_mm_counter(mm, MM_FILEPAGES);
+	if (PageSwapBacked(page))
+		dec_mm_counter(mm, MM_SHMEMPAGES);
+}
+
 static inline unsigned long get_mm_rss(struct mm_struct *mm)
 {
 	return get_mm_counter(mm, MM_FILEPAGES) +
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 21bff4b..e0307c8 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -322,9 +322,10 @@ struct core_state {
 };
 
 enum {
-	MM_FILEPAGES,
-	MM_ANONPAGES,
-	MM_SWAPENTS,
+	MM_FILEPAGES,	/* Resident file mapping pages (includes /dev/zero) */
+	MM_ANONPAGES,	/* Resident anonymous pages */
+	MM_SWAPENTS,	/* Anonymous swap entries */
+	MM_SHMEMPAGES,	/* Resident shared memory pages */
 	NR_MM_COUNTERS
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 1d0af8a..6c28c72 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	lru_cache_add_active_or_unevictable(kpage, vma);
 
 	if (!PageAnon(page)) {
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_file_counters(mm, page);
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c
index d8d9fe3..4bd4836 100644
--- a/mm/filemap_xip.c
+++ b/mm/filemap_xip.c
@@ -194,7 +194,7 @@ retry:
 			flush_cache_page(vma, address, pte_pfn(*pte));
 			pteval = ptep_clear_flush(vma, address, pte);
 			page_remove_rmap(page);
-			dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_file_counters(mm, page);
 			BUG_ON(pte_dirty(pteval));
 			pte_unmap_unlock(pte, ptl);
 			/* must invalidate_page _before_ freeing the page */
diff --git a/mm/memory.c b/mm/memory.c
index eb37dfb..39820ed 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -176,6 +176,20 @@ static void check_sync_rss_stat(struct task_struct *task)
 
 #endif /* SPLIT_RSS_COUNTING */
 
+static void inc_mm_file_counters_fast(struct mm_struct *mm, struct page *page)
+{
+	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	if (PageSwapBacked(page))
+		inc_mm_counter_fast(mm, MM_SHMEMPAGES);
+}
+
+static void dec_mm_file_counters_fast(struct mm_struct *mm, struct page *page)
+{
+	dec_mm_counter_fast(mm, MM_FILEPAGES);
+	if (PageSwapBacked(page))
+		dec_mm_counter_fast(mm, MM_SHMEMPAGES);
+}
+
 #ifdef HAVE_GENERIC_MMU_GATHER
 
 static int tlb_next_batch(struct mmu_gather *tlb)
@@ -832,8 +846,11 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 
 				if (PageAnon(page))
 					rss[MM_ANONPAGES]++;
-				else
+				else {
 					rss[MM_FILEPAGES]++;
+					if (PageSwapBacked(page))
+						rss[MM_SHMEMPAGES]++;
+				}
 
 				if (is_write_migration_entry(entry) &&
 				    is_cow_mapping(vm_flags)) {
@@ -875,8 +892,11 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		page_dup_rmap(page);
 		if (PageAnon(page))
 			rss[MM_ANONPAGES]++;
-		else
+		else {
 			rss[MM_FILEPAGES]++;
+			if (PageSwapBacked(page))
+				rss[MM_SHMEMPAGES]++;
+		}
 	}
 
 out_set_pte:
@@ -1140,6 +1160,8 @@ again:
 				    likely(!(vma->vm_flags & VM_SEQ_READ)))
 					mark_page_accessed(page);
 				rss[MM_FILEPAGES]--;
+				if (PageSwapBacked(page))
+					rss[MM_SHMEMPAGES]--;
 			}
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
@@ -1171,8 +1193,11 @@ again:
 
 				if (PageAnon(page))
 					rss[MM_ANONPAGES]--;
-				else
+				else {
 					rss[MM_FILEPAGES]--;
+					if (PageSwapBacked(page))
+						rss[MM_SHMEMPAGES]--;
+				}
 			}
 			if (unlikely(!free_swap_and_cache(entry)))
 				print_bad_pte(vma, addr, ptent, NULL);
@@ -1495,7 +1520,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
 
 	/* Ok, finally just insert the thing.. */
 	get_page(page);
-	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	inc_mm_file_counters_fast(mm, page);
 	page_add_file_rmap(page);
 	set_pte_at(mm, addr, pte, mk_pte(page, prot));
 
@@ -2217,7 +2242,7 @@ gotten:
 	if (likely(pte_same(*page_table, orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				dec_mm_file_counters_fast(mm, old_page);
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
 		} else
@@ -2759,7 +2784,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, address);
 	} else {
-		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
+		inc_mm_file_counters_fast(vma->vm_mm, page);
 		page_add_file_rmap(page);
 	}
 	set_pte_at(vma->vm_mm, address, pte, entry);
diff --git a/mm/rmap.c b/mm/rmap.c
index 3e8491c..618240d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1168,7 +1168,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 			if (PageAnon(page))
 				dec_mm_counter(mm, MM_ANONPAGES);
 			else
-				dec_mm_counter(mm, MM_FILEPAGES);
+				dec_mm_file_counters(mm, page);
 		}
 		set_pte_at(mm, address, pte,
 			   swp_entry_to_pte(make_hwpoison_entry(page)));
@@ -1181,7 +1181,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		if (PageAnon(page))
 			dec_mm_counter(mm, MM_ANONPAGES);
 		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_file_counters(mm, page);
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		pte_t swp_pte;
@@ -1225,7 +1225,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		entry = make_migration_entry(page, pte_write(pteval));
 		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
 	} else
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_file_counters(mm, page);
 
 	page_remove_rmap(page);
 	page_cache_release(page);
@@ -1376,7 +1376,7 @@ static int try_to_unmap_cluster(unsigned long cursor, unsigned int *mapcount,
 
 		page_remove_rmap(page);
 		page_cache_release(page);
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_file_counters(mm, page);
 		(*mapcount)--;
 	}
 	pte_unmap_unlock(pte - 1, ptl);
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] mm, shmem: Add shmem_locate function
  2014-07-22 13:43 ` Jerome Marchand
@ 2014-07-22 13:43   ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

The shmem subsytem is kind of a black box: the generic mm code can't
always know where a specific page physically is. This patch adds the
shmem_locate() function to find out the physical location of shmem
pages (resident, in swap or swapcache). If the optional argument count
isn't NULL and the page is resident, it also returns the mapcount value
of this page.
This is intended to allow finer accounting of shmem/tmpfs pages.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h |  7 +++++++
 mm/shmem.c         | 29 +++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e69ee9d..34099fa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1066,6 +1066,13 @@ extern bool skip_free_areas_node(unsigned int flags, int nid);
 
 int shmem_zero_setup(struct vm_area_struct *);
 #ifdef CONFIG_SHMEM
+
+#define SHMEM_NOTPRESENT	1 /* page is not present in memory */
+#define SHMEM_RESIDENT		2 /* page is resident in RAM */
+#define SHMEM_SWAPCACHE		3 /* page is in swap cache */
+#define SHMEM_SWAP		4 /* page is paged out */
+
+extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
 bool shmem_mapping(struct address_space *mapping);
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
diff --git a/mm/shmem.c b/mm/shmem.c
index b16d3e7..8aa4892 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1341,6 +1341,35 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	return ret;
 }
 
+int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count)
+{
+	struct address_space *mapping = file_inode(vma->vm_file)->i_mapping;
+	struct page *page;
+	swp_entry_t swap;
+	int ret;
+
+	page = find_get_entry(mapping, pgoff);
+	if (!page) /* Not yet initialised? */
+		return SHMEM_NOTPRESENT;
+
+	if (!radix_tree_exceptional_entry(page)) {
+		ret = SHMEM_RESIDENT;
+		if (count)
+			*count = page_mapcount(page);
+		goto out;
+	}
+
+	swap = radix_to_swp_entry(page);
+	page = find_get_page(swap_address_space(swap), swap.val);
+	if (!page)
+		return SHMEM_SWAP;
+	ret = SHMEM_SWAPCACHE;
+
+out:
+	page_cache_release(page);
+	return ret;
+}
+
 #ifdef CONFIG_NUMA
 static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
 {
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/5] mm, shmem: Add shmem_locate function
@ 2014-07-22 13:43   ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

The shmem subsytem is kind of a black box: the generic mm code can't
always know where a specific page physically is. This patch adds the
shmem_locate() function to find out the physical location of shmem
pages (resident, in swap or swapcache). If the optional argument count
isn't NULL and the page is resident, it also returns the mapcount value
of this page.
This is intended to allow finer accounting of shmem/tmpfs pages.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h |  7 +++++++
 mm/shmem.c         | 29 +++++++++++++++++++++++++++++
 2 files changed, 36 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index e69ee9d..34099fa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1066,6 +1066,13 @@ extern bool skip_free_areas_node(unsigned int flags, int nid);
 
 int shmem_zero_setup(struct vm_area_struct *);
 #ifdef CONFIG_SHMEM
+
+#define SHMEM_NOTPRESENT	1 /* page is not present in memory */
+#define SHMEM_RESIDENT		2 /* page is resident in RAM */
+#define SHMEM_SWAPCACHE		3 /* page is in swap cache */
+#define SHMEM_SWAP		4 /* page is paged out */
+
+extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
 bool shmem_mapping(struct address_space *mapping);
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
diff --git a/mm/shmem.c b/mm/shmem.c
index b16d3e7..8aa4892 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1341,6 +1341,35 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 	return ret;
 }
 
+int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count)
+{
+	struct address_space *mapping = file_inode(vma->vm_file)->i_mapping;
+	struct page *page;
+	swp_entry_t swap;
+	int ret;
+
+	page = find_get_entry(mapping, pgoff);
+	if (!page) /* Not yet initialised? */
+		return SHMEM_NOTPRESENT;
+
+	if (!radix_tree_exceptional_entry(page)) {
+		ret = SHMEM_RESIDENT;
+		if (count)
+			*count = page_mapcount(page);
+		goto out;
+	}
+
+	swap = radix_to_swp_entry(page);
+	page = find_get_page(swap_address_space(swap), swap.val);
+	if (!page)
+		return SHMEM_SWAP;
+	ret = SHMEM_SWAPCACHE;
+
+out:
+	page_cache_release(page);
+	return ret;
+}
+
 #ifdef CONFIG_NUMA
 static int shmem_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol)
 {
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] mm, shmem: Add shmem_vma() helper
  2014-07-22 13:43 ` Jerome Marchand
@ 2014-07-22 13:43   ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Add a simple helper to check if a vm area belongs to shmem.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h | 6 ++++++
 mm/shmem.c         | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 34099fa..04a58d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
 
 extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
 bool shmem_mapping(struct address_space *mapping);
+bool shmem_vma(struct vm_area_struct *vma);
+
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
 {
 	return false;
 }
+static inline bool shmem_vma(struct vm_area_struct *vma)
+{
+	return false;
+}
 #endif
 
 extern int can_do_mlock(void);
diff --git a/mm/shmem.c b/mm/shmem.c
index 8aa4892..7d16227 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1483,6 +1483,14 @@ bool shmem_mapping(struct address_space *mapping)
 	return mapping->backing_dev_info == &shmem_backing_dev_info;
 }
 
+bool shmem_vma(struct vm_area_struct *vma)
+{
+	return (vma->vm_file &&
+		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
+		== &shmem_backing_dev_info);
+
+}
+
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
 static const struct inode_operations shmem_short_symlink_operations;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] mm, shmem: Add shmem_vma() helper
@ 2014-07-22 13:43   ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Add a simple helper to check if a vm area belongs to shmem.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h | 6 ++++++
 mm/shmem.c         | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 34099fa..04a58d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
 
 extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
 bool shmem_mapping(struct address_space *mapping);
+bool shmem_vma(struct vm_area_struct *vma);
+
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
 {
 	return false;
 }
+static inline bool shmem_vma(struct vm_area_struct *vma)
+{
+	return false;
+}
 #endif
 
 extern int can_do_mlock(void);
diff --git a/mm/shmem.c b/mm/shmem.c
index 8aa4892..7d16227 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1483,6 +1483,14 @@ bool shmem_mapping(struct address_space *mapping)
 	return mapping->backing_dev_info == &shmem_backing_dev_info;
 }
 
+bool shmem_vma(struct vm_area_struct *vma)
+{
+	return (vma->vm_file &&
+		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
+		== &shmem_backing_dev_info);
+
+}
+
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
 static const struct inode_operations shmem_short_symlink_operations;
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] mm, shmem: Add shmem swap memory accounting
  2014-07-22 13:43 ` Jerome Marchand
@ 2014-07-22 13:43   ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Adds get_mm_shswap() which compute the size of swaped out shmem. It
does so by pagewalking the mm and using the new shmem_locate() function
to get the physical location of shmem pages.
The result is displayed in the new VmShSw line of /proc/<pid>/status.
Use mm_walk an shmem_locate() to account paged out shmem pages.

It significantly slows down /proc/<pid>/status acccess speed when
there is a big shmem mapping. If that is an issue, we can drop this
patch and only display this counter in the inherently slower
/proc/<pid>/smaps file (cf. next patch).

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 Documentation/filesystems/proc.txt |  2 +
 fs/proc/task_mmu.c                 | 80 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 1c49957..1a15c56 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -172,6 +172,7 @@ read the file /proc/PID/status:
   VmPTE:        20 kb
   VmSwap:        0 kB
   VmShm:         0 kB
+  VmShSw:        0 kB
   Threads:        1
   SigQ:   0/28578
   SigPnd: 0000000000000000
@@ -230,6 +231,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmPTE                       size of page table entries
  VmSwap                      size of swap usage (the number of referred swapents)
  VmShm	                      size of resident shmem memory
+ VmShSw                      size of paged out shmem memory
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
  SigPnd                      bitmap of pending signals for the thread
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 4e60751..73f0ce4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -19,9 +19,80 @@
 #include <asm/tlbflush.h>
 #include "internal.h"
 
+struct shswap_stats {
+	struct vm_area_struct *vma;
+	unsigned long shswap;
+};
+
+#ifdef CONFIG_SHMEM
+static int shswap_pte(pte_t *pte, unsigned long addr, unsigned long end,
+		       struct mm_walk *walk)
+{
+	struct shswap_stats *shss = walk->private;
+	struct vm_area_struct *vma = shss->vma;
+	pgoff_t pgoff = linear_page_index(vma, addr);
+	pte_t ptent = *pte;
+
+	if (pte_none(ptent) &&
+	    shmem_locate(vma, pgoff, NULL) == SHMEM_SWAP)
+		shss->shswap += end - addr;
+
+	return 0;
+}
+
+static int shswap_pte_hole(unsigned long addr, unsigned long end,
+			   struct mm_walk *walk)
+{
+	struct shswap_stats *shss = walk->private;
+	struct vm_area_struct *vma = shss->vma;
+	pgoff_t pgoff;
+
+	for (; addr != end; addr += PAGE_SIZE) {
+		pgoff = linear_page_index(vma, addr);
+
+		if (shmem_locate(vma, pgoff, NULL) == SHMEM_SWAP)
+			shss->shswap += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+static unsigned long get_mm_shswap(struct mm_struct *mm)
+{
+	struct vm_area_struct *vma;
+	struct shswap_stats shss;
+	struct mm_walk shswap_walk = {
+		.pte_entry = shswap_pte,
+		.pte_hole = shswap_pte_hole,
+		.mm = mm,
+		.private = &shss,
+	};
+
+	memset(&shss, 0, sizeof(shss));
+
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next)
+		if (shmem_vma(vma)) {
+			shss.vma = vma;
+			walk_page_range(vma->vm_start, vma->vm_end,
+					&shswap_walk);
+		}
+	up_read(&mm->mmap_sem);
+
+	return shss.shswap;
+}
+
+#else
+
+static unsigned long get_mm_shswap(struct mm_struct *mm)
+{
+	return 0;
+}
+#endif
+
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap, shmem;
+	unsigned long data, text, lib, swap, shmem, shswap;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
 	/*
@@ -43,6 +114,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
 	swap = get_mm_counter(mm, MM_SWAPENTS);
 	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
+	shswap = get_mm_shswap(mm);
 	seq_printf(m,
 		"VmPeak:\t%8lu kB\n"
 		"VmSize:\t%8lu kB\n"
@@ -56,7 +128,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmLib:\t%8lu kB\n"
 		"VmPTE:\t%8lu kB\n"
 		"VmSwap:\t%8lu kB\n"
-		"VmShm:\t%8lu kB\n",
+		"VmShm:\t%8lu kB\n"
+		"VmShSw:\t%8lu kB\n",
 		hiwater_vm << (PAGE_SHIFT-10),
 		total_vm << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
@@ -68,7 +141,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		(PTRS_PER_PTE * sizeof(pte_t) *
 		 atomic_long_read(&mm->nr_ptes)) >> 10,
 		swap << (PAGE_SHIFT-10),
-		shmem << (PAGE_SHIFT-10));
+		shmem << (PAGE_SHIFT-10),
+		shswap >> 10);
 }
 
 unsigned long task_vsize(struct mm_struct *mm)
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 4/5] mm, shmem: Add shmem swap memory accounting
@ 2014-07-22 13:43   ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Adds get_mm_shswap() which compute the size of swaped out shmem. It
does so by pagewalking the mm and using the new shmem_locate() function
to get the physical location of shmem pages.
The result is displayed in the new VmShSw line of /proc/<pid>/status.
Use mm_walk an shmem_locate() to account paged out shmem pages.

It significantly slows down /proc/<pid>/status acccess speed when
there is a big shmem mapping. If that is an issue, we can drop this
patch and only display this counter in the inherently slower
/proc/<pid>/smaps file (cf. next patch).

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 Documentation/filesystems/proc.txt |  2 +
 fs/proc/task_mmu.c                 | 80 ++++++++++++++++++++++++++++++++++++--
 2 files changed, 79 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 1c49957..1a15c56 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -172,6 +172,7 @@ read the file /proc/PID/status:
   VmPTE:        20 kb
   VmSwap:        0 kB
   VmShm:         0 kB
+  VmShSw:        0 kB
   Threads:        1
   SigQ:   0/28578
   SigPnd: 0000000000000000
@@ -230,6 +231,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmPTE                       size of page table entries
  VmSwap                      size of swap usage (the number of referred swapents)
  VmShm	                      size of resident shmem memory
+ VmShSw                      size of paged out shmem memory
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
  SigPnd                      bitmap of pending signals for the thread
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 4e60751..73f0ce4 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -19,9 +19,80 @@
 #include <asm/tlbflush.h>
 #include "internal.h"
 
+struct shswap_stats {
+	struct vm_area_struct *vma;
+	unsigned long shswap;
+};
+
+#ifdef CONFIG_SHMEM
+static int shswap_pte(pte_t *pte, unsigned long addr, unsigned long end,
+		       struct mm_walk *walk)
+{
+	struct shswap_stats *shss = walk->private;
+	struct vm_area_struct *vma = shss->vma;
+	pgoff_t pgoff = linear_page_index(vma, addr);
+	pte_t ptent = *pte;
+
+	if (pte_none(ptent) &&
+	    shmem_locate(vma, pgoff, NULL) == SHMEM_SWAP)
+		shss->shswap += end - addr;
+
+	return 0;
+}
+
+static int shswap_pte_hole(unsigned long addr, unsigned long end,
+			   struct mm_walk *walk)
+{
+	struct shswap_stats *shss = walk->private;
+	struct vm_area_struct *vma = shss->vma;
+	pgoff_t pgoff;
+
+	for (; addr != end; addr += PAGE_SIZE) {
+		pgoff = linear_page_index(vma, addr);
+
+		if (shmem_locate(vma, pgoff, NULL) == SHMEM_SWAP)
+			shss->shswap += PAGE_SIZE;
+	}
+
+	return 0;
+}
+
+static unsigned long get_mm_shswap(struct mm_struct *mm)
+{
+	struct vm_area_struct *vma;
+	struct shswap_stats shss;
+	struct mm_walk shswap_walk = {
+		.pte_entry = shswap_pte,
+		.pte_hole = shswap_pte_hole,
+		.mm = mm,
+		.private = &shss,
+	};
+
+	memset(&shss, 0, sizeof(shss));
+
+	down_read(&mm->mmap_sem);
+	for (vma = mm->mmap; vma; vma = vma->vm_next)
+		if (shmem_vma(vma)) {
+			shss.vma = vma;
+			walk_page_range(vma->vm_start, vma->vm_end,
+					&shswap_walk);
+		}
+	up_read(&mm->mmap_sem);
+
+	return shss.shswap;
+}
+
+#else
+
+static unsigned long get_mm_shswap(struct mm_struct *mm)
+{
+	return 0;
+}
+#endif
+
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap, shmem;
+	unsigned long data, text, lib, swap, shmem, shswap;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
 	/*
@@ -43,6 +114,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
 	swap = get_mm_counter(mm, MM_SWAPENTS);
 	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
+	shswap = get_mm_shswap(mm);
 	seq_printf(m,
 		"VmPeak:\t%8lu kB\n"
 		"VmSize:\t%8lu kB\n"
@@ -56,7 +128,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmLib:\t%8lu kB\n"
 		"VmPTE:\t%8lu kB\n"
 		"VmSwap:\t%8lu kB\n"
-		"VmShm:\t%8lu kB\n",
+		"VmShm:\t%8lu kB\n"
+		"VmShSw:\t%8lu kB\n",
 		hiwater_vm << (PAGE_SHIFT-10),
 		total_vm << (PAGE_SHIFT-10),
 		mm->locked_vm << (PAGE_SHIFT-10),
@@ -68,7 +141,8 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		(PTRS_PER_PTE * sizeof(pte_t) *
 		 atomic_long_read(&mm->nr_ptes)) >> 10,
 		swap << (PAGE_SHIFT-10),
-		shmem << (PAGE_SHIFT-10));
+		shmem << (PAGE_SHIFT-10),
+		shswap >> 10);
 }
 
 unsigned long task_vsize(struct mm_struct *mm)
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps
  2014-07-22 13:43 ` Jerome Marchand
@ 2014-07-22 13:43   ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to
/proc/<pid>/smaps for shmem mappings.

ShmOther: amount of memory that is currently resident in memory, not
present in the page table of this process but present in the page
table of an other process.
ShmOrphan: amount of memory that is currently resident in memory but
not present in any process page table. This can happens when a process
unmaps a shared mapping it has accessed before or exits. Despite being
resident, this memory is not currently accounted to any process.
ShmSwapcache: amount of memory currently in swap cache
ShmSwap: amount of memory that is paged out on disk.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 Documentation/filesystems/proc.txt | 11 ++++++++
 fs/proc/task_mmu.c                 | 56 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 1a15c56..a65ab59 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -422,6 +422,10 @@ Swap:                  0 kB
 KernelPageSize:        4 kB
 MMUPageSize:           4 kB
 Locked:              374 kB
+ShmOther:            124 kB
+ShmOrphan:             0 kB
+ShmSwapCache:         12 kB
+ShmSwap:              36 kB
 VmFlags: rd ex mr mw me de
 
 the first of these lines shows the same information as is displayed for the
@@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
 swap.
+The ShmXXX lines only appears for shmem mapping. They show the amount of memory
+from the mapping that is currently:
+ - resident in RAM, not present in the page table of this process but present
+ in the page table of an other process (ShmOther)
+ - resident in RAM but not present in the page table of any process (ShmOrphan)
+ - in swap cache (ShmSwapCache)
+ - paged out on swap (ShmSwap).
 
 "VmFlags" field deserves a separate description. This member represents the kernel
 flags associated with the particular virtual memory area in two letter encoded
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 73f0ce4..9b1de55 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -518,9 +518,33 @@ struct mem_size_stats {
 	unsigned long anonymous_thp;
 	unsigned long swap;
 	unsigned long nonlinear;
+	unsigned long shmem_resident_other;
+	unsigned long shmem_swapcache;
+	unsigned long shmem_swap;
+	unsigned long shmem_orphan;
 	u64 pss;
 };
 
+void update_shmem_stats(struct mem_size_stats *mss, struct vm_area_struct *vma,
+			pgoff_t pgoff, unsigned long size)
+{
+	int count = 0;
+
+	switch (shmem_locate(vma, pgoff, &count)) {
+	case SHMEM_RESIDENT:
+		if (count)
+			mss->shmem_resident_other += size;
+		else
+			mss->shmem_orphan += size;
+		break;
+	case SHMEM_SWAPCACHE:
+		mss->shmem_swapcache += size;
+		break;
+	case SHMEM_SWAP:
+		mss->shmem_swap += size;
+		break;
+	}
+}
 
 static void smaps_pte_entry(pte_t ptent, unsigned long addr,
 		unsigned long ptent_size, struct mm_walk *walk)
@@ -543,7 +567,8 @@ static void smaps_pte_entry(pte_t ptent, unsigned long addr,
 	} else if (pte_file(ptent)) {
 		if (pte_to_pgoff(ptent) != pgoff)
 			mss->nonlinear += ptent_size;
-	}
+	} else if (pte_none(ptent) && shmem_vma(vma))
+		update_shmem_stats(mss, vma, pgoff, ptent_size);
 
 	if (!page)
 		return;
@@ -604,6 +629,21 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	return 0;
 }
 
+static int smaps_pte_hole(unsigned long addr, unsigned long end,
+			  struct mm_walk *walk)
+{
+	struct mem_size_stats *mss = walk->private;
+	struct vm_area_struct *vma = mss->vma;
+	pgoff_t pgoff;
+
+	for (; addr != end; addr += PAGE_SIZE) {
+		pgoff = linear_page_index(vma, addr);
+		update_shmem_stats(mss, vma, pgoff, PAGE_SIZE);
+	}
+
+	return 0;
+}
+
 static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 {
 	/*
@@ -670,6 +710,10 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		.private = &mss,
 	};
 
+	/* Only walk the holes when it'a a shmem mapping */
+	if (shmem_vma(vma))
+		smaps_walk.pte_hole = smaps_pte_hole;
+
 	memset(&mss, 0, sizeof mss);
 	mss.vma = vma;
 	/* mmap_sem is held in m_start */
@@ -712,6 +756,16 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	if (vma->vm_flags & VM_NONLINEAR)
 		seq_printf(m, "Nonlinear:      %8lu kB\n",
 				mss.nonlinear >> 10);
+	if (shmem_vma(vma))
+		seq_printf(m,
+			   "ShmOther:       %8lu kB\n"
+			   "ShmOrphan:      %8lu kB\n"
+			   "ShmSwapCache:   %8lu kB\n"
+			   "ShmSwap:        %8lu kB\n",
+			   mss.shmem_resident_other >> 10,
+			   mss.shmem_orphan >> 10,
+			   mss.shmem_swapcache >> 10,
+			   mss.shmem_swap >> 10);
 
 	show_smap_vma_flags(m, vma);
 
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps
@ 2014-07-22 13:43   ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-22 13:43 UTC (permalink / raw)
  To: linux-mm
  Cc: linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to
/proc/<pid>/smaps for shmem mappings.

ShmOther: amount of memory that is currently resident in memory, not
present in the page table of this process but present in the page
table of an other process.
ShmOrphan: amount of memory that is currently resident in memory but
not present in any process page table. This can happens when a process
unmaps a shared mapping it has accessed before or exits. Despite being
resident, this memory is not currently accounted to any process.
ShmSwapcache: amount of memory currently in swap cache
ShmSwap: amount of memory that is paged out on disk.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 Documentation/filesystems/proc.txt | 11 ++++++++
 fs/proc/task_mmu.c                 | 56 +++++++++++++++++++++++++++++++++++++-
 2 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 1a15c56..a65ab59 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -422,6 +422,10 @@ Swap:                  0 kB
 KernelPageSize:        4 kB
 MMUPageSize:           4 kB
 Locked:              374 kB
+ShmOther:            124 kB
+ShmOrphan:             0 kB
+ShmSwapCache:         12 kB
+ShmSwap:              36 kB
 VmFlags: rd ex mr mw me de
 
 the first of these lines shows the same information as is displayed for the
@@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
 swap.
+The ShmXXX lines only appears for shmem mapping. They show the amount of memory
+from the mapping that is currently:
+ - resident in RAM, not present in the page table of this process but present
+ in the page table of an other process (ShmOther)
+ - resident in RAM but not present in the page table of any process (ShmOrphan)
+ - in swap cache (ShmSwapCache)
+ - paged out on swap (ShmSwap).
 
 "VmFlags" field deserves a separate description. This member represents the kernel
 flags associated with the particular virtual memory area in two letter encoded
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 73f0ce4..9b1de55 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -518,9 +518,33 @@ struct mem_size_stats {
 	unsigned long anonymous_thp;
 	unsigned long swap;
 	unsigned long nonlinear;
+	unsigned long shmem_resident_other;
+	unsigned long shmem_swapcache;
+	unsigned long shmem_swap;
+	unsigned long shmem_orphan;
 	u64 pss;
 };
 
+void update_shmem_stats(struct mem_size_stats *mss, struct vm_area_struct *vma,
+			pgoff_t pgoff, unsigned long size)
+{
+	int count = 0;
+
+	switch (shmem_locate(vma, pgoff, &count)) {
+	case SHMEM_RESIDENT:
+		if (count)
+			mss->shmem_resident_other += size;
+		else
+			mss->shmem_orphan += size;
+		break;
+	case SHMEM_SWAPCACHE:
+		mss->shmem_swapcache += size;
+		break;
+	case SHMEM_SWAP:
+		mss->shmem_swap += size;
+		break;
+	}
+}
 
 static void smaps_pte_entry(pte_t ptent, unsigned long addr,
 		unsigned long ptent_size, struct mm_walk *walk)
@@ -543,7 +567,8 @@ static void smaps_pte_entry(pte_t ptent, unsigned long addr,
 	} else if (pte_file(ptent)) {
 		if (pte_to_pgoff(ptent) != pgoff)
 			mss->nonlinear += ptent_size;
-	}
+	} else if (pte_none(ptent) && shmem_vma(vma))
+		update_shmem_stats(mss, vma, pgoff, ptent_size);
 
 	if (!page)
 		return;
@@ -604,6 +629,21 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end,
 	return 0;
 }
 
+static int smaps_pte_hole(unsigned long addr, unsigned long end,
+			  struct mm_walk *walk)
+{
+	struct mem_size_stats *mss = walk->private;
+	struct vm_area_struct *vma = mss->vma;
+	pgoff_t pgoff;
+
+	for (; addr != end; addr += PAGE_SIZE) {
+		pgoff = linear_page_index(vma, addr);
+		update_shmem_stats(mss, vma, pgoff, PAGE_SIZE);
+	}
+
+	return 0;
+}
+
 static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 {
 	/*
@@ -670,6 +710,10 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 		.private = &mss,
 	};
 
+	/* Only walk the holes when it'a a shmem mapping */
+	if (shmem_vma(vma))
+		smaps_walk.pte_hole = smaps_pte_hole;
+
 	memset(&mss, 0, sizeof mss);
 	mss.vma = vma;
 	/* mmap_sem is held in m_start */
@@ -712,6 +756,16 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	if (vma->vm_flags & VM_NONLINEAR)
 		seq_printf(m, "Nonlinear:      %8lu kB\n",
 				mss.nonlinear >> 10);
+	if (shmem_vma(vma))
+		seq_printf(m,
+			   "ShmOther:       %8lu kB\n"
+			   "ShmOrphan:      %8lu kB\n"
+			   "ShmSwapCache:   %8lu kB\n"
+			   "ShmSwap:        %8lu kB\n",
+			   mss.shmem_resident_other >> 10,
+			   mss.shmem_orphan >> 10,
+			   mss.shmem_swapcache >> 10,
+			   mss.shmem_swap >> 10);
 
 	show_smap_vma_flags(m, vma);
 
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] mm, shmem: Add shmem_vma() helper
  2014-07-22 13:43   ` Jerome Marchand
@ 2014-07-24 19:53     ` Oleg Nesterov
  -1 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2014-07-24 19:53 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra

On 07/22, Jerome Marchand wrote:
>
> +bool shmem_vma(struct vm_area_struct *vma)
> +{
> +	return (vma->vm_file &&
> +		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
> +		== &shmem_backing_dev_info);
> +
> +}

Cosmetic nit, it seems that this helper could simply do

	return vma->vm_file && shmem_mapping(file_inode(vma->vm_file));

Oleg.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] mm, shmem: Add shmem_vma() helper
@ 2014-07-24 19:53     ` Oleg Nesterov
  0 siblings, 0 replies; 30+ messages in thread
From: Oleg Nesterov @ 2014-07-24 19:53 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra

On 07/22, Jerome Marchand wrote:
>
> +bool shmem_vma(struct vm_area_struct *vma)
> +{
> +	return (vma->vm_file &&
> +		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
> +		== &shmem_backing_dev_info);
> +
> +}

Cosmetic nit, it seems that this helper could simply do

	return vma->vm_file && shmem_mapping(file_inode(vma->vm_file));

Oleg.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] mm, shmem: Add shmem resident memory accounting
  2014-07-22 13:43   ` Jerome Marchand
@ 2014-08-01  5:01     ` Hugh Dickins
  -1 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:01 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.
> This patch adds MM_SHMEMPAGES counter to mm_rss_stat. It keeps track of
> resident shmem memory size. Its value is exposed in the new VmShm line
> of /proc/<pid>/status.

I like adding this info to /proc/<pid>/status - thank you -
but I think you can make the patch much better in a couple of ways.

> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> ---
>  Documentation/filesystems/proc.txt |  2 ++
>  arch/s390/mm/pgtable.c             |  2 +-
>  fs/proc/task_mmu.c                 |  9 ++++++---
>  include/linux/mm.h                 |  7 +++++++
>  include/linux/mm_types.h           |  7 ++++---
>  kernel/events/uprobes.c            |  2 +-
>  mm/filemap_xip.c                   |  2 +-
>  mm/memory.c                        | 37 +++++++++++++++++++++++++++++++------
>  mm/rmap.c                          |  8 ++++----
>  9 files changed, 57 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index ddc531a..1c49957 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -171,6 +171,7 @@ read the file /proc/PID/status:
>    VmLib:      1412 kB
>    VmPTE:        20 kb
>    VmSwap:        0 kB
> +  VmShm:         0 kB
>    Threads:        1
>    SigQ:   0/28578
>    SigPnd: 0000000000000000
> @@ -228,6 +229,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
>   VmLib                       size of shared library code
>   VmPTE                       size of page table entries
>   VmSwap                      size of swap usage (the number of referred swapents)
> + VmShm	                      size of resident shmem memory

Needs to say that includes mappings of tmpfs, and needs to say that
it's a subset of VmRSS.  Better placed immediately after VmRSS...

...but now that I look through what's in /proc/<pid>/status, it appears
that we have to defer to /proc/<pid>/statm to see MM_FILEPAGES (third
field) and MM_ANONPAGES (subtract third field from second field).

That's not a very friendly interface.  If you're going to help by
exposing MM_SHMPAGES separately, please help even more by exposing
VmFile and VmAnon here in /proc/<pid>/status too.

VmRSS, VmAnon, VmShm, VmFile?  I'm not sure what's the best order:
here I'm thinking that anon comes before file in /proc/meminfo, and
shm should be halfway between anon and file.  You may have another idea.

And of course the VmFile count here should exclude VmShm: I think it
will work out least confusingly if you account MM_FILEPAGES separately
from MM_SHMPAGES, but add them together where needed e.g. for statm.

>   Threads                     number of threads
>   SigQ                        number of signals queued/max. number for queue
>   SigPnd                      bitmap of pending signals for the thread
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 37b8241..9fe31b0 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -612,7 +612,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
>  		if (PageAnon(page))
>  			dec_mm_counter(mm, MM_ANONPAGES);
>  		else
> -			dec_mm_counter(mm, MM_FILEPAGES);
> +			dec_mm_file_counters(mm, page);
>  	}

That is a recurring pattern: please try putting

static inline int mm_counter(struct page *page)
{
	if (PageAnon(page))
		return MM_ANONPAGES;
	if (PageSwapBacked(page))
		return MM_SHMPAGES;
	return MM_FILEPAGES;
}

in include/linux/mm.h.

Then dec_mm_counter(mm, mm_counter(page)) here, and wherever you can,
use mm_counter(page) to simplify the code throughout.

I say "try" because I think factoring out mm_counter() will simplify
the most code, given the profusion of different accessors, particularly
in mm/memory.c.  But I'm not sure how much bloat having it as an inline
function will add, versus how much overhead it would add if not inline.

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] mm, shmem: Add shmem resident memory accounting
@ 2014-08-01  5:01     ` Hugh Dickins
  0 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:01 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.
> This patch adds MM_SHMEMPAGES counter to mm_rss_stat. It keeps track of
> resident shmem memory size. Its value is exposed in the new VmShm line
> of /proc/<pid>/status.

I like adding this info to /proc/<pid>/status - thank you -
but I think you can make the patch much better in a couple of ways.

> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> ---
>  Documentation/filesystems/proc.txt |  2 ++
>  arch/s390/mm/pgtable.c             |  2 +-
>  fs/proc/task_mmu.c                 |  9 ++++++---
>  include/linux/mm.h                 |  7 +++++++
>  include/linux/mm_types.h           |  7 ++++---
>  kernel/events/uprobes.c            |  2 +-
>  mm/filemap_xip.c                   |  2 +-
>  mm/memory.c                        | 37 +++++++++++++++++++++++++++++++------
>  mm/rmap.c                          |  8 ++++----
>  9 files changed, 57 insertions(+), 19 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index ddc531a..1c49957 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -171,6 +171,7 @@ read the file /proc/PID/status:
>    VmLib:      1412 kB
>    VmPTE:        20 kb
>    VmSwap:        0 kB
> +  VmShm:         0 kB
>    Threads:        1
>    SigQ:   0/28578
>    SigPnd: 0000000000000000
> @@ -228,6 +229,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
>   VmLib                       size of shared library code
>   VmPTE                       size of page table entries
>   VmSwap                      size of swap usage (the number of referred swapents)
> + VmShm	                      size of resident shmem memory

Needs to say that includes mappings of tmpfs, and needs to say that
it's a subset of VmRSS.  Better placed immediately after VmRSS...

...but now that I look through what's in /proc/<pid>/status, it appears
that we have to defer to /proc/<pid>/statm to see MM_FILEPAGES (third
field) and MM_ANONPAGES (subtract third field from second field).

That's not a very friendly interface.  If you're going to help by
exposing MM_SHMPAGES separately, please help even more by exposing
VmFile and VmAnon here in /proc/<pid>/status too.

VmRSS, VmAnon, VmShm, VmFile?  I'm not sure what's the best order:
here I'm thinking that anon comes before file in /proc/meminfo, and
shm should be halfway between anon and file.  You may have another idea.

And of course the VmFile count here should exclude VmShm: I think it
will work out least confusingly if you account MM_FILEPAGES separately
from MM_SHMPAGES, but add them together where needed e.g. for statm.

>   Threads                     number of threads
>   SigQ                        number of signals queued/max. number for queue
>   SigPnd                      bitmap of pending signals for the thread
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 37b8241..9fe31b0 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -612,7 +612,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
>  		if (PageAnon(page))
>  			dec_mm_counter(mm, MM_ANONPAGES);
>  		else
> -			dec_mm_counter(mm, MM_FILEPAGES);
> +			dec_mm_file_counters(mm, page);
>  	}

That is a recurring pattern: please try putting

static inline int mm_counter(struct page *page)
{
	if (PageAnon(page))
		return MM_ANONPAGES;
	if (PageSwapBacked(page))
		return MM_SHMPAGES;
	return MM_FILEPAGES;
}

in include/linux/mm.h.

Then dec_mm_counter(mm, mm_counter(page)) here, and wherever you can,
use mm_counter(page) to simplify the code throughout.

I say "try" because I think factoring out mm_counter() will simplify
the most code, given the profusion of different accessors, particularly
in mm/memory.c.  But I'm not sure how much bloat having it as an inline
function will add, versus how much overhead it would add if not inline.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] mm, shmem: Add shmem_locate function
  2014-07-22 13:43   ` Jerome Marchand
@ 2014-08-01  5:01     ` Hugh Dickins
  -1 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:01 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> The shmem subsytem is kind of a black box: the generic mm code can't

I'm happier with that black box than you are :)

> always know where a specific page physically is. This patch adds the
> shmem_locate() function to find out the physical location of shmem
> pages (resident, in swap or swapcache). If the optional argument count
> isn't NULL and the page is resident, it also returns the mapcount value
> of this page.
> This is intended to allow finer accounting of shmem/tmpfs pages.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> ---
>  include/linux/mm.h |  7 +++++++
>  mm/shmem.c         | 29 +++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e69ee9d..34099fa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1066,6 +1066,13 @@ extern bool skip_free_areas_node(unsigned int flags, int nid);
>  
>  int shmem_zero_setup(struct vm_area_struct *);
>  #ifdef CONFIG_SHMEM
> +
> +#define SHMEM_NOTPRESENT	1 /* page is not present in memory */
> +#define SHMEM_RESIDENT		2 /* page is resident in RAM */
> +#define SHMEM_SWAPCACHE		3 /* page is in swap cache */
> +#define SHMEM_SWAP		4 /* page is paged out */
> +
> +extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);

Please place these, or what's needed of them, in include/linux/shmem_fs.h,
rather than in the very overloaded include/linux/mm.h.
You will need a !CONFIG_SHMEM stub for shmem_locate(),
or whatever it ends up being called.

>  bool shmem_mapping(struct address_space *mapping);

Oh, you're following a precedent, that's already bad placement.
And it (but not its !CONFIG_SHMEM stub) is duplicated in shmem_fs.h.
Perhaps because we were moving shmem_zero_setup() from mm.h to shmem_fs.h
some time ago, but never got around to cleaning up the old location.

Well, please place the new ones in shmem_fs.h, and I ought to clean
up the rest at a time which does not interfere with you.

>  #else
>  static inline bool shmem_mapping(struct address_space *mapping)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b16d3e7..8aa4892 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1341,6 +1341,35 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	return ret;
>  }
>  
> +int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count)

I don't find that a helpful name; but in 5/5 I question the info you're
gathering here - maybe a good name will be more obvious once we've cut
down what it's gathering.

I just noticed that in 5/5 you're using a walk->pte_hole across
empty extents: perhaps I'm prematurely optimizing, but that feels very
inefficient, maybe here you should use a radix_tree lookup of the extent.

If all we had to look up were the number of swap entries, in the vast
majority of cases shmem.c could just see info->swapped is 0 and spend
no time on radix_tree lookups at all.

But what happens here depends on what really needs to be shown in 5/5.

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/5] mm, shmem: Add shmem_locate function
@ 2014-08-01  5:01     ` Hugh Dickins
  0 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:01 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> The shmem subsytem is kind of a black box: the generic mm code can't

I'm happier with that black box than you are :)

> always know where a specific page physically is. This patch adds the
> shmem_locate() function to find out the physical location of shmem
> pages (resident, in swap or swapcache). If the optional argument count
> isn't NULL and the page is resident, it also returns the mapcount value
> of this page.
> This is intended to allow finer accounting of shmem/tmpfs pages.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> ---
>  include/linux/mm.h |  7 +++++++
>  mm/shmem.c         | 29 +++++++++++++++++++++++++++++
>  2 files changed, 36 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index e69ee9d..34099fa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1066,6 +1066,13 @@ extern bool skip_free_areas_node(unsigned int flags, int nid);
>  
>  int shmem_zero_setup(struct vm_area_struct *);
>  #ifdef CONFIG_SHMEM
> +
> +#define SHMEM_NOTPRESENT	1 /* page is not present in memory */
> +#define SHMEM_RESIDENT		2 /* page is resident in RAM */
> +#define SHMEM_SWAPCACHE		3 /* page is in swap cache */
> +#define SHMEM_SWAP		4 /* page is paged out */
> +
> +extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);

Please place these, or what's needed of them, in include/linux/shmem_fs.h,
rather than in the very overloaded include/linux/mm.h.
You will need a !CONFIG_SHMEM stub for shmem_locate(),
or whatever it ends up being called.

>  bool shmem_mapping(struct address_space *mapping);

Oh, you're following a precedent, that's already bad placement.
And it (but not its !CONFIG_SHMEM stub) is duplicated in shmem_fs.h.
Perhaps because we were moving shmem_zero_setup() from mm.h to shmem_fs.h
some time ago, but never got around to cleaning up the old location.

Well, please place the new ones in shmem_fs.h, and I ought to clean
up the rest at a time which does not interfere with you.

>  #else
>  static inline bool shmem_mapping(struct address_space *mapping)
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b16d3e7..8aa4892 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1341,6 +1341,35 @@ static int shmem_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
>  	return ret;
>  }
>  
> +int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count)

I don't find that a helpful name; but in 5/5 I question the info you're
gathering here - maybe a good name will be more obvious once we've cut
down what it's gathering.

I just noticed that in 5/5 you're using a walk->pte_hole across
empty extents: perhaps I'm prematurely optimizing, but that feels very
inefficient, maybe here you should use a radix_tree lookup of the extent.

If all we had to look up were the number of swap entries, in the vast
majority of cases shmem.c could just see info->swapped is 0 and spend
no time on radix_tree lookups at all.

But what happens here depends on what really needs to be shown in 5/5.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] mm, shmem: Add shmem_vma() helper
  2014-07-22 13:43   ` Jerome Marchand
@ 2014-08-01  5:03     ` Hugh Dickins
  -1 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:03 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: Oleg Nesterov, linux-mm, linux-kernel, linux-s390, linux-doc,
	Hugh Dickins, Arnaldo Carvalho de Melo, Ingo Molnar,
	Paul Mackerras, Peter Zijlstra, linux390, Heiko Carstens,
	Martin Schwidefsky, Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Add a simple helper to check if a vm area belongs to shmem.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> ---
>  include/linux/mm.h | 6 ++++++
>  mm/shmem.c         | 8 ++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 34099fa..04a58d1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
>  
>  extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
>  bool shmem_mapping(struct address_space *mapping);
> +bool shmem_vma(struct vm_area_struct *vma);
> +
>  #else
>  static inline bool shmem_mapping(struct address_space *mapping)
>  {
>  	return false;
>  }
> +static inline bool shmem_vma(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
>  #endif

I would prefer include/linux/shmem_fs.h for this (and one of us clean
up where the declarations of shmem_zero_setup and shmem_mapping live).

But if 4/5 goes away, then there will only be one user of shmem_vma(),
so in that case better just declare it (using shmem_mapping()) there
in task_mmu.c in the smaps patch.

>  
>  extern int can_do_mlock(void);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8aa4892..7d16227 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1483,6 +1483,14 @@ bool shmem_mapping(struct address_space *mapping)
>  	return mapping->backing_dev_info == &shmem_backing_dev_info;
>  }
>  
> +bool shmem_vma(struct vm_area_struct *vma)
> +{
> +	return (vma->vm_file &&
> +		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
> +		== &shmem_backing_dev_info);
> +

I agree with Oleg,
	vma->vm_file && shmem_mapping(file_inode(vma->vm_file)->i_mapping);
would be better,

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] mm, shmem: Add shmem_vma() helper
@ 2014-08-01  5:03     ` Hugh Dickins
  0 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:03 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: Oleg Nesterov, linux-mm, linux-kernel, linux-s390, linux-doc,
	Hugh Dickins, Arnaldo Carvalho de Melo, Ingo Molnar,
	Paul Mackerras, Peter Zijlstra, linux390, Heiko Carstens,
	Martin Schwidefsky, Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Add a simple helper to check if a vm area belongs to shmem.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> ---
>  include/linux/mm.h | 6 ++++++
>  mm/shmem.c         | 8 ++++++++
>  2 files changed, 14 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 34099fa..04a58d1 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
>  
>  extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
>  bool shmem_mapping(struct address_space *mapping);
> +bool shmem_vma(struct vm_area_struct *vma);
> +
>  #else
>  static inline bool shmem_mapping(struct address_space *mapping)
>  {
>  	return false;
>  }
> +static inline bool shmem_vma(struct vm_area_struct *vma)
> +{
> +	return false;
> +}
>  #endif

I would prefer include/linux/shmem_fs.h for this (and one of us clean
up where the declarations of shmem_zero_setup and shmem_mapping live).

But if 4/5 goes away, then there will only be one user of shmem_vma(),
so in that case better just declare it (using shmem_mapping()) there
in task_mmu.c in the smaps patch.

>  
>  extern int can_do_mlock(void);
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 8aa4892..7d16227 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -1483,6 +1483,14 @@ bool shmem_mapping(struct address_space *mapping)
>  	return mapping->backing_dev_info == &shmem_backing_dev_info;
>  }
>  
> +bool shmem_vma(struct vm_area_struct *vma)
> +{
> +	return (vma->vm_file &&
> +		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
> +		== &shmem_backing_dev_info);
> +

I agree with Oleg,
	vma->vm_file && shmem_mapping(file_inode(vma->vm_file)->i_mapping);
would be better,

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] mm, shmem: Add shmem swap memory accounting
  2014-07-22 13:43   ` Jerome Marchand
@ 2014-08-01  5:05     ` Hugh Dickins
  -1 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:05 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Adds get_mm_shswap() which compute the size of swaped out shmem. It
> does so by pagewalking the mm and using the new shmem_locate() function
> to get the physical location of shmem pages.
> The result is displayed in the new VmShSw line of /proc/<pid>/status.
> Use mm_walk an shmem_locate() to account paged out shmem pages.
> 
> It significantly slows down /proc/<pid>/status acccess speed when
> there is a big shmem mapping. If that is an issue, we can drop this
> patch and only display this counter in the inherently slower
> /proc/<pid>/smaps file (cf. next patch).
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

Definite NAK to this one.  As you guessed yourself, it is always a
mistake to add one potentially very slow-to-gather number to a stats
file showing a group of quickly gathered numbers.

Is there anything you could do instead?  I don't know if it's worth
the (little) extra mm_struct storage and maintenance, but you could
add a VmShmSize, which shows that subset of VmSize (total_vm) which
is occupied by shmem mappings.

It's ambiguous what to deduce when VmShm is less than VmShmSize:
the difference might be swapped out, it might be holes in the sparse
object, it might be instantiated in the object but never faulted
into the mapping: in general it will be a mix of all of those.
So, sometimes useful info, but easy to be misled by it.

As I say, I don't know if VmShmSize would be worth adding, given its
deficiencies; and it could be worked out from /proc/<pid>/maps anyway.

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] mm, shmem: Add shmem swap memory accounting
@ 2014-08-01  5:05     ` Hugh Dickins
  0 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:05 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Adds get_mm_shswap() which compute the size of swaped out shmem. It
> does so by pagewalking the mm and using the new shmem_locate() function
> to get the physical location of shmem pages.
> The result is displayed in the new VmShSw line of /proc/<pid>/status.
> Use mm_walk an shmem_locate() to account paged out shmem pages.
> 
> It significantly slows down /proc/<pid>/status acccess speed when
> there is a big shmem mapping. If that is an issue, we can drop this
> patch and only display this counter in the inherently slower
> /proc/<pid>/smaps file (cf. next patch).
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

Definite NAK to this one.  As you guessed yourself, it is always a
mistake to add one potentially very slow-to-gather number to a stats
file showing a group of quickly gathered numbers.

Is there anything you could do instead?  I don't know if it's worth
the (little) extra mm_struct storage and maintenance, but you could
add a VmShmSize, which shows that subset of VmSize (total_vm) which
is occupied by shmem mappings.

It's ambiguous what to deduce when VmShm is less than VmShmSize:
the difference might be swapped out, it might be holes in the sparse
object, it might be instantiated in the object but never faulted
into the mapping: in general it will be a mix of all of those.
So, sometimes useful info, but easy to be misled by it.

As I say, I don't know if VmShmSize would be worth adding, given its
deficiencies; and it could be worked out from /proc/<pid>/maps anyway.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps
  2014-07-22 13:43   ` Jerome Marchand
@ 2014-08-01  5:06     ` Hugh Dickins
  -1 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:06 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to
> /proc/<pid>/smaps for shmem mappings.
> 
> ShmOther: amount of memory that is currently resident in memory, not
> present in the page table of this process but present in the page
> table of an other process.
> ShmOrphan: amount of memory that is currently resident in memory but
> not present in any process page table. This can happens when a process
> unmaps a shared mapping it has accessed before or exits. Despite being
> resident, this memory is not currently accounted to any process.
> ShmSwapcache: amount of memory currently in swap cache
> ShmSwap: amount of memory that is paged out on disk.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

You will have to do a much better job of persuading me that these
numbers are of any interest.  Okay, maybe not me, I'm not that keen
on /proc/<pid>/smaps at the best of times.  But you will need to show
plausible cases where having these numbers available would have made
a real difference, and drum up support for their inclusion from
/proc/<pid>/smaps devotees.

Do you have a customer, who has underprovisioned with swap,
and wants these numbers to work out how much more is needed?

As it is, they appear to be numbers that you found you could provide,
and so you're adding them into /proc/<pid>/smaps, but having great
difficulty in finding good names to describe them - which is itself
an indicator that they're probably not the most useful statistics
a sysadmin is wanting.

(Google is a /proc/<pid>/smaps user: let's take a look to see if
we have been driven to add in stats of this kind: no, not at all.)

The more numbers we add to /proc/<pid>/smaps, the longer it will take to
print, the longer mmap_sem will be held, and the more it will interfere
with proper system operation - that's the concern I more often see.

> ---
>  Documentation/filesystems/proc.txt | 11 ++++++++
>  fs/proc/task_mmu.c                 | 56 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 1a15c56..a65ab59 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -422,6 +422,10 @@ Swap:                  0 kB
>  KernelPageSize:        4 kB
>  MMUPageSize:           4 kB
>  Locked:              374 kB
> +ShmOther:            124 kB
> +ShmOrphan:             0 kB
> +ShmSwapCache:         12 kB
> +ShmSwap:              36 kB
>  VmFlags: rd ex mr mw me de
>  
>  the first of these lines shows the same information as is displayed for the
> @@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
>  swap.
> +The ShmXXX lines only appears for shmem mapping. They show the amount of memory
> +from the mapping that is currently:
> + - resident in RAM, not present in the page table of this process but present
> + in the page table of an other process (ShmOther)

We don't show that for files of any other filesystem, why for shmem?
Perhaps you are too focussed on SysV SHM, and I am too focussed on tmpfs.

It is a very specialized statistic, and therefore hard to name: I don't
think ShmOther is a good name, but doubt any would do.  ShmOtherMapped?

> + - resident in RAM but not present in the page table of any process (ShmOrphan)

We don't show that for files of any other filesystem, why for shmem?

Orphan?  We do use the word "orphan" to describe pages which have been
truncated off a file, but somehow not yet removed from pagecache.  We
don't use the the word "orphan" to describe pagecache pages which are
not mapped into userspace - they are known as "pagecache pages which
are not mapped into userspace".  ShmNotMapped?

> + - in swap cache (ShmSwapCache)

Is this interesting?  It's a transitional state: either memory pressure
has forced the page to swapcache, but not yet freed it from memory; or
swapin_readahead has brought this page back in when bringing in a nearby
page of swap.

I can understand that we might want better stats on the behaviour of
swapin_readahead; better stats on shmem objects and swap; better stats
on duplication between pagecache and swap; but I'm not convinced that
/proc/<pid>/smaps is the right place for those.

Against all that, of course, we do have mincore() showing these pages
as incore, where /proc/<pid>/smaps does not.  But I think that is
justified by mincore()'s mission to show what's incore.

> + - paged out on swap (ShmSwap).

This one has the best case for inclusion: we do show Swap for the anon
pages which are out on swap, but not for the shmem areas, where swap
entry does not go into page table.  But there is good reason for that:
this is shared memory, files, objects commonly shared between
processes, so it's a poor fit then to account them by processes.

(We have "df" and "du" showing the occupancy of mounted tmpfs
filesystems: it would be nice if we had something like those,
which showed also the swap occupancy, and for the non-user-mounts.)

I need much more convincing on this patch: I expect you will drop
some of the numbers, and provide an argument for others.

Hugh

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps
@ 2014-08-01  5:06     ` Hugh Dickins
  0 siblings, 0 replies; 30+ messages in thread
From: Hugh Dickins @ 2014-08-01  5:06 UTC (permalink / raw)
  To: Jerome Marchand
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc, Hugh Dickins,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

On Tue, 22 Jul 2014, Jerome Marchand wrote:

> Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to
> /proc/<pid>/smaps for shmem mappings.
> 
> ShmOther: amount of memory that is currently resident in memory, not
> present in the page table of this process but present in the page
> table of an other process.
> ShmOrphan: amount of memory that is currently resident in memory but
> not present in any process page table. This can happens when a process
> unmaps a shared mapping it has accessed before or exits. Despite being
> resident, this memory is not currently accounted to any process.
> ShmSwapcache: amount of memory currently in swap cache
> ShmSwap: amount of memory that is paged out on disk.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>

You will have to do a much better job of persuading me that these
numbers are of any interest.  Okay, maybe not me, I'm not that keen
on /proc/<pid>/smaps at the best of times.  But you will need to show
plausible cases where having these numbers available would have made
a real difference, and drum up support for their inclusion from
/proc/<pid>/smaps devotees.

Do you have a customer, who has underprovisioned with swap,
and wants these numbers to work out how much more is needed?

As it is, they appear to be numbers that you found you could provide,
and so you're adding them into /proc/<pid>/smaps, but having great
difficulty in finding good names to describe them - which is itself
an indicator that they're probably not the most useful statistics
a sysadmin is wanting.

(Google is a /proc/<pid>/smaps user: let's take a look to see if
we have been driven to add in stats of this kind: no, not at all.)

The more numbers we add to /proc/<pid>/smaps, the longer it will take to
print, the longer mmap_sem will be held, and the more it will interfere
with proper system operation - that's the concern I more often see.

> ---
>  Documentation/filesystems/proc.txt | 11 ++++++++
>  fs/proc/task_mmu.c                 | 56 +++++++++++++++++++++++++++++++++++++-
>  2 files changed, 66 insertions(+), 1 deletion(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 1a15c56..a65ab59 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -422,6 +422,10 @@ Swap:                  0 kB
>  KernelPageSize:        4 kB
>  MMUPageSize:           4 kB
>  Locked:              374 kB
> +ShmOther:            124 kB
> +ShmOrphan:             0 kB
> +ShmSwapCache:         12 kB
> +ShmSwap:              36 kB
>  VmFlags: rd ex mr mw me de
>  
>  the first of these lines shows the same information as is displayed for the
> @@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
>  swap.
> +The ShmXXX lines only appears for shmem mapping. They show the amount of memory
> +from the mapping that is currently:
> + - resident in RAM, not present in the page table of this process but present
> + in the page table of an other process (ShmOther)

We don't show that for files of any other filesystem, why for shmem?
Perhaps you are too focussed on SysV SHM, and I am too focussed on tmpfs.

It is a very specialized statistic, and therefore hard to name: I don't
think ShmOther is a good name, but doubt any would do.  ShmOtherMapped?

> + - resident in RAM but not present in the page table of any process (ShmOrphan)

We don't show that for files of any other filesystem, why for shmem?

Orphan?  We do use the word "orphan" to describe pages which have been
truncated off a file, but somehow not yet removed from pagecache.  We
don't use the the word "orphan" to describe pagecache pages which are
not mapped into userspace - they are known as "pagecache pages which
are not mapped into userspace".  ShmNotMapped?

> + - in swap cache (ShmSwapCache)

Is this interesting?  It's a transitional state: either memory pressure
has forced the page to swapcache, but not yet freed it from memory; or
swapin_readahead has brought this page back in when bringing in a nearby
page of swap.

I can understand that we might want better stats on the behaviour of
swapin_readahead; better stats on shmem objects and swap; better stats
on duplication between pagecache and swap; but I'm not convinced that
/proc/<pid>/smaps is the right place for those.

Against all that, of course, we do have mincore() showing these pages
as incore, where /proc/<pid>/smaps does not.  But I think that is
justified by mincore()'s mission to show what's incore.

> + - paged out on swap (ShmSwap).

This one has the best case for inclusion: we do show Swap for the anon
pages which are out on swap, but not for the shmem areas, where swap
entry does not go into page table.  But there is good reason for that:
this is shared memory, files, objects commonly shared between
processes, so it's a poor fit then to account them by processes.

(We have "df" and "du" showing the occupancy of mounted tmpfs
filesystems: it would be nice if we had something like those,
which showed also the swap occupancy, and for the non-user-mounts.)

I need much more convincing on this patch: I expect you will drop
some of the numbers, and provide an argument for others.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/5] mm, shmem: Add shmem resident memory accounting
  2014-08-01  5:01     ` Hugh Dickins
  (?)
@ 2014-08-01 14:36     ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-08-01 14:36 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 4712 bytes --]

On 08/01/2014 07:01 AM, Hugh Dickins wrote:
> On Tue, 22 Jul 2014, Jerome Marchand wrote:
> 
>> Currently looking at /proc/<pid>/status or statm, there is no way to
>> distinguish shmem pages from pages mapped to a regular file (shmem
>> pages are mapped to /dev/zero), even though their implication in
>> actual memory use is quite different.
>> This patch adds MM_SHMEMPAGES counter to mm_rss_stat. It keeps track of
>> resident shmem memory size. Its value is exposed in the new VmShm line
>> of /proc/<pid>/status.
> 
> I like adding this info to /proc/<pid>/status - thank you -
> but I think you can make the patch much better in a couple of ways.
> 
>>
>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
>> ---
>>  Documentation/filesystems/proc.txt |  2 ++
>>  arch/s390/mm/pgtable.c             |  2 +-
>>  fs/proc/task_mmu.c                 |  9 ++++++---
>>  include/linux/mm.h                 |  7 +++++++
>>  include/linux/mm_types.h           |  7 ++++---
>>  kernel/events/uprobes.c            |  2 +-
>>  mm/filemap_xip.c                   |  2 +-
>>  mm/memory.c                        | 37 +++++++++++++++++++++++++++++++------
>>  mm/rmap.c                          |  8 ++++----
>>  9 files changed, 57 insertions(+), 19 deletions(-)
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index ddc531a..1c49957 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -171,6 +171,7 @@ read the file /proc/PID/status:
>>    VmLib:      1412 kB
>>    VmPTE:        20 kb
>>    VmSwap:        0 kB
>> +  VmShm:         0 kB
>>    Threads:        1
>>    SigQ:   0/28578
>>    SigPnd: 0000000000000000
>> @@ -228,6 +229,7 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
>>   VmLib                       size of shared library code
>>   VmPTE                       size of page table entries
>>   VmSwap                      size of swap usage (the number of referred swapents)
>> + VmShm	                      size of resident shmem memory
> 
> Needs to say that includes mappings of tmpfs, and needs to say that
> it's a subset of VmRSS.  Better placed immediately after VmRSS...
> 
> ...but now that I look through what's in /proc/<pid>/status, it appears
> that we have to defer to /proc/<pid>/statm to see MM_FILEPAGES (third
> field) and MM_ANONPAGES (subtract third field from second field).
> 
> That's not a very friendly interface.  If you're going to help by
> exposing MM_SHMPAGES separately, please help even more by exposing
> VmFile and VmAnon here in /proc/<pid>/status too.
> 

Good point.

> VmRSS, VmAnon, VmShm, VmFile?  I'm not sure what's the best order:
> here I'm thinking that anon comes before file in /proc/meminfo, and
> shm should be halfway between anon and file.  You may have another idea.
> 
> And of course the VmFile count here should exclude VmShm: I think it
> will work out least confusingly if you account MM_FILEPAGES separately
> from MM_SHMPAGES, but add them together where needed e.g. for statm.

I chose not to change MM_FILEPAGES to avoid to break anything, but it
might indeed look better not to have MM_SHMPAGES included in
MM_FILEPAGES. I'll look into it.

> 
>>   Threads                     number of threads
>>   SigQ                        number of signals queued/max. number for queue
>>   SigPnd                      bitmap of pending signals for the thread
>> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
>> index 37b8241..9fe31b0 100644
>> --- a/arch/s390/mm/pgtable.c
>> +++ b/arch/s390/mm/pgtable.c
>> @@ -612,7 +612,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
>>  		if (PageAnon(page))
>>  			dec_mm_counter(mm, MM_ANONPAGES);
>>  		else
>> -			dec_mm_counter(mm, MM_FILEPAGES);
>> +			dec_mm_file_counters(mm, page);
>>  	}
> 
> That is a recurring pattern: please try putting
> 
> static inline int mm_counter(struct page *page)
> {
> 	if (PageAnon(page))
> 		return MM_ANONPAGES;
> 	if (PageSwapBacked(page))
> 		return MM_SHMPAGES;
> 	return MM_FILEPAGES;
> }
> 
> in include/linux/mm.h.
> 
> Then dec_mm_counter(mm, mm_counter(page)) here, and wherever you can,
> use mm_counter(page) to simplify the code throughout.
> 
> I say "try" because I think factoring out mm_counter() will simplify
> the most code, given the profusion of different accessors, particularly
> in mm/memory.c.  But I'm not sure how much bloat having it as an inline
> function will add, versus how much overhead it would add if not inline.

I'll look into that.

Jerome

> 
> Hugh
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 3/5] mm, shmem: Add shmem_vma() helper
  2014-08-01  5:03     ` Hugh Dickins
  (?)
@ 2014-08-01 14:37     ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-08-01 14:37 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Oleg Nesterov, linux-mm, linux-kernel, linux-s390, linux-doc,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 2007 bytes --]

On 08/01/2014 07:03 AM, Hugh Dickins wrote:
> On Tue, 22 Jul 2014, Jerome Marchand wrote:
> 
>> Add a simple helper to check if a vm area belongs to shmem.
>>
>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
>> ---
>>  include/linux/mm.h | 6 ++++++
>>  mm/shmem.c         | 8 ++++++++
>>  2 files changed, 14 insertions(+)
>>
>> diff --git a/include/linux/mm.h b/include/linux/mm.h
>> index 34099fa..04a58d1 100644
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
>>  
>>  extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
>>  bool shmem_mapping(struct address_space *mapping);
>> +bool shmem_vma(struct vm_area_struct *vma);
>> +
>>  #else
>>  static inline bool shmem_mapping(struct address_space *mapping)
>>  {
>>  	return false;
>>  }
>> +static inline bool shmem_vma(struct vm_area_struct *vma)
>> +{
>> +	return false;
>> +}
>>  #endif
> 
> I would prefer include/linux/shmem_fs.h for this (and one of us clean
> up where the declarations of shmem_zero_setup and shmem_mapping live).
> 
> But if 4/5 goes away, then there will only be one user of shmem_vma(),
> so in that case better just declare it (using shmem_mapping()) there
> in task_mmu.c in the smaps patch.
> 
>>  
>>  extern int can_do_mlock(void);
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 8aa4892..7d16227 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -1483,6 +1483,14 @@ bool shmem_mapping(struct address_space *mapping)
>>  	return mapping->backing_dev_info == &shmem_backing_dev_info;
>>  }
>>  
>> +bool shmem_vma(struct vm_area_struct *vma)
>> +{
>> +	return (vma->vm_file &&
>> +		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
>> +		== &shmem_backing_dev_info);
>> +
> 
> I agree with Oleg,
> 	vma->vm_file && shmem_mapping(file_inode(vma->vm_file)->i_mapping);
> would be better,

Will do.

Jerome

> 
> Hugh
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 4/5] mm, shmem: Add shmem swap memory accounting
  2014-08-01  5:05     ` Hugh Dickins
  (?)
@ 2014-08-01 14:44     ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-08-01 14:44 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 2016 bytes --]

On 08/01/2014 07:05 AM, Hugh Dickins wrote:
> On Tue, 22 Jul 2014, Jerome Marchand wrote:
> 
>> Adds get_mm_shswap() which compute the size of swaped out shmem. It
>> does so by pagewalking the mm and using the new shmem_locate() function
>> to get the physical location of shmem pages.
>> The result is displayed in the new VmShSw line of /proc/<pid>/status.
>> Use mm_walk an shmem_locate() to account paged out shmem pages.
>>
>> It significantly slows down /proc/<pid>/status acccess speed when
>> there is a big shmem mapping. If that is an issue, we can drop this
>> patch and only display this counter in the inherently slower
>> /proc/<pid>/smaps file (cf. next patch).
>>
>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> 
> Definite NAK to this one.  As you guessed yourself, it is always a
> mistake to add one potentially very slow-to-gather number to a stats
> file showing a group of quickly gathered numbers.

What I was going for, is to have a counter for  shared swap in the same
way I did for VmShm, but I never found a way to do it. The reason I
posted this patch is that I hope than someone will have a better idea.

> 
> Is there anything you could do instead?  I don't know if it's worth
> the (little) extra mm_struct storage and maintenance, but you could
> add a VmShmSize, which shows that subset of VmSize (total_vm) which
> is occupied by shmem mappings.
> 
> It's ambiguous what to deduce when VmShm is less than VmShmSize:
> the difference might be swapped out, it might be holes in the sparse
> object, it might be instantiated in the object but never faulted
> into the mapping: in general it will be a mix of all of those.
> So, sometimes useful info, but easy to be misled by it.
> 
> As I say, I don't know if VmShmSize would be worth adding, given its
> deficiencies; and it could be worked out from /proc/<pid>/maps anyway.

I don't think that would be very useful. Sparse mapping are quite common.

Jerome

> 
> Hugh
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps
  2014-08-01  5:06     ` Hugh Dickins
  (?)
@ 2014-08-01 15:23     ` Jerome Marchand
  -1 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-08-01 15:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, linux-kernel, linux-s390, linux-doc,
	Arnaldo Carvalho de Melo, Ingo Molnar, Paul Mackerras,
	Peter Zijlstra, linux390, Heiko Carstens, Martin Schwidefsky,
	Randy Dunlap

[-- Attachment #1: Type: text/plain, Size: 6664 bytes --]

On 08/01/2014 07:06 AM, Hugh Dickins wrote:
> On Tue, 22 Jul 2014, Jerome Marchand wrote:
> 
>> Adds ShmOther, ShmOrphan, ShmSwapCache and ShmSwap lines to
>> /proc/<pid>/smaps for shmem mappings.
>>
>> ShmOther: amount of memory that is currently resident in memory, not
>> present in the page table of this process but present in the page
>> table of an other process.
>> ShmOrphan: amount of memory that is currently resident in memory but
>> not present in any process page table. This can happens when a process
>> unmaps a shared mapping it has accessed before or exits. Despite being
>> resident, this memory is not currently accounted to any process.
>> ShmSwapcache: amount of memory currently in swap cache
>> ShmSwap: amount of memory that is paged out on disk.
>>
>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> 
> You will have to do a much better job of persuading me that these
> numbers are of any interest.  Okay, maybe not me, I'm not that keen
> on /proc/<pid>/smaps at the best of times.  But you will need to show
> plausible cases where having these numbers available would have made
> a real difference, and drum up support for their inclusion from
> /proc/<pid>/smaps devotees.
> 
> Do you have a customer, who has underprovisioned with swap,
> and wants these numbers to work out how much more is needed?

We have a customer who needs to know how much memory a process with big
shared anonymous mappings have in swap.

> 
> As it is, they appear to be numbers that you found you could provide,
> and so you're adding them into /proc/<pid>/smaps, but having great
> difficulty in finding good names to describe them - which is itself
> an indicator that they're probably not the most useful statistics
> a sysadmin is wanting.

ShmSwap is obviously the stat I needed for our customer. I also have use
for the ill named ShmOrphan (see below). I may have add the two others
because there were low hanging fruits, or maybe because there were
useful to me for debugging. I will get rid of them.

> 
> (Google is a /proc/<pid>/smaps user: let's take a look to see if
> we have been driven to add in stats of this kind: no, not at all.)
> 
> The more numbers we add to /proc/<pid>/smaps, the longer it will take to
> print, the longer mmap_sem will be held, and the more it will interfere
> with proper system operation - that's the concern I more often see.
> 
>> ---
>>  Documentation/filesystems/proc.txt | 11 ++++++++
>>  fs/proc/task_mmu.c                 | 56 +++++++++++++++++++++++++++++++++++++-
>>  2 files changed, 66 insertions(+), 1 deletion(-)
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 1a15c56..a65ab59 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -422,6 +422,10 @@ Swap:                  0 kB
>>  KernelPageSize:        4 kB
>>  MMUPageSize:           4 kB
>>  Locked:              374 kB
>> +ShmOther:            124 kB
>> +ShmOrphan:             0 kB
>> +ShmSwapCache:         12 kB
>> +ShmSwap:              36 kB
>>  VmFlags: rd ex mr mw me de
>>  
>>  the first of these lines shows the same information as is displayed for the
>> @@ -437,6 +441,13 @@ a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>>  and a page is modified, the file page is replaced by a private anonymous copy.
>>  "Swap" shows how much would-be-anonymous memory is also used, but out on
>>  swap.
>> +The ShmXXX lines only appears for shmem mapping. They show the amount of memory
>> +from the mapping that is currently:
>> + - resident in RAM, not present in the page table of this process but present
>> + in the page table of an other process (ShmOther)
> 
> We don't show that for files of any other filesystem, why for shmem?
> Perhaps you are too focussed on SysV SHM, and I am too focussed on tmpfs.

I must admit that I see all this from SysV SHM / shared anon mappings
point of view.

> 
> It is a very specialized statistic, and therefore hard to name: I don't
> think ShmOther is a good name, but doubt any would do.  ShmOtherMapped?
> 
>> + - resident in RAM but not present in the page table of any process (ShmOrphan)
> 
> We don't show that for files of any other filesystem, why for shmem?

Because these pages can not be discarded of write back to disk. Under
memory pressure, they need space on swap or have to stay in RAM.

> 
> Orphan?  We do use the word "orphan" to describe pages which have been
> truncated off a file, but somehow not yet removed from pagecache.

I was unaware of that.

>  We
> don't use the the word "orphan" to describe pagecache pages which are
> not mapped into userspace - they are known as "pagecache pages which
> are not mapped into userspace".  ShmNotMapped?

I'm not sure about the terminology here. These pages are not mapped in
the sense that their map_count is zero, but they belong to a userspace
mapping.

> 
>> + - in swap cache (ShmSwapCache)
> 
> Is this interesting?  It's a transitional state: either memory pressure
> has forced the page to swapcache, but not yet freed it from memory; or
> swapin_readahead has brought this page back in when bringing in a nearby
> page of swap.
> 
> I can understand that we might want better stats on the behaviour of
> swapin_readahead; better stats on shmem objects and swap; better stats
> on duplication between pagecache and swap; but I'm not convinced that
> /proc/<pid>/smaps is the right place for those.
> 
> Against all that, of course, we do have mincore() showing these pages
> as incore, where /proc/<pid>/smaps does not.  But I think that is
> justified by mincore()'s mission to show what's incore.
> 
>> + - paged out on swap (ShmSwap).
> 
> This one has the best case for inclusion: we do show Swap for the anon
> pages which are out on swap, but not for the shmem areas, where swap
> entry does not go into page table.  But there is good reason for that:
> this is shared memory, files, objects commonly shared between
> processes, so it's a poor fit then to account them by processes.
> 
> (We have "df" and "du" showing the occupancy of mounted tmpfs
> filesystems: it would be nice if we had something like those,
> which showed also the swap occupancy, and for the non-user-mounts.)

I guess that works for tmpfs, but shared anon mappings are invisible to
these tools.

Jerome

> 
> I need much more convincing on this patch: I expect you will drop
> some of the numbers, and provide an argument for others.
> 
> Hugh
> 



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 538 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 3/5] mm, shmem: Add shmem_vma() helper
  2014-07-01 13:01 [PATCH 0/5] mm, shmem: Enhance per-process accounting of shared memnory Jerome Marchand
@ 2014-07-01 13:01   ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-01 13:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Hugh Dickins

Add a simple helper to check if a vm area belongs to shmem.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h | 6 ++++++
 mm/shmem.c         | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 34099fa..04a58d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
 
 extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
 bool shmem_mapping(struct address_space *mapping);
+bool shmem_vma(struct vm_area_struct *vma);
+
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
 {
 	return false;
 }
+static inline bool shmem_vma(struct vm_area_struct *vma)
+{
+	return false;
+}
 #endif
 
 extern int can_do_mlock(void);
diff --git a/mm/shmem.c b/mm/shmem.c
index 11b37a7..be87a20 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1447,6 +1447,14 @@ bool shmem_mapping(struct address_space *mapping)
 	return mapping->backing_dev_info == &shmem_backing_dev_info;
 }
 
+bool shmem_vma(struct vm_area_struct *vma)
+{
+	return (vma->vm_file &&
+		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
+		== &shmem_backing_dev_info);
+
+}
+
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
 static const struct inode_operations shmem_short_symlink_operations;
-- 
1.9.3


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 3/5] mm, shmem: Add shmem_vma() helper
@ 2014-07-01 13:01   ` Jerome Marchand
  0 siblings, 0 replies; 30+ messages in thread
From: Jerome Marchand @ 2014-07-01 13:01 UTC (permalink / raw)
  To: linux-mm; +Cc: linux-kernel, Hugh Dickins

Add a simple helper to check if a vm area belongs to shmem.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
---
 include/linux/mm.h | 6 ++++++
 mm/shmem.c         | 8 ++++++++
 2 files changed, 14 insertions(+)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 34099fa..04a58d1 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1074,11 +1074,17 @@ int shmem_zero_setup(struct vm_area_struct *);
 
 extern int shmem_locate(struct vm_area_struct *vma, pgoff_t pgoff, int *count);
 bool shmem_mapping(struct address_space *mapping);
+bool shmem_vma(struct vm_area_struct *vma);
+
 #else
 static inline bool shmem_mapping(struct address_space *mapping)
 {
 	return false;
 }
+static inline bool shmem_vma(struct vm_area_struct *vma)
+{
+	return false;
+}
 #endif
 
 extern int can_do_mlock(void);
diff --git a/mm/shmem.c b/mm/shmem.c
index 11b37a7..be87a20 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -1447,6 +1447,14 @@ bool shmem_mapping(struct address_space *mapping)
 	return mapping->backing_dev_info == &shmem_backing_dev_info;
 }
 
+bool shmem_vma(struct vm_area_struct *vma)
+{
+	return (vma->vm_file &&
+		vma->vm_file->f_dentry->d_inode->i_mapping->backing_dev_info
+		== &shmem_backing_dev_info);
+
+}
+
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
 static const struct inode_operations shmem_short_symlink_operations;
-- 
1.9.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2014-08-01 15:24 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-07-22 13:43 [PATCH RESEND 0/5] mm, shmem: Enhance per-process accounting of shared memory Jerome Marchand
2014-07-22 13:43 ` Jerome Marchand
2014-07-22 13:43 ` [PATCH 1/5] mm, shmem: Add shmem resident memory accounting Jerome Marchand
2014-07-22 13:43   ` Jerome Marchand
2014-08-01  5:01   ` Hugh Dickins
2014-08-01  5:01     ` Hugh Dickins
2014-08-01 14:36     ` Jerome Marchand
2014-07-22 13:43 ` [PATCH 2/5] mm, shmem: Add shmem_locate function Jerome Marchand
2014-07-22 13:43   ` Jerome Marchand
2014-08-01  5:01   ` Hugh Dickins
2014-08-01  5:01     ` Hugh Dickins
2014-07-22 13:43 ` [PATCH 3/5] mm, shmem: Add shmem_vma() helper Jerome Marchand
2014-07-22 13:43   ` Jerome Marchand
2014-07-24 19:53   ` Oleg Nesterov
2014-07-24 19:53     ` Oleg Nesterov
2014-08-01  5:03   ` Hugh Dickins
2014-08-01  5:03     ` Hugh Dickins
2014-08-01 14:37     ` Jerome Marchand
2014-07-22 13:43 ` [PATCH 4/5] mm, shmem: Add shmem swap memory accounting Jerome Marchand
2014-07-22 13:43   ` Jerome Marchand
2014-08-01  5:05   ` Hugh Dickins
2014-08-01  5:05     ` Hugh Dickins
2014-08-01 14:44     ` Jerome Marchand
2014-07-22 13:43 ` [PATCH 5/5] mm, shmem: Show location of non-resident shmem pages in smaps Jerome Marchand
2014-07-22 13:43   ` Jerome Marchand
2014-08-01  5:06   ` Hugh Dickins
2014-08-01  5:06     ` Hugh Dickins
2014-08-01 15:23     ` Jerome Marchand
  -- strict thread matches above, loose matches on Subject: below --
2014-07-01 13:01 [PATCH 0/5] mm, shmem: Enhance per-process accounting of shared memnory Jerome Marchand
2014-07-01 13:01 ` [PATCH 3/5] mm, shmem: Add shmem_vma() helper Jerome Marchand
2014-07-01 13:01   ` Jerome Marchand

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.