All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] enhance shmem process and swap accounting
@ 2015-03-27 16:40 ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

Changes since v1:
o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
  radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
  swap usage determination from the page walk, for performance reasons.
  Thanks to Jerome and Konstantin for the tips.
  The downside is that mm/shmem.c had to be touched.

This series is based on Jerome Marchand's [1] so let me quote the first
paragraph from there:

There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even
though theirs implication on memory usage are quite different: at
reclaim, file mapping can be dropped or write back on disk while shmem
needs a place in swap. As for shmem pages that are swapped-out or in
swap cache, they aren't accounted at all.

The original motivation for myself is that a customer found (IMHO rightfully)
confusing that e.g. top output for process swap usage is unreliable with
respect to swapped out shmem pages, which are not accounted for.

The fundamental difference between private anonymous and shmem pages is that
the latter has PTE's converted to pte_none, and not swapents. As such, they are
not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap
row. It might be theoretically possible to use swapents when swapping out shmem
(without extra cost, as one has to change all mappers anyway), and on swap in
only convert the swapent for the faulting process, leaving swapents in other
processes until they also fault (so again no extra cost). But I don't know how
many assumptions this would break, and it would be too disruptive change for a
relatively small benefit.

Instead, my approach is to document the limitation of VmSwap, and provide means
to determine the swap usage for shmem areas for those who are interested and
willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I
don't think it's possible to currently to determine the usage at all.  The
previous patchset [1] did introduce new shmem-specific fields into smaps
output, and functions to determine the values. I take a simpler approach,
noting that smaps output already has a "Swap: X kB" line, where currently X ==
0 always for shmem areas. I think we can just consider this a bug and provide
the proper value by consulting the radix tree, as e.g. mincore_page() does. In the
patch changelog I explain why this is also not perfect (and cannot be without
swapents), but still arguably much better than showing a 0.

The last two patches are adapted from Jerome's patchset and provide a VmRSS
breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that
this is a welcome addition, and I agree that it might help e.g. debugging
process memory usage at albeit non-zero, but still rather low cost of extra
per-mm counter and some page flag checks. I updated these patches to 4.0-rc1,
made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and
optimized the page flag checking somewhat.

[1] http://lwn.net/Articles/611966/

Jerome Marchand (2):
  mm, shmem: Add shmem resident memory accounting
  mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status

Vlastimil Babka (2):
  mm, documentation: clarify /proc/pid/status VmSwap limitations
  mm, proc: account for shmem swap in /proc/pid/smaps

 Documentation/filesystems/proc.txt | 15 +++++++++--
 arch/s390/mm/pgtable.c             |  5 +---
 fs/proc/task_mmu.c                 | 52 ++++++++++++++++++++++++++++++++++--
 include/linux/mm.h                 | 28 ++++++++++++++++++++
 include/linux/mm_types.h           |  9 ++++---
 include/linux/shmem_fs.h           |  6 +++++
 kernel/events/uprobes.c            |  2 +-
 mm/memory.c                        | 30 +++++++--------------
 mm/oom_kill.c                      |  5 ++--
 mm/rmap.c                          | 15 +++--------
 mm/shmem.c                         | 54 ++++++++++++++++++++++++++++++++++++++
 11 files changed, 176 insertions(+), 45 deletions(-)

-- 
2.1.4


^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 0/4] enhance shmem process and swap accounting
@ 2015-03-27 16:40 ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

Changes since v1:
o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
  radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
  swap usage determination from the page walk, for performance reasons.
  Thanks to Jerome and Konstantin for the tips.
  The downside is that mm/shmem.c had to be touched.

This series is based on Jerome Marchand's [1] so let me quote the first
paragraph from there:

There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even
though theirs implication on memory usage are quite different: at
reclaim, file mapping can be dropped or write back on disk while shmem
needs a place in swap. As for shmem pages that are swapped-out or in
swap cache, they aren't accounted at all.

The original motivation for myself is that a customer found (IMHO rightfully)
confusing that e.g. top output for process swap usage is unreliable with
respect to swapped out shmem pages, which are not accounted for.

The fundamental difference between private anonymous and shmem pages is that
the latter has PTE's converted to pte_none, and not swapents. As such, they are
not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap
row. It might be theoretically possible to use swapents when swapping out shmem
(without extra cost, as one has to change all mappers anyway), and on swap in
only convert the swapent for the faulting process, leaving swapents in other
processes until they also fault (so again no extra cost). But I don't know how
many assumptions this would break, and it would be too disruptive change for a
relatively small benefit.

Instead, my approach is to document the limitation of VmSwap, and provide means
to determine the swap usage for shmem areas for those who are interested and
willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I
don't think it's possible to currently to determine the usage at all.  The
previous patchset [1] did introduce new shmem-specific fields into smaps
output, and functions to determine the values. I take a simpler approach,
noting that smaps output already has a "Swap: X kB" line, where currently X ==
0 always for shmem areas. I think we can just consider this a bug and provide
the proper value by consulting the radix tree, as e.g. mincore_page() does. In the
patch changelog I explain why this is also not perfect (and cannot be without
swapents), but still arguably much better than showing a 0.

The last two patches are adapted from Jerome's patchset and provide a VmRSS
breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that
this is a welcome addition, and I agree that it might help e.g. debugging
process memory usage at albeit non-zero, but still rather low cost of extra
per-mm counter and some page flag checks. I updated these patches to 4.0-rc1,
made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and
optimized the page flag checking somewhat.

[1] http://lwn.net/Articles/611966/

Jerome Marchand (2):
  mm, shmem: Add shmem resident memory accounting
  mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status

Vlastimil Babka (2):
  mm, documentation: clarify /proc/pid/status VmSwap limitations
  mm, proc: account for shmem swap in /proc/pid/smaps

 Documentation/filesystems/proc.txt | 15 +++++++++--
 arch/s390/mm/pgtable.c             |  5 +---
 fs/proc/task_mmu.c                 | 52 ++++++++++++++++++++++++++++++++++--
 include/linux/mm.h                 | 28 ++++++++++++++++++++
 include/linux/mm_types.h           |  9 ++++---
 include/linux/shmem_fs.h           |  6 +++++
 kernel/events/uprobes.c            |  2 +-
 mm/memory.c                        | 30 +++++++--------------
 mm/oom_kill.c                      |  5 ++--
 mm/rmap.c                          | 15 +++--------
 mm/shmem.c                         | 54 ++++++++++++++++++++++++++++++++++++++
 11 files changed, 176 insertions(+), 45 deletions(-)

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
  2015-03-27 16:40 ` Vlastimil Babka
@ 2015-03-27 16:40   ` Vlastimil Babka
  -1 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

The documentation for /proc/pid/status does not mention that the value of
VmSwap counts only swapped out anonymous private pages and not shmem. This is
not obvious, so document this limitation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/filesystems/proc.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a07ba61..d4f56ec 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -231,6 +231,8 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmLib                       size of shared library code
  VmPTE                       size of page table entries
  VmSwap                      size of swap usage (the number of referred swapents)
+                             by anonymous private data (shmem swap usage is not
+                             included)
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
  SigPnd                      bitmap of pending signals for the thread
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
@ 2015-03-27 16:40   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

The documentation for /proc/pid/status does not mention that the value of
VmSwap counts only swapped out anonymous private pages and not shmem. This is
not obvious, so document this limitation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/filesystems/proc.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a07ba61..d4f56ec 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -231,6 +231,8 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmLib                       size of shared library code
  VmPTE                       size of page table entries
  VmSwap                      size of swap usage (the number of referred swapents)
+                             by anonymous private data (shmem swap usage is not
+                             included)
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
  SigPnd                      bitmap of pending signals for the thread
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-03-27 16:40 ` Vlastimil Babka
@ 2015-03-27 16:40   ` Vlastimil Babka
  -1 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/filesystems/proc.txt |  3 ++-
 fs/proc/task_mmu.c                 | 38 +++++++++++++++++++++++++++
 include/linux/shmem_fs.h           |  6 +++++
 mm/shmem.c                         | 54 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d4f56ec..8b30543 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed.
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
-swap.
+swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
+underlying shmem object is on swap.
 
 "VmFlags" field deserves a separate description. This member represents the kernel
 flags associated with the particular virtual memory area in two letter encoded
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6dee68d..1b271ec 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -13,6 +13,7 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/mmu_notifier.h>
+#include <linux/shmem_fs.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -610,6 +611,41 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 	seq_putc(m, '\n');
 }
 
+#if defined(CONFIG_SHMEM) && defined(CONFIG_SWAP)
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	struct inode *inode;
+	unsigned long swapped;
+	pgoff_t start, end;
+
+	if (!vma->vm_file)
+		return 0;
+
+	inode = file_inode(vma->vm_file);
+
+	if (!shmem_mapping(inode->i_mapping))
+		return 0;
+
+	swapped = shmem_swap_usage(inode);
+
+	if (swapped == 0)
+		return 0;
+
+	if (vma->vm_end - vma->vm_start >= inode->i_size)
+		return swapped;
+
+	start = linear_page_index(vma, vma->vm_start);
+	end = linear_page_index(vma, vma->vm_end);
+
+	return shmem_partial_swap_usage(inode->i_mapping, start, end);
+}
+#else
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
 	struct vm_area_struct *vma = v;
@@ -624,6 +660,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	/* mmap_sem is held in m_start */
 	walk_page_vma(vma, &smaps_walk);
 
+	mss.swap += smaps_shmem_swap(vma);
+
 	show_map_vma(m, vma, is_pid);
 
 	seq_printf(m,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 50777b5..12519e4 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end);
+#endif
+
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index cf2d0ca..f8ebd23 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -357,6 +357,60 @@ static int shmem_free_swap(struct address_space *mapping,
 	return 0;
 }
 
+#ifdef CONFIG_SWAP
+unsigned long shmem_swap_usage(struct inode *inode)
+{
+	struct shmem_inode_info *info = SHMEM_I(inode);
+	unsigned long swapped;
+
+	spin_lock(&info->lock);
+	swapped = info->swapped;
+	spin_unlock(&info->lock);
+
+	return swapped << PAGE_SHIFT;
+}
+
+unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end)
+{
+	struct radix_tree_iter iter;
+	void **slot;
+	struct page *page;
+	unsigned long swapped = 0;
+
+	rcu_read_lock();
+
+restart:
+	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+		if (iter.index >= end)
+			break;
+
+		page = radix_tree_deref_slot(slot);
+
+		/*
+		 * This should only be possible to happen at index 0, so we
+		 * don't need to reset the counter, nor do we risk infinite
+		 * restarts.
+		 */
+		if (radix_tree_deref_retry(page))
+			goto restart;
+
+		if (radix_tree_exceptional_entry(page))
+			swapped++;
+
+		if (need_resched()) {
+			cond_resched_rcu();
+			start = iter.index + 1;
+			goto restart;
+		}
+	}
+
+	rcu_read_unlock();
+
+	return swapped << PAGE_SHIFT;
+}
+#endif
+
 /*
  * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
  */
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-03-27 16:40   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/filesystems/proc.txt |  3 ++-
 fs/proc/task_mmu.c                 | 38 +++++++++++++++++++++++++++
 include/linux/shmem_fs.h           |  6 +++++
 mm/shmem.c                         | 54 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 100 insertions(+), 1 deletion(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index d4f56ec..8b30543 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -437,7 +437,8 @@ indicates the amount of memory currently marked as referenced or accessed.
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
-swap.
+swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
+underlying shmem object is on swap.
 
 "VmFlags" field deserves a separate description. This member represents the kernel
 flags associated with the particular virtual memory area in two letter encoded
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6dee68d..1b271ec 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -13,6 +13,7 @@
 #include <linux/swap.h>
 #include <linux/swapops.h>
 #include <linux/mmu_notifier.h>
+#include <linux/shmem_fs.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -610,6 +611,41 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma)
 	seq_putc(m, '\n');
 }
 
+#if defined(CONFIG_SHMEM) && defined(CONFIG_SWAP)
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	struct inode *inode;
+	unsigned long swapped;
+	pgoff_t start, end;
+
+	if (!vma->vm_file)
+		return 0;
+
+	inode = file_inode(vma->vm_file);
+
+	if (!shmem_mapping(inode->i_mapping))
+		return 0;
+
+	swapped = shmem_swap_usage(inode);
+
+	if (swapped == 0)
+		return 0;
+
+	if (vma->vm_end - vma->vm_start >= inode->i_size)
+		return swapped;
+
+	start = linear_page_index(vma, vma->vm_start);
+	end = linear_page_index(vma, vma->vm_end);
+
+	return shmem_partial_swap_usage(inode->i_mapping, start, end);
+}
+#else
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
 	struct vm_area_struct *vma = v;
@@ -624,6 +660,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	/* mmap_sem is held in m_start */
 	walk_page_vma(vma, &smaps_walk);
 
+	mss.swap += smaps_shmem_swap(vma);
+
 	show_map_vma(m, vma, is_pid);
 
 	seq_printf(m,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 50777b5..12519e4 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end);
+#endif
+
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index cf2d0ca..f8ebd23 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -357,6 +357,60 @@ static int shmem_free_swap(struct address_space *mapping,
 	return 0;
 }
 
+#ifdef CONFIG_SWAP
+unsigned long shmem_swap_usage(struct inode *inode)
+{
+	struct shmem_inode_info *info = SHMEM_I(inode);
+	unsigned long swapped;
+
+	spin_lock(&info->lock);
+	swapped = info->swapped;
+	spin_unlock(&info->lock);
+
+	return swapped << PAGE_SHIFT;
+}
+
+unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end)
+{
+	struct radix_tree_iter iter;
+	void **slot;
+	struct page *page;
+	unsigned long swapped = 0;
+
+	rcu_read_lock();
+
+restart:
+	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+		if (iter.index >= end)
+			break;
+
+		page = radix_tree_deref_slot(slot);
+
+		/*
+		 * This should only be possible to happen at index 0, so we
+		 * don't need to reset the counter, nor do we risk infinite
+		 * restarts.
+		 */
+		if (radix_tree_deref_retry(page))
+			goto restart;
+
+		if (radix_tree_exceptional_entry(page))
+			swapped++;
+
+		if (need_resched()) {
+			cond_resched_rcu();
+			start = iter.index + 1;
+			goto restart;
+		}
+	}
+
+	rcu_read_unlock();
+
+	return swapped << PAGE_SHIFT;
+}
+#endif
+
 /*
  * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
  */
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting
  2015-03-27 16:40 ` Vlastimil Babka
@ 2015-03-27 16:40   ` Vlastimil Babka
  -1 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

Currently looking at /proc/<pid>/status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem
pages are mapped to /dev/zero), even though their implication in
actual memory use is quite different.
This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
shmem pages instead of MM_FILEPAGES.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 arch/s390/mm/pgtable.c   |  5 +----
 fs/proc/task_mmu.c       |  3 ++-
 include/linux/mm.h       | 28 ++++++++++++++++++++++++++++
 include/linux/mm_types.h |  9 ++++++---
 kernel/events/uprobes.c  |  2 +-
 mm/memory.c              | 30 ++++++++++--------------------
 mm/oom_kill.c            |  5 +++--
 mm/rmap.c                | 15 ++++-----------
 8 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index b2c1542..5bffd5d 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -617,10 +617,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
 	else if (is_migration_entry(entry)) {
 		struct page *page = migration_entry_to_page(entry);
 
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	}
 	free_swap_and_cache(entry);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1b271ec..e86e137 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm,
 			 unsigned long *shared, unsigned long *text,
 			 unsigned long *data, unsigned long *resident)
 {
-	*shared = get_mm_counter(mm, MM_FILEPAGES);
+	*shared = get_mm_counter(mm, MM_FILEPAGES) +
+		get_mm_counter(mm, MM_SHMEMPAGES);
 	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
 								>> PAGE_SHIFT;
 	*data = mm->total_vm - mm->shared_vm;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a9392..adfbb5b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1364,6 +1364,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member)
 	return (unsigned long)val;
 }
 
+/* A wrapper for the CONFIG_SHMEM dependent counter */
+static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm)
+{
+#ifdef CONFIG_SHMEM
+	return get_mm_counter(mm, MM_SHMEMPAGES);
+#else
+	return 0;
+#endif
+}
+
 static inline void add_mm_counter(struct mm_struct *mm, int member, long value)
 {
 	atomic_long_add(value, &mm->rss_stat.count[member]);
@@ -1379,9 +1389,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
 	atomic_long_dec(&mm->rss_stat.count[member]);
 }
 
+/* Optimized variant when page is already known not to be PageAnon */
+static inline int mm_counter_file(struct page *page)
+{
+#ifdef CONFIG_SHMEM
+	if (PageSwapBacked(page))
+		return MM_SHMEMPAGES;
+#endif
+	return MM_FILEPAGES;
+}
+
+static inline int mm_counter(struct page *page)
+{
+	if (PageAnon(page))
+		return MM_ANONPAGES;
+	return mm_counter_file(page);
+}
+
 static inline unsigned long get_mm_rss(struct mm_struct *mm)
 {
 	return get_mm_counter(mm, MM_FILEPAGES) +
+		get_mm_counter_shmem(mm) +
 		get_mm_counter(mm, MM_ANONPAGES);
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 199a03a..d3c2372 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -327,9 +327,12 @@ struct core_state {
 };
 
 enum {
-	MM_FILEPAGES,
-	MM_ANONPAGES,
-	MM_SWAPENTS,
+	MM_FILEPAGES,	/* Resident file mapping pages */
+	MM_ANONPAGES,	/* Resident anonymous pages */
+	MM_SWAPENTS,	/* Anonymous swap entries */
+#ifdef CONFIG_SHMEM
+	MM_SHMEMPAGES,	/* Resident shared memory pages */
+#endif
 	NR_MM_COUNTERS
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index cb346f2..0a08fdd 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	lru_cache_add_active_or_unevictable(kpage, vma);
 
 	if (!PageAnon(page)) {
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 411144f..db66400 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		} else if (is_migration_entry(entry)) {
 			page = migration_entry_to_page(entry);
 
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]++;
-			else
-				rss[MM_FILEPAGES]++;
+			rss[mm_counter(page)]++;
 
 			if (is_write_migration_entry(entry) &&
 					is_cow_mapping(vm_flags)) {
@@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	if (page) {
 		get_page(page);
 		page_dup_rmap(page);
-		if (PageAnon(page))
-			rss[MM_ANONPAGES]++;
-		else
-			rss[MM_FILEPAGES]++;
+		rss[mm_counter(page)]++;
 	}
 
 out_set_pte:
@@ -1113,9 +1107,8 @@ again:
 			tlb_remove_tlb_entry(tlb, pte, addr);
 			if (unlikely(!page))
 				continue;
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else {
+
+			if (!PageAnon(page)) {
 				if (pte_dirty(ptent)) {
 					force_flush = 1;
 					set_page_dirty(page);
@@ -1123,8 +1116,8 @@ again:
 				if (pte_young(ptent) &&
 				    likely(!(vma->vm_flags & VM_SEQ_READ)))
 					mark_page_accessed(page);
-				rss[MM_FILEPAGES]--;
 			}
+			rss[mm_counter(page)]--;
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
@@ -1146,11 +1139,7 @@ again:
 			struct page *page;
 
 			page = migration_entry_to_page(entry);
-
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else
-				rss[MM_FILEPAGES]--;
+			rss[mm_counter(page)]--;
 		}
 		if (unlikely(!free_swap_and_cache(entry)))
 			print_bad_pte(vma, addr, ptent, NULL);
@@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
 
 	/* Ok, finally just insert the thing.. */
 	get_page(page);
-	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	inc_mm_counter_fast(mm, mm_counter_file(page));
 	page_add_file_rmap(page);
 	set_pte_at(mm, addr, pte, mk_pte(page, prot));
 
@@ -2174,7 +2163,8 @@ gotten:
 	if (likely(pte_same(*page_table, orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				dec_mm_counter_fast(mm,
+						mm_counter_file(old_page));
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
 		} else
@@ -2703,7 +2693,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, address);
 	} else {
-		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
+		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page);
 	}
 	set_pte_at(vma->vm_mm, address, pte, entry);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 642f38c..a5ee3a2 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -573,10 +573,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	/* mm cannot safely be dereferenced after task_unlock(victim) */
 	mm = victim->mm;
 	mark_tsk_oom_victim(victim);
-	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
+	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
 		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
 		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
-		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
+		K(get_mm_counter(victim->mm, MM_FILEPAGES)),
+		K(get_mm_counter_shmem(victim->mm)));
 	task_unlock(victim);
 
 	/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 5e3e090..e3c4392 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1216,12 +1216,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	update_hiwater_rss(mm);
 
 	if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
-		if (!PageHuge(page)) {
-			if (PageAnon(page))
-				dec_mm_counter(mm, MM_ANONPAGES);
-			else
-				dec_mm_counter(mm, MM_FILEPAGES);
-		}
+		if (!PageHuge(page))
+			dec_mm_counter(mm, mm_counter(page));
 		set_pte_at(mm, address, pte,
 			   swp_entry_to_pte(make_hwpoison_entry(page)));
 	} else if (pte_unused(pteval)) {
@@ -1230,10 +1226,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 * interest anymore. Simply discard the pte, vmscan
 		 * will take care of the rest.
 		 */
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		pte_t swp_pte;
@@ -1276,7 +1269,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		entry = make_migration_entry(page, pte_write(pteval));
 		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
 	} else
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 
 	page_remove_rmap(page);
 	page_cache_release(page);
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting
@ 2015-03-27 16:40   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

Currently looking at /proc/<pid>/status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem
pages are mapped to /dev/zero), even though their implication in
actual memory use is quite different.
This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
shmem pages instead of MM_FILEPAGES.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 arch/s390/mm/pgtable.c   |  5 +----
 fs/proc/task_mmu.c       |  3 ++-
 include/linux/mm.h       | 28 ++++++++++++++++++++++++++++
 include/linux/mm_types.h |  9 ++++++---
 kernel/events/uprobes.c  |  2 +-
 mm/memory.c              | 30 ++++++++++--------------------
 mm/oom_kill.c            |  5 +++--
 mm/rmap.c                | 15 ++++-----------
 8 files changed, 55 insertions(+), 42 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index b2c1542..5bffd5d 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -617,10 +617,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
 	else if (is_migration_entry(entry)) {
 		struct page *page = migration_entry_to_page(entry);
 
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	}
 	free_swap_and_cache(entry);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 1b271ec..e86e137 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -81,7 +81,8 @@ unsigned long task_statm(struct mm_struct *mm,
 			 unsigned long *shared, unsigned long *text,
 			 unsigned long *data, unsigned long *resident)
 {
-	*shared = get_mm_counter(mm, MM_FILEPAGES);
+	*shared = get_mm_counter(mm, MM_FILEPAGES) +
+		get_mm_counter(mm, MM_SHMEMPAGES);
 	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
 								>> PAGE_SHIFT;
 	*data = mm->total_vm - mm->shared_vm;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 47a9392..adfbb5b 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1364,6 +1364,16 @@ static inline unsigned long get_mm_counter(struct mm_struct *mm, int member)
 	return (unsigned long)val;
 }
 
+/* A wrapper for the CONFIG_SHMEM dependent counter */
+static inline unsigned long get_mm_counter_shmem(struct mm_struct *mm)
+{
+#ifdef CONFIG_SHMEM
+	return get_mm_counter(mm, MM_SHMEMPAGES);
+#else
+	return 0;
+#endif
+}
+
 static inline void add_mm_counter(struct mm_struct *mm, int member, long value)
 {
 	atomic_long_add(value, &mm->rss_stat.count[member]);
@@ -1379,9 +1389,27 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
 	atomic_long_dec(&mm->rss_stat.count[member]);
 }
 
+/* Optimized variant when page is already known not to be PageAnon */
+static inline int mm_counter_file(struct page *page)
+{
+#ifdef CONFIG_SHMEM
+	if (PageSwapBacked(page))
+		return MM_SHMEMPAGES;
+#endif
+	return MM_FILEPAGES;
+}
+
+static inline int mm_counter(struct page *page)
+{
+	if (PageAnon(page))
+		return MM_ANONPAGES;
+	return mm_counter_file(page);
+}
+
 static inline unsigned long get_mm_rss(struct mm_struct *mm)
 {
 	return get_mm_counter(mm, MM_FILEPAGES) +
+		get_mm_counter_shmem(mm) +
 		get_mm_counter(mm, MM_ANONPAGES);
 }
 
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 199a03a..d3c2372 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -327,9 +327,12 @@ struct core_state {
 };
 
 enum {
-	MM_FILEPAGES,
-	MM_ANONPAGES,
-	MM_SWAPENTS,
+	MM_FILEPAGES,	/* Resident file mapping pages */
+	MM_ANONPAGES,	/* Resident anonymous pages */
+	MM_SWAPENTS,	/* Anonymous swap entries */
+#ifdef CONFIG_SHMEM
+	MM_SHMEMPAGES,	/* Resident shared memory pages */
+#endif
 	NR_MM_COUNTERS
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index cb346f2..0a08fdd 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -188,7 +188,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	lru_cache_add_active_or_unevictable(kpage, vma);
 
 	if (!PageAnon(page)) {
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 411144f..db66400 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		} else if (is_migration_entry(entry)) {
 			page = migration_entry_to_page(entry);
 
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]++;
-			else
-				rss[MM_FILEPAGES]++;
+			rss[mm_counter(page)]++;
 
 			if (is_write_migration_entry(entry) &&
 					is_cow_mapping(vm_flags)) {
@@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	if (page) {
 		get_page(page);
 		page_dup_rmap(page);
-		if (PageAnon(page))
-			rss[MM_ANONPAGES]++;
-		else
-			rss[MM_FILEPAGES]++;
+		rss[mm_counter(page)]++;
 	}
 
 out_set_pte:
@@ -1113,9 +1107,8 @@ again:
 			tlb_remove_tlb_entry(tlb, pte, addr);
 			if (unlikely(!page))
 				continue;
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else {
+
+			if (!PageAnon(page)) {
 				if (pte_dirty(ptent)) {
 					force_flush = 1;
 					set_page_dirty(page);
@@ -1123,8 +1116,8 @@ again:
 				if (pte_young(ptent) &&
 				    likely(!(vma->vm_flags & VM_SEQ_READ)))
 					mark_page_accessed(page);
-				rss[MM_FILEPAGES]--;
 			}
+			rss[mm_counter(page)]--;
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
@@ -1146,11 +1139,7 @@ again:
 			struct page *page;
 
 			page = migration_entry_to_page(entry);
-
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else
-				rss[MM_FILEPAGES]--;
+			rss[mm_counter(page)]--;
 		}
 		if (unlikely(!free_swap_and_cache(entry)))
 			print_bad_pte(vma, addr, ptent, NULL);
@@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
 
 	/* Ok, finally just insert the thing.. */
 	get_page(page);
-	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	inc_mm_counter_fast(mm, mm_counter_file(page));
 	page_add_file_rmap(page);
 	set_pte_at(mm, addr, pte, mk_pte(page, prot));
 
@@ -2174,7 +2163,8 @@ gotten:
 	if (likely(pte_same(*page_table, orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				dec_mm_counter_fast(mm,
+						mm_counter_file(old_page));
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
 		} else
@@ -2703,7 +2693,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, address);
 	} else {
-		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
+		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page);
 	}
 	set_pte_at(vma->vm_mm, address, pte, entry);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 642f38c..a5ee3a2 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -573,10 +573,11 @@ void oom_kill_process(struct task_struct *p, gfp_t gfp_mask, int order,
 	/* mm cannot safely be dereferenced after task_unlock(victim) */
 	mm = victim->mm;
 	mark_tsk_oom_victim(victim);
-	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
+	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
 		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
 		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
-		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
+		K(get_mm_counter(victim->mm, MM_FILEPAGES)),
+		K(get_mm_counter_shmem(victim->mm)));
 	task_unlock(victim);
 
 	/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 5e3e090..e3c4392 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1216,12 +1216,8 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 	update_hiwater_rss(mm);
 
 	if (PageHWPoison(page) && !(flags & TTU_IGNORE_HWPOISON)) {
-		if (!PageHuge(page)) {
-			if (PageAnon(page))
-				dec_mm_counter(mm, MM_ANONPAGES);
-			else
-				dec_mm_counter(mm, MM_FILEPAGES);
-		}
+		if (!PageHuge(page))
+			dec_mm_counter(mm, mm_counter(page));
 		set_pte_at(mm, address, pte,
 			   swp_entry_to_pte(make_hwpoison_entry(page)));
 	} else if (pte_unused(pteval)) {
@@ -1230,10 +1226,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 * interest anymore. Simply discard the pte, vmscan
 		 * will take care of the rest.
 		 */
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		pte_t swp_pte;
@@ -1276,7 +1269,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		entry = make_migration_entry(page, pte_write(pteval));
 		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
 	} else
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 
 	page_remove_rmap(page);
 	page_cache_release(page);
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
  2015-03-27 16:40 ` Vlastimil Babka
@ 2015-03-27 16:40   ` Vlastimil Babka
  -1 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

It's currently inconvenient to retrieve MM_ANONPAGES value from status
and statm files and there is no way to separate MM_FILEPAGES and
MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status
to solve these issues.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/filesystems/proc.txt | 10 +++++++++-
 fs/proc/task_mmu.c                 | 13 +++++++++++--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 8b30543..c777adb 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -168,6 +168,9 @@ read the file /proc/PID/status:
   VmLck:         0 kB
   VmHWM:       476 kB
   VmRSS:       476 kB
+  VmAnon:      352 kB
+  VmFile:      120 kB
+  VmShm:         4 kB
   VmData:      156 kB
   VmStk:        88 kB
   VmExe:        68 kB
@@ -224,7 +227,12 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmSize                      total program size
  VmLck                       locked memory size
  VmHWM                       peak resident set size ("high water mark")
- VmRSS                       size of memory portions
+ VmRSS                       size of memory portions. It contains the three
+                             following parts (VmRSS = VmAnon + VmFile + VmShm)
+ VmAnon                      size of resident anonymous memory
+ VmFile                      size of resident file mappings
+ VmShm                       size of resident shmem memory (includes SysV shm,
+                             mapping of tmpfs and shared anonymous mappings)
  VmData                      size of data, stack, and text segments
  VmStk                       size of data, stack, and text segments
  VmExe                       size of text segment
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e86e137..1eeacd3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -22,7 +22,7 @@
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap, ptes, pmds;
+	unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
 	/*
@@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	if (hiwater_rss < mm->hiwater_rss)
 		hiwater_rss = mm->hiwater_rss;
 
+	anon = get_mm_counter(mm, MM_ANONPAGES);
+	file = get_mm_counter(mm, MM_FILEPAGES);
+	shmem = get_mm_counter_shmem(mm);
 	data = mm->total_vm - mm->shared_vm - mm->stack_vm;
 	text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
 	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
@@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmPin:\t%8lu kB\n"
 		"VmHWM:\t%8lu kB\n"
 		"VmRSS:\t%8lu kB\n"
+		"VmAnon:\t%8lu kB\n"
+		"VmFile:\t%8lu kB\n"
+		"VmShm:\t%8lu kB\n"
 		"VmData:\t%8lu kB\n"
 		"VmStk:\t%8lu kB\n"
 		"VmExe:\t%8lu kB\n"
@@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		mm->pinned_vm << (PAGE_SHIFT-10),
 		hiwater_rss << (PAGE_SHIFT-10),
 		total_rss << (PAGE_SHIFT-10),
+		anon << (PAGE_SHIFT-10),
+		file << (PAGE_SHIFT-10),
+		shmem << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
 		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
 		ptes >> 10,
@@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm,
 			 unsigned long *data, unsigned long *resident)
 {
 	*shared = get_mm_counter(mm, MM_FILEPAGES) +
-		get_mm_counter(mm, MM_SHMEMPAGES);
+		get_mm_counter_shmem(mm);
 	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
 								>> PAGE_SHIFT;
 	*data = mm->total_vm - mm->shared_vm;
-- 
2.1.4


^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH v2 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
@ 2015-03-27 16:40   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-03-27 16:40 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

It's currently inconvenient to retrieve MM_ANONPAGES value from status
and statm files and there is no way to separate MM_FILEPAGES and
MM_SHMEMPAGES. Add VmAnon, VmFile and VmShm lines in /proc/<pid>/status
to solve these issues.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 Documentation/filesystems/proc.txt | 10 +++++++++-
 fs/proc/task_mmu.c                 | 13 +++++++++++--
 2 files changed, 20 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 8b30543..c777adb 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -168,6 +168,9 @@ read the file /proc/PID/status:
   VmLck:         0 kB
   VmHWM:       476 kB
   VmRSS:       476 kB
+  VmAnon:      352 kB
+  VmFile:      120 kB
+  VmShm:         4 kB
   VmData:      156 kB
   VmStk:        88 kB
   VmExe:        68 kB
@@ -224,7 +227,12 @@ Table 1-2: Contents of the status files (as of 2.6.30-rc7)
  VmSize                      total program size
  VmLck                       locked memory size
  VmHWM                       peak resident set size ("high water mark")
- VmRSS                       size of memory portions
+ VmRSS                       size of memory portions. It contains the three
+                             following parts (VmRSS = VmAnon + VmFile + VmShm)
+ VmAnon                      size of resident anonymous memory
+ VmFile                      size of resident file mappings
+ VmShm                       size of resident shmem memory (includes SysV shm,
+                             mapping of tmpfs and shared anonymous mappings)
  VmData                      size of data, stack, and text segments
  VmStk                       size of data, stack, and text segments
  VmExe                       size of text segment
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index e86e137..1eeacd3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -22,7 +22,7 @@
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap, ptes, pmds;
+	unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
 	/*
@@ -39,6 +39,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	if (hiwater_rss < mm->hiwater_rss)
 		hiwater_rss = mm->hiwater_rss;
 
+	anon = get_mm_counter(mm, MM_ANONPAGES);
+	file = get_mm_counter(mm, MM_FILEPAGES);
+	shmem = get_mm_counter_shmem(mm);
 	data = mm->total_vm - mm->shared_vm - mm->stack_vm;
 	text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK)) >> 10;
 	lib = (mm->exec_vm << (PAGE_SHIFT-10)) - text;
@@ -52,6 +55,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmPin:\t%8lu kB\n"
 		"VmHWM:\t%8lu kB\n"
 		"VmRSS:\t%8lu kB\n"
+		"VmAnon:\t%8lu kB\n"
+		"VmFile:\t%8lu kB\n"
+		"VmShm:\t%8lu kB\n"
 		"VmData:\t%8lu kB\n"
 		"VmStk:\t%8lu kB\n"
 		"VmExe:\t%8lu kB\n"
@@ -65,6 +71,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		mm->pinned_vm << (PAGE_SHIFT-10),
 		hiwater_rss << (PAGE_SHIFT-10),
 		total_rss << (PAGE_SHIFT-10),
+		anon << (PAGE_SHIFT-10),
+		file << (PAGE_SHIFT-10),
+		shmem << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
 		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
 		ptes >> 10,
@@ -82,7 +91,7 @@ unsigned long task_statm(struct mm_struct *mm,
 			 unsigned long *data, unsigned long *resident)
 {
 	*shared = get_mm_counter(mm, MM_FILEPAGES) +
-		get_mm_counter(mm, MM_SHMEMPAGES);
+		get_mm_counter_shmem(mm);
 	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
 								>> PAGE_SHIFT;
 	*data = mm->total_vm - mm->shared_vm;
-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting
  2015-03-27 16:40   ` Vlastimil Babka
@ 2015-03-27 17:09     ` Konstantin Khlebnikov
  -1 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-27 17:09 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API

On 27.03.2015 19:40, Vlastimil Babka wrote:
> From: Jerome Marchand <jmarchan@redhat.com>
>
> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.
> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
> shmem pages instead of MM_FILEPAGES.
>
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---


> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -327,9 +327,12 @@ struct core_state {
>   };
>
>   enum {
> -	MM_FILEPAGES,
> -	MM_ANONPAGES,
> -	MM_SWAPENTS,
> +	MM_FILEPAGES,	/* Resident file mapping pages */
> +	MM_ANONPAGES,	/* Resident anonymous pages */
> +	MM_SWAPENTS,	/* Anonymous swap entries */
> +#ifdef CONFIG_SHMEM
> +	MM_SHMEMPAGES,	/* Resident shared memory pages */
> +#endif

I prefer to keep that counter unconditionally:
kernel has MM_SWAPENTS even without CONFIG_SWAP.

>   	NR_MM_COUNTERS
>   };
>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting
@ 2015-03-27 17:09     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-03-27 17:09 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API

On 27.03.2015 19:40, Vlastimil Babka wrote:
> From: Jerome Marchand <jmarchan@redhat.com>
>
> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.
> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
> shmem pages instead of MM_FILEPAGES.
>
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---


> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -327,9 +327,12 @@ struct core_state {
>   };
>
>   enum {
> -	MM_FILEPAGES,
> -	MM_ANONPAGES,
> -	MM_SWAPENTS,
> +	MM_FILEPAGES,	/* Resident file mapping pages */
> +	MM_ANONPAGES,	/* Resident anonymous pages */
> +	MM_SWAPENTS,	/* Anonymous swap entries */
> +#ifdef CONFIG_SHMEM
> +	MM_SHMEMPAGES,	/* Resident shared memory pages */
> +#endif

I prefer to keep that counter unconditionally:
kernel has MM_SWAPENTS even without CONFIG_SWAP.

>   	NR_MM_COUNTERS
>   };
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/4] enhance shmem process and swap accounting
@ 2015-05-14 11:16   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-05-14 11:16 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On 03/27/2015 05:40 PM, Vlastimil Babka wrote:
> Changes since v1:
> o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
>    radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
>    swap usage determination from the page walk, for performance reasons.
>    Thanks to Jerome and Konstantin for the tips.
>    The downside is that mm/shmem.c had to be touched.
>

Ping? I've got only a minor suggestion from Konstantin and no more 
feedback. Hugh?

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/4] enhance shmem process and swap accounting
@ 2015-05-14 11:16   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-05-14 11:16 UTC (permalink / raw)
  To: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jerome Marchand
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Andrew Morton,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Hugh Dickins, Michal Hocko,
	Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On 03/27/2015 05:40 PM, Vlastimil Babka wrote:
> Changes since v1:
> o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
>    radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
>    swap usage determination from the page walk, for performance reasons.
>    Thanks to Jerome and Konstantin for the tips.
>    The downside is that mm/shmem.c had to be touched.
>

Ping? I've got only a minor suggestion from Konstantin and no more 
feedback. Hugh?

Thanks,
Vlastimil

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 0/4] enhance shmem process and swap accounting
@ 2015-05-14 11:16   ` Vlastimil Babka
  0 siblings, 0 replies; 18+ messages in thread
From: Vlastimil Babka @ 2015-05-14 11:16 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On 03/27/2015 05:40 PM, Vlastimil Babka wrote:
> Changes since v1:
> o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
>    radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
>    swap usage determination from the page walk, for performance reasons.
>    Thanks to Jerome and Konstantin for the tips.
>    The downside is that mm/shmem.c had to be touched.
>

Ping? I've got only a minor suggestion from Konstantin and no more 
feedback. Hugh?

Thanks,
Vlastimil

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting
  2015-03-27 17:09     ` Konstantin Khlebnikov
  (?)
@ 2015-05-14 11:17     ` Vlastimil Babka
  2015-05-14 13:31         ` Konstantin Khlebnikov
  -1 siblings, 1 reply; 18+ messages in thread
From: Vlastimil Babka @ 2015-05-14 11:17 UTC (permalink / raw)
  To: Konstantin Khlebnikov, linux-mm, Jerome Marchand
  Cc: linux-kernel, Andrew Morton, linux-doc, Hugh Dickins,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API

On 03/27/2015 06:09 PM, Konstantin Khlebnikov wrote:
> On 27.03.2015 19:40, Vlastimil Babka wrote:
>> From: Jerome Marchand <jmarchan@redhat.com>
>>
>> Currently looking at /proc/<pid>/status or statm, there is no way to
>> distinguish shmem pages from pages mapped to a regular file (shmem
>> pages are mapped to /dev/zero), even though their implication in
>> actual memory use is quite different.
>> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
>> shmem pages instead of MM_FILEPAGES.
>>
>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
>
>
>> --- a/include/linux/mm_types.h
>> +++ b/include/linux/mm_types.h
>> @@ -327,9 +327,12 @@ struct core_state {
>>    };
>>
>>    enum {
>> -	MM_FILEPAGES,
>> -	MM_ANONPAGES,
>> -	MM_SWAPENTS,
>> +	MM_FILEPAGES,	/* Resident file mapping pages */
>> +	MM_ANONPAGES,	/* Resident anonymous pages */
>> +	MM_SWAPENTS,	/* Anonymous swap entries */
>> +#ifdef CONFIG_SHMEM
>> +	MM_SHMEMPAGES,	/* Resident shared memory pages */
>> +#endif
>
> I prefer to keep that counter unconditionally:
> kernel has MM_SWAPENTS even without CONFIG_SWAP.

Hmm, so just for consistency? I don't see much reason to make life 
harder for tiny systems, especially when it's not too much effort.

>
>>    	NR_MM_COUNTERS
>>    };
>>
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting
  2015-05-14 11:17     ` Vlastimil Babka
@ 2015-05-14 13:31         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-14 13:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Konstantin Khlebnikov, linux-mm, Jerome Marchand,
	Linux Kernel Mailing List, Andrew Morton, linux-doc,
	Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API

On Thu, May 14, 2015 at 2:17 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 03/27/2015 06:09 PM, Konstantin Khlebnikov wrote:
>>
>> On 27.03.2015 19:40, Vlastimil Babka wrote:
>>>
>>> From: Jerome Marchand <jmarchan@redhat.com>
>>>
>>> Currently looking at /proc/<pid>/status or statm, there is no way to
>>> distinguish shmem pages from pages mapped to a regular file (shmem
>>> pages are mapped to /dev/zero), even though their implication in
>>> actual memory use is quite different.
>>> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
>>> shmem pages instead of MM_FILEPAGES.
>>>
>>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>> ---
>>
>>
>>
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -327,9 +327,12 @@ struct core_state {
>>>    };
>>>
>>>    enum {
>>> -       MM_FILEPAGES,
>>> -       MM_ANONPAGES,
>>> -       MM_SWAPENTS,
>>> +       MM_FILEPAGES,   /* Resident file mapping pages */
>>> +       MM_ANONPAGES,   /* Resident anonymous pages */
>>> +       MM_SWAPENTS,    /* Anonymous swap entries */
>>> +#ifdef CONFIG_SHMEM
>>> +       MM_SHMEMPAGES,  /* Resident shared memory pages */
>>> +#endif
>>
>>
>> I prefer to keep that counter unconditionally:
>> kernel has MM_SWAPENTS even without CONFIG_SWAP.
>
>
> Hmm, so just for consistency? I don't see much reason to make life harder
> for tiny systems, especially when it's not too much effort.

Profit is vague, I guess slab anyway will round size to the next
cacheline or power-of-two.
That conditional (non)existence just adds unneeded code lines.

>
>>
>>>         NR_MM_COUNTERS
>>>    };
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting
@ 2015-05-14 13:31         ` Konstantin Khlebnikov
  0 siblings, 0 replies; 18+ messages in thread
From: Konstantin Khlebnikov @ 2015-05-14 13:31 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Konstantin Khlebnikov, linux-mm, Jerome Marchand,
	Linux Kernel Mailing List, Andrew Morton, linux-doc,
	Hugh Dickins, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API

On Thu, May 14, 2015 at 2:17 PM, Vlastimil Babka <vbabka@suse.cz> wrote:
> On 03/27/2015 06:09 PM, Konstantin Khlebnikov wrote:
>>
>> On 27.03.2015 19:40, Vlastimil Babka wrote:
>>>
>>> From: Jerome Marchand <jmarchan@redhat.com>
>>>
>>> Currently looking at /proc/<pid>/status or statm, there is no way to
>>> distinguish shmem pages from pages mapped to a regular file (shmem
>>> pages are mapped to /dev/zero), even though their implication in
>>> actual memory use is quite different.
>>> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
>>> shmem pages instead of MM_FILEPAGES.
>>>
>>> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
>>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>>> ---
>>
>>
>>
>>> --- a/include/linux/mm_types.h
>>> +++ b/include/linux/mm_types.h
>>> @@ -327,9 +327,12 @@ struct core_state {
>>>    };
>>>
>>>    enum {
>>> -       MM_FILEPAGES,
>>> -       MM_ANONPAGES,
>>> -       MM_SWAPENTS,
>>> +       MM_FILEPAGES,   /* Resident file mapping pages */
>>> +       MM_ANONPAGES,   /* Resident anonymous pages */
>>> +       MM_SWAPENTS,    /* Anonymous swap entries */
>>> +#ifdef CONFIG_SHMEM
>>> +       MM_SHMEMPAGES,  /* Resident shared memory pages */
>>> +#endif
>>
>>
>> I prefer to keep that counter unconditionally:
>> kernel has MM_SWAPENTS even without CONFIG_SWAP.
>
>
> Hmm, so just for consistency? I don't see much reason to make life harder
> for tiny systems, especially when it's not too much effort.

Profit is vague, I guess slab anyway will round size to the next
cacheline or power-of-two.
That conditional (non)existence just adds unneeded code lines.

>
>>
>>>         NR_MM_COUNTERS
>>>    };
>>>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2015-05-14 13:31 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-27 16:40 [PATCH v2 0/4] enhance shmem process and swap accounting Vlastimil Babka
2015-03-27 16:40 ` Vlastimil Babka
2015-03-27 16:40 ` [PATCH v2 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations Vlastimil Babka
2015-03-27 16:40   ` Vlastimil Babka
2015-03-27 16:40 ` [PATCH v2 2/4] mm, proc: account for shmem swap in /proc/pid/smaps Vlastimil Babka
2015-03-27 16:40   ` Vlastimil Babka
2015-03-27 16:40 ` [PATCH v2 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka
2015-03-27 16:40   ` Vlastimil Babka
2015-03-27 17:09   ` Konstantin Khlebnikov
2015-03-27 17:09     ` Konstantin Khlebnikov
2015-05-14 11:17     ` Vlastimil Babka
2015-05-14 13:31       ` Konstantin Khlebnikov
2015-05-14 13:31         ` Konstantin Khlebnikov
2015-03-27 16:40 ` [PATCH v2 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka
2015-03-27 16:40   ` Vlastimil Babka
2015-05-14 11:16 ` [PATCH v2 0/4] enhance shmem process and swap accounting Vlastimil Babka
2015-05-14 11:16   ` Vlastimil Babka
2015-05-14 11:16   ` Vlastimil Babka

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.