All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v4 0/4] enhance shmem process and swap accounting
@ 2015-10-02 13:35 ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

Hugh, you commented on the earlier patches from Jerome, can you take a look
please so we can have this merged? Especially at the changes to shmem.c
in Patch 2 and if they are indeed safe without extra locking. Thanks!

Changes since v3:
o Rebase on next-20151002
o Apply (feedb)acks from Michal Hocko and Konstantin Khlebnikov (Thanks!)
  - drop CONFIG_SHMEM ifdefs, as it was the 2nd suggestion already
  - add comments about not taking i_mutex in patch 2
o Rename VmAnon/VmFile/VmShm to RssAnon/RssFile... to make it hopefully more
  obvious that it's a breakdown of VmRSS. Naming things sucks.

Changes since v2:
o Rebase on next-20150805.
o This means that /proc/pid/maps has the proportional swap share (SwapPss:)
  field as per https://lkml.org/lkml/2015/6/15/274
  It's not clear what to do with shmem here so it's 0 for now.
  - swapped out shmem doesn't have swap entries, so we would have to look at who
    else has the shmem object (partially) mapped
  - to be more precise we should also check if his range actually includes 
    the offset in question, which could get rather involved
  - or is there some easy way I don't see?
o Konstantin suggested for patch 3/4 that I drop the CONFIG_SHMEM #ifdefs
  I didn't see the point in going against tinyfication when the work is
  already done, but I can do that if more people think it's better and it
  would block the series.

Changes since v1:
o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
  radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
  swap usage determination from the page walk, for performance reasons.
  Thanks to Jerome and Konstantin for the tips.
  The downside is that mm/shmem.c had to be touched.

This series is based on Jerome Marchand's [1] so let me quote the first
paragraph from there:

There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even
though theirs implication on memory usage are quite different: at
reclaim, file mapping can be dropped or write back on disk while shmem
needs a place in swap. As for shmem pages that are swapped-out or in
swap cache, they aren't accounted at all.

The original motivation for myself is that a customer found (IMHO rightfully)
confusing that e.g. top output for process swap usage is unreliable with
respect to swapped out shmem pages, which are not accounted for.

The fundamental difference between private anonymous and shmem pages is that
the latter has PTE's converted to pte_none, and not swapents. As such, they are
not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap
row. It might be theoretically possible to use swapents when swapping out shmem
(without extra cost, as one has to change all mappers anyway), and on swap in
only convert the swapent for the faulting process, leaving swapents in other
processes until they also fault (so again no extra cost). But I don't know how
many assumptions this would break, and it would be too disruptive change for a
relatively small benefit.

Instead, my approach is to document the limitation of VmSwap, and provide means
to determine the swap usage for shmem areas for those who are interested and
willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I
don't think it's possible to currently to determine the usage at all.  The
previous patchset [1] did introduce new shmem-specific fields into smaps
output, and functions to determine the values. I take a simpler approach,
noting that smaps output already has a "Swap: X kB" line, where currently X ==
0 always for shmem areas. I think we can just consider this a bug and provide
the proper value by consulting the radix tree, as e.g. mincore_page() does. In the
patch changelog I explain why this is also not perfect (and cannot be without
swapents), but still arguably much better than showing a 0.

The last two patches are adapted from Jerome's patchset and provide a VmRSS
breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that
this is a welcome addition, and I agree that it might help e.g. debugging
process memory usage at albeit non-zero, but still rather low cost of extra
per-mm counter and some page flag checks. I updated these patches to 4.0-rc1,
made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and
optimized the page flag checking somewhat.

[1] http://lwn.net/Articles/611966/

Jerome Marchand (2):
  mm, shmem: Add shmem resident memory accounting
  mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status

Vlastimil Babka (2):
  mm, documentation: clarify /proc/pid/status VmSwap limitations
  mm, proc: account for shmem swap in /proc/pid/smaps

 Documentation/filesystems/proc.txt | 18 +++++++++--
 arch/s390/mm/pgtable.c             |  5 +--
 fs/proc/task_mmu.c                 | 65 ++++++++++++++++++++++++++++++++++++--
 include/linux/mm.h                 | 18 ++++++++++-
 include/linux/mm_types.h           |  7 ++--
 include/linux/shmem_fs.h           |  6 ++++
 kernel/events/uprobes.c            |  2 +-
 mm/memory.c                        | 30 ++++++------------
 mm/oom_kill.c                      |  5 +--
 mm/rmap.c                          | 12 ++-----
 mm/shmem.c                         | 61 +++++++++++++++++++++++++++++++++++
 11 files changed, 183 insertions(+), 46 deletions(-)

-- 
2.5.2


^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 0/4] enhance shmem process and swap accounting
@ 2015-10-02 13:35 ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

Hugh, you commented on the earlier patches from Jerome, can you take a look
please so we can have this merged? Especially at the changes to shmem.c
in Patch 2 and if they are indeed safe without extra locking. Thanks!

Changes since v3:
o Rebase on next-20151002
o Apply (feedb)acks from Michal Hocko and Konstantin Khlebnikov (Thanks!)
  - drop CONFIG_SHMEM ifdefs, as it was the 2nd suggestion already
  - add comments about not taking i_mutex in patch 2
o Rename VmAnon/VmFile/VmShm to RssAnon/RssFile... to make it hopefully more
  obvious that it's a breakdown of VmRSS. Naming things sucks.

Changes since v2:
o Rebase on next-20150805.
o This means that /proc/pid/maps has the proportional swap share (SwapPss:)
  field as per https://lkml.org/lkml/2015/6/15/274
  It's not clear what to do with shmem here so it's 0 for now.
  - swapped out shmem doesn't have swap entries, so we would have to look at who
    else has the shmem object (partially) mapped
  - to be more precise we should also check if his range actually includes 
    the offset in question, which could get rather involved
  - or is there some easy way I don't see?
o Konstantin suggested for patch 3/4 that I drop the CONFIG_SHMEM #ifdefs
  I didn't see the point in going against tinyfication when the work is
  already done, but I can do that if more people think it's better and it
  would block the series.

Changes since v1:
o In Patch 2, rely on SHMEM_I(inode)->swapped if possible, and fallback to
  radix tree iterator on partially mapped shmem objects, i.e. decouple shmem
  swap usage determination from the page walk, for performance reasons.
  Thanks to Jerome and Konstantin for the tips.
  The downside is that mm/shmem.c had to be touched.

This series is based on Jerome Marchand's [1] so let me quote the first
paragraph from there:

There are several shortcomings with the accounting of shared memory
(sysV shm, shared anonymous mapping, mapping to a tmpfs file). The
values in /proc/<pid>/status and statm don't allow to distinguish
between shmem memory and a shared mapping to a regular file, even
though theirs implication on memory usage are quite different: at
reclaim, file mapping can be dropped or write back on disk while shmem
needs a place in swap. As for shmem pages that are swapped-out or in
swap cache, they aren't accounted at all.

The original motivation for myself is that a customer found (IMHO rightfully)
confusing that e.g. top output for process swap usage is unreliable with
respect to swapped out shmem pages, which are not accounted for.

The fundamental difference between private anonymous and shmem pages is that
the latter has PTE's converted to pte_none, and not swapents. As such, they are
not accounted to the number of swapents visible e.g. in /proc/pid/status VmSwap
row. It might be theoretically possible to use swapents when swapping out shmem
(without extra cost, as one has to change all mappers anyway), and on swap in
only convert the swapent for the faulting process, leaving swapents in other
processes until they also fault (so again no extra cost). But I don't know how
many assumptions this would break, and it would be too disruptive change for a
relatively small benefit.

Instead, my approach is to document the limitation of VmSwap, and provide means
to determine the swap usage for shmem areas for those who are interested and
willing to pay the price, using /proc/pid/smaps. Because outside of ipcs, I
don't think it's possible to currently to determine the usage at all.  The
previous patchset [1] did introduce new shmem-specific fields into smaps
output, and functions to determine the values. I take a simpler approach,
noting that smaps output already has a "Swap: X kB" line, where currently X ==
0 always for shmem areas. I think we can just consider this a bug and provide
the proper value by consulting the radix tree, as e.g. mincore_page() does. In the
patch changelog I explain why this is also not perfect (and cannot be without
swapents), but still arguably much better than showing a 0.

The last two patches are adapted from Jerome's patchset and provide a VmRSS
breakdown to VmAnon, VmFile and VmShm in /proc/pid/status. Hugh noted that
this is a welcome addition, and I agree that it might help e.g. debugging
process memory usage at albeit non-zero, but still rather low cost of extra
per-mm counter and some page flag checks. I updated these patches to 4.0-rc1,
made them respect !CONFIG_SHMEM so that tiny systems don't pay the cost, and
optimized the page flag checking somewhat.

[1] http://lwn.net/Articles/611966/

Jerome Marchand (2):
  mm, shmem: Add shmem resident memory accounting
  mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status

Vlastimil Babka (2):
  mm, documentation: clarify /proc/pid/status VmSwap limitations
  mm, proc: account for shmem swap in /proc/pid/smaps

 Documentation/filesystems/proc.txt | 18 +++++++++--
 arch/s390/mm/pgtable.c             |  5 +--
 fs/proc/task_mmu.c                 | 65 ++++++++++++++++++++++++++++++++++++--
 include/linux/mm.h                 | 18 ++++++++++-
 include/linux/mm_types.h           |  7 ++--
 include/linux/shmem_fs.h           |  6 ++++
 kernel/events/uprobes.c            |  2 +-
 mm/memory.c                        | 30 ++++++------------
 mm/oom_kill.c                      |  5 +--
 mm/rmap.c                          | 12 ++-----
 mm/shmem.c                         | 61 +++++++++++++++++++++++++++++++++++
 11 files changed, 183 insertions(+), 46 deletions(-)

-- 
2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
  2015-10-02 13:35 ` Vlastimil Babka
@ 2015-10-02 13:35   ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

The documentation for /proc/pid/status does not mention that the value of
VmSwap counts only swapped out anonymous private pages and not shmem. This is
not obvious, so document this limitation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a99b208..7ef50cb 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -239,6 +239,8 @@ Table 1-2: Contents of the status files (as of 4.1)
  VmPTE                       size of page table entries
  VmPMD                       size of second level page tables
  VmSwap                      size of swap usage (the number of referred swapents)
+                             by anonymous private data (shmem swap usage is not
+                             included)
  HugetlbPages                size of hugetlb memory portions
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
@ 2015-10-02 13:35   ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

The documentation for /proc/pid/status does not mention that the value of
VmSwap counts only swapped out anonymous private pages and not shmem. This is
not obvious, so document this limitation.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.txt | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index a99b208..7ef50cb 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -239,6 +239,8 @@ Table 1-2: Contents of the status files (as of 4.1)
  VmPTE                       size of page table entries
  VmPMD                       size of second level page tables
  VmSwap                      size of swap usage (the number of referred swapents)
+                             by anonymous private data (shmem swap usage is not
+                             included)
  HugetlbPages                size of hugetlb memory portions
  Threads                     number of threads
  SigQ                        number of signals queued/max. number for queue
-- 
2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-02 13:35 ` Vlastimil Babka
@ 2015-10-02 13:35   ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/filesystems/proc.txt |  6 ++--
 fs/proc/task_mmu.c                 | 48 ++++++++++++++++++++++++++++++
 include/linux/shmem_fs.h           |  6 ++++
 mm/shmem.c                         | 61 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 7ef50cb..82d3657 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -457,8 +457,10 @@ accessed.
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
-swap.
-"SwapPss" shows proportional swap share of this mapping.
+swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
+underlying shmem object is on swap.
+"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
+currently show 0 here.
 "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
 "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
 hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 04999b2..103457c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -14,6 +14,7 @@
 #include <linux/swapops.h>
 #include <linux/mmu_notifier.h>
 #include <linux/page_idle.h>
+#include <linux/shmem_fs.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
 }
 #endif /* HUGETLB_PAGE */
 
+#ifdef CONFIG_SHMEM
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	struct inode *inode;
+	unsigned long swapped;
+	pgoff_t start, end;
+
+	if (!vma->vm_file)
+		return 0;
+
+	inode = file_inode(vma->vm_file);
+
+	if (!shmem_mapping(inode->i_mapping))
+		return 0;
+
+	/*
+	 * The easier cases are when the shmem object has nothing in swap, or
+	 * we have the whole object mapped. Then we can simply use the stats
+	 * that are already tracked by shmem.
+	 */
+	swapped = shmem_swap_usage(inode);
+
+	if (swapped == 0)
+		return 0;
+
+	if (vma->vm_end - vma->vm_start >= inode->i_size)
+		return swapped;
+
+	/*
+	 * Here we have to inspect individual pages in our mapped range to
+	 * determine how much of them are swapped out. Thanks to RCU, we don't
+	 * need i_mutex to protect against truncating or hole punching.
+	 */
+	start = linear_page_index(vma, vma->vm_start);
+	end = linear_page_index(vma, vma->vm_end);
+
+	return shmem_partial_swap_usage(inode->i_mapping, start, end);
+}
+#else
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
 	struct vm_area_struct *vma = v;
@@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	/* mmap_sem is held in m_start */
 	walk_page_vma(vma, &smaps_walk);
 
+	mss.swap += smaps_shmem_swap(vma);
+
 	show_map_vma(m, vma, is_pid);
 
 	seq_printf(m,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 50777b5..12519e4 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end);
+#endif
+
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index b543cc7..b0e9e30 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -360,6 +360,67 @@ static int shmem_free_swap(struct address_space *mapping,
 }
 
 /*
+ * Determine (in bytes) how much of the whole shmem object is swapped out.
+ */
+unsigned long shmem_swap_usage(struct inode *inode)
+{
+	struct shmem_inode_info *info = SHMEM_I(inode);
+	unsigned long swapped;
+
+	/* Mostly an overkill, but it's not atomic64_t */
+	spin_lock(&info->lock);
+	swapped = info->swapped;
+	spin_unlock(&info->lock);
+
+	return swapped << PAGE_SHIFT;
+}
+
+/*
+ * Determine (in bytes) how many pages within the given range are swapped out.
+ *
+ * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
+ */
+unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end)
+{
+	struct radix_tree_iter iter;
+	void **slot;
+	struct page *page;
+	unsigned long swapped = 0;
+
+	rcu_read_lock();
+
+restart:
+	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+		if (iter.index >= end)
+			break;
+
+		page = radix_tree_deref_slot(slot);
+
+		/*
+		 * This should only be possible to happen at index 0, so we
+		 * don't need to reset the counter, nor do we risk infinite
+		 * restarts.
+		 */
+		if (radix_tree_deref_retry(page))
+			goto restart;
+
+		if (radix_tree_exceptional_entry(page))
+			swapped++;
+
+		if (need_resched()) {
+			cond_resched_rcu();
+			start = iter.index + 1;
+			goto restart;
+		}
+	}
+
+	rcu_read_unlock();
+
+	return swapped << PAGE_SHIFT;
+}
+
+/*
  * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
  */
 void shmem_unlock_mapping(struct address_space *mapping)
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-02 13:35   ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
mappings, even if the mapped portion does contain pages that were swapped out.
This is because unlike private anonymous mappings, shmem does not change pte
to swap entry, but pte_none when swapping the page out. In the smaps page
walk, such page thus looks like it was never faulted in.

This patch changes smaps_pte_entry() to determine the swap status for such
pte_none entries for shmem mappings, similarly to how mincore_page() does it.
Swapped out pages are thus accounted for.

The accounting is arguably still not as precise as for private anonymous
mappings, since now we will count also pages that the process in question never
accessed, but only another process populated them and then let them become
swapped out. I believe it is still less confusing and subtle than not showing
any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
performance issue for future accesses, and we cannot predict those for neither
kind of mapping.

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
---
 Documentation/filesystems/proc.txt |  6 ++--
 fs/proc/task_mmu.c                 | 48 ++++++++++++++++++++++++++++++
 include/linux/shmem_fs.h           |  6 ++++
 mm/shmem.c                         | 61 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 119 insertions(+), 2 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 7ef50cb..82d3657 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -457,8 +457,10 @@ accessed.
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
-swap.
-"SwapPss" shows proportional swap share of this mapping.
+swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
+underlying shmem object is on swap.
+"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
+currently show 0 here.
 "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
 "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
 hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 04999b2..103457c 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -14,6 +14,7 @@
 #include <linux/swapops.h>
 #include <linux/mmu_notifier.h>
 #include <linux/page_idle.h>
+#include <linux/shmem_fs.h>
 
 #include <asm/elf.h>
 #include <asm/uaccess.h>
@@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
 }
 #endif /* HUGETLB_PAGE */
 
+#ifdef CONFIG_SHMEM
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	struct inode *inode;
+	unsigned long swapped;
+	pgoff_t start, end;
+
+	if (!vma->vm_file)
+		return 0;
+
+	inode = file_inode(vma->vm_file);
+
+	if (!shmem_mapping(inode->i_mapping))
+		return 0;
+
+	/*
+	 * The easier cases are when the shmem object has nothing in swap, or
+	 * we have the whole object mapped. Then we can simply use the stats
+	 * that are already tracked by shmem.
+	 */
+	swapped = shmem_swap_usage(inode);
+
+	if (swapped == 0)
+		return 0;
+
+	if (vma->vm_end - vma->vm_start >= inode->i_size)
+		return swapped;
+
+	/*
+	 * Here we have to inspect individual pages in our mapped range to
+	 * determine how much of them are swapped out. Thanks to RCU, we don't
+	 * need i_mutex to protect against truncating or hole punching.
+	 */
+	start = linear_page_index(vma, vma->vm_start);
+	end = linear_page_index(vma, vma->vm_end);
+
+	return shmem_partial_swap_usage(inode->i_mapping, start, end);
+}
+#else
+static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
+{
+	return 0;
+}
+#endif
+
 static int show_smap(struct seq_file *m, void *v, int is_pid)
 {
 	struct vm_area_struct *vma = v;
@@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
 	/* mmap_sem is held in m_start */
 	walk_page_vma(vma, &smaps_walk);
 
+	mss.swap += smaps_shmem_swap(vma);
+
 	show_map_vma(m, vma, is_pid);
 
 	seq_printf(m,
diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index 50777b5..12519e4 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
+#ifdef CONFIG_SWAP
+extern unsigned long shmem_swap_usage(struct inode *inode);
+extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end);
+#endif
+
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)
 {
diff --git a/mm/shmem.c b/mm/shmem.c
index b543cc7..b0e9e30 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -360,6 +360,67 @@ static int shmem_free_swap(struct address_space *mapping,
 }
 
 /*
+ * Determine (in bytes) how much of the whole shmem object is swapped out.
+ */
+unsigned long shmem_swap_usage(struct inode *inode)
+{
+	struct shmem_inode_info *info = SHMEM_I(inode);
+	unsigned long swapped;
+
+	/* Mostly an overkill, but it's not atomic64_t */
+	spin_lock(&info->lock);
+	swapped = info->swapped;
+	spin_unlock(&info->lock);
+
+	return swapped << PAGE_SHIFT;
+}
+
+/*
+ * Determine (in bytes) how many pages within the given range are swapped out.
+ *
+ * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
+ */
+unsigned long shmem_partial_swap_usage(struct address_space *mapping,
+						pgoff_t start, pgoff_t end)
+{
+	struct radix_tree_iter iter;
+	void **slot;
+	struct page *page;
+	unsigned long swapped = 0;
+
+	rcu_read_lock();
+
+restart:
+	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
+		if (iter.index >= end)
+			break;
+
+		page = radix_tree_deref_slot(slot);
+
+		/*
+		 * This should only be possible to happen at index 0, so we
+		 * don't need to reset the counter, nor do we risk infinite
+		 * restarts.
+		 */
+		if (radix_tree_deref_retry(page))
+			goto restart;
+
+		if (radix_tree_exceptional_entry(page))
+			swapped++;
+
+		if (need_resched()) {
+			cond_resched_rcu();
+			start = iter.index + 1;
+			goto restart;
+		}
+	}
+
+	rcu_read_unlock();
+
+	return swapped << PAGE_SHIFT;
+}
+
+/*
  * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
  */
 void shmem_unlock_mapping(struct address_space *mapping)
-- 
2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting
  2015-10-02 13:35 ` Vlastimil Babka
@ 2015-10-02 13:35   ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

Currently looking at /proc/<pid>/status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem
pages are mapped to /dev/zero), even though their implication in
actual memory use is quite different.
This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
shmem pages instead of MM_FILEPAGES.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/mm/pgtable.c   |  5 +----
 fs/proc/task_mmu.c       |  3 ++-
 include/linux/mm.h       | 18 +++++++++++++++++-
 include/linux/mm_types.h |  7 ++++---
 kernel/events/uprobes.c  |  2 +-
 mm/memory.c              | 30 ++++++++++--------------------
 mm/oom_kill.c            |  5 +++--
 mm/rmap.c                | 12 +++---------
 8 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54ef3bc..9816f25 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -603,10 +603,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
 	else if (is_migration_entry(entry)) {
 		struct page *page = migration_entry_to_page(entry);
 
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	}
 	free_swap_and_cache(entry);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 103457c..9b9708e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -83,7 +83,8 @@ unsigned long task_statm(struct mm_struct *mm,
 			 unsigned long *shared, unsigned long *text,
 			 unsigned long *data, unsigned long *resident)
 {
-	*shared = get_mm_counter(mm, MM_FILEPAGES);
+	*shared = get_mm_counter(mm, MM_FILEPAGES) +
+			get_mm_counter(mm, MM_SHMEMPAGES);
 	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
 								>> PAGE_SHIFT;
 	*data = mm->total_vm - mm->shared_vm;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d30eea3..8be4efc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1361,10 +1361,26 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
 	atomic_long_dec(&mm->rss_stat.count[member]);
 }
 
+/* Optimized variant when page is already known not to be PageAnon */
+static inline int mm_counter_file(struct page *page)
+{
+	if (PageSwapBacked(page))
+		return MM_SHMEMPAGES;
+	return MM_FILEPAGES;
+}
+
+static inline int mm_counter(struct page *page)
+{
+	if (PageAnon(page))
+		return MM_ANONPAGES;
+	return mm_counter_file(page);
+}
+
 static inline unsigned long get_mm_rss(struct mm_struct *mm)
 {
 	return get_mm_counter(mm, MM_FILEPAGES) +
-		get_mm_counter(mm, MM_ANONPAGES);
+		get_mm_counter(mm, MM_ANONPAGES) +
+		get_mm_counter(mm, MM_SHMEMPAGES);
 }
 
 static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f8d1492..207890b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -369,9 +369,10 @@ struct core_state {
 };
 
 enum {
-	MM_FILEPAGES,
-	MM_ANONPAGES,
-	MM_SWAPENTS,
+	MM_FILEPAGES,	/* Resident file mapping pages */
+	MM_ANONPAGES,	/* Resident anonymous pages */
+	MM_SWAPENTS,	/* Anonymous swap entries */
+	MM_SHMEMPAGES,	/* Resident shared memory pages */
 	NR_MM_COUNTERS
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4e5e979..6288606 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -180,7 +180,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	lru_cache_add_active_or_unevictable(kpage, vma);
 
 	if (!PageAnon(page)) {
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 3bd465a..f10d458 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		} else if (is_migration_entry(entry)) {
 			page = migration_entry_to_page(entry);
 
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]++;
-			else
-				rss[MM_FILEPAGES]++;
+			rss[mm_counter(page)]++;
 
 			if (is_write_migration_entry(entry) &&
 					is_cow_mapping(vm_flags)) {
@@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	if (page) {
 		get_page(page);
 		page_dup_rmap(page);
-		if (PageAnon(page))
-			rss[MM_ANONPAGES]++;
-		else
-			rss[MM_FILEPAGES]++;
+		rss[mm_counter(page)]++;
 	}
 
 out_set_pte:
@@ -1113,9 +1107,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			tlb_remove_tlb_entry(tlb, pte, addr);
 			if (unlikely(!page))
 				continue;
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else {
+
+			if (!PageAnon(page)) {
 				if (pte_dirty(ptent)) {
 					force_flush = 1;
 					set_page_dirty(page);
@@ -1123,8 +1116,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_young(ptent) &&
 				    likely(!(vma->vm_flags & VM_SEQ_READ)))
 					mark_page_accessed(page);
-				rss[MM_FILEPAGES]--;
 			}
+			rss[mm_counter(page)]--;
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
@@ -1146,11 +1139,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 
 			page = migration_entry_to_page(entry);
-
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else
-				rss[MM_FILEPAGES]--;
+			rss[mm_counter(page)]--;
 		}
 		if (unlikely(!free_swap_and_cache(entry)))
 			print_bad_pte(vma, addr, ptent, NULL);
@@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
 
 	/* Ok, finally just insert the thing.. */
 	get_page(page);
-	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	inc_mm_counter_fast(mm, mm_counter_file(page));
 	page_add_file_rmap(page);
 	set_pte_at(mm, addr, pte, mk_pte(page, prot));
 
@@ -2097,7 +2086,8 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (likely(pte_same(*page_table, orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				dec_mm_counter_fast(mm,
+						mm_counter_file(old_page));
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
 		} else {
@@ -2818,7 +2808,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, address);
 	} else {
-		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
+		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page);
 	}
 	set_pte_at(vma->vm_mm, address, pte, entry);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4766e25..127e2d6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -567,10 +567,11 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	 */
 	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
 	mark_oom_victim(victim);
-	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
+	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
 		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
 		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
-		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
+		K(get_mm_counter(victim->mm, MM_FILEPAGES)),
+		K(get_mm_counter(victim->mm, MM_SHMEMPAGES)));
 	task_unlock(victim);
 
 	/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 0ce371a..6c89356 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1383,10 +1383,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		if (PageHuge(page)) {
 			hugetlb_count_sub(1 << compound_order(page), mm);
 		} else {
-			if (PageAnon(page))
-				dec_mm_counter(mm, MM_ANONPAGES);
-			else
-				dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_counter(mm, mm_counter(page));
 		}
 		set_pte_at(mm, address, pte,
 			   swp_entry_to_pte(make_hwpoison_entry(page)));
@@ -1396,10 +1393,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 * interest anymore. Simply discard the pte, vmscan
 		 * will take care of the rest.
 		 */
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		pte_t swp_pte;
@@ -1455,7 +1449,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		entry = make_migration_entry(page, pte_write(pteval));
 		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
 	} else
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 
 discard:
 	page_remove_rmap(page);
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting
@ 2015-10-02 13:35   ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

Currently looking at /proc/<pid>/status or statm, there is no way to
distinguish shmem pages from pages mapped to a regular file (shmem
pages are mapped to /dev/zero), even though their implication in
actual memory use is quite different.
This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
shmem pages instead of MM_FILEPAGES.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 arch/s390/mm/pgtable.c   |  5 +----
 fs/proc/task_mmu.c       |  3 ++-
 include/linux/mm.h       | 18 +++++++++++++++++-
 include/linux/mm_types.h |  7 ++++---
 kernel/events/uprobes.c  |  2 +-
 mm/memory.c              | 30 ++++++++++--------------------
 mm/oom_kill.c            |  5 +++--
 mm/rmap.c                | 12 +++---------
 8 files changed, 41 insertions(+), 41 deletions(-)

diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
index 54ef3bc..9816f25 100644
--- a/arch/s390/mm/pgtable.c
+++ b/arch/s390/mm/pgtable.c
@@ -603,10 +603,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
 	else if (is_migration_entry(entry)) {
 		struct page *page = migration_entry_to_page(entry);
 
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	}
 	free_swap_and_cache(entry);
 }
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 103457c..9b9708e 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -83,7 +83,8 @@ unsigned long task_statm(struct mm_struct *mm,
 			 unsigned long *shared, unsigned long *text,
 			 unsigned long *data, unsigned long *resident)
 {
-	*shared = get_mm_counter(mm, MM_FILEPAGES);
+	*shared = get_mm_counter(mm, MM_FILEPAGES) +
+			get_mm_counter(mm, MM_SHMEMPAGES);
 	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
 								>> PAGE_SHIFT;
 	*data = mm->total_vm - mm->shared_vm;
diff --git a/include/linux/mm.h b/include/linux/mm.h
index d30eea3..8be4efc 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -1361,10 +1361,26 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
 	atomic_long_dec(&mm->rss_stat.count[member]);
 }
 
+/* Optimized variant when page is already known not to be PageAnon */
+static inline int mm_counter_file(struct page *page)
+{
+	if (PageSwapBacked(page))
+		return MM_SHMEMPAGES;
+	return MM_FILEPAGES;
+}
+
+static inline int mm_counter(struct page *page)
+{
+	if (PageAnon(page))
+		return MM_ANONPAGES;
+	return mm_counter_file(page);
+}
+
 static inline unsigned long get_mm_rss(struct mm_struct *mm)
 {
 	return get_mm_counter(mm, MM_FILEPAGES) +
-		get_mm_counter(mm, MM_ANONPAGES);
+		get_mm_counter(mm, MM_ANONPAGES) +
+		get_mm_counter(mm, MM_SHMEMPAGES);
 }
 
 static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f8d1492..207890b 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -369,9 +369,10 @@ struct core_state {
 };
 
 enum {
-	MM_FILEPAGES,
-	MM_ANONPAGES,
-	MM_SWAPENTS,
+	MM_FILEPAGES,	/* Resident file mapping pages */
+	MM_ANONPAGES,	/* Resident anonymous pages */
+	MM_SWAPENTS,	/* Anonymous swap entries */
+	MM_SHMEMPAGES,	/* Resident shared memory pages */
 	NR_MM_COUNTERS
 };
 
diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
index 4e5e979..6288606 100644
--- a/kernel/events/uprobes.c
+++ b/kernel/events/uprobes.c
@@ -180,7 +180,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
 	lru_cache_add_active_or_unevictable(kpage, vma);
 
 	if (!PageAnon(page)) {
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 		inc_mm_counter(mm, MM_ANONPAGES);
 	}
 
diff --git a/mm/memory.c b/mm/memory.c
index 3bd465a..f10d458 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 		} else if (is_migration_entry(entry)) {
 			page = migration_entry_to_page(entry);
 
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]++;
-			else
-				rss[MM_FILEPAGES]++;
+			rss[mm_counter(page)]++;
 
 			if (is_write_migration_entry(entry) &&
 					is_cow_mapping(vm_flags)) {
@@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
 	if (page) {
 		get_page(page);
 		page_dup_rmap(page);
-		if (PageAnon(page))
-			rss[MM_ANONPAGES]++;
-		else
-			rss[MM_FILEPAGES]++;
+		rss[mm_counter(page)]++;
 	}
 
 out_set_pte:
@@ -1113,9 +1107,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			tlb_remove_tlb_entry(tlb, pte, addr);
 			if (unlikely(!page))
 				continue;
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else {
+
+			if (!PageAnon(page)) {
 				if (pte_dirty(ptent)) {
 					force_flush = 1;
 					set_page_dirty(page);
@@ -1123,8 +1116,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 				if (pte_young(ptent) &&
 				    likely(!(vma->vm_flags & VM_SEQ_READ)))
 					mark_page_accessed(page);
-				rss[MM_FILEPAGES]--;
 			}
+			rss[mm_counter(page)]--;
 			page_remove_rmap(page);
 			if (unlikely(page_mapcount(page) < 0))
 				print_bad_pte(vma, addr, ptent, page);
@@ -1146,11 +1139,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
 			struct page *page;
 
 			page = migration_entry_to_page(entry);
-
-			if (PageAnon(page))
-				rss[MM_ANONPAGES]--;
-			else
-				rss[MM_FILEPAGES]--;
+			rss[mm_counter(page)]--;
 		}
 		if (unlikely(!free_swap_and_cache(entry)))
 			print_bad_pte(vma, addr, ptent, NULL);
@@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
 
 	/* Ok, finally just insert the thing.. */
 	get_page(page);
-	inc_mm_counter_fast(mm, MM_FILEPAGES);
+	inc_mm_counter_fast(mm, mm_counter_file(page));
 	page_add_file_rmap(page);
 	set_pte_at(mm, addr, pte, mk_pte(page, prot));
 
@@ -2097,7 +2086,8 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (likely(pte_same(*page_table, orig_pte))) {
 		if (old_page) {
 			if (!PageAnon(old_page)) {
-				dec_mm_counter_fast(mm, MM_FILEPAGES);
+				dec_mm_counter_fast(mm,
+						mm_counter_file(old_page));
 				inc_mm_counter_fast(mm, MM_ANONPAGES);
 			}
 		} else {
@@ -2818,7 +2808,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
 		page_add_new_anon_rmap(page, vma, address);
 	} else {
-		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
+		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
 		page_add_file_rmap(page);
 	}
 	set_pte_at(vma->vm_mm, address, pte, entry);
diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 4766e25..127e2d6 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -567,10 +567,11 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
 	 */
 	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
 	mark_oom_victim(victim);
-	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
+	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
 		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
 		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
-		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
+		K(get_mm_counter(victim->mm, MM_FILEPAGES)),
+		K(get_mm_counter(victim->mm, MM_SHMEMPAGES)));
 	task_unlock(victim);
 
 	/*
diff --git a/mm/rmap.c b/mm/rmap.c
index 0ce371a..6c89356 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -1383,10 +1383,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		if (PageHuge(page)) {
 			hugetlb_count_sub(1 << compound_order(page), mm);
 		} else {
-			if (PageAnon(page))
-				dec_mm_counter(mm, MM_ANONPAGES);
-			else
-				dec_mm_counter(mm, MM_FILEPAGES);
+			dec_mm_counter(mm, mm_counter(page));
 		}
 		set_pte_at(mm, address, pte,
 			   swp_entry_to_pte(make_hwpoison_entry(page)));
@@ -1396,10 +1393,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		 * interest anymore. Simply discard the pte, vmscan
 		 * will take care of the rest.
 		 */
-		if (PageAnon(page))
-			dec_mm_counter(mm, MM_ANONPAGES);
-		else
-			dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter(page));
 	} else if (PageAnon(page)) {
 		swp_entry_t entry = { .val = page_private(page) };
 		pte_t swp_pte;
@@ -1455,7 +1449,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
 		entry = make_migration_entry(page, pte_write(pteval));
 		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
 	} else
-		dec_mm_counter(mm, MM_FILEPAGES);
+		dec_mm_counter(mm, mm_counter_file(page));
 
 discard:
 	page_remove_rmap(page);
-- 
2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
  2015-10-02 13:35 ` Vlastimil Babka
@ 2015-10-02 13:35   ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

It's currently inconvenient to retrieve MM_ANONPAGES value from status
and statm files and there is no way to separate MM_FILEPAGES and
MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc/<pid>/status
to solve these issues.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.txt | 10 +++++++++-
 fs/proc/task_mmu.c                 | 14 ++++++++++++--
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 82d3657..c887a42 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -169,6 +169,9 @@ For example, to get the status information of a process, all you have to do is
   VmLck:         0 kB
   VmHWM:       476 kB
   VmRSS:       476 kB
+  RssAnon      352 kB
+  RssFile:     120 kB
+  RssShm:        4 kB
   VmData:      156 kB
   VmStk:        88 kB
   VmExe:        68 kB
@@ -231,7 +234,12 @@ Table 1-2: Contents of the status files (as of 4.1)
  VmSize                      total program size
  VmLck                       locked memory size
  VmHWM                       peak resident set size ("high water mark")
- VmRSS                       size of memory portions
+ VmRSS                       size of memory portions. It contains the three
+                             following parts (VmRSS = RssAnon + RssFile + RssShm)
+ RssAnon                     size of resident anonymous memory
+ RssFile                     size of resident file mappings
+ RssShm                      size of resident shmem memory (includes SysV shm,
+                             mapping of tmpfs and shared anonymous mappings)
  VmData                      size of data, stack, and text segments
  VmStk                       size of data, stack, and text segments
  VmExe                       size of text segment
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 9b9708e..7332afd 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -23,9 +23,13 @@
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap, ptes, pmds;
+	unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
+	anon = get_mm_counter(mm, MM_ANONPAGES);
+	file = get_mm_counter(mm, MM_FILEPAGES);
+	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
+
 	/*
 	 * Note: to minimize their overhead, mm maintains hiwater_vm and
 	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
@@ -36,7 +40,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	hiwater_vm = total_vm = mm->total_vm;
 	if (hiwater_vm < mm->hiwater_vm)
 		hiwater_vm = mm->hiwater_vm;
-	hiwater_rss = total_rss = get_mm_rss(mm);
+	hiwater_rss = total_rss = anon + file + shmem;
 	if (hiwater_rss < mm->hiwater_rss)
 		hiwater_rss = mm->hiwater_rss;
 
@@ -53,6 +57,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmPin:\t%8lu kB\n"
 		"VmHWM:\t%8lu kB\n"
 		"VmRSS:\t%8lu kB\n"
+		"RssAnon:\t%8lu kB\n"
+		"RssFile:\t%8lu kB\n"
+		"RssShm:\t%8lu kB\n"
 		"VmData:\t%8lu kB\n"
 		"VmStk:\t%8lu kB\n"
 		"VmExe:\t%8lu kB\n"
@@ -66,6 +73,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		mm->pinned_vm << (PAGE_SHIFT-10),
 		hiwater_rss << (PAGE_SHIFT-10),
 		total_rss << (PAGE_SHIFT-10),
+		anon << (PAGE_SHIFT-10),
+		file << (PAGE_SHIFT-10),
+		shmem << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
 		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
 		ptes >> 10,
-- 
2.5.2


^ permalink raw reply related	[flat|nested] 40+ messages in thread

* [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
@ 2015-10-02 13:35   ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-02 13:35 UTC (permalink / raw)
  To: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov, Vlastimil Babka

From: Jerome Marchand <jmarchan@redhat.com>

It's currently inconvenient to retrieve MM_ANONPAGES value from status
and statm files and there is no way to separate MM_FILEPAGES and
MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc/<pid>/status
to solve these issues.

Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Acked-by: Michal Hocko <mhocko@suse.com>
---
 Documentation/filesystems/proc.txt | 10 +++++++++-
 fs/proc/task_mmu.c                 | 14 ++++++++++++--
 2 files changed, 21 insertions(+), 3 deletions(-)

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 82d3657..c887a42 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -169,6 +169,9 @@ For example, to get the status information of a process, all you have to do is
   VmLck:         0 kB
   VmHWM:       476 kB
   VmRSS:       476 kB
+  RssAnon      352 kB
+  RssFile:     120 kB
+  RssShm:        4 kB
   VmData:      156 kB
   VmStk:        88 kB
   VmExe:        68 kB
@@ -231,7 +234,12 @@ Table 1-2: Contents of the status files (as of 4.1)
  VmSize                      total program size
  VmLck                       locked memory size
  VmHWM                       peak resident set size ("high water mark")
- VmRSS                       size of memory portions
+ VmRSS                       size of memory portions. It contains the three
+                             following parts (VmRSS = RssAnon + RssFile + RssShm)
+ RssAnon                     size of resident anonymous memory
+ RssFile                     size of resident file mappings
+ RssShm                      size of resident shmem memory (includes SysV shm,
+                             mapping of tmpfs and shared anonymous mappings)
  VmData                      size of data, stack, and text segments
  VmStk                       size of data, stack, and text segments
  VmExe                       size of text segment
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 9b9708e..7332afd 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -23,9 +23,13 @@
 
 void task_mem(struct seq_file *m, struct mm_struct *mm)
 {
-	unsigned long data, text, lib, swap, ptes, pmds;
+	unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem;
 	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
 
+	anon = get_mm_counter(mm, MM_ANONPAGES);
+	file = get_mm_counter(mm, MM_FILEPAGES);
+	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
+
 	/*
 	 * Note: to minimize their overhead, mm maintains hiwater_vm and
 	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
@@ -36,7 +40,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 	hiwater_vm = total_vm = mm->total_vm;
 	if (hiwater_vm < mm->hiwater_vm)
 		hiwater_vm = mm->hiwater_vm;
-	hiwater_rss = total_rss = get_mm_rss(mm);
+	hiwater_rss = total_rss = anon + file + shmem;
 	if (hiwater_rss < mm->hiwater_rss)
 		hiwater_rss = mm->hiwater_rss;
 
@@ -53,6 +57,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		"VmPin:\t%8lu kB\n"
 		"VmHWM:\t%8lu kB\n"
 		"VmRSS:\t%8lu kB\n"
+		"RssAnon:\t%8lu kB\n"
+		"RssFile:\t%8lu kB\n"
+		"RssShm:\t%8lu kB\n"
 		"VmData:\t%8lu kB\n"
 		"VmStk:\t%8lu kB\n"
 		"VmExe:\t%8lu kB\n"
@@ -66,6 +73,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
 		mm->pinned_vm << (PAGE_SHIFT-10),
 		hiwater_rss << (PAGE_SHIFT-10),
 		total_rss << (PAGE_SHIFT-10),
+		anon << (PAGE_SHIFT-10),
+		file << (PAGE_SHIFT-10),
+		shmem << (PAGE_SHIFT-10),
 		data << (PAGE_SHIFT-10),
 		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
 		ptes >> 10,
-- 
2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
  2015-10-02 13:35   ` Vlastimil Babka
  (?)
@ 2015-10-02 14:56   ` Jerome Marchand
  -1 siblings, 0 replies; 40+ messages in thread
From: Jerome Marchand @ 2015-10-02 14:56 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

[-- Attachment #1: Type: text/plain, Size: 470 bytes --]

On 10/02/2015 03:35 PM, Vlastimil Babka wrote:
> The documentation for /proc/pid/status does not mention that the value of
> VmSwap counts only swapped out anonymous private pages and not shmem. This is
> not obvious, so document this limitation.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Acked-by: Michal Hocko <mhocko@suse.com>

Acked-by: Jerome Marchand <jmarchan@redhat.com>




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-02 13:35   ` Vlastimil Babka
  (?)
@ 2015-10-02 15:00   ` Jerome Marchand
  -1 siblings, 0 replies; 40+ messages in thread
From: Jerome Marchand @ 2015-10-02 15:00 UTC (permalink / raw)
  To: Vlastimil Babka, linux-mm, Andrew Morton, Hugh Dickins
  Cc: linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

On 10/02/2015 03:35 PM, Vlastimil Babka wrote:
> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.

Agreed, this is much better than the current situation. I don't think
there is such a thing as a perfect accounting of shared pages anyway.

> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Acked-by: Jerome Marchand <jmarchan@redhat.com>




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-02 15:20     ` Michal Hocko
  -1 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2015-10-02 15:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On Fri 02-10-15 15:35:49, Vlastimil Babka wrote:
> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Acked-by: Michal Hocko <mhocko@suse.com>

But I think comments explaining why i_mutex is not needed are
confusing and incomplete.
[...]
> +	/*
> +	 * Here we have to inspect individual pages in our mapped range to
> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
> +	 * need i_mutex to protect against truncating or hole punching.
> +	 */
> +	start = linear_page_index(vma, vma->vm_start);
> +	end = linear_page_index(vma, vma->vm_end);
> +
> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
[...]
> +/*
> + * Determine (in bytes) how many pages within the given range are swapped out.
> + *
> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
> + */
> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end)

AFAIU RCU only helps to prevent from accessing nodes which were freed
from the radix tree. The reason why we do not need to hold i_mutex is
that the radix tree iterator would break out of the loop if we entered
node which backed truncated range. At least this is my understanding, I
might be wrong here of course.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-02 15:20     ` Michal Hocko
  0 siblings, 0 replies; 40+ messages in thread
From: Michal Hocko @ 2015-10-02 15:20 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On Fri 02-10-15 15:35:49, Vlastimil Babka wrote:
> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Acked-by: Michal Hocko <mhocko@suse.com>

But I think comments explaining why i_mutex is not needed are
confusing and incomplete.
[...]
> +	/*
> +	 * Here we have to inspect individual pages in our mapped range to
> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
> +	 * need i_mutex to protect against truncating or hole punching.
> +	 */
> +	start = linear_page_index(vma, vma->vm_start);
> +	end = linear_page_index(vma, vma->vm_end);
> +
> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
[...]
> +/*
> + * Determine (in bytes) how many pages within the given range are swapped out.
> + *
> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
> + */
> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end)

AFAIU RCU only helps to prevent from accessing nodes which were freed
from the radix tree. The reason why we do not need to hold i_mutex is
that the radix tree iterator would break out of the loop if we entered
node which backed truncated range. At least this is my understanding, I
might be wrong here of course.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-02 22:37     ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-10-02 22:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> ...
>
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
>  
> +#ifdef CONFIG_SWAP
> +extern unsigned long shmem_swap_usage(struct inode *inode);
> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end);
> +#endif

CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.

I'd just do

--- a/include/linux/shmem_fs.h~mm-proc-account-for-shmem-swap-in-proc-pid-smaps-fix
+++ a/include/linux/shmem_fs.h
@@ -60,11 +60,9 @@ extern struct page *shmem_read_mapping_p
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
-#ifdef CONFIG_SWAP
 extern unsigned long shmem_swap_usage(struct inode *inode);
 extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 						pgoff_t start, pgoff_t end);
-#endif
 
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)


We don't need the ifdefs around declarations and they're a pain to
maintain and they'd add a *ton* of clutter if we even tried to do this
for real.

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-02 22:37     ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-10-02 22:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> ...
>
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
>  
> +#ifdef CONFIG_SWAP
> +extern unsigned long shmem_swap_usage(struct inode *inode);
> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end);
> +#endif

CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.

I'd just do

--- a/include/linux/shmem_fs.h~mm-proc-account-for-shmem-swap-in-proc-pid-smaps-fix
+++ a/include/linux/shmem_fs.h
@@ -60,11 +60,9 @@ extern struct page *shmem_read_mapping_p
 extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
 extern int shmem_unuse(swp_entry_t entry, struct page *page);
 
-#ifdef CONFIG_SWAP
 extern unsigned long shmem_swap_usage(struct inode *inode);
 extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
 						pgoff_t start, pgoff_t end);
-#endif
 
 static inline struct page *shmem_read_mapping_page(
 				struct address_space *mapping, pgoff_t index)


We don't need the ifdefs around declarations and they're a pain to
maintain and they'd add a *ton* of clutter if we even tried to do this
for real.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-02 22:37     ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-10-02 22:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On Fri,  2 Oct 2015 15:35:50 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> From: Jerome Marchand <jmarchan@redhat.com>

Changelog is a bit weird.

> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.

OK, that's a bunch of stuff about the user interface.

> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
> shmem pages instead of MM_FILEPAGES.

And that has nothing to do with the user interface.

So now this little reader is all confused.  The patch doesn't actually
address the described problem at all, does it?  It's preparatory stuff
only?  No changes to the kernel's user interface?



^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting
@ 2015-10-02 22:37     ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-10-02 22:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On Fri,  2 Oct 2015 15:35:50 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> From: Jerome Marchand <jmarchan@redhat.com>

Changelog is a bit weird.

> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.

OK, that's a bunch of stuff about the user interface.

> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
> shmem pages instead of MM_FILEPAGES.

And that has nothing to do with the user interface.

So now this little reader is all confused.  The patch doesn't actually
address the described problem at all, does it?  It's preparatory stuff
only?  No changes to the kernel's user interface?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-02 22:37     ` Andrew Morton
  -1 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-10-02 22:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On Fri,  2 Oct 2015 15:35:51 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> From: Jerome Marchand <jmarchan@redhat.com>
> 
> It's currently inconvenient to retrieve MM_ANONPAGES value from status
> and statm files and there is no way to separate MM_FILEPAGES and
> MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc/<pid>/status
> to solve these issues.

This changelog is also head-spinning.

Why is it talking about MM_ANONPAGES and MM_FILEPAGES in the context of
procfs files?  Those terms are kernel-internal stuff and are
meaningless to end users.

So can we please start over with the changelogs?

- What is wrong with the current user interface?

- What changes are we proposing making?

- What new fields are added to the UI?  What is their meaning to users?

- Are any existing UI fields altered?  If so how and why and what
  impact will that have?

Extra points will be awarded for example procfs output.

This is the important stuff!  Once this is all clearly described,
understood and reviewed, then we can get into the
kernel-internal-microdetails like MM_ANONPAGES.


(And "What is wrong with the current user interface?" is important. 
What value does this patchset provide to end users?  Why does anyone
even want these changes?)


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
@ 2015-10-02 22:37     ` Andrew Morton
  0 siblings, 0 replies; 40+ messages in thread
From: Andrew Morton @ 2015-10-02 22:37 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On Fri,  2 Oct 2015 15:35:51 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:

> From: Jerome Marchand <jmarchan@redhat.com>
> 
> It's currently inconvenient to retrieve MM_ANONPAGES value from status
> and statm files and there is no way to separate MM_FILEPAGES and
> MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc/<pid>/status
> to solve these issues.

This changelog is also head-spinning.

Why is it talking about MM_ANONPAGES and MM_FILEPAGES in the context of
procfs files?  Those terms are kernel-internal stuff and are
meaningless to end users.

So can we please start over with the changelogs?

- What is wrong with the current user interface?

- What changes are we proposing making?

- What new fields are added to the UI?  What is their meaning to users?

- Are any existing UI fields altered?  If so how and why and what
  impact will that have?

Extra points will be awarded for example procfs output.

This is the important stuff!  Once this is all clearly described,
understood and reviewed, then we can get into the
kernel-internal-microdetails like MM_ANONPAGES.


(And "What is wrong with the current user interface?" is important. 
What value does this patchset provide to end users?  Why does anyone
even want these changes?)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-05  1:05     ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  1:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> The documentation for /proc/pid/status does not mention that the value of
> VmSwap counts only swapped out anonymous private pages and not shmem. This is
> not obvious, so document this limitation.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  Documentation/filesystems/proc.txt | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index a99b208..7ef50cb 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -239,6 +239,8 @@ Table 1-2: Contents of the status files (as of 4.1)
>   VmPTE                       size of page table entries
>   VmPMD                       size of second level page tables
>   VmSwap                      size of swap usage (the number of referred swapents)
> +                             by anonymous private data (shmem swap usage is not
> +                             included)

I have difficulty in reading "size of swap usage (the number of referred
swapents) by anonymous private data (shmem swap usage is not included)".

Luckily, VmSwap never was "the number of referred swapents", it's in kB.
So I suggest                    amount of swap used by anonymous private data
                                (shmem swap usage is not included)

for which you can assume Acked-by: Hugh Dickins <hughd@google.com>

Hugh

>   HugetlbPages                size of hugetlb memory portions
>   Threads                     number of threads
>   SigQ                        number of signals queued/max. number for queue
> -- 
> 2.5.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
@ 2015-10-05  1:05     ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  1:05 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> The documentation for /proc/pid/status does not mention that the value of
> VmSwap counts only swapped out anonymous private pages and not shmem. This is
> not obvious, so document this limitation.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Acked-by: Michal Hocko <mhocko@suse.com>
> ---
>  Documentation/filesystems/proc.txt | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index a99b208..7ef50cb 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -239,6 +239,8 @@ Table 1-2: Contents of the status files (as of 4.1)
>   VmPTE                       size of page table entries
>   VmPMD                       size of second level page tables
>   VmSwap                      size of swap usage (the number of referred swapents)
> +                             by anonymous private data (shmem swap usage is not
> +                             included)

I have difficulty in reading "size of swap usage (the number of referred
swapents) by anonymous private data (shmem swap usage is not included)".

Luckily, VmSwap never was "the number of referred swapents", it's in kB.
So I suggest                    amount of swap used by anonymous private data
                                (shmem swap usage is not included)

for which you can assume Acked-by: Hugh Dickins <hughd@google.com>

Hugh

>   HugetlbPages                size of hugetlb memory portions
>   Threads                     number of threads
>   SigQ                        number of signals queued/max. number for queue
> -- 
> 2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-05  3:01     ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  3:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Neither Ack nor Nack from me.

I don't want to stand in the way of this patch, if you and others
believe that it will help to diagnose problems in the field better
than what's shown at present; but to me it looks dangerously like
replacing no information by wrong information.

As you acknowledge in the commit message, if a file of 100 pages
were copied to tmpfs, and 100 tasks map its full extent, but they
all mess around with the first 50 pages and take no interest in
the last 50, then it's quite likely that that last 50 will get
swapped out; then with your patch, 100 tasks are each shown as
using 50 pages of swap, when none of them are actually using any.

It is rather as if we didn't bother to record Rss, and just put
Size in there instead: you are (for understandable reasons) treating
the virtual address space as if every page of it had been touched.

But I accept that there may well be a class of processes and problems
which would be better served by this fiction than the present: I expect
you have much more experience of helping out in such situations than I.

And perhaps you do balance it nicely by going to the opposite extreme
with SwapPss 0 for all (again for the eminently understandable reason,
that it would be a whole lot more new code to work out the right number).
Altogther, you're saying everyone's using more swap than they probably
are, but that's okay because it's infinitely shared.

I am not at all angling for you or anyone to make the changes necessary
to make those numbers accurate.  I think there's a point at which we
stop cluttering up the core kernel code, just for the sake of
maintaining numbers for a /proc file someone thought was a good idea
at the time.  But I am hoping that if this patch goes in, you will take
responsibility for batting away all the complaints that it doesn't work
as this or that person expected, rather than a long stream of patches
to refine it.

I think the root problem is that we're trying to use /proc/<pid>/smaps
for something that's independent of <pid> and its maps: a shmem object.
Would we be better served by a tmpfs-ish filesystem mounted somewhere,
which gives names to all the objects on the internal mount of tmpfs
(SysV SHM, memfds etc); and some fincore-ish syscalls which could be
used to interrogate how much swap any tmpfs file is using in any range?
(I am not volunteering to write this, not in the foreseeable future.)

I have no idea of the security implications of naming the hidden, it
may be a non-starter.  And my guess is, it would be nice if it already
existed, but you need a solution today to some problems that have been
wasting your time; and grafting it into smaps looks to be good enough.

Some comments on your implementation below.

> ---
>  Documentation/filesystems/proc.txt |  6 ++--
>  fs/proc/task_mmu.c                 | 48 ++++++++++++++++++++++++++++++
>  include/linux/shmem_fs.h           |  6 ++++
>  mm/shmem.c                         | 61 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 119 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 7ef50cb..82d3657 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -457,8 +457,10 @@ accessed.
>  a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
> -swap.
> -"SwapPss" shows proportional swap share of this mapping.
> +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
> +underlying shmem object is on swap.

And for private mappings of tmpfs files?  I expected it to show an
inderminate mixture of the two, but it looks like you treat the private
mapping just like a shared one, and take no notice of the COWed pages
out on swap which would have been reported before.  Oh, no, I think
I misread, and you add the two together?  I agree that's the easiest
thing to do, and therefore perhaps the best; but it doesn't fill me
with conviction that it's the right thing to do. 

> +"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
> +currently show 0 here.

Yes, my heart sank when I remembered SwapPss, and I wondered what you were
going to do with that.  I was imagining that the Swap number would go into
SwapPss, but no, I prefer your choice to show 0 there (but depressed to
see the word "currently", which hints at grand schemes to plumb in another
radix_tree of swap counts, or more rmap_walks to calculate, or something).

>  "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
>  "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
>  hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 04999b2..103457c 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -14,6 +14,7 @@
>  #include <linux/swapops.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/page_idle.h>
> +#include <linux/shmem_fs.h>
>  
>  #include <asm/elf.h>
>  #include <asm/uaccess.h>
> @@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
>  }
>  #endif /* HUGETLB_PAGE */
>  
> +#ifdef CONFIG_SHMEM

Correct.

> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> +	struct inode *inode;
> +	unsigned long swapped;
> +	pgoff_t start, end;
> +
> +	if (!vma->vm_file)
> +		return 0;
> +
> +	inode = file_inode(vma->vm_file);
> +
> +	if (!shmem_mapping(inode->i_mapping))
> +		return 0;

Someone somewhere may ask for an ops method,
but that someone will certainly not be me.

> +
> +	/*
> +	 * The easier cases are when the shmem object has nothing in swap, or
> +	 * we have the whole object mapped. Then we can simply use the stats
> +	 * that are already tracked by shmem.
> +	 */
> +	swapped = shmem_swap_usage(inode);
> +
> +	if (swapped == 0)
> +		return 0;

You are absolutely right to go for that optimization, but please
please do it all inside one call to shmem.c: all you need is one
shmem_swap_usage(inode, start, end)
or
shmem_swap_usage(vma).

> +
> +	if (vma->vm_end - vma->vm_start >= inode->i_size)

That must be wrong.  It's probably right for all normal processes,
and you may not be interested in the rest; but anyone can set up
a mapping from end of file onwards, which won't intersect with the
swap at all.  Just a little more thought on that test would be good.

> +		return swapped;
> +
> +	/*
> +	 * Here we have to inspect individual pages in our mapped range to
> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
> +	 * need i_mutex to protect against truncating or hole punching.
> +	 */
> +	start = linear_page_index(vma, vma->vm_start);
> +	end = linear_page_index(vma, vma->vm_end);
> +
> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
> +}
> +#else
> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> +	return 0;
> +}
> +#endif
> +
>  static int show_smap(struct seq_file *m, void *v, int is_pid)
>  {
>  	struct vm_area_struct *vma = v;
> @@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  	/* mmap_sem is held in m_start */
>  	walk_page_vma(vma, &smaps_walk);
>  
> +	mss.swap += smaps_shmem_swap(vma);
> +

So, I think here you add the private swap to the object swap.

>  	show_map_vma(m, vma, is_pid);
>  
>  	seq_printf(m,
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 50777b5..12519e4 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
>  
> +#ifdef CONFIG_SWAP

As Andrew said, better just drop the #ifdef here.

> +extern unsigned long shmem_swap_usage(struct inode *inode);
> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end);
> +#endif
> +
>  static inline struct page *shmem_read_mapping_page(
>  				struct address_space *mapping, pgoff_t index)
>  {
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b543cc7..b0e9e30 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -360,6 +360,67 @@ static int shmem_free_swap(struct address_space *mapping,
>  }
>  
>  /*
> + * Determine (in bytes) how much of the whole shmem object is swapped out.
> + */
> +unsigned long shmem_swap_usage(struct inode *inode)
> +{
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	unsigned long swapped;
> +
> +	/* Mostly an overkill, but it's not atomic64_t */
> +	spin_lock(&info->lock);

Entirely overkill, what's atomic64_t got to do with it?
info->swapped is an unsigned long, 32-bit on 32-bit, 64-bit on 64-bit,
there are no atomicity issues.  READ_ONCE if you like, but I can't even
see where it would read twice, or what bad consequence could result.

> +	swapped = info->swapped;
> +	spin_unlock(&info->lock);
> +
> +	return swapped << PAGE_SHIFT;
> +}
> +
> +/*
> + * Determine (in bytes) how many pages within the given range are swapped out.
> + *
> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.

Correct.

> + */
> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end)
> +{
> +	struct radix_tree_iter iter;
> +	void **slot;
> +	struct page *page;
> +	unsigned long swapped = 0;
> +
> +	rcu_read_lock();
> +
> +restart:
> +	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
> +		if (iter.index >= end)
> +			break;
> +
> +		page = radix_tree_deref_slot(slot);
> +
> +		/*
> +		 * This should only be possible to happen at index 0, so we
> +		 * don't need to reset the counter, nor do we risk infinite
> +		 * restarts.
> +		 */
> +		if (radix_tree_deref_retry(page))
> +			goto restart;
> +
> +		if (radix_tree_exceptional_entry(page))
> +			swapped++;
> +
> +		if (need_resched()) {
> +			cond_resched_rcu();
> +			start = iter.index + 1;
> +			goto restart;
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +
> +	return swapped << PAGE_SHIFT;
> +}

This is what you most wanted me to look at, but it looks perfect to me
(aside from my wanting one call into shmem.c instead of two).

Hugh

> +
> +/*
>   * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
>   */
>  void shmem_unlock_mapping(struct address_space *mapping)
> -- 
> 2.5.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-05  3:01     ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  3:01 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> mappings, even if the mapped portion does contain pages that were swapped out.
> This is because unlike private anonymous mappings, shmem does not change pte
> to swap entry, but pte_none when swapping the page out. In the smaps page
> walk, such page thus looks like it was never faulted in.
> 
> This patch changes smaps_pte_entry() to determine the swap status for such
> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
> Swapped out pages are thus accounted for.
> 
> The accounting is arguably still not as precise as for private anonymous
> mappings, since now we will count also pages that the process in question never
> accessed, but only another process populated them and then let them become
> swapped out. I believe it is still less confusing and subtle than not showing
> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
> performance issue for future accesses, and we cannot predict those for neither
> kind of mapping.
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

Neither Ack nor Nack from me.

I don't want to stand in the way of this patch, if you and others
believe that it will help to diagnose problems in the field better
than what's shown at present; but to me it looks dangerously like
replacing no information by wrong information.

As you acknowledge in the commit message, if a file of 100 pages
were copied to tmpfs, and 100 tasks map its full extent, but they
all mess around with the first 50 pages and take no interest in
the last 50, then it's quite likely that that last 50 will get
swapped out; then with your patch, 100 tasks are each shown as
using 50 pages of swap, when none of them are actually using any.

It is rather as if we didn't bother to record Rss, and just put
Size in there instead: you are (for understandable reasons) treating
the virtual address space as if every page of it had been touched.

But I accept that there may well be a class of processes and problems
which would be better served by this fiction than the present: I expect
you have much more experience of helping out in such situations than I.

And perhaps you do balance it nicely by going to the opposite extreme
with SwapPss 0 for all (again for the eminently understandable reason,
that it would be a whole lot more new code to work out the right number).
Altogther, you're saying everyone's using more swap than they probably
are, but that's okay because it's infinitely shared.

I am not at all angling for you or anyone to make the changes necessary
to make those numbers accurate.  I think there's a point at which we
stop cluttering up the core kernel code, just for the sake of
maintaining numbers for a /proc file someone thought was a good idea
at the time.  But I am hoping that if this patch goes in, you will take
responsibility for batting away all the complaints that it doesn't work
as this or that person expected, rather than a long stream of patches
to refine it.

I think the root problem is that we're trying to use /proc/<pid>/smaps
for something that's independent of <pid> and its maps: a shmem object.
Would we be better served by a tmpfs-ish filesystem mounted somewhere,
which gives names to all the objects on the internal mount of tmpfs
(SysV SHM, memfds etc); and some fincore-ish syscalls which could be
used to interrogate how much swap any tmpfs file is using in any range?
(I am not volunteering to write this, not in the foreseeable future.)

I have no idea of the security implications of naming the hidden, it
may be a non-starter.  And my guess is, it would be nice if it already
existed, but you need a solution today to some problems that have been
wasting your time; and grafting it into smaps looks to be good enough.

Some comments on your implementation below.

> ---
>  Documentation/filesystems/proc.txt |  6 ++--
>  fs/proc/task_mmu.c                 | 48 ++++++++++++++++++++++++++++++
>  include/linux/shmem_fs.h           |  6 ++++
>  mm/shmem.c                         | 61 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 119 insertions(+), 2 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 7ef50cb..82d3657 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -457,8 +457,10 @@ accessed.
>  a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
> -swap.
> -"SwapPss" shows proportional swap share of this mapping.
> +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
> +underlying shmem object is on swap.

And for private mappings of tmpfs files?  I expected it to show an
inderminate mixture of the two, but it looks like you treat the private
mapping just like a shared one, and take no notice of the COWed pages
out on swap which would have been reported before.  Oh, no, I think
I misread, and you add the two together?  I agree that's the easiest
thing to do, and therefore perhaps the best; but it doesn't fill me
with conviction that it's the right thing to do. 

> +"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
> +currently show 0 here.

Yes, my heart sank when I remembered SwapPss, and I wondered what you were
going to do with that.  I was imagining that the Swap number would go into
SwapPss, but no, I prefer your choice to show 0 there (but depressed to
see the word "currently", which hints at grand schemes to plumb in another
radix_tree of swap counts, or more rmap_walks to calculate, or something).

>  "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
>  "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
>  hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 04999b2..103457c 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -14,6 +14,7 @@
>  #include <linux/swapops.h>
>  #include <linux/mmu_notifier.h>
>  #include <linux/page_idle.h>
> +#include <linux/shmem_fs.h>
>  
>  #include <asm/elf.h>
>  #include <asm/uaccess.h>
> @@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
>  }
>  #endif /* HUGETLB_PAGE */
>  
> +#ifdef CONFIG_SHMEM

Correct.

> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> +	struct inode *inode;
> +	unsigned long swapped;
> +	pgoff_t start, end;
> +
> +	if (!vma->vm_file)
> +		return 0;
> +
> +	inode = file_inode(vma->vm_file);
> +
> +	if (!shmem_mapping(inode->i_mapping))
> +		return 0;

Someone somewhere may ask for an ops method,
but that someone will certainly not be me.

> +
> +	/*
> +	 * The easier cases are when the shmem object has nothing in swap, or
> +	 * we have the whole object mapped. Then we can simply use the stats
> +	 * that are already tracked by shmem.
> +	 */
> +	swapped = shmem_swap_usage(inode);
> +
> +	if (swapped == 0)
> +		return 0;

You are absolutely right to go for that optimization, but please
please do it all inside one call to shmem.c: all you need is one
shmem_swap_usage(inode, start, end)
or
shmem_swap_usage(vma).

> +
> +	if (vma->vm_end - vma->vm_start >= inode->i_size)

That must be wrong.  It's probably right for all normal processes,
and you may not be interested in the rest; but anyone can set up
a mapping from end of file onwards, which won't intersect with the
swap at all.  Just a little more thought on that test would be good.

> +		return swapped;
> +
> +	/*
> +	 * Here we have to inspect individual pages in our mapped range to
> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
> +	 * need i_mutex to protect against truncating or hole punching.
> +	 */
> +	start = linear_page_index(vma, vma->vm_start);
> +	end = linear_page_index(vma, vma->vm_end);
> +
> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
> +}
> +#else
> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> +	return 0;
> +}
> +#endif
> +
>  static int show_smap(struct seq_file *m, void *v, int is_pid)
>  {
>  	struct vm_area_struct *vma = v;
> @@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  	/* mmap_sem is held in m_start */
>  	walk_page_vma(vma, &smaps_walk);
>  
> +	mss.swap += smaps_shmem_swap(vma);
> +

So, I think here you add the private swap to the object swap.

>  	show_map_vma(m, vma, is_pid);
>  
>  	seq_printf(m,
> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
> index 50777b5..12519e4 100644
> --- a/include/linux/shmem_fs.h
> +++ b/include/linux/shmem_fs.h
> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>  extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>  extern int shmem_unuse(swp_entry_t entry, struct page *page);
>  
> +#ifdef CONFIG_SWAP

As Andrew said, better just drop the #ifdef here.

> +extern unsigned long shmem_swap_usage(struct inode *inode);
> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end);
> +#endif
> +
>  static inline struct page *shmem_read_mapping_page(
>  				struct address_space *mapping, pgoff_t index)
>  {
> diff --git a/mm/shmem.c b/mm/shmem.c
> index b543cc7..b0e9e30 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -360,6 +360,67 @@ static int shmem_free_swap(struct address_space *mapping,
>  }
>  
>  /*
> + * Determine (in bytes) how much of the whole shmem object is swapped out.
> + */
> +unsigned long shmem_swap_usage(struct inode *inode)
> +{
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	unsigned long swapped;
> +
> +	/* Mostly an overkill, but it's not atomic64_t */
> +	spin_lock(&info->lock);

Entirely overkill, what's atomic64_t got to do with it?
info->swapped is an unsigned long, 32-bit on 32-bit, 64-bit on 64-bit,
there are no atomicity issues.  READ_ONCE if you like, but I can't even
see where it would read twice, or what bad consequence could result.

> +	swapped = info->swapped;
> +	spin_unlock(&info->lock);
> +
> +	return swapped << PAGE_SHIFT;
> +}
> +
> +/*
> + * Determine (in bytes) how many pages within the given range are swapped out.
> + *
> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.

Correct.

> + */
> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
> +						pgoff_t start, pgoff_t end)
> +{
> +	struct radix_tree_iter iter;
> +	void **slot;
> +	struct page *page;
> +	unsigned long swapped = 0;
> +
> +	rcu_read_lock();
> +
> +restart:
> +	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
> +		if (iter.index >= end)
> +			break;
> +
> +		page = radix_tree_deref_slot(slot);
> +
> +		/*
> +		 * This should only be possible to happen at index 0, so we
> +		 * don't need to reset the counter, nor do we risk infinite
> +		 * restarts.
> +		 */
> +		if (radix_tree_deref_retry(page))
> +			goto restart;
> +
> +		if (radix_tree_exceptional_entry(page))
> +			swapped++;
> +
> +		if (need_resched()) {
> +			cond_resched_rcu();
> +			start = iter.index + 1;
> +			goto restart;
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +
> +	return swapped << PAGE_SHIFT;
> +}

This is what you most wanted me to look at, but it looks perfect to me
(aside from my wanting one call into shmem.c instead of two).

Hugh

> +
> +/*
>   * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
>   */
>  void shmem_unlock_mapping(struct address_space *mapping)
> -- 
> 2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-05  4:28     ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  4:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> From: Jerome Marchand <jmarchan@redhat.com>
> 
> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.
> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
> shmem pages instead of MM_FILEPAGES.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Acked-by: Michal Hocko <mhocko@suse.com>

Acked-by: Hugh Dickins <hughd@google.com>

Good, this one long overdue, I've grown tired for writing those if/elses.
I'd have probably have done without mm_counter_file(), but it's okay.

> ---
>  arch/s390/mm/pgtable.c   |  5 +----
>  fs/proc/task_mmu.c       |  3 ++-
>  include/linux/mm.h       | 18 +++++++++++++++++-
>  include/linux/mm_types.h |  7 ++++---
>  kernel/events/uprobes.c  |  2 +-
>  mm/memory.c              | 30 ++++++++++--------------------
>  mm/oom_kill.c            |  5 +++--
>  mm/rmap.c                | 12 +++---------
>  8 files changed, 41 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 54ef3bc..9816f25 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -603,10 +603,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
>  	else if (is_migration_entry(entry)) {
>  		struct page *page = migration_entry_to_page(entry);
>  
> -		if (PageAnon(page))
> -			dec_mm_counter(mm, MM_ANONPAGES);
> -		else
> -			dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter(page));
>  	}
>  	free_swap_and_cache(entry);
>  }
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 103457c..9b9708e 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -83,7 +83,8 @@ unsigned long task_statm(struct mm_struct *mm,
>  			 unsigned long *shared, unsigned long *text,
>  			 unsigned long *data, unsigned long *resident)
>  {
> -	*shared = get_mm_counter(mm, MM_FILEPAGES);
> +	*shared = get_mm_counter(mm, MM_FILEPAGES) +
> +			get_mm_counter(mm, MM_SHMEMPAGES);
>  	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
>  								>> PAGE_SHIFT;
>  	*data = mm->total_vm - mm->shared_vm;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d30eea3..8be4efc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1361,10 +1361,26 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
>  	atomic_long_dec(&mm->rss_stat.count[member]);
>  }
>  
> +/* Optimized variant when page is already known not to be PageAnon */
> +static inline int mm_counter_file(struct page *page)
> +{
> +	if (PageSwapBacked(page))
> +		return MM_SHMEMPAGES;
> +	return MM_FILEPAGES;
> +}
> +
> +static inline int mm_counter(struct page *page)
> +{
> +	if (PageAnon(page))
> +		return MM_ANONPAGES;
> +	return mm_counter_file(page);
> +}
> +
>  static inline unsigned long get_mm_rss(struct mm_struct *mm)
>  {
>  	return get_mm_counter(mm, MM_FILEPAGES) +
> -		get_mm_counter(mm, MM_ANONPAGES);
> +		get_mm_counter(mm, MM_ANONPAGES) +
> +		get_mm_counter(mm, MM_SHMEMPAGES);
>  }
>  
>  static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index f8d1492..207890b 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -369,9 +369,10 @@ struct core_state {
>  };
>  
>  enum {
> -	MM_FILEPAGES,
> -	MM_ANONPAGES,
> -	MM_SWAPENTS,
> +	MM_FILEPAGES,	/* Resident file mapping pages */
> +	MM_ANONPAGES,	/* Resident anonymous pages */
> +	MM_SWAPENTS,	/* Anonymous swap entries */
> +	MM_SHMEMPAGES,	/* Resident shared memory pages */
>  	NR_MM_COUNTERS
>  };
>  
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 4e5e979..6288606 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -180,7 +180,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  	lru_cache_add_active_or_unevictable(kpage, vma);
>  
>  	if (!PageAnon(page)) {
> -		dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter_file(page));
>  		inc_mm_counter(mm, MM_ANONPAGES);
>  	}
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 3bd465a..f10d458 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  		} else if (is_migration_entry(entry)) {
>  			page = migration_entry_to_page(entry);
>  
> -			if (PageAnon(page))
> -				rss[MM_ANONPAGES]++;
> -			else
> -				rss[MM_FILEPAGES]++;
> +			rss[mm_counter(page)]++;
>  
>  			if (is_write_migration_entry(entry) &&
>  					is_cow_mapping(vm_flags)) {
> @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	if (page) {
>  		get_page(page);
>  		page_dup_rmap(page);
> -		if (PageAnon(page))
> -			rss[MM_ANONPAGES]++;
> -		else
> -			rss[MM_FILEPAGES]++;
> +		rss[mm_counter(page)]++;
>  	}
>  
>  out_set_pte:
> @@ -1113,9 +1107,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			tlb_remove_tlb_entry(tlb, pte, addr);
>  			if (unlikely(!page))
>  				continue;
> -			if (PageAnon(page))
> -				rss[MM_ANONPAGES]--;
> -			else {
> +
> +			if (!PageAnon(page)) {
>  				if (pte_dirty(ptent)) {
>  					force_flush = 1;
>  					set_page_dirty(page);
> @@ -1123,8 +1116,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  				if (pte_young(ptent) &&
>  				    likely(!(vma->vm_flags & VM_SEQ_READ)))
>  					mark_page_accessed(page);
> -				rss[MM_FILEPAGES]--;
>  			}
> +			rss[mm_counter(page)]--;
>  			page_remove_rmap(page);
>  			if (unlikely(page_mapcount(page) < 0))
>  				print_bad_pte(vma, addr, ptent, page);
> @@ -1146,11 +1139,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			struct page *page;
>  
>  			page = migration_entry_to_page(entry);
> -
> -			if (PageAnon(page))
> -				rss[MM_ANONPAGES]--;
> -			else
> -				rss[MM_FILEPAGES]--;
> +			rss[mm_counter(page)]--;
>  		}
>  		if (unlikely(!free_swap_and_cache(entry)))
>  			print_bad_pte(vma, addr, ptent, NULL);
> @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
>  
>  	/* Ok, finally just insert the thing.. */
>  	get_page(page);
> -	inc_mm_counter_fast(mm, MM_FILEPAGES);
> +	inc_mm_counter_fast(mm, mm_counter_file(page));
>  	page_add_file_rmap(page);
>  	set_pte_at(mm, addr, pte, mk_pte(page, prot));
>  
> @@ -2097,7 +2086,8 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
>  	if (likely(pte_same(*page_table, orig_pte))) {
>  		if (old_page) {
>  			if (!PageAnon(old_page)) {
> -				dec_mm_counter_fast(mm, MM_FILEPAGES);
> +				dec_mm_counter_fast(mm,
> +						mm_counter_file(old_page));
>  				inc_mm_counter_fast(mm, MM_ANONPAGES);
>  			}
>  		} else {
> @@ -2818,7 +2808,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>  		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
>  		page_add_new_anon_rmap(page, vma, address);
>  	} else {
> -		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
> +		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
>  		page_add_file_rmap(page);
>  	}
>  	set_pte_at(vma->vm_mm, address, pte, entry);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4766e25..127e2d6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -567,10 +567,11 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>  	 */
>  	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
>  	mark_oom_victim(victim);
> -	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
> +	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
>  		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
>  		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
> -		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
> +		K(get_mm_counter(victim->mm, MM_FILEPAGES)),
> +		K(get_mm_counter(victim->mm, MM_SHMEMPAGES)));
>  	task_unlock(victim);
>  
>  	/*
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0ce371a..6c89356 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1383,10 +1383,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		if (PageHuge(page)) {
>  			hugetlb_count_sub(1 << compound_order(page), mm);
>  		} else {
> -			if (PageAnon(page))
> -				dec_mm_counter(mm, MM_ANONPAGES);
> -			else
> -				dec_mm_counter(mm, MM_FILEPAGES);
> +			dec_mm_counter(mm, mm_counter(page));
>  		}
>  		set_pte_at(mm, address, pte,
>  			   swp_entry_to_pte(make_hwpoison_entry(page)));
> @@ -1396,10 +1393,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		 * interest anymore. Simply discard the pte, vmscan
>  		 * will take care of the rest.
>  		 */
> -		if (PageAnon(page))
> -			dec_mm_counter(mm, MM_ANONPAGES);
> -		else
> -			dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter(page));
>  	} else if (PageAnon(page)) {
>  		swp_entry_t entry = { .val = page_private(page) };
>  		pte_t swp_pte;
> @@ -1455,7 +1449,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		entry = make_migration_entry(page, pte_write(pteval));
>  		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>  	} else
> -		dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter_file(page));
>  
>  discard:
>  	page_remove_rmap(page);
> -- 
> 2.5.2
> 
> 

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting
@ 2015-10-05  4:28     ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  4:28 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> From: Jerome Marchand <jmarchan@redhat.com>
> 
> Currently looking at /proc/<pid>/status or statm, there is no way to
> distinguish shmem pages from pages mapped to a regular file (shmem
> pages are mapped to /dev/zero), even though their implication in
> actual memory use is quite different.
> This patch adds MM_SHMEMPAGES counter to mm_rss_stat to account for
> shmem pages instead of MM_FILEPAGES.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Acked-by: Michal Hocko <mhocko@suse.com>

Acked-by: Hugh Dickins <hughd@google.com>

Good, this one long overdue, I've grown tired for writing those if/elses.
I'd have probably have done without mm_counter_file(), but it's okay.

> ---
>  arch/s390/mm/pgtable.c   |  5 +----
>  fs/proc/task_mmu.c       |  3 ++-
>  include/linux/mm.h       | 18 +++++++++++++++++-
>  include/linux/mm_types.h |  7 ++++---
>  kernel/events/uprobes.c  |  2 +-
>  mm/memory.c              | 30 ++++++++++--------------------
>  mm/oom_kill.c            |  5 +++--
>  mm/rmap.c                | 12 +++---------
>  8 files changed, 41 insertions(+), 41 deletions(-)
> 
> diff --git a/arch/s390/mm/pgtable.c b/arch/s390/mm/pgtable.c
> index 54ef3bc..9816f25 100644
> --- a/arch/s390/mm/pgtable.c
> +++ b/arch/s390/mm/pgtable.c
> @@ -603,10 +603,7 @@ static void gmap_zap_swap_entry(swp_entry_t entry, struct mm_struct *mm)
>  	else if (is_migration_entry(entry)) {
>  		struct page *page = migration_entry_to_page(entry);
>  
> -		if (PageAnon(page))
> -			dec_mm_counter(mm, MM_ANONPAGES);
> -		else
> -			dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter(page));
>  	}
>  	free_swap_and_cache(entry);
>  }
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 103457c..9b9708e 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -83,7 +83,8 @@ unsigned long task_statm(struct mm_struct *mm,
>  			 unsigned long *shared, unsigned long *text,
>  			 unsigned long *data, unsigned long *resident)
>  {
> -	*shared = get_mm_counter(mm, MM_FILEPAGES);
> +	*shared = get_mm_counter(mm, MM_FILEPAGES) +
> +			get_mm_counter(mm, MM_SHMEMPAGES);
>  	*text = (PAGE_ALIGN(mm->end_code) - (mm->start_code & PAGE_MASK))
>  								>> PAGE_SHIFT;
>  	*data = mm->total_vm - mm->shared_vm;
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d30eea3..8be4efc 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1361,10 +1361,26 @@ static inline void dec_mm_counter(struct mm_struct *mm, int member)
>  	atomic_long_dec(&mm->rss_stat.count[member]);
>  }
>  
> +/* Optimized variant when page is already known not to be PageAnon */
> +static inline int mm_counter_file(struct page *page)
> +{
> +	if (PageSwapBacked(page))
> +		return MM_SHMEMPAGES;
> +	return MM_FILEPAGES;
> +}
> +
> +static inline int mm_counter(struct page *page)
> +{
> +	if (PageAnon(page))
> +		return MM_ANONPAGES;
> +	return mm_counter_file(page);
> +}
> +
>  static inline unsigned long get_mm_rss(struct mm_struct *mm)
>  {
>  	return get_mm_counter(mm, MM_FILEPAGES) +
> -		get_mm_counter(mm, MM_ANONPAGES);
> +		get_mm_counter(mm, MM_ANONPAGES) +
> +		get_mm_counter(mm, MM_SHMEMPAGES);
>  }
>  
>  static inline unsigned long get_mm_hiwater_rss(struct mm_struct *mm)
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index f8d1492..207890b 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -369,9 +369,10 @@ struct core_state {
>  };
>  
>  enum {
> -	MM_FILEPAGES,
> -	MM_ANONPAGES,
> -	MM_SWAPENTS,
> +	MM_FILEPAGES,	/* Resident file mapping pages */
> +	MM_ANONPAGES,	/* Resident anonymous pages */
> +	MM_SWAPENTS,	/* Anonymous swap entries */
> +	MM_SHMEMPAGES,	/* Resident shared memory pages */
>  	NR_MM_COUNTERS
>  };
>  
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 4e5e979..6288606 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -180,7 +180,7 @@ static int __replace_page(struct vm_area_struct *vma, unsigned long addr,
>  	lru_cache_add_active_or_unevictable(kpage, vma);
>  
>  	if (!PageAnon(page)) {
> -		dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter_file(page));
>  		inc_mm_counter(mm, MM_ANONPAGES);
>  	}
>  
> diff --git a/mm/memory.c b/mm/memory.c
> index 3bd465a..f10d458 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -832,10 +832,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  		} else if (is_migration_entry(entry)) {
>  			page = migration_entry_to_page(entry);
>  
> -			if (PageAnon(page))
> -				rss[MM_ANONPAGES]++;
> -			else
> -				rss[MM_FILEPAGES]++;
> +			rss[mm_counter(page)]++;
>  
>  			if (is_write_migration_entry(entry) &&
>  					is_cow_mapping(vm_flags)) {
> @@ -874,10 +871,7 @@ copy_one_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm,
>  	if (page) {
>  		get_page(page);
>  		page_dup_rmap(page);
> -		if (PageAnon(page))
> -			rss[MM_ANONPAGES]++;
> -		else
> -			rss[MM_FILEPAGES]++;
> +		rss[mm_counter(page)]++;
>  	}
>  
>  out_set_pte:
> @@ -1113,9 +1107,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			tlb_remove_tlb_entry(tlb, pte, addr);
>  			if (unlikely(!page))
>  				continue;
> -			if (PageAnon(page))
> -				rss[MM_ANONPAGES]--;
> -			else {
> +
> +			if (!PageAnon(page)) {
>  				if (pte_dirty(ptent)) {
>  					force_flush = 1;
>  					set_page_dirty(page);
> @@ -1123,8 +1116,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  				if (pte_young(ptent) &&
>  				    likely(!(vma->vm_flags & VM_SEQ_READ)))
>  					mark_page_accessed(page);
> -				rss[MM_FILEPAGES]--;
>  			}
> +			rss[mm_counter(page)]--;
>  			page_remove_rmap(page);
>  			if (unlikely(page_mapcount(page) < 0))
>  				print_bad_pte(vma, addr, ptent, page);
> @@ -1146,11 +1139,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb,
>  			struct page *page;
>  
>  			page = migration_entry_to_page(entry);
> -
> -			if (PageAnon(page))
> -				rss[MM_ANONPAGES]--;
> -			else
> -				rss[MM_FILEPAGES]--;
> +			rss[mm_counter(page)]--;
>  		}
>  		if (unlikely(!free_swap_and_cache(entry)))
>  			print_bad_pte(vma, addr, ptent, NULL);
> @@ -1460,7 +1449,7 @@ static int insert_page(struct vm_area_struct *vma, unsigned long addr,
>  
>  	/* Ok, finally just insert the thing.. */
>  	get_page(page);
> -	inc_mm_counter_fast(mm, MM_FILEPAGES);
> +	inc_mm_counter_fast(mm, mm_counter_file(page));
>  	page_add_file_rmap(page);
>  	set_pte_at(mm, addr, pte, mk_pte(page, prot));
>  
> @@ -2097,7 +2086,8 @@ static int wp_page_copy(struct mm_struct *mm, struct vm_area_struct *vma,
>  	if (likely(pte_same(*page_table, orig_pte))) {
>  		if (old_page) {
>  			if (!PageAnon(old_page)) {
> -				dec_mm_counter_fast(mm, MM_FILEPAGES);
> +				dec_mm_counter_fast(mm,
> +						mm_counter_file(old_page));
>  				inc_mm_counter_fast(mm, MM_ANONPAGES);
>  			}
>  		} else {
> @@ -2818,7 +2808,7 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
>  		inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
>  		page_add_new_anon_rmap(page, vma, address);
>  	} else {
> -		inc_mm_counter_fast(vma->vm_mm, MM_FILEPAGES);
> +		inc_mm_counter_fast(vma->vm_mm, mm_counter_file(page));
>  		page_add_file_rmap(page);
>  	}
>  	set_pte_at(vma->vm_mm, address, pte, entry);
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 4766e25..127e2d6 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -567,10 +567,11 @@ void oom_kill_process(struct oom_control *oc, struct task_struct *p,
>  	 */
>  	do_send_sig_info(SIGKILL, SEND_SIG_FORCED, victim, true);
>  	mark_oom_victim(victim);
> -	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB\n",
> +	pr_err("Killed process %d (%s) total-vm:%lukB, anon-rss:%lukB, file-rss:%lukB, shmem-rss:%lukB\n",
>  		task_pid_nr(victim), victim->comm, K(victim->mm->total_vm),
>  		K(get_mm_counter(victim->mm, MM_ANONPAGES)),
> -		K(get_mm_counter(victim->mm, MM_FILEPAGES)));
> +		K(get_mm_counter(victim->mm, MM_FILEPAGES)),
> +		K(get_mm_counter(victim->mm, MM_SHMEMPAGES)));
>  	task_unlock(victim);
>  
>  	/*
> diff --git a/mm/rmap.c b/mm/rmap.c
> index 0ce371a..6c89356 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -1383,10 +1383,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		if (PageHuge(page)) {
>  			hugetlb_count_sub(1 << compound_order(page), mm);
>  		} else {
> -			if (PageAnon(page))
> -				dec_mm_counter(mm, MM_ANONPAGES);
> -			else
> -				dec_mm_counter(mm, MM_FILEPAGES);
> +			dec_mm_counter(mm, mm_counter(page));
>  		}
>  		set_pte_at(mm, address, pte,
>  			   swp_entry_to_pte(make_hwpoison_entry(page)));
> @@ -1396,10 +1393,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		 * interest anymore. Simply discard the pte, vmscan
>  		 * will take care of the rest.
>  		 */
> -		if (PageAnon(page))
> -			dec_mm_counter(mm, MM_ANONPAGES);
> -		else
> -			dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter(page));
>  	} else if (PageAnon(page)) {
>  		swp_entry_t entry = { .val = page_private(page) };
>  		pte_t swp_pte;
> @@ -1455,7 +1449,7 @@ static int try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
>  		entry = make_migration_entry(page, pte_write(pteval));
>  		set_pte_at(mm, address, pte, swp_entry_to_pte(entry));
>  	} else
> -		dec_mm_counter(mm, MM_FILEPAGES);
> +		dec_mm_counter(mm, mm_counter_file(page));
>  
>  discard:
>  	page_remove_rmap(page);
> -- 
> 2.5.2
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
@ 2015-10-05  4:55     ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  4:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> From: Jerome Marchand <jmarchan@redhat.com>
> 
> It's currently inconvenient to retrieve MM_ANONPAGES value from status
> and statm files and there is no way to separate MM_FILEPAGES and
> MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc/<pid>/status
> to solve these issues.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Acked-by: Michal Hocko <mhocko@suse.com>

Mostly
Acked-by: Hugh Dickins <hughd@google.com>
but I loathe the alignment...

> ---
>  Documentation/filesystems/proc.txt | 10 +++++++++-
>  fs/proc/task_mmu.c                 | 14 ++++++++++++--
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 82d3657..c887a42 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -169,6 +169,9 @@ For example, to get the status information of a process, all you have to do is
>    VmLck:         0 kB
>    VmHWM:       476 kB
>    VmRSS:       476 kB
> +  RssAnon      352 kB
> +  RssFile:     120 kB
> +  RssShm:        4 kB

That looks nice...

>    VmData:      156 kB
>    VmStk:        88 kB
>    VmExe:        68 kB
> @@ -231,7 +234,12 @@ Table 1-2: Contents of the status files (as of 4.1)
>   VmSize                      total program size
>   VmLck                       locked memory size
>   VmHWM                       peak resident set size ("high water mark")
> - VmRSS                       size of memory portions
> + VmRSS                       size of memory portions. It contains the three
> +                             following parts (VmRSS = RssAnon + RssFile + RssShm)
> + RssAnon                     size of resident anonymous memory
> + RssFile                     size of resident file mappings
> + RssShm                      size of resident shmem memory (includes SysV shm,
> +                             mapping of tmpfs and shared anonymous mappings)
>   VmData                      size of data, stack, and text segments
>   VmStk                       size of data, stack, and text segments
>   VmExe                       size of text segment
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 9b9708e..7332afd 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -23,9 +23,13 @@
>  
>  void task_mem(struct seq_file *m, struct mm_struct *mm)
>  {
> -	unsigned long data, text, lib, swap, ptes, pmds;
> +	unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem;
>  	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
>  
> +	anon = get_mm_counter(mm, MM_ANONPAGES);
> +	file = get_mm_counter(mm, MM_FILEPAGES);
> +	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
> +
>  	/*
>  	 * Note: to minimize their overhead, mm maintains hiwater_vm and
>  	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
> @@ -36,7 +40,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  	hiwater_vm = total_vm = mm->total_vm;
>  	if (hiwater_vm < mm->hiwater_vm)
>  		hiwater_vm = mm->hiwater_vm;
> -	hiwater_rss = total_rss = get_mm_rss(mm);
> +	hiwater_rss = total_rss = anon + file + shmem;
>  	if (hiwater_rss < mm->hiwater_rss)
>  		hiwater_rss = mm->hiwater_rss;
>  
> @@ -53,6 +57,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		"VmPin:\t%8lu kB\n"
>  		"VmHWM:\t%8lu kB\n"
>  		"VmRSS:\t%8lu kB\n"
> +		"RssAnon:\t%8lu kB\n"
> +		"RssFile:\t%8lu kB\n"
> +		"RssShm:\t%8lu kB\n"

... but on my terminal that comes out as

VmPeak:     4584 kB
VmSize:     4584 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      1264 kB
VmRSS:      1264 kB
RssAnon:              84 kB
RssFile:            1180 kB
RssShm:        0 kB
VmData:      184 kB
VmStk:       136 kB
VmExe:        48 kB
VmLib:      1808 kB
VmPTE:        32 kB
VmPMD:        12 kB
VmSwap:        0 kB
HugetlbPages:          0 kB

Notice anything ugly about that?  Of course, what's really wrong was
the years-ago choice of absurdly short names, with a tab after them.
Ugh.  The HugetlbPages line probably can't be helped (even HugeTLB:
would be too long).  But your three, I hope we can do better: I can
understand why Rss instead of Vm, sure, Vm on the front contributes
nothing but incorrectness, and it wasn't a bad idea to group them as
contributors to VmRSS.

I suggest either indenting them with spaces to keep the alignment,

"  Anon:\t%8lu kB\n"
"  File:\t%8lu kB\n"
" Shmem:\t%8lu kB\n"

or keeping your Rss prefix but misaligning the three together,

"RssAnon:\t%8lu kB\n"
"RssFile:\t%8lu kB\n"
"RssShmem:\t%8lu kB\n"

I somewhat prefer "Shmem" to "Shm" because "Shmem" is what
/proc/meminfo already says, and "Shm" makes me think of SysV SHM only.
But I'd happily settle for Shm if it helped in the alignment.

I realize that /proc/<pid>/status is not universally loved for its
aesthetic charm, and I may be the only one who feels this way...

Hugh

>  		"VmData:\t%8lu kB\n"
>  		"VmStk:\t%8lu kB\n"
>  		"VmExe:\t%8lu kB\n"
> @@ -66,6 +73,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		mm->pinned_vm << (PAGE_SHIFT-10),
>  		hiwater_rss << (PAGE_SHIFT-10),
>  		total_rss << (PAGE_SHIFT-10),
> +		anon << (PAGE_SHIFT-10),
> +		file << (PAGE_SHIFT-10),
> +		shmem << (PAGE_SHIFT-10),
>  		data << (PAGE_SHIFT-10),
>  		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
>  		ptes >> 10,
> -- 
> 2.5.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
@ 2015-10-05  4:55     ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  4:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm-Bw31MaZKKs3YtjvyW6yDsg, Jerome Marchand, Andrew Morton,
	Hugh Dickins, linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	linux-doc-u79uwXL29TY76Z2rM5mHXA, Michal Hocko,
	Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390-u79uwXL29TY76Z2rM5mHXA, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> From: Jerome Marchand <jmarchan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> 
> It's currently inconvenient to retrieve MM_ANONPAGES value from status
> and statm files and there is no way to separate MM_FILEPAGES and
> MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc/<pid>/status
> to solve these issues.
> 
> Signed-off-by: Jerome Marchand <jmarchan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Signed-off-by: Vlastimil Babka <vbabka-AlSwsSmVLrQ@public.gmane.org>
> Acked-by: Konstantin Khlebnikov <khlebnikov-XoJtRXgx1JseBXzfvpsJ4g@public.gmane.org>
> Acked-by: Michal Hocko <mhocko-IBi9RG/b67k@public.gmane.org>

Mostly
Acked-by: Hugh Dickins <hughd-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
but I loathe the alignment...

> ---
>  Documentation/filesystems/proc.txt | 10 +++++++++-
>  fs/proc/task_mmu.c                 | 14 ++++++++++++--
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 82d3657..c887a42 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -169,6 +169,9 @@ For example, to get the status information of a process, all you have to do is
>    VmLck:         0 kB
>    VmHWM:       476 kB
>    VmRSS:       476 kB
> +  RssAnon      352 kB
> +  RssFile:     120 kB
> +  RssShm:        4 kB

That looks nice...

>    VmData:      156 kB
>    VmStk:        88 kB
>    VmExe:        68 kB
> @@ -231,7 +234,12 @@ Table 1-2: Contents of the status files (as of 4.1)
>   VmSize                      total program size
>   VmLck                       locked memory size
>   VmHWM                       peak resident set size ("high water mark")
> - VmRSS                       size of memory portions
> + VmRSS                       size of memory portions. It contains the three
> +                             following parts (VmRSS = RssAnon + RssFile + RssShm)
> + RssAnon                     size of resident anonymous memory
> + RssFile                     size of resident file mappings
> + RssShm                      size of resident shmem memory (includes SysV shm,
> +                             mapping of tmpfs and shared anonymous mappings)
>   VmData                      size of data, stack, and text segments
>   VmStk                       size of data, stack, and text segments
>   VmExe                       size of text segment
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 9b9708e..7332afd 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -23,9 +23,13 @@
>  
>  void task_mem(struct seq_file *m, struct mm_struct *mm)
>  {
> -	unsigned long data, text, lib, swap, ptes, pmds;
> +	unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem;
>  	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
>  
> +	anon = get_mm_counter(mm, MM_ANONPAGES);
> +	file = get_mm_counter(mm, MM_FILEPAGES);
> +	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
> +
>  	/*
>  	 * Note: to minimize their overhead, mm maintains hiwater_vm and
>  	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
> @@ -36,7 +40,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  	hiwater_vm = total_vm = mm->total_vm;
>  	if (hiwater_vm < mm->hiwater_vm)
>  		hiwater_vm = mm->hiwater_vm;
> -	hiwater_rss = total_rss = get_mm_rss(mm);
> +	hiwater_rss = total_rss = anon + file + shmem;
>  	if (hiwater_rss < mm->hiwater_rss)
>  		hiwater_rss = mm->hiwater_rss;
>  
> @@ -53,6 +57,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		"VmPin:\t%8lu kB\n"
>  		"VmHWM:\t%8lu kB\n"
>  		"VmRSS:\t%8lu kB\n"
> +		"RssAnon:\t%8lu kB\n"
> +		"RssFile:\t%8lu kB\n"
> +		"RssShm:\t%8lu kB\n"

... but on my terminal that comes out as

VmPeak:     4584 kB
VmSize:     4584 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      1264 kB
VmRSS:      1264 kB
RssAnon:              84 kB
RssFile:            1180 kB
RssShm:        0 kB
VmData:      184 kB
VmStk:       136 kB
VmExe:        48 kB
VmLib:      1808 kB
VmPTE:        32 kB
VmPMD:        12 kB
VmSwap:        0 kB
HugetlbPages:          0 kB

Notice anything ugly about that?  Of course, what's really wrong was
the years-ago choice of absurdly short names, with a tab after them.
Ugh.  The HugetlbPages line probably can't be helped (even HugeTLB:
would be too long).  But your three, I hope we can do better: I can
understand why Rss instead of Vm, sure, Vm on the front contributes
nothing but incorrectness, and it wasn't a bad idea to group them as
contributors to VmRSS.

I suggest either indenting them with spaces to keep the alignment,

"  Anon:\t%8lu kB\n"
"  File:\t%8lu kB\n"
" Shmem:\t%8lu kB\n"

or keeping your Rss prefix but misaligning the three together,

"RssAnon:\t%8lu kB\n"
"RssFile:\t%8lu kB\n"
"RssShmem:\t%8lu kB\n"

I somewhat prefer "Shmem" to "Shm" because "Shmem" is what
/proc/meminfo already says, and "Shm" makes me think of SysV SHM only.
But I'd happily settle for Shm if it helped in the alignment.

I realize that /proc/<pid>/status is not universally loved for its
aesthetic charm, and I may be the only one who feels this way...

Hugh

>  		"VmData:\t%8lu kB\n"
>  		"VmStk:\t%8lu kB\n"
>  		"VmExe:\t%8lu kB\n"
> @@ -66,6 +73,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		mm->pinned_vm << (PAGE_SHIFT-10),
>  		hiwater_rss << (PAGE_SHIFT-10),
>  		total_rss << (PAGE_SHIFT-10),
> +		anon << (PAGE_SHIFT-10),
> +		file << (PAGE_SHIFT-10),
> +		shmem << (PAGE_SHIFT-10),
>  		data << (PAGE_SHIFT-10),
>  		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
>  		ptes >> 10,
> -- 
> 2.5.2

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status
@ 2015-10-05  4:55     ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-05  4:55 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Fri, 2 Oct 2015, Vlastimil Babka wrote:

> From: Jerome Marchand <jmarchan@redhat.com>
> 
> It's currently inconvenient to retrieve MM_ANONPAGES value from status
> and statm files and there is no way to separate MM_FILEPAGES and
> MM_SHMEMPAGES. Add RssAnon, RssFile and RssShm lines in /proc/<pid>/status
> to solve these issues.
> 
> Signed-off-by: Jerome Marchand <jmarchan@redhat.com>
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Acked-by: Michal Hocko <mhocko@suse.com>

Mostly
Acked-by: Hugh Dickins <hughd@google.com>
but I loathe the alignment...

> ---
>  Documentation/filesystems/proc.txt | 10 +++++++++-
>  fs/proc/task_mmu.c                 | 14 ++++++++++++--
>  2 files changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 82d3657..c887a42 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -169,6 +169,9 @@ For example, to get the status information of a process, all you have to do is
>    VmLck:         0 kB
>    VmHWM:       476 kB
>    VmRSS:       476 kB
> +  RssAnon      352 kB
> +  RssFile:     120 kB
> +  RssShm:        4 kB

That looks nice...

>    VmData:      156 kB
>    VmStk:        88 kB
>    VmExe:        68 kB
> @@ -231,7 +234,12 @@ Table 1-2: Contents of the status files (as of 4.1)
>   VmSize                      total program size
>   VmLck                       locked memory size
>   VmHWM                       peak resident set size ("high water mark")
> - VmRSS                       size of memory portions
> + VmRSS                       size of memory portions. It contains the three
> +                             following parts (VmRSS = RssAnon + RssFile + RssShm)
> + RssAnon                     size of resident anonymous memory
> + RssFile                     size of resident file mappings
> + RssShm                      size of resident shmem memory (includes SysV shm,
> +                             mapping of tmpfs and shared anonymous mappings)
>   VmData                      size of data, stack, and text segments
>   VmStk                       size of data, stack, and text segments
>   VmExe                       size of text segment
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 9b9708e..7332afd 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -23,9 +23,13 @@
>  
>  void task_mem(struct seq_file *m, struct mm_struct *mm)
>  {
> -	unsigned long data, text, lib, swap, ptes, pmds;
> +	unsigned long data, text, lib, swap, ptes, pmds, anon, file, shmem;
>  	unsigned long hiwater_vm, total_vm, hiwater_rss, total_rss;
>  
> +	anon = get_mm_counter(mm, MM_ANONPAGES);
> +	file = get_mm_counter(mm, MM_FILEPAGES);
> +	shmem = get_mm_counter(mm, MM_SHMEMPAGES);
> +
>  	/*
>  	 * Note: to minimize their overhead, mm maintains hiwater_vm and
>  	 * hiwater_rss only when about to *lower* total_vm or rss.  Any
> @@ -36,7 +40,7 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  	hiwater_vm = total_vm = mm->total_vm;
>  	if (hiwater_vm < mm->hiwater_vm)
>  		hiwater_vm = mm->hiwater_vm;
> -	hiwater_rss = total_rss = get_mm_rss(mm);
> +	hiwater_rss = total_rss = anon + file + shmem;
>  	if (hiwater_rss < mm->hiwater_rss)
>  		hiwater_rss = mm->hiwater_rss;
>  
> @@ -53,6 +57,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		"VmPin:\t%8lu kB\n"
>  		"VmHWM:\t%8lu kB\n"
>  		"VmRSS:\t%8lu kB\n"
> +		"RssAnon:\t%8lu kB\n"
> +		"RssFile:\t%8lu kB\n"
> +		"RssShm:\t%8lu kB\n"

... but on my terminal that comes out as

VmPeak:     4584 kB
VmSize:     4584 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      1264 kB
VmRSS:      1264 kB
RssAnon:              84 kB
RssFile:            1180 kB
RssShm:        0 kB
VmData:      184 kB
VmStk:       136 kB
VmExe:        48 kB
VmLib:      1808 kB
VmPTE:        32 kB
VmPMD:        12 kB
VmSwap:        0 kB
HugetlbPages:          0 kB

Notice anything ugly about that?  Of course, what's really wrong was
the years-ago choice of absurdly short names, with a tab after them.
Ugh.  The HugetlbPages line probably can't be helped (even HugeTLB:
would be too long).  But your three, I hope we can do better: I can
understand why Rss instead of Vm, sure, Vm on the front contributes
nothing but incorrectness, and it wasn't a bad idea to group them as
contributors to VmRSS.

I suggest either indenting them with spaces to keep the alignment,

"  Anon:\t%8lu kB\n"
"  File:\t%8lu kB\n"
" Shmem:\t%8lu kB\n"

or keeping your Rss prefix but misaligning the three together,

"RssAnon:\t%8lu kB\n"
"RssFile:\t%8lu kB\n"
"RssShmem:\t%8lu kB\n"

I somewhat prefer "Shmem" to "Shm" because "Shmem" is what
/proc/meminfo already says, and "Shm" makes me think of SysV SHM only.
But I'd happily settle for Shm if it helped in the alignment.

I realize that /proc/<pid>/status is not universally loved for its
aesthetic charm, and I may be the only one who feels this way...

Hugh

>  		"VmData:\t%8lu kB\n"
>  		"VmStk:\t%8lu kB\n"
>  		"VmExe:\t%8lu kB\n"
> @@ -66,6 +73,9 @@ void task_mem(struct seq_file *m, struct mm_struct *mm)
>  		mm->pinned_vm << (PAGE_SHIFT-10),
>  		hiwater_rss << (PAGE_SHIFT-10),
>  		total_rss << (PAGE_SHIFT-10),
> +		anon << (PAGE_SHIFT-10),
> +		file << (PAGE_SHIFT-10),
> +		shmem << (PAGE_SHIFT-10),
>  		data << (PAGE_SHIFT-10),
>  		mm->stack_vm << (PAGE_SHIFT-10), text, lib,
>  		ptes >> 10,
> -- 
> 2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-02 13:35   ` Vlastimil Babka
@ 2015-10-05  7:53     ` Peter Zijlstra
  -1 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-10-05  7:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On Fri, Oct 02, 2015 at 03:35:49PM +0200, Vlastimil Babka wrote:
> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> +	struct inode *inode;
> +	unsigned long swapped;
> +	pgoff_t start, end;
> +
> +	if (!vma->vm_file)
> +		return 0;
> +
> +	inode = file_inode(vma->vm_file);
> +
> +	if (!shmem_mapping(inode->i_mapping))
> +		return 0;
> +
> +	/*
> +	 * The easier cases are when the shmem object has nothing in swap, or
> +	 * we have the whole object mapped. Then we can simply use the stats
> +	 * that are already tracked by shmem.
> +	 */
> +	swapped = shmem_swap_usage(inode);
> +
> +	if (swapped == 0)
> +		return 0;
> +
> +	if (vma->vm_end - vma->vm_start >= inode->i_size)
> +		return swapped;
> +
> +	/*
> +	 * Here we have to inspect individual pages in our mapped range to
> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
> +	 * need i_mutex to protect against truncating or hole punching.
> +	 */

At the very least put in an assertion that we hold the RCU read lock,
otherwise RCU doesn't guarantee anything and its not obvious it is held
here.

> +	start = linear_page_index(vma, vma->vm_start);
> +	end = linear_page_index(vma, vma->vm_end);
> +
> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
> +}

> + * Determine (in bytes) how much of the whole shmem object is swapped out.
> + */
> +unsigned long shmem_swap_usage(struct inode *inode)
> +{
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	unsigned long swapped;
> +
> +	/* Mostly an overkill, but it's not atomic64_t */

Yeah, that don't make any kind of sense.

> +	spin_lock(&info->lock);
> +	swapped = info->swapped;
> +	spin_unlock(&info->lock);
> +
> +	return swapped << PAGE_SHIFT;
> +}

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-05  7:53     ` Peter Zijlstra
  0 siblings, 0 replies; 40+ messages in thread
From: Peter Zijlstra @ 2015-10-05  7:53 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: linux-mm, Jerome Marchand, Andrew Morton, Hugh Dickins,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On Fri, Oct 02, 2015 at 03:35:49PM +0200, Vlastimil Babka wrote:
> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
> +{
> +	struct inode *inode;
> +	unsigned long swapped;
> +	pgoff_t start, end;
> +
> +	if (!vma->vm_file)
> +		return 0;
> +
> +	inode = file_inode(vma->vm_file);
> +
> +	if (!shmem_mapping(inode->i_mapping))
> +		return 0;
> +
> +	/*
> +	 * The easier cases are when the shmem object has nothing in swap, or
> +	 * we have the whole object mapped. Then we can simply use the stats
> +	 * that are already tracked by shmem.
> +	 */
> +	swapped = shmem_swap_usage(inode);
> +
> +	if (swapped == 0)
> +		return 0;
> +
> +	if (vma->vm_end - vma->vm_start >= inode->i_size)
> +		return swapped;
> +
> +	/*
> +	 * Here we have to inspect individual pages in our mapped range to
> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
> +	 * need i_mutex to protect against truncating or hole punching.
> +	 */

At the very least put in an assertion that we hold the RCU read lock,
otherwise RCU doesn't guarantee anything and its not obvious it is held
here.

> +	start = linear_page_index(vma, vma->vm_start);
> +	end = linear_page_index(vma, vma->vm_end);
> +
> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
> +}

> + * Determine (in bytes) how much of the whole shmem object is swapped out.
> + */
> +unsigned long shmem_swap_usage(struct inode *inode)
> +{
> +	struct shmem_inode_info *info = SHMEM_I(inode);
> +	unsigned long swapped;
> +
> +	/* Mostly an overkill, but it's not atomic64_t */

Yeah, that don't make any kind of sense.

> +	spin_lock(&info->lock);
> +	swapped = info->swapped;
> +	spin_unlock(&info->lock);
> +
> +	return swapped << PAGE_SHIFT;
> +}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
  2015-10-05  1:05     ` Hugh Dickins
@ 2015-10-06  7:05       ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-06  7:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Jerome Marchand, Andrew Morton, linux-kernel,
	linux-doc, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On 10/05/2015 03:05 AM, Hugh Dickins wrote:
> On Fri, 2 Oct 2015, Vlastimil Babka wrote:
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -239,6 +239,8 @@ Table 1-2: Contents of the status files (as of 4.1)
>>    VmPTE                       size of page table entries
>>    VmPMD                       size of second level page tables
>>    VmSwap                      size of swap usage (the number of referred swapents)
>> +                             by anonymous private data (shmem swap usage is not
>> +                             included)
>
> I have difficulty in reading "size of swap usage (the number of referred
> swapents) by anonymous private data (shmem swap usage is not included)".
>
> Luckily, VmSwap never was "the number of referred swapents", it's in kB.
> So I suggest                    amount of swap used by anonymous private data
>                                  (shmem swap usage is not included)

Good point, thanks!

> for which you can assume Acked-by: Hugh Dickins <hughd@google.com>
>
> Hugh
>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations
@ 2015-10-06  7:05       ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-06  7:05 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Jerome Marchand, Andrew Morton, linux-kernel,
	linux-doc, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On 10/05/2015 03:05 AM, Hugh Dickins wrote:
> On Fri, 2 Oct 2015, Vlastimil Babka wrote:
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -239,6 +239,8 @@ Table 1-2: Contents of the status files (as of 4.1)
>>    VmPTE                       size of page table entries
>>    VmPMD                       size of second level page tables
>>    VmSwap                      size of swap usage (the number of referred swapents)
>> +                             by anonymous private data (shmem swap usage is not
>> +                             included)
>
> I have difficulty in reading "size of swap usage (the number of referred
> swapents) by anonymous private data (shmem swap usage is not included)".
>
> Luckily, VmSwap never was "the number of referred swapents", it's in kB.
> So I suggest                    amount of swap used by anonymous private data
>                                  (shmem swap usage is not included)

Good point, thanks!

> for which you can assume Acked-by: Hugh Dickins <hughd@google.com>
>
> Hugh
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-02 22:37     ` Andrew Morton
@ 2015-10-06  7:08       ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-06  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On 10/03/2015 12:37 AM, Andrew Morton wrote:
> On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:
>
>>
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>>   extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>>   extern int shmem_unuse(swp_entry_t entry, struct page *page);
>>
>> +#ifdef CONFIG_SWAP
>> +extern unsigned long shmem_swap_usage(struct inode *inode);
>> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>> +						pgoff_t start, pgoff_t end);
>> +#endif
>
> CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.

Yeah, I overlooked this while removing the other ifdefs. Thanks.


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-06  7:08       ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-06  7:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Jerome Marchand, Hugh Dickins, linux-kernel, linux-doc,
	Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap,
	linux-s390, Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

On 10/03/2015 12:37 AM, Andrew Morton wrote:
> On Fri,  2 Oct 2015 15:35:49 +0200 Vlastimil Babka <vbabka@suse.cz> wrote:
>
>>
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>>   extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>>   extern int shmem_unuse(swp_entry_t entry, struct page *page);
>>
>> +#ifdef CONFIG_SWAP
>> +extern unsigned long shmem_swap_usage(struct inode *inode);
>> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>> +						pgoff_t start, pgoff_t end);
>> +#endif
>
> CONFIG_SWAP is wrong, isn't it?  It should be CONFIG_SHMEM if anything.

Yeah, I overlooked this while removing the other ifdefs. Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-05  3:01     ` Hugh Dickins
@ 2015-10-21 14:39       ` Vlastimil Babka
  -1 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-21 14:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Jerome Marchand, Andrew Morton, linux-kernel,
	linux-doc, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On 10/05/2015 05:01 AM, Hugh Dickins wrote:
> On Fri, 2 Oct 2015, Vlastimil Babka wrote:
>
>> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
>> mappings, even if the mapped portion does contain pages that were swapped out.
>> This is because unlike private anonymous mappings, shmem does not change pte
>> to swap entry, but pte_none when swapping the page out. In the smaps page
>> walk, such page thus looks like it was never faulted in.
>>
>> This patch changes smaps_pte_entry() to determine the swap status for such
>> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
>> Swapped out pages are thus accounted for.
>>
>> The accounting is arguably still not as precise as for private anonymous
>> mappings, since now we will count also pages that the process in question never
>> accessed, but only another process populated them and then let them become
>> swapped out. I believe it is still less confusing and subtle than not showing
>> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
>> performance issue for future accesses, and we cannot predict those for neither
>> kind of mapping.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> Neither Ack nor Nack from me.
>
> I don't want to stand in the way of this patch, if you and others
> believe that it will help to diagnose problems in the field better
> than what's shown at present; but to me it looks dangerously like
> replacing no information by wrong information.
>
> As you acknowledge in the commit message, if a file of 100 pages
> were copied to tmpfs, and 100 tasks map its full extent, but they
> all mess around with the first 50 pages and take no interest in
> the last 50, then it's quite likely that that last 50 will get
> swapped out; then with your patch, 100 tasks are each shown as
> using 50 pages of swap, when none of them are actually using any.

Yeah, but isn't it the same with private memory which was swapped out at 
some point and we don't know if it will be touched or not? The
difference is in private case we know the process touched it at least
once, but that can also mean nothing for the future (or maybe it just
mmapped with MAP_POPULATE and didn't care about half of it).

That's basically what I was trying to say in the changelog. I interpret
the Swap: value as the amount of swap-in potential, if the process was
going to access it, which is what the particular customer also expects 
(see below). In that case showing zero is IMHO wrong and inconsistent 
with the anonymous private mappings.

> It is rather as if we didn't bother to record Rss, and just put
> Size in there instead: you are (for understandable reasons) treating
> the virtual address space as if every page of it had been touched.
>
> But I accept that there may well be a class of processes and problems
> which would be better served by this fiction than the present: I expect
> you have much more experience of helping out in such situations than I.

Well, the customers driving this change would in the best case want to
see the shmem swap accounted continuously and e.g. see it immediately in 
the top output. Fixing (IMHO) the smaps output is the next best thing. 
The use case here is a application that really doesn't like page faults, 
and has background thread that checks and prefaults such areas when they 
are expected to be used soon. So they would like to identify these areas.

> And perhaps you do balance it nicely by going to the opposite extreme
> with SwapPss 0 for all (again for the eminently understandable reason,
> that it would be a whole lot more new code to work out the right number).
> Altogther, you're saying everyone's using more swap than they probably
> are, but that's okay because it's infinitely shared.
>
> I am not at all angling for you or anyone to make the changes necessary
> to make those numbers accurate.  I think there's a point at which we
> stop cluttering up the core kernel code, just for the sake of
> maintaining numbers for a /proc file someone thought was a good idea
> at the time.  But I am hoping that if this patch goes in, you will take
> responsibility for batting away all the complaints that it doesn't work
> as this or that person expected, rather than a long stream of patches
> to refine it.

I don't plan to fix SwapPss, so I can drop the "currently".

> I think the root problem is that we're trying to use /proc/<pid>/smaps
> for something that's independent of <pid> and its maps: a shmem object.
> Would we be better served by a tmpfs-ish filesystem mounted somewhere,
> which gives names to all the objects on the internal mount of tmpfs
> (SysV SHM, memfds etc); and some fincore-ish syscalls which could be
> used to interrogate how much swap any tmpfs file is using in any range?
> (I am not volunteering to write this, not in the foreseeable future.)
>
> I have no idea of the security implications of naming the hidden, it
> may be a non-starter.  And my guess is, it would be nice if it already
> existed, but you need a solution today to some problems that have been
> wasting your time; and grafting it into smaps looks to be good enough.
>
> Some comments on your implementation below.
>
>> ---
>>   Documentation/filesystems/proc.txt |  6 ++--
>>   fs/proc/task_mmu.c                 | 48 ++++++++++++++++++++++++++++++
>>   include/linux/shmem_fs.h           |  6 ++++
>>   mm/shmem.c                         | 61 ++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 119 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 7ef50cb..82d3657 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -457,8 +457,10 @@ accessed.
>>   a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>>   and a page is modified, the file page is replaced by a private anonymous copy.
>>   "Swap" shows how much would-be-anonymous memory is also used, but out on
>> -swap.
>> -"SwapPss" shows proportional swap share of this mapping.
>> +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
>> +underlying shmem object is on swap.
>
> And for private mappings of tmpfs files?  I expected it to show an
> inderminate mixture of the two, but it looks like you treat the private
> mapping just like a shared one, and take no notice of the COWed pages
> out on swap which would have been reported before.  Oh, no, I think
> I misread, and you add the two together?  I agree that's the easiest
> thing to do, and therefore perhaps the best; but it doesn't fill me
> with conviction that it's the right thing to do.

Thanks for pointing this out, I totally missed this possibility! Well
the current patch is certainly not the right thing to do, as it can
over-account. The most correct solution would have to be implemented 
into the page walk and only check into shmem radix tree for individual 
pages that were not COWed. Michal Hocko suggested I try that, and 
although it does add some overhead (the complexity is n*log(n) AFAICT), 
it's not that bad from preliminary checks. Another advantage is that no 
new shmem code is needed, as we can use the generic find_get_entry(). 
Unless we want to really limit the extra complexity only to the special 
private mapping case with non-zero swap usage of the shmem object etc... 
I'll repost the series with that approach.

Other non-perfect solutions that come to mind:

1) For private mappings, count only the swapents. "Swap:" is no longer
showing full swap-in potential though.
2) For private mappings, do not count swapents. Ditto.
3) Provide two separate counters. The user won't know how much they
overlap, though.

 From these I would be inclined towards 3) as being more universal, 
although then it's no longer a simple "we're fixing a Swap: 0 value 
which is wrong", but closer to original Jerome's versions, which IIRC 
introduced several shmem-specific counters.

Well at least now I do understand why you don't particularly like this 
approach...

>> +"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
>> +currently show 0 here.
>
> Yes, my heart sank when I remembered SwapPss, and I wondered what you were
> going to do with that.  I was imagining that the Swap number would go into
> SwapPss, but no, I prefer your choice to show 0 there (but depressed to
> see the word "currently", which hints at grand schemes to plumb in another
> radix_tree of swap counts, or more rmap_walks to calculate, or something).
>
>>   "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
>>   "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
>>   hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 04999b2..103457c 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/swapops.h>
>>   #include <linux/mmu_notifier.h>
>>   #include <linux/page_idle.h>
>> +#include <linux/shmem_fs.h>
>>
>>   #include <asm/elf.h>
>>   #include <asm/uaccess.h>
>> @@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
>>   }
>>   #endif /* HUGETLB_PAGE */
>>
>> +#ifdef CONFIG_SHMEM
>
> Correct.
>
>> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
>> +{
>> +	struct inode *inode;
>> +	unsigned long swapped;
>> +	pgoff_t start, end;
>> +
>> +	if (!vma->vm_file)
>> +		return 0;
>> +
>> +	inode = file_inode(vma->vm_file);
>> +
>> +	if (!shmem_mapping(inode->i_mapping))
>> +		return 0;
>
> Someone somewhere may ask for an ops method,
> but that someone will certainly not be me.
>
>> +
>> +	/*
>> +	 * The easier cases are when the shmem object has nothing in swap, or
>> +	 * we have the whole object mapped. Then we can simply use the stats
>> +	 * that are already tracked by shmem.
>> +	 */
>> +	swapped = shmem_swap_usage(inode);
>> +
>> +	if (swapped == 0)
>> +		return 0;
>
> You are absolutely right to go for that optimization, but please
> please do it all inside one call to shmem.c: all you need is one
> shmem_swap_usage(inode, start, end)
> or
> shmem_swap_usage(vma).

OK.

>> +
>> +	if (vma->vm_end - vma->vm_start >= inode->i_size)
>
> That must be wrong.  It's probably right for all normal processes,
> and you may not be interested in the rest; but anyone can set up
> a mapping from end of file onwards, which won't intersect with the
> swap at all.  Just a little more thought on that test would be good.

Uh right, thanks for pointing out.

>> +		return swapped;
>> +
>> +	/*
>> +	 * Here we have to inspect individual pages in our mapped range to
>> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
>> +	 * need i_mutex to protect against truncating or hole punching.
>> +	 */
>> +	start = linear_page_index(vma, vma->vm_start);
>> +	end = linear_page_index(vma, vma->vm_end);
>> +
>> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
>> +}
>> +#else
>> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>>   static int show_smap(struct seq_file *m, void *v, int is_pid)
>>   {
>>   	struct vm_area_struct *vma = v;
>> @@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>>   	/* mmap_sem is held in m_start */
>>   	walk_page_vma(vma, &smaps_walk);
>>
>> +	mss.swap += smaps_shmem_swap(vma);
>> +
>
> So, I think here you add the private swap to the object swap.

Yes, through ignorance.

>>   	show_map_vma(m, vma, is_pid);
>>
>>   	seq_printf(m,
>> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
>> index 50777b5..12519e4 100644
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>>   extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>>   extern int shmem_unuse(swp_entry_t entry, struct page *page);
>>
>> +#ifdef CONFIG_SWAP
>
> As Andrew said, better just drop the #ifdef here.
>
>> +extern unsigned long shmem_swap_usage(struct inode *inode);
>> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>> +						pgoff_t start, pgoff_t end);
>> +#endif
>> +
>>   static inline struct page *shmem_read_mapping_page(
>>   				struct address_space *mapping, pgoff_t index)
>>   {
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index b543cc7..b0e9e30 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -360,6 +360,67 @@ static int shmem_free_swap(struct address_space *mapping,
>>   }
>>
>>   /*
>> + * Determine (in bytes) how much of the whole shmem object is swapped out.
>> + */
>> +unsigned long shmem_swap_usage(struct inode *inode)
>> +{
>> +	struct shmem_inode_info *info = SHMEM_I(inode);
>> +	unsigned long swapped;
>> +
>> +	/* Mostly an overkill, but it's not atomic64_t */
>> +	spin_lock(&info->lock);
>
> Entirely overkill, what's atomic64_t got to do with it?
> info->swapped is an unsigned long, 32-bit on 32-bit, 64-bit on 64-bit,
> there are no atomicity issues.  READ_ONCE if you like, but I can't even
> see where it would read twice, or what bad consequence could result.

Right, I was wrongly thinking that the counter would be 64bit also on
32bit machines and thus not atomically readable...

>> +	swapped = info->swapped;
>> +	spin_unlock(&info->lock);
>> +
>> +	return swapped << PAGE_SHIFT;
>> +}
>> +
>> +/*
>> + * Determine (in bytes) how many pages within the given range are swapped out.
>> + *
>> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
>
> Correct.
>
>> + */
>> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>> +						pgoff_t start, pgoff_t end)
>> +{
>> +	struct radix_tree_iter iter;
>> +	void **slot;
>> +	struct page *page;
>> +	unsigned long swapped = 0;
>> +
>> +	rcu_read_lock();
>> +
>> +restart:
>> +	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
>> +		if (iter.index >= end)
>> +			break;
>> +
>> +		page = radix_tree_deref_slot(slot);
>> +
>> +		/*
>> +		 * This should only be possible to happen at index 0, so we
>> +		 * don't need to reset the counter, nor do we risk infinite
>> +		 * restarts.
>> +		 */
>> +		if (radix_tree_deref_retry(page))
>> +			goto restart;
>> +
>> +		if (radix_tree_exceptional_entry(page))
>> +			swapped++;
>> +
>> +		if (need_resched()) {
>> +			cond_resched_rcu();
>> +			start = iter.index + 1;
>> +			goto restart;
>> +		}
>> +	}
>> +
>> +	rcu_read_unlock();
>> +
>> +	return swapped << PAGE_SHIFT;
>> +}
>
> This is what you most wanted me to look at, but it looks perfect to me
> (aside from my wanting one call into shmem.c instead of two).

Thanks!

> Hugh
>
>> +
>> +/*
>>    * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
>>    */
>>   void shmem_unlock_mapping(struct address_space *mapping)
>> --
>> 2.5.2


^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-21 14:39       ` Vlastimil Babka
  0 siblings, 0 replies; 40+ messages in thread
From: Vlastimil Babka @ 2015-10-21 14:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: linux-mm, Jerome Marchand, Andrew Morton, linux-kernel,
	linux-doc, Michal Hocko, Kirill A. Shutemov, Cyrill Gorcunov,
	Randy Dunlap, linux-s390, Martin Schwidefsky, Heiko Carstens,
	Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Oleg Nesterov, Linux API, Konstantin Khlebnikov

On 10/05/2015 05:01 AM, Hugh Dickins wrote:
> On Fri, 2 Oct 2015, Vlastimil Babka wrote:
>
>> Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
>> mappings, even if the mapped portion does contain pages that were swapped out.
>> This is because unlike private anonymous mappings, shmem does not change pte
>> to swap entry, but pte_none when swapping the page out. In the smaps page
>> walk, such page thus looks like it was never faulted in.
>>
>> This patch changes smaps_pte_entry() to determine the swap status for such
>> pte_none entries for shmem mappings, similarly to how mincore_page() does it.
>> Swapped out pages are thus accounted for.
>>
>> The accounting is arguably still not as precise as for private anonymous
>> mappings, since now we will count also pages that the process in question never
>> accessed, but only another process populated them and then let them become
>> swapped out. I believe it is still less confusing and subtle than not showing
>> any swap usage by shmem mappings at all. Also, swapped out pages only becomee a
>> performance issue for future accesses, and we cannot predict those for neither
>> kind of mapping.
>>
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>
> Neither Ack nor Nack from me.
>
> I don't want to stand in the way of this patch, if you and others
> believe that it will help to diagnose problems in the field better
> than what's shown at present; but to me it looks dangerously like
> replacing no information by wrong information.
>
> As you acknowledge in the commit message, if a file of 100 pages
> were copied to tmpfs, and 100 tasks map its full extent, but they
> all mess around with the first 50 pages and take no interest in
> the last 50, then it's quite likely that that last 50 will get
> swapped out; then with your patch, 100 tasks are each shown as
> using 50 pages of swap, when none of them are actually using any.

Yeah, but isn't it the same with private memory which was swapped out at 
some point and we don't know if it will be touched or not? The
difference is in private case we know the process touched it at least
once, but that can also mean nothing for the future (or maybe it just
mmapped with MAP_POPULATE and didn't care about half of it).

That's basically what I was trying to say in the changelog. I interpret
the Swap: value as the amount of swap-in potential, if the process was
going to access it, which is what the particular customer also expects 
(see below). In that case showing zero is IMHO wrong and inconsistent 
with the anonymous private mappings.

> It is rather as if we didn't bother to record Rss, and just put
> Size in there instead: you are (for understandable reasons) treating
> the virtual address space as if every page of it had been touched.
>
> But I accept that there may well be a class of processes and problems
> which would be better served by this fiction than the present: I expect
> you have much more experience of helping out in such situations than I.

Well, the customers driving this change would in the best case want to
see the shmem swap accounted continuously and e.g. see it immediately in 
the top output. Fixing (IMHO) the smaps output is the next best thing. 
The use case here is a application that really doesn't like page faults, 
and has background thread that checks and prefaults such areas when they 
are expected to be used soon. So they would like to identify these areas.

> And perhaps you do balance it nicely by going to the opposite extreme
> with SwapPss 0 for all (again for the eminently understandable reason,
> that it would be a whole lot more new code to work out the right number).
> Altogther, you're saying everyone's using more swap than they probably
> are, but that's okay because it's infinitely shared.
>
> I am not at all angling for you or anyone to make the changes necessary
> to make those numbers accurate.  I think there's a point at which we
> stop cluttering up the core kernel code, just for the sake of
> maintaining numbers for a /proc file someone thought was a good idea
> at the time.  But I am hoping that if this patch goes in, you will take
> responsibility for batting away all the complaints that it doesn't work
> as this or that person expected, rather than a long stream of patches
> to refine it.

I don't plan to fix SwapPss, so I can drop the "currently".

> I think the root problem is that we're trying to use /proc/<pid>/smaps
> for something that's independent of <pid> and its maps: a shmem object.
> Would we be better served by a tmpfs-ish filesystem mounted somewhere,
> which gives names to all the objects on the internal mount of tmpfs
> (SysV SHM, memfds etc); and some fincore-ish syscalls which could be
> used to interrogate how much swap any tmpfs file is using in any range?
> (I am not volunteering to write this, not in the foreseeable future.)
>
> I have no idea of the security implications of naming the hidden, it
> may be a non-starter.  And my guess is, it would be nice if it already
> existed, but you need a solution today to some problems that have been
> wasting your time; and grafting it into smaps looks to be good enough.
>
> Some comments on your implementation below.
>
>> ---
>>   Documentation/filesystems/proc.txt |  6 ++--
>>   fs/proc/task_mmu.c                 | 48 ++++++++++++++++++++++++++++++
>>   include/linux/shmem_fs.h           |  6 ++++
>>   mm/shmem.c                         | 61 ++++++++++++++++++++++++++++++++++++++
>>   4 files changed, 119 insertions(+), 2 deletions(-)
>>
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 7ef50cb..82d3657 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -457,8 +457,10 @@ accessed.
>>   a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>>   and a page is modified, the file page is replaced by a private anonymous copy.
>>   "Swap" shows how much would-be-anonymous memory is also used, but out on
>> -swap.
>> -"SwapPss" shows proportional swap share of this mapping.
>> +swap. For shmem mappings, "Swap" shows how much of the mapped portion of the
>> +underlying shmem object is on swap.
>
> And for private mappings of tmpfs files?  I expected it to show an
> inderminate mixture of the two, but it looks like you treat the private
> mapping just like a shared one, and take no notice of the COWed pages
> out on swap which would have been reported before.  Oh, no, I think
> I misread, and you add the two together?  I agree that's the easiest
> thing to do, and therefore perhaps the best; but it doesn't fill me
> with conviction that it's the right thing to do.

Thanks for pointing this out, I totally missed this possibility! Well
the current patch is certainly not the right thing to do, as it can
over-account. The most correct solution would have to be implemented 
into the page walk and only check into shmem radix tree for individual 
pages that were not COWed. Michal Hocko suggested I try that, and 
although it does add some overhead (the complexity is n*log(n) AFAICT), 
it's not that bad from preliminary checks. Another advantage is that no 
new shmem code is needed, as we can use the generic find_get_entry(). 
Unless we want to really limit the extra complexity only to the special 
private mapping case with non-zero swap usage of the shmem object etc... 
I'll repost the series with that approach.

Other non-perfect solutions that come to mind:

1) For private mappings, count only the swapents. "Swap:" is no longer
showing full swap-in potential though.
2) For private mappings, do not count swapents. Ditto.
3) Provide two separate counters. The user won't know how much they
overlap, though.

 From these I would be inclined towards 3) as being more universal, 
although then it's no longer a simple "we're fixing a Swap: 0 value 
which is wrong", but closer to original Jerome's versions, which IIRC 
introduced several shmem-specific counters.

Well at least now I do understand why you don't particularly like this 
approach...

>> +"SwapPss" shows proportional swap share of this mapping. Shmem mappings will
>> +currently show 0 here.
>
> Yes, my heart sank when I remembered SwapPss, and I wondered what you were
> going to do with that.  I was imagining that the Swap number would go into
> SwapPss, but no, I prefer your choice to show 0 there (but depressed to
> see the word "currently", which hints at grand schemes to plumb in another
> radix_tree of swap counts, or more rmap_walks to calculate, or something).
>
>>   "AnonHugePages" shows the ammount of memory backed by transparent hugepage.
>>   "Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by
>>   hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 04999b2..103457c 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -14,6 +14,7 @@
>>   #include <linux/swapops.h>
>>   #include <linux/mmu_notifier.h>
>>   #include <linux/page_idle.h>
>> +#include <linux/shmem_fs.h>
>>
>>   #include <asm/elf.h>
>>   #include <asm/uaccess.h>
>> @@ -657,6 +658,51 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask,
>>   }
>>   #endif /* HUGETLB_PAGE */
>>
>> +#ifdef CONFIG_SHMEM
>
> Correct.
>
>> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
>> +{
>> +	struct inode *inode;
>> +	unsigned long swapped;
>> +	pgoff_t start, end;
>> +
>> +	if (!vma->vm_file)
>> +		return 0;
>> +
>> +	inode = file_inode(vma->vm_file);
>> +
>> +	if (!shmem_mapping(inode->i_mapping))
>> +		return 0;
>
> Someone somewhere may ask for an ops method,
> but that someone will certainly not be me.
>
>> +
>> +	/*
>> +	 * The easier cases are when the shmem object has nothing in swap, or
>> +	 * we have the whole object mapped. Then we can simply use the stats
>> +	 * that are already tracked by shmem.
>> +	 */
>> +	swapped = shmem_swap_usage(inode);
>> +
>> +	if (swapped == 0)
>> +		return 0;
>
> You are absolutely right to go for that optimization, but please
> please do it all inside one call to shmem.c: all you need is one
> shmem_swap_usage(inode, start, end)
> or
> shmem_swap_usage(vma).

OK.

>> +
>> +	if (vma->vm_end - vma->vm_start >= inode->i_size)
>
> That must be wrong.  It's probably right for all normal processes,
> and you may not be interested in the rest; but anyone can set up
> a mapping from end of file onwards, which won't intersect with the
> swap at all.  Just a little more thought on that test would be good.

Uh right, thanks for pointing out.

>> +		return swapped;
>> +
>> +	/*
>> +	 * Here we have to inspect individual pages in our mapped range to
>> +	 * determine how much of them are swapped out. Thanks to RCU, we don't
>> +	 * need i_mutex to protect against truncating or hole punching.
>> +	 */
>> +	start = linear_page_index(vma, vma->vm_start);
>> +	end = linear_page_index(vma, vma->vm_end);
>> +
>> +	return shmem_partial_swap_usage(inode->i_mapping, start, end);
>> +}
>> +#else
>> +static unsigned long smaps_shmem_swap(struct vm_area_struct *vma)
>> +{
>> +	return 0;
>> +}
>> +#endif
>> +
>>   static int show_smap(struct seq_file *m, void *v, int is_pid)
>>   {
>>   	struct vm_area_struct *vma = v;
>> @@ -674,6 +720,8 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>>   	/* mmap_sem is held in m_start */
>>   	walk_page_vma(vma, &smaps_walk);
>>
>> +	mss.swap += smaps_shmem_swap(vma);
>> +
>
> So, I think here you add the private swap to the object swap.

Yes, through ignorance.

>>   	show_map_vma(m, vma, is_pid);
>>
>>   	seq_printf(m,
>> diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
>> index 50777b5..12519e4 100644
>> --- a/include/linux/shmem_fs.h
>> +++ b/include/linux/shmem_fs.h
>> @@ -60,6 +60,12 @@ extern struct page *shmem_read_mapping_page_gfp(struct address_space *mapping,
>>   extern void shmem_truncate_range(struct inode *inode, loff_t start, loff_t end);
>>   extern int shmem_unuse(swp_entry_t entry, struct page *page);
>>
>> +#ifdef CONFIG_SWAP
>
> As Andrew said, better just drop the #ifdef here.
>
>> +extern unsigned long shmem_swap_usage(struct inode *inode);
>> +extern unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>> +						pgoff_t start, pgoff_t end);
>> +#endif
>> +
>>   static inline struct page *shmem_read_mapping_page(
>>   				struct address_space *mapping, pgoff_t index)
>>   {
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index b543cc7..b0e9e30 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -360,6 +360,67 @@ static int shmem_free_swap(struct address_space *mapping,
>>   }
>>
>>   /*
>> + * Determine (in bytes) how much of the whole shmem object is swapped out.
>> + */
>> +unsigned long shmem_swap_usage(struct inode *inode)
>> +{
>> +	struct shmem_inode_info *info = SHMEM_I(inode);
>> +	unsigned long swapped;
>> +
>> +	/* Mostly an overkill, but it's not atomic64_t */
>> +	spin_lock(&info->lock);
>
> Entirely overkill, what's atomic64_t got to do with it?
> info->swapped is an unsigned long, 32-bit on 32-bit, 64-bit on 64-bit,
> there are no atomicity issues.  READ_ONCE if you like, but I can't even
> see where it would read twice, or what bad consequence could result.

Right, I was wrongly thinking that the counter would be 64bit also on
32bit machines and thus not atomically readable...

>> +	swapped = info->swapped;
>> +	spin_unlock(&info->lock);
>> +
>> +	return swapped << PAGE_SHIFT;
>> +}
>> +
>> +/*
>> + * Determine (in bytes) how many pages within the given range are swapped out.
>> + *
>> + * Can be called without i_mutex or mapping->tree_lock thanks to RCU.
>
> Correct.
>
>> + */
>> +unsigned long shmem_partial_swap_usage(struct address_space *mapping,
>> +						pgoff_t start, pgoff_t end)
>> +{
>> +	struct radix_tree_iter iter;
>> +	void **slot;
>> +	struct page *page;
>> +	unsigned long swapped = 0;
>> +
>> +	rcu_read_lock();
>> +
>> +restart:
>> +	radix_tree_for_each_slot(slot, &mapping->page_tree, &iter, start) {
>> +		if (iter.index >= end)
>> +			break;
>> +
>> +		page = radix_tree_deref_slot(slot);
>> +
>> +		/*
>> +		 * This should only be possible to happen at index 0, so we
>> +		 * don't need to reset the counter, nor do we risk infinite
>> +		 * restarts.
>> +		 */
>> +		if (radix_tree_deref_retry(page))
>> +			goto restart;
>> +
>> +		if (radix_tree_exceptional_entry(page))
>> +			swapped++;
>> +
>> +		if (need_resched()) {
>> +			cond_resched_rcu();
>> +			start = iter.index + 1;
>> +			goto restart;
>> +		}
>> +	}
>> +
>> +	rcu_read_unlock();
>> +
>> +	return swapped << PAGE_SHIFT;
>> +}
>
> This is what you most wanted me to look at, but it looks perfect to me
> (aside from my wanting one call into shmem.c instead of two).

Thanks!

> Hugh
>
>> +
>> +/*
>>    * SysV IPC SHM_UNLOCK restore Unevictable pages to their evictable lists.
>>    */
>>   void shmem_unlock_mapping(struct address_space *mapping)
>> --
>> 2.5.2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-21 14:39       ` Vlastimil Babka
@ 2015-10-21 22:38         ` Hugh Dickins
  -1 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-21 22:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, linux-mm, Jerome Marchand, Andrew Morton,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Wed, 21 Oct 2015, Vlastimil Babka wrote:
> On 10/05/2015 05:01 AM, Hugh Dickins wrote:
> > On Fri, 2 Oct 2015, Vlastimil Babka wrote:
> > 
> > > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> > > mappings, even if the mapped portion does contain pages that were swapped
> > > out.
> > > This is because unlike private anonymous mappings, shmem does not change
> > > pte
> > > to swap entry, but pte_none when swapping the page out. In the smaps page
> > > walk, such page thus looks like it was never faulted in.
> > > 
> > > This patch changes smaps_pte_entry() to determine the swap status for
> > > such
> > > pte_none entries for shmem mappings, similarly to how mincore_page() does
> > > it.
> > > Swapped out pages are thus accounted for.
> > > 
> > > The accounting is arguably still not as precise as for private anonymous
> > > mappings, since now we will count also pages that the process in question
> > > never
> > > accessed, but only another process populated them and then let them
> > > become
> > > swapped out. I believe it is still less confusing and subtle than not
> > > showing
> > > any swap usage by shmem mappings at all. Also, swapped out pages only
> > > becomee a
> > > performance issue for future accesses, and we cannot predict those for
> > > neither
> > > kind of mapping.
> > > 
> > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > > Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > 
> > Neither Ack nor Nack from me.
> > 
> > I don't want to stand in the way of this patch, if you and others
> > believe that it will help to diagnose problems in the field better
> > than what's shown at present; but to me it looks dangerously like
> > replacing no information by wrong information.
> > 
> > As you acknowledge in the commit message, if a file of 100 pages
> > were copied to tmpfs, and 100 tasks map its full extent, but they
> > all mess around with the first 50 pages and take no interest in
> > the last 50, then it's quite likely that that last 50 will get
> > swapped out; then with your patch, 100 tasks are each shown as
> > using 50 pages of swap, when none of them are actually using any.
> 
> Yeah, but isn't it the same with private memory which was swapped out at some
> point and we don't know if it will be touched or not? The
> difference is in private case we know the process touched it at least
> once, but that can also mean nothing for the future (or maybe it just
> mmapped with MAP_POPULATE and didn't care about half of it).

I see that as quite different myself; but agree that neither way
predicts the future.  Now, if you can make a patch to predict the future...

FWIW, I do seem to be looking at it more from a point of view of how
much swap the process is using, whereas you're looking at it more
from a point of view of what delays would be incurred in accessing.

> 
> That's basically what I was trying to say in the changelog. I interpret
> the Swap: value as the amount of swap-in potential, if the process was
> going to access it, which is what the particular customer also expects (see
> below). In that case showing zero is IMHO wrong and inconsistent with the
> anonymous private mappings.

Yes, your changelog is honest about the difference, I don't dispute that.
As I said, neither Ack nor Nack from me: I just don't feel in a position
to judge whether changing the output of smaps to please this customer is
likely to displease another customer or not.

> 
> > It is rather as if we didn't bother to record Rss, and just put
> > Size in there instead: you are (for understandable reasons) treating
> > the virtual address space as if every page of it had been touched.
> > 
> > But I accept that there may well be a class of processes and problems
> > which would be better served by this fiction than the present: I expect
> > you have much more experience of helping out in such situations than I.
> 
> Well, the customers driving this change would in the best case want to
> see the shmem swap accounted continuously and e.g. see it immediately in the
> top output. Fixing (IMHO) the smaps output is the next best thing. The use
> case here is a application that really doesn't like page faults, and has
> background thread that checks and prefaults such areas when they are expected
> to be used soon. So they would like to identify these areas.

And I guess I won't be able to sell mlock(2) to you :)

Still neither Ack nor Nack from me: while your number is more information
(or misinformation) than always 0, it's still not clear to me that it will
give them what they need.

...
> > 
> > And for private mappings of tmpfs files?  I expected it to show an
> > inderminate mixture of the two, but it looks like you treat the private
> > mapping just like a shared one, and take no notice of the COWed pages
> > out on swap which would have been reported before.  Oh, no, I think
> > I misread, and you add the two together?  I agree that's the easiest
> > thing to do, and therefore perhaps the best; but it doesn't fill me
> > with conviction that it's the right thing to do.
> 
> Thanks for pointing this out, I totally missed this possibility! Well
> the current patch is certainly not the right thing to do, as it can
> over-account. The most correct solution would have to be implemented into the
> page walk and only check into shmem radix tree for individual pages that were
> not COWed. Michal Hocko suggested I try that, and although it does add some
> overhead (the complexity is n*log(n) AFAICT), it's not that bad from
> preliminary checks. Another advantage is that no new shmem code is needed, as
> we can use the generic find_get_entry(). Unless we want to really limit the
> extra complexity only to the special private mapping case with non-zero swap
> usage of the shmem object etc... I'll repost the series with that approach.
> 
> Other non-perfect solutions that come to mind:
> 
> 1) For private mappings, count only the swapents. "Swap:" is no longer
> showing full swap-in potential though.
> 2) For private mappings, do not count swapents. Ditto.
> 3) Provide two separate counters. The user won't know how much they
> overlap, though.
> 
> From these I would be inclined towards 3) as being more universal, although
> then it's no longer a simple "we're fixing a Swap: 0 value which is wrong",
> but closer to original Jerome's versions, which IIRC introduced several
> shmem-specific counters.
> 
> Well at least now I do understand why you don't particularly like this
> approach...

Have you considered extending mincore(2) for them?

It was always intended that more info could be added into its byte array
later - the man page I'm looking at says "The settings of the other bits
[than the least significant] in each byte are undefined; these bits are
reserved for possible later use."

That way your customers could get a precise picture of the status of
each page: without ambiguity as to whether it's anon, shmem, file, anon
swap, shmem swap, whatever; without ambiguity as to where 40kB of 80kB
lies in the region, the unused half or the vital half etc.

Or forget passing back the info: just offer an madvise(,, MADV_POPULATE)?

Hugh

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
@ 2015-10-21 22:38         ` Hugh Dickins
  0 siblings, 0 replies; 40+ messages in thread
From: Hugh Dickins @ 2015-10-21 22:38 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Hugh Dickins, linux-mm, Jerome Marchand, Andrew Morton,
	linux-kernel, linux-doc, Michal Hocko, Kirill A. Shutemov,
	Cyrill Gorcunov, Randy Dunlap, linux-s390, Martin Schwidefsky,
	Heiko Carstens, Peter Zijlstra, Paul Mackerras,
	Arnaldo Carvalho de Melo, Oleg Nesterov, Linux API,
	Konstantin Khlebnikov

On Wed, 21 Oct 2015, Vlastimil Babka wrote:
> On 10/05/2015 05:01 AM, Hugh Dickins wrote:
> > On Fri, 2 Oct 2015, Vlastimil Babka wrote:
> > 
> > > Currently, /proc/pid/smaps will always show "Swap: 0 kB" for shmem-backed
> > > mappings, even if the mapped portion does contain pages that were swapped
> > > out.
> > > This is because unlike private anonymous mappings, shmem does not change
> > > pte
> > > to swap entry, but pte_none when swapping the page out. In the smaps page
> > > walk, such page thus looks like it was never faulted in.
> > > 
> > > This patch changes smaps_pte_entry() to determine the swap status for
> > > such
> > > pte_none entries for shmem mappings, similarly to how mincore_page() does
> > > it.
> > > Swapped out pages are thus accounted for.
> > > 
> > > The accounting is arguably still not as precise as for private anonymous
> > > mappings, since now we will count also pages that the process in question
> > > never
> > > accessed, but only another process populated them and then let them
> > > become
> > > swapped out. I believe it is still less confusing and subtle than not
> > > showing
> > > any swap usage by shmem mappings at all. Also, swapped out pages only
> > > becomee a
> > > performance issue for future accesses, and we cannot predict those for
> > > neither
> > > kind of mapping.
> > > 
> > > Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> > > Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> > 
> > Neither Ack nor Nack from me.
> > 
> > I don't want to stand in the way of this patch, if you and others
> > believe that it will help to diagnose problems in the field better
> > than what's shown at present; but to me it looks dangerously like
> > replacing no information by wrong information.
> > 
> > As you acknowledge in the commit message, if a file of 100 pages
> > were copied to tmpfs, and 100 tasks map its full extent, but they
> > all mess around with the first 50 pages and take no interest in
> > the last 50, then it's quite likely that that last 50 will get
> > swapped out; then with your patch, 100 tasks are each shown as
> > using 50 pages of swap, when none of them are actually using any.
> 
> Yeah, but isn't it the same with private memory which was swapped out at some
> point and we don't know if it will be touched or not? The
> difference is in private case we know the process touched it at least
> once, but that can also mean nothing for the future (or maybe it just
> mmapped with MAP_POPULATE and didn't care about half of it).

I see that as quite different myself; but agree that neither way
predicts the future.  Now, if you can make a patch to predict the future...

FWIW, I do seem to be looking at it more from a point of view of how
much swap the process is using, whereas you're looking at it more
from a point of view of what delays would be incurred in accessing.

> 
> That's basically what I was trying to say in the changelog. I interpret
> the Swap: value as the amount of swap-in potential, if the process was
> going to access it, which is what the particular customer also expects (see
> below). In that case showing zero is IMHO wrong and inconsistent with the
> anonymous private mappings.

Yes, your changelog is honest about the difference, I don't dispute that.
As I said, neither Ack nor Nack from me: I just don't feel in a position
to judge whether changing the output of smaps to please this customer is
likely to displease another customer or not.

> 
> > It is rather as if we didn't bother to record Rss, and just put
> > Size in there instead: you are (for understandable reasons) treating
> > the virtual address space as if every page of it had been touched.
> > 
> > But I accept that there may well be a class of processes and problems
> > which would be better served by this fiction than the present: I expect
> > you have much more experience of helping out in such situations than I.
> 
> Well, the customers driving this change would in the best case want to
> see the shmem swap accounted continuously and e.g. see it immediately in the
> top output. Fixing (IMHO) the smaps output is the next best thing. The use
> case here is a application that really doesn't like page faults, and has
> background thread that checks and prefaults such areas when they are expected
> to be used soon. So they would like to identify these areas.

And I guess I won't be able to sell mlock(2) to you :)

Still neither Ack nor Nack from me: while your number is more information
(or misinformation) than always 0, it's still not clear to me that it will
give them what they need.

...
> > 
> > And for private mappings of tmpfs files?  I expected it to show an
> > inderminate mixture of the two, but it looks like you treat the private
> > mapping just like a shared one, and take no notice of the COWed pages
> > out on swap which would have been reported before.  Oh, no, I think
> > I misread, and you add the two together?  I agree that's the easiest
> > thing to do, and therefore perhaps the best; but it doesn't fill me
> > with conviction that it's the right thing to do.
> 
> Thanks for pointing this out, I totally missed this possibility! Well
> the current patch is certainly not the right thing to do, as it can
> over-account. The most correct solution would have to be implemented into the
> page walk and only check into shmem radix tree for individual pages that were
> not COWed. Michal Hocko suggested I try that, and although it does add some
> overhead (the complexity is n*log(n) AFAICT), it's not that bad from
> preliminary checks. Another advantage is that no new shmem code is needed, as
> we can use the generic find_get_entry(). Unless we want to really limit the
> extra complexity only to the special private mapping case with non-zero swap
> usage of the shmem object etc... I'll repost the series with that approach.
> 
> Other non-perfect solutions that come to mind:
> 
> 1) For private mappings, count only the swapents. "Swap:" is no longer
> showing full swap-in potential though.
> 2) For private mappings, do not count swapents. Ditto.
> 3) Provide two separate counters. The user won't know how much they
> overlap, though.
> 
> From these I would be inclined towards 3) as being more universal, although
> then it's no longer a simple "we're fixing a Swap: 0 value which is wrong",
> but closer to original Jerome's versions, which IIRC introduced several
> shmem-specific counters.
> 
> Well at least now I do understand why you don't particularly like this
> approach...

Have you considered extending mincore(2) for them?

It was always intended that more info could be added into its byte array
later - the man page I'm looking at says "The settings of the other bits
[than the least significant] in each byte are undefined; these bits are
reserved for possible later use."

That way your customers could get a precise picture of the status of
each page: without ambiguity as to whether it's anon, shmem, file, anon
swap, shmem swap, whatever; without ambiguity as to where 40kB of 80kB
lies in the region, the unused half or the vital half etc.

Or forget passing back the info: just offer an madvise(,, MADV_POPULATE)?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 40+ messages in thread

* Re: [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps
  2015-10-21 14:39       ` Vlastimil Babka
  (?)
  (?)
@ 2015-10-26 11:22       ` Jerome Marchand
  -1 siblings, 0 replies; 40+ messages in thread
From: Jerome Marchand @ 2015-10-26 11:22 UTC (permalink / raw)
  To: Vlastimil Babka, Hugh Dickins
  Cc: linux-mm, Andrew Morton, linux-kernel, linux-doc, Michal Hocko,
	Kirill A. Shutemov, Cyrill Gorcunov, Randy Dunlap, linux-s390,
	Martin Schwidefsky, Heiko Carstens, Peter Zijlstra,
	Paul Mackerras, Arnaldo Carvalho de Melo, Oleg Nesterov,
	Linux API, Konstantin Khlebnikov

[-- Attachment #1: Type: text/plain, Size: 3397 bytes --]

On 10/21/2015 04:39 PM, Vlastimil Babka wrote:
> On 10/05/2015 05:01 AM, Hugh Dickins wrote:
>> On Fri, 2 Oct 2015, Vlastimil Babka wrote:

>> As you acknowledge in the commit message, if a file of 100 pages
>> were copied to tmpfs, and 100 tasks map its full extent, but they
>> all mess around with the first 50 pages and take no interest in
>> the last 50, then it's quite likely that that last 50 will get
>> swapped out; then with your patch, 100 tasks are each shown as
>> using 50 pages of swap, when none of them are actually using any.
> 
> Yeah, but isn't it the same with private memory which was swapped out at
> some point and we don't know if it will be touched or not? The
> difference is in private case we know the process touched it at least
> once, but that can also mean nothing for the future (or maybe it just
> mmapped with MAP_POPULATE and didn't care about half of it).
> 
> That's basically what I was trying to say in the changelog. I interpret
> the Swap: value as the amount of swap-in potential, if the process was
> going to access it, which is what the particular customer also expects
> (see below). In that case showing zero is IMHO wrong and inconsistent
> with the anonymous private mappings.

I didn't understand the changelog that way an IMHO it's a pretty
specific interpretation. I've always understood memory accounting as
being primarily the answer to the question: how much resources a
process uses? I guess its meaning as been overloaded with corollaries
that are only true in the most simple non-shared cases, such as yours
or "how much memory would be freed if this process goes away?", but I
don't think it should ever be used as a definition.

I suppose the reason I didn't understand the changelog the way you
intended is because I think that sometimes it's correct to blame a
process for pages it never accessed (and I also believe that
over-accounting is better that under accounting,  but I must admit
that it is a quite arbitrary point of view). For instance, what if a
process has a shared anonymous mapping that includes pages that it
never accessed, but have been populated by an other process that has
already exited or munmaped the range? That process is not to blame for
the appearance of these pages, but it's the only reason why they
stay.

I'll offer a lollipop to anyone who comes up with a simple consistent
model on how to account shmem pages for all the possible cases, a
"Grand Unified Theory of Memory Accounting" so to speak.

> Other non-perfect solutions that come to mind:
> 
> 1) For private mappings, count only the swapents. "Swap:" is no longer
> showing full swap-in potential though.
> 2) For private mappings, do not count swapents. Ditto.
> 3) Provide two separate counters. The user won't know how much they
> overlap, though.
> 
> From these I would be inclined towards 3) as being more universal,
> although then it's no longer a simple "we're fixing a Swap: 0 value
> which is wrong", but closer to original Jerome's versions, which IIRC
> introduced several shmem-specific counters.

You remember correctly. Given all the controversy around shmem
accounting, maybe it would indeed be better to leave existing
counters, that are relatively well defined and understood, untouched
and add specific counters to mess around instead.

Thanks,
Jerome


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 473 bytes --]

^ permalink raw reply	[flat|nested] 40+ messages in thread

end of thread, other threads:[~2015-10-26 11:23 UTC | newest]

Thread overview: 40+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-02 13:35 [PATCH v4 0/4] enhance shmem process and swap accounting Vlastimil Babka
2015-10-02 13:35 ` Vlastimil Babka
2015-10-02 13:35 ` [PATCH v4 1/4] mm, documentation: clarify /proc/pid/status VmSwap limitations Vlastimil Babka
2015-10-02 13:35   ` Vlastimil Babka
2015-10-02 14:56   ` Jerome Marchand
2015-10-05  1:05   ` Hugh Dickins
2015-10-05  1:05     ` Hugh Dickins
2015-10-06  7:05     ` Vlastimil Babka
2015-10-06  7:05       ` Vlastimil Babka
2015-10-02 13:35 ` [PATCH v4 2/4] mm, proc: account for shmem swap in /proc/pid/smaps Vlastimil Babka
2015-10-02 13:35   ` Vlastimil Babka
2015-10-02 15:00   ` Jerome Marchand
2015-10-02 15:20   ` Michal Hocko
2015-10-02 15:20     ` Michal Hocko
2015-10-02 22:37   ` Andrew Morton
2015-10-02 22:37     ` Andrew Morton
2015-10-06  7:08     ` Vlastimil Babka
2015-10-06  7:08       ` Vlastimil Babka
2015-10-05  3:01   ` Hugh Dickins
2015-10-05  3:01     ` Hugh Dickins
2015-10-21 14:39     ` Vlastimil Babka
2015-10-21 14:39       ` Vlastimil Babka
2015-10-21 22:38       ` Hugh Dickins
2015-10-21 22:38         ` Hugh Dickins
2015-10-26 11:22       ` Jerome Marchand
2015-10-05  7:53   ` Peter Zijlstra
2015-10-05  7:53     ` Peter Zijlstra
2015-10-02 13:35 ` [PATCH v4 3/4] mm, shmem: Add shmem resident memory accounting Vlastimil Babka
2015-10-02 13:35   ` Vlastimil Babka
2015-10-02 22:37   ` Andrew Morton
2015-10-02 22:37     ` Andrew Morton
2015-10-05  4:28   ` Hugh Dickins
2015-10-05  4:28     ` Hugh Dickins
2015-10-02 13:35 ` [PATCH v4 4/4] mm, procfs: Display VmAnon, VmFile and VmShm in /proc/pid/status Vlastimil Babka
2015-10-02 13:35   ` Vlastimil Babka
2015-10-02 22:37   ` Andrew Morton
2015-10-02 22:37     ` Andrew Morton
2015-10-05  4:55   ` Hugh Dickins
2015-10-05  4:55     ` Hugh Dickins
2015-10-05  4:55     ` Hugh Dickins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.