[v2] mm: show proportional swap share of the mapping
diff mbox series

Message ID 1434373614-1041-1-git-send-email-minchan@kernel.org
State New, archived
Headers show
Series
  • [v2] mm: show proportional swap share of the mapping
Related show

Commit Message

Minchan Kim June 15, 2015, 1:06 p.m. UTC
We want to know per-process workingset size for smart memory management
on userland and we use swap(ex, zram) heavily to maximize memory efficiency
so workingset includes swap as well as RSS.

On such system, if there are lots of shared anonymous pages, it's
really hard to figure out exactly how many each process consumes
memory(ie, rss + wap) if the system has lots of shared anonymous
memory(e.g, android).

This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
more exact workingset size per process.

Bongkyu tested it. Result is below.

1. 50M used swap
SwapTotal: 461976 kB
SwapFree: 411192 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
48236
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
141184

2. 240M used swap
SwapTotal: 461976 kB
SwapFree: 216808 kB

$ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
230315
$ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
1387744

* from v1
  * add more description - Andrew
  * swp_swacount fix on !CONFIG_SWP - Sergey
  * add what PSS is to proc.txt - Andrew
    * Bring quote from lwn.net - Corbet
      * http://lwn.net/Articles/230975/

Cc: Hugh Dickins <hughd@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Report-and-Tested-by: Bongkyu Kim <bongkyu.kim@lge.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 Documentation/filesystems/proc.txt | 18 +++++++++++-----
 fs/proc/task_mmu.c                 | 18 ++++++++++++++--
 include/linux/swap.h               |  6 ++++++
 mm/swapfile.c                      | 42 ++++++++++++++++++++++++++++++++++++++
 4 files changed, 77 insertions(+), 7 deletions(-)

Comments

Minchan Kim July 7, 2015, 1:47 p.m. UTC | #1
It seems merge windows is closed so bump up.

On Mon, Jun 15, 2015 at 10:06:54PM +0900, Minchan Kim wrote:
> We want to know per-process workingset size for smart memory management
> on userland and we use swap(ex, zram) heavily to maximize memory efficiency
> so workingset includes swap as well as RSS.
> 
> On such system, if there are lots of shared anonymous pages, it's
> really hard to figure out exactly how many each process consumes
> memory(ie, rss + wap) if the system has lots of shared anonymous
> memory(e.g, android).
> 
> This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
> more exact workingset size per process.
> 
> Bongkyu tested it. Result is below.
> 
> 1. 50M used swap
> SwapTotal: 461976 kB
> SwapFree: 411192 kB
> 
> $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
> 48236
> $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
> 141184
> 
> 2. 240M used swap
> SwapTotal: 461976 kB
> SwapFree: 216808 kB
> 
> $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
> 230315
> $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
> 1387744
> 
> * from v1
>   * add more description - Andrew
>   * swp_swacount fix on !CONFIG_SWP - Sergey
>   * add what PSS is to proc.txt - Andrew
>     * Bring quote from lwn.net - Corbet
>       * http://lwn.net/Articles/230975/
> 
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
> Cc: Jonathan Corbet <corbet@lwn.net>
> Report-and-Tested-by: Bongkyu Kim <bongkyu.kim@lge.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  Documentation/filesystems/proc.txt | 18 +++++++++++-----
>  fs/proc/task_mmu.c                 | 18 ++++++++++++++--
>  include/linux/swap.h               |  6 ++++++
>  mm/swapfile.c                      | 42 ++++++++++++++++++++++++++++++++++++++
>  4 files changed, 77 insertions(+), 7 deletions(-)
> 
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index c3b6b301d8b0..cfc765e6cfa6 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -423,6 +423,7 @@ Private_Dirty:         0 kB
>  Referenced:          892 kB
>  Anonymous:             0 kB
>  Swap:                  0 kB
> +SwapPss:               0 kB
>  KernelPageSize:        4 kB
>  MMUPageSize:           4 kB
>  Locked:              374 kB
> @@ -432,16 +433,23 @@ the first of these lines shows the same information as is displayed for the
>  mapping in /proc/PID/maps.  The remaining lines show the size of the mapping
>  (size), the amount of the mapping that is currently resident in RAM (RSS), the
>  process' proportional share of this mapping (PSS), the number of clean and
> -dirty private pages in the mapping.  Note that even a page which is part of a
> -MAP_SHARED mapping, but has only a single pte mapped, i.e.  is currently used
> -by only one process, is accounted as private and not as shared.  "Referenced"
> -indicates the amount of memory currently marked as referenced or accessed.
> +dirty private pages in the mapping.
> +
> +The "proportional set size" (PSS) of a process is the count of pages it has
> +in memory, where each page is divided by the number of processes sharing it.
> +So if a process has 1000 pages all to itself, and 1000 shared with one other
> +process, its PSS will be 1500.
> +Note that even a page which is part of a MAP_SHARED mapping, but has only
> +a single pte mapped, i.e.  is currently used by only one process, is accounted
> +as private and not as shared.
> +"Referenced" indicates the amount of memory currently marked as referenced or
> +accessed.
>  "Anonymous" shows the amount of memory that does not belong to any file.  Even
>  a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
>  and a page is modified, the file page is replaced by a private anonymous copy.
>  "Swap" shows how much would-be-anonymous memory is also used, but out on
>  swap.
> -
> +"SwapPss" shows proportional swap share of this mapping.
>  "VmFlags" field deserves a separate description. This member represents the kernel
>  flags associated with the particular virtual memory area in two letter encoded
>  manner. The codes are the following:
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 6dee68d013ff..d537899f4b25 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -446,6 +446,7 @@ struct mem_size_stats {
>  	unsigned long anonymous_thp;
>  	unsigned long swap;
>  	u64 pss;
> +	u64 swap_pss;
>  };
>  
>  static void smaps_account(struct mem_size_stats *mss, struct page *page,
> @@ -492,9 +493,20 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
>  	} else if (is_swap_pte(*pte)) {
>  		swp_entry_t swpent = pte_to_swp_entry(*pte);
>  
> -		if (!non_swap_entry(swpent))
> +		if (!non_swap_entry(swpent)) {
> +			int mapcount;
> +
>  			mss->swap += PAGE_SIZE;
> -		else if (is_migration_entry(swpent))
> +			mapcount = swp_swapcount(swpent);
> +			if (mapcount >= 2) {
> +				u64 pss_delta = (u64)PAGE_SIZE << PSS_SHIFT;
> +
> +				do_div(pss_delta, mapcount);
> +				mss->swap_pss += pss_delta;
> +			} else {
> +				mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
> +			}
> +		} else if (is_migration_entry(swpent))
>  			page = migration_entry_to_page(swpent);
>  	}
>  
> @@ -638,6 +650,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   "Anonymous:      %8lu kB\n"
>  		   "AnonHugePages:  %8lu kB\n"
>  		   "Swap:           %8lu kB\n"
> +		   "SwapPss:        %8lu kB\n"
>  		   "KernelPageSize: %8lu kB\n"
>  		   "MMUPageSize:    %8lu kB\n"
>  		   "Locked:         %8lu kB\n",
> @@ -652,6 +665,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   mss.anonymous >> 10,
>  		   mss.anonymous_thp >> 10,
>  		   mss.swap >> 10,
> +		   (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
>  		   vma_kernel_pagesize(vma) >> 10,
>  		   vma_mmu_pagesize(vma) >> 10,
>  		   (vma->vm_flags & VM_LOCKED) ?
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index cee108cbe2d5..afc9eb3cba48 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -432,6 +432,7 @@ extern unsigned int count_swap_pages(int, int);
>  extern sector_t map_swap_page(struct page *, struct block_device **);
>  extern sector_t swapdev_block(int, pgoff_t);
>  extern int page_swapcount(struct page *);
> +extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
>  extern int reuse_swap_page(struct page *);
>  extern int try_to_free_swap(struct page *);
> @@ -523,6 +524,11 @@ static inline int page_swapcount(struct page *page)
>  	return 0;
>  }
>  
> +static inline int swp_swapcount(swp_entry_t entry)
> +{
> +	return 0;
> +}
> +
>  #define reuse_swap_page(page)	(page_mapcount(page) == 1)
>  
>  static inline int try_to_free_swap(struct page *page)
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index a7e72103f23b..7a6bd1e5a8e9 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -875,6 +875,48 @@ int page_swapcount(struct page *page)
>  }
>  
>  /*
> + * How many references to @entry are currently swapped out?
> + * This considers COUNT_CONTINUED so it returns exact answer.
> + */
> +int swp_swapcount(swp_entry_t entry)
> +{
> +	int count, tmp_count, n;
> +	struct swap_info_struct *p;
> +	struct page *page;
> +	pgoff_t offset;
> +	unsigned char *map;
> +
> +	p = swap_info_get(entry);
> +	if (!p)
> +		return 0;
> +
> +	count = swap_count(p->swap_map[swp_offset(entry)]);
> +	if (!(count & COUNT_CONTINUED))
> +		goto out;
> +
> +	count &= ~COUNT_CONTINUED;
> +	n = SWAP_MAP_MAX + 1;
> +
> +	offset = swp_offset(entry);
> +	page = vmalloc_to_page(p->swap_map + offset);
> +	offset &= ~PAGE_MASK;
> +	VM_BUG_ON(page_private(page) != SWP_CONTINUED);
> +
> +	do {
> +		page = list_entry(page->lru.next, struct page, lru);
> +		map = kmap_atomic(page) + offset;
> +		tmp_count = *map;
> +		kunmap_atomic(map);
> +
> +		count += (tmp_count & ~COUNT_CONTINUED) * n;
> +		n *= (SWAP_CONT_MAX + 1);
> +	} while (tmp_count & COUNT_CONTINUED);
> +out:
> +	spin_unlock(&p->lock);
> +	return count;
> +}
> +
> +/*
>   * We can write to an anon page without COW if there are no other references
>   * to it.  And as a side-effect, free up its swap: because the old content
>   * on disk will never be read, and seeking back there to write new content
> -- 
> 1.9.1
>
Andrew Morton July 14, 2015, 9:07 p.m. UTC | #2
On Mon, 15 Jun 2015 22:06:54 +0900 Minchan Kim <minchan@kernel.org> wrote:

> We want to know per-process workingset size for smart memory management
> on userland and we use swap(ex, zram) heavily to maximize memory efficiency
> so workingset includes swap as well as RSS.
> 
> On such system, if there are lots of shared anonymous pages, it's
> really hard to figure out exactly how many each process consumes
> memory(ie, rss + wap) if the system has lots of shared anonymous
> memory(e.g, android).
> 
> This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
> more exact workingset size per process.
> 
> ...
>
> +int swp_swapcount(swp_entry_t entry)
> +{
> +	int count, tmp_count, n;
> +	struct swap_info_struct *p;
> +	struct page *page;
> +	pgoff_t offset;
> +	unsigned char *map;
> +
> +	p = swap_info_get(entry);
> +	if (!p)
> +		return 0;
> +
> +	count = swap_count(p->swap_map[swp_offset(entry)]);
> +	if (!(count & COUNT_CONTINUED))
> +		goto out;
> +
> +	count &= ~COUNT_CONTINUED;
> +	n = SWAP_MAP_MAX + 1;
> +
> +	offset = swp_offset(entry);
> +	page = vmalloc_to_page(p->swap_map + offset);
> +	offset &= ~PAGE_MASK;
> +	VM_BUG_ON(page_private(page) != SWP_CONTINUED);
> +
> +	do {
> +		page = list_entry(page->lru.next, struct page, lru);
> +		map = kmap_atomic(page) + offset;
> +		tmp_count = *map;
> +		kunmap_atomic(map);

A little thing: I've never liked the way that kunmap_atomic() accepts
any address within the page.  It's weird, and it makes the reviewer
have to scramble around to make sure the offset can never be >=
PAGE_SIZE.

We can easily avoid doing it here:

--- a/mm/swapfile.c~mm-show-proportional-swap-share-of-the-mapping-fix
+++ a/mm/swapfile.c
@@ -904,8 +904,8 @@ int swp_swapcount(swp_entry_t entry)
 
 	do {
 		page = list_entry(page->lru.next, struct page, lru);
-		map = kmap_atomic(page) + offset;
-		tmp_count = *map;
+		map = kmap_atomic(page);
+		tmp_count = map[offset];
 		kunmap_atomic(map);
 
 		count += (tmp_count & ~COUNT_CONTINUED) * n;

> +		count += (tmp_count & ~COUNT_CONTINUED) * n;
> +		n *= (SWAP_CONT_MAX + 1);
> +	} while (tmp_count & COUNT_CONTINUED);
> +out:
> +	spin_unlock(&p->lock);
> +	return count;
> +}
>
> ...
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Minchan Kim July 15, 2015, 11:49 p.m. UTC | #3
On Tue, Jul 14, 2015 at 02:07:08PM -0700, Andrew Morton wrote:
> On Mon, 15 Jun 2015 22:06:54 +0900 Minchan Kim <minchan@kernel.org> wrote:
> 
> > We want to know per-process workingset size for smart memory management
> > on userland and we use swap(ex, zram) heavily to maximize memory efficiency
> > so workingset includes swap as well as RSS.
> > 
> > On such system, if there are lots of shared anonymous pages, it's
> > really hard to figure out exactly how many each process consumes
> > memory(ie, rss + wap) if the system has lots of shared anonymous
> > memory(e.g, android).
> > 
> > This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
> > more exact workingset size per process.
> > 
> > ...
> >
> > +int swp_swapcount(swp_entry_t entry)
> > +{
> > +	int count, tmp_count, n;
> > +	struct swap_info_struct *p;
> > +	struct page *page;
> > +	pgoff_t offset;
> > +	unsigned char *map;
> > +
> > +	p = swap_info_get(entry);
> > +	if (!p)
> > +		return 0;
> > +
> > +	count = swap_count(p->swap_map[swp_offset(entry)]);
> > +	if (!(count & COUNT_CONTINUED))
> > +		goto out;
> > +
> > +	count &= ~COUNT_CONTINUED;
> > +	n = SWAP_MAP_MAX + 1;
> > +
> > +	offset = swp_offset(entry);
> > +	page = vmalloc_to_page(p->swap_map + offset);
> > +	offset &= ~PAGE_MASK;
> > +	VM_BUG_ON(page_private(page) != SWP_CONTINUED);
> > +
> > +	do {
> > +		page = list_entry(page->lru.next, struct page, lru);
> > +		map = kmap_atomic(page) + offset;
> > +		tmp_count = *map;
> > +		kunmap_atomic(map);
> 
> A little thing: I've never liked the way that kunmap_atomic() accepts
> any address within the page.  It's weird, and it makes the reviewer
> have to scramble around to make sure the offset can never be >=
> PAGE_SIZE.

Very ture. I was bitten by that.
Thanks for the clean up.

> 
> We can easily avoid doing it here:
> 
> --- a/mm/swapfile.c~mm-show-proportional-swap-share-of-the-mapping-fix
> +++ a/mm/swapfile.c
> @@ -904,8 +904,8 @@ int swp_swapcount(swp_entry_t entry)
>  
>  	do {
>  		page = list_entry(page->lru.next, struct page, lru);
> -		map = kmap_atomic(page) + offset;
> -		tmp_count = *map;
> +		map = kmap_atomic(page);
> +		tmp_count = map[offset];
>  		kunmap_atomic(map);
>  
>  		count += (tmp_count & ~COUNT_CONTINUED) * n;
> 
> > +		count += (tmp_count & ~COUNT_CONTINUED) * n;
> > +		n *= (SWAP_CONT_MAX + 1);
> > +	} while (tmp_count & COUNT_CONTINUED);
> > +out:
> > +	spin_unlock(&p->lock);
> > +	return count;
> > +}
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jerome Marchand July 29, 2015, 8:33 a.m. UTC | #4
On 06/15/2015 03:06 PM, Minchan Kim wrote:
> We want to know per-process workingset size for smart memory management
> on userland and we use swap(ex, zram) heavily to maximize memory efficiency
> so workingset includes swap as well as RSS.
> 
> On such system, if there are lots of shared anonymous pages, it's
> really hard to figure out exactly how many each process consumes
> memory(ie, rss + wap) if the system has lots of shared anonymous
> memory(e.g, android).
> 
> This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
> more exact workingset size per process.
> 
> Bongkyu tested it. Result is below.
> 
> 1. 50M used swap
> SwapTotal: 461976 kB
> SwapFree: 411192 kB
> 
> $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
> 48236
> $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
> 141184

Hi Minchan,

I just found out about this patch. What kind of shared memory is that?
Since it's android, I'm inclined to think something specific like
ashmem. I'm asking because this patch won't help for more common type of
shared memory. See my comment below.

> 
> 2. 240M used swap
> SwapTotal: 461976 kB
> SwapFree: 216808 kB
> 
> $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
> 230315
> $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
> 1387744
> 
snip
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 6dee68d013ff..d537899f4b25 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -446,6 +446,7 @@ struct mem_size_stats {
>  	unsigned long anonymous_thp;
>  	unsigned long swap;
>  	u64 pss;
> +	u64 swap_pss;
>  };
>  
>  static void smaps_account(struct mem_size_stats *mss, struct page *page,
> @@ -492,9 +493,20 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
>  	} else if (is_swap_pte(*pte)) {

This won't work for sysV shm, tmpfs and MAP_SHARED | MAP_ANONYMOUS
mapping pages which are pte_none when paged out. They're currently not
accounted at all when in swap.

Jerome

>  		swp_entry_t swpent = pte_to_swp_entry(*pte);
>  
> -		if (!non_swap_entry(swpent))
> +		if (!non_swap_entry(swpent)) {
> +			int mapcount;
> +
>  			mss->swap += PAGE_SIZE;
> -		else if (is_migration_entry(swpent))
> +			mapcount = swp_swapcount(swpent);
> +			if (mapcount >= 2) {
> +				u64 pss_delta = (u64)PAGE_SIZE << PSS_SHIFT;
> +
> +				do_div(pss_delta, mapcount);
> +				mss->swap_pss += pss_delta;
> +			} else {
> +				mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
> +			}
> +		} else if (is_migration_entry(swpent))
>  			page = migration_entry_to_page(swpent);
>  	}
>  
> @@ -638,6 +650,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   "Anonymous:      %8lu kB\n"
>  		   "AnonHugePages:  %8lu kB\n"
>  		   "Swap:           %8lu kB\n"
> +		   "SwapPss:        %8lu kB\n"
>  		   "KernelPageSize: %8lu kB\n"
>  		   "MMUPageSize:    %8lu kB\n"
>  		   "Locked:         %8lu kB\n",
> @@ -652,6 +665,7 @@ static int show_smap(struct seq_file *m, void *v, int is_pid)
>  		   mss.anonymous >> 10,
>  		   mss.anonymous_thp >> 10,
>  		   mss.swap >> 10,
> +		   (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
>  		   vma_kernel_pagesize(vma) >> 10,
>  		   vma_mmu_pagesize(vma) >> 10,
>  		   (vma->vm_flags & VM_LOCKED) ?
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index cee108cbe2d5..afc9eb3cba48 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -432,6 +432,7 @@ extern unsigned int count_swap_pages(int, int);
>  extern sector_t map_swap_page(struct page *, struct block_device **);
>  extern sector_t swapdev_block(int, pgoff_t);
>  extern int page_swapcount(struct page *);
> +extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
>  extern int reuse_swap_page(struct page *);
>  extern int try_to_free_swap(struct page *);
> @@ -523,6 +524,11 @@ static inline int page_swapcount(struct page *page)
>  	return 0;
>  }
>  
> +static inline int swp_swapcount(swp_entry_t entry)
> +{
> +	return 0;
> +}
> +
>  #define reuse_swap_page(page)	(page_mapcount(page) == 1)
>  
>  static inline int try_to_free_swap(struct page *page)
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index a7e72103f23b..7a6bd1e5a8e9 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -875,6 +875,48 @@ int page_swapcount(struct page *page)
>  }
>  
>  /*
> + * How many references to @entry are currently swapped out?
> + * This considers COUNT_CONTINUED so it returns exact answer.
> + */
> +int swp_swapcount(swp_entry_t entry)
> +{
> +	int count, tmp_count, n;
> +	struct swap_info_struct *p;
> +	struct page *page;
> +	pgoff_t offset;
> +	unsigned char *map;
> +
> +	p = swap_info_get(entry);
> +	if (!p)
> +		return 0;
> +
> +	count = swap_count(p->swap_map[swp_offset(entry)]);
> +	if (!(count & COUNT_CONTINUED))
> +		goto out;
> +
> +	count &= ~COUNT_CONTINUED;
> +	n = SWAP_MAP_MAX + 1;
> +
> +	offset = swp_offset(entry);
> +	page = vmalloc_to_page(p->swap_map + offset);
> +	offset &= ~PAGE_MASK;
> +	VM_BUG_ON(page_private(page) != SWP_CONTINUED);
> +
> +	do {
> +		page = list_entry(page->lru.next, struct page, lru);
> +		map = kmap_atomic(page) + offset;
> +		tmp_count = *map;
> +		kunmap_atomic(map);
> +
> +		count += (tmp_count & ~COUNT_CONTINUED) * n;
> +		n *= (SWAP_CONT_MAX + 1);
> +	} while (tmp_count & COUNT_CONTINUED);
> +out:
> +	spin_unlock(&p->lock);
> +	return count;
> +}
> +
> +/*
>   * We can write to an anon page without COW if there are no other references
>   * to it.  And as a side-effect, free up its swap: because the old content
>   * on disk will never be read, and seeking back there to write new content
>
Minchan Kim July 29, 2015, 10:30 a.m. UTC | #5
Hi Jerome,

On Wed, Jul 29, 2015 at 10:33:53AM +0200, Jerome Marchand wrote:
> On 06/15/2015 03:06 PM, Minchan Kim wrote:
> > We want to know per-process workingset size for smart memory management
> > on userland and we use swap(ex, zram) heavily to maximize memory efficiency
> > so workingset includes swap as well as RSS.
> > 
> > On such system, if there are lots of shared anonymous pages, it's
> > really hard to figure out exactly how many each process consumes
> > memory(ie, rss + wap) if the system has lots of shared anonymous
> > memory(e.g, android).
> > 
> > This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
> > more exact workingset size per process.
> > 
> > Bongkyu tested it. Result is below.
> > 
> > 1. 50M used swap
> > SwapTotal: 461976 kB
> > SwapFree: 411192 kB
> > 
> > $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
> > 48236
> > $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
> > 141184
> 
> Hi Minchan,
> 
> I just found out about this patch. What kind of shared memory is that?
> Since it's android, I'm inclined to think something specific like
> ashmem. I'm asking because this patch won't help for more common type of
> shared memory. See my comment below.

It's normal heap of parent(IOW, MAP_ANON|MAP_PRIVATE memory which is share
 by child processes).

> 
> > 
> > 2. 240M used swap
> > SwapTotal: 461976 kB
> > SwapFree: 216808 kB
> > 
> > $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
> > 230315
> > $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
> > 1387744
> > 
> snip
> > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> > index 6dee68d013ff..d537899f4b25 100644
> > --- a/fs/proc/task_mmu.c
> > +++ b/fs/proc/task_mmu.c
> > @@ -446,6 +446,7 @@ struct mem_size_stats {
> >  	unsigned long anonymous_thp;
> >  	unsigned long swap;
> >  	u64 pss;
> > +	u64 swap_pss;
> >  };
> >  
> >  static void smaps_account(struct mem_size_stats *mss, struct page *page,
> > @@ -492,9 +493,20 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
> >  	} else if (is_swap_pte(*pte)) {
> 
> This won't work for sysV shm, tmpfs and MAP_SHARED | MAP_ANONYMOUS
> mapping pages which are pte_none when paged out. They're currently not
> accounted at all when in swap.

This patch doesn't handle those pages because we don't have supported
thoses pages. IMHO, if someone need it, it should be another patch and
he can contribute it in future.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Jerome Marchand July 29, 2015, 11:56 a.m. UTC | #6
On 07/29/2015 12:30 PM, Minchan Kim wrote:
> Hi Jerome,
> 
> On Wed, Jul 29, 2015 at 10:33:53AM +0200, Jerome Marchand wrote:
>> On 06/15/2015 03:06 PM, Minchan Kim wrote:
>>> We want to know per-process workingset size for smart memory management
>>> on userland and we use swap(ex, zram) heavily to maximize memory efficiency
>>> so workingset includes swap as well as RSS.
>>>
>>> On such system, if there are lots of shared anonymous pages, it's
>>> really hard to figure out exactly how many each process consumes
>>> memory(ie, rss + wap) if the system has lots of shared anonymous
>>> memory(e.g, android).
>>>
>>> This patch introduces SwapPss field on /proc/<pid>/smaps so we can get
>>> more exact workingset size per process.
>>>
>>> Bongkyu tested it. Result is below.
>>>
>>> 1. 50M used swap
>>> SwapTotal: 461976 kB
>>> SwapFree: 411192 kB
>>>
>>> $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
>>> 48236
>>> $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
>>> 141184
>>
>> Hi Minchan,
>>
>> I just found out about this patch. What kind of shared memory is that?
>> Since it's android, I'm inclined to think something specific like
>> ashmem. I'm asking because this patch won't help for more common type of
>> shared memory. See my comment below.
> 
> It's normal heap of parent(IOW, MAP_ANON|MAP_PRIVATE memory which is share
>  by child processes).

Ok. I didn't imagine CoW pages would represent such a big share of
swapped out pages.

> 
>>
>>>
>>> 2. 240M used swap
>>> SwapTotal: 461976 kB
>>> SwapFree: 216808 kB
>>>
>>> $ adb shell cat /proc/*/smaps | grep "SwapPss:" | awk '{sum += $2} END {print sum}';
>>> 230315
>>> $ adb shell cat /proc/*/smaps | grep "Swap:" | awk '{sum += $2} END {print sum}';
>>> 1387744
>>>
>> snip
>>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>>> index 6dee68d013ff..d537899f4b25 100644
>>> --- a/fs/proc/task_mmu.c
>>> +++ b/fs/proc/task_mmu.c
>>> @@ -446,6 +446,7 @@ struct mem_size_stats {
>>>  	unsigned long anonymous_thp;
>>>  	unsigned long swap;
>>>  	u64 pss;
>>> +	u64 swap_pss;
>>>  };
>>>  
>>>  static void smaps_account(struct mem_size_stats *mss, struct page *page,
>>> @@ -492,9 +493,20 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
>>>  	} else if (is_swap_pte(*pte)) {
>>
>> This won't work for sysV shm, tmpfs and MAP_SHARED | MAP_ANONYMOUS
>> mapping pages which are pte_none when paged out. They're currently not
>> accounted at all when in swap.
> 
> This patch doesn't handle those pages because we don't have supported
> thoses pages. IMHO, if someone need it, it should be another patch and
> he can contribute it in future.

Sure.

> 
> Thanks.
>
Vlastimil Babka Aug. 5, 2015, 11:38 a.m. UTC | #7
On 07/29/2015 12:30 PM, Minchan Kim wrote:
>> This won't work for sysV shm, tmpfs and MAP_SHARED | MAP_ANONYMOUS
>> mapping pages which are pte_none when paged out. They're currently not
>> accounted at all when in swap.
>
> This patch doesn't handle those pages because we don't have supported
> thoses pages. IMHO, if someone need it, it should be another patch and
> he can contribute it in future.

OK, time to try again...

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index c3b6b301d8b0..cfc765e6cfa6 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -423,6 +423,7 @@  Private_Dirty:         0 kB
 Referenced:          892 kB
 Anonymous:             0 kB
 Swap:                  0 kB
+SwapPss:               0 kB
 KernelPageSize:        4 kB
 MMUPageSize:           4 kB
 Locked:              374 kB
@@ -432,16 +433,23 @@  the first of these lines shows the same information as is displayed for the
 mapping in /proc/PID/maps.  The remaining lines show the size of the mapping
 (size), the amount of the mapping that is currently resident in RAM (RSS), the
 process' proportional share of this mapping (PSS), the number of clean and
-dirty private pages in the mapping.  Note that even a page which is part of a
-MAP_SHARED mapping, but has only a single pte mapped, i.e.  is currently used
-by only one process, is accounted as private and not as shared.  "Referenced"
-indicates the amount of memory currently marked as referenced or accessed.
+dirty private pages in the mapping.
+
+The "proportional set size" (PSS) of a process is the count of pages it has
+in memory, where each page is divided by the number of processes sharing it.
+So if a process has 1000 pages all to itself, and 1000 shared with one other
+process, its PSS will be 1500.
+Note that even a page which is part of a MAP_SHARED mapping, but has only
+a single pte mapped, i.e.  is currently used by only one process, is accounted
+as private and not as shared.
+"Referenced" indicates the amount of memory currently marked as referenced or
+accessed.
 "Anonymous" shows the amount of memory that does not belong to any file.  Even
 a mapping associated with a file may contain anonymous pages: when MAP_PRIVATE
 and a page is modified, the file page is replaced by a private anonymous copy.
 "Swap" shows how much would-be-anonymous memory is also used, but out on
 swap.
-
+"SwapPss" shows proportional swap share of this mapping.
 "VmFlags" field deserves a separate description. This member represents the kernel
 flags associated with the particular virtual memory area in two letter encoded
 manner. The codes are the following:
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 6dee68d013ff..d537899f4b25 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -446,6 +446,7 @@  struct mem_size_stats {
 	unsigned long anonymous_thp;
 	unsigned long swap;
 	u64 pss;
+	u64 swap_pss;
 };
 
 static void smaps_account(struct mem_size_stats *mss, struct page *page,
@@ -492,9 +493,20 @@  static void smaps_pte_entry(pte_t *pte, unsigned long addr,
 	} else if (is_swap_pte(*pte)) {
 		swp_entry_t swpent = pte_to_swp_entry(*pte);
 
-		if (!non_swap_entry(swpent))
+		if (!non_swap_entry(swpent)) {
+			int mapcount;
+
 			mss->swap += PAGE_SIZE;
-		else if (is_migration_entry(swpent))
+			mapcount = swp_swapcount(swpent);
+			if (mapcount >= 2) {
+				u64 pss_delta = (u64)PAGE_SIZE << PSS_SHIFT;
+
+				do_div(pss_delta, mapcount);
+				mss->swap_pss += pss_delta;
+			} else {
+				mss->swap_pss += (u64)PAGE_SIZE << PSS_SHIFT;
+			}
+		} else if (is_migration_entry(swpent))
 			page = migration_entry_to_page(swpent);
 	}
 
@@ -638,6 +650,7 @@  static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   "Anonymous:      %8lu kB\n"
 		   "AnonHugePages:  %8lu kB\n"
 		   "Swap:           %8lu kB\n"
+		   "SwapPss:        %8lu kB\n"
 		   "KernelPageSize: %8lu kB\n"
 		   "MMUPageSize:    %8lu kB\n"
 		   "Locked:         %8lu kB\n",
@@ -652,6 +665,7 @@  static int show_smap(struct seq_file *m, void *v, int is_pid)
 		   mss.anonymous >> 10,
 		   mss.anonymous_thp >> 10,
 		   mss.swap >> 10,
+		   (unsigned long)(mss.swap_pss >> (10 + PSS_SHIFT)),
 		   vma_kernel_pagesize(vma) >> 10,
 		   vma_mmu_pagesize(vma) >> 10,
 		   (vma->vm_flags & VM_LOCKED) ?
diff --git a/include/linux/swap.h b/include/linux/swap.h
index cee108cbe2d5..afc9eb3cba48 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -432,6 +432,7 @@  extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
 extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
+extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
 extern int reuse_swap_page(struct page *);
 extern int try_to_free_swap(struct page *);
@@ -523,6 +524,11 @@  static inline int page_swapcount(struct page *page)
 	return 0;
 }
 
+static inline int swp_swapcount(swp_entry_t entry)
+{
+	return 0;
+}
+
 #define reuse_swap_page(page)	(page_mapcount(page) == 1)
 
 static inline int try_to_free_swap(struct page *page)
diff --git a/mm/swapfile.c b/mm/swapfile.c
index a7e72103f23b..7a6bd1e5a8e9 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -875,6 +875,48 @@  int page_swapcount(struct page *page)
 }
 
 /*
+ * How many references to @entry are currently swapped out?
+ * This considers COUNT_CONTINUED so it returns exact answer.
+ */
+int swp_swapcount(swp_entry_t entry)
+{
+	int count, tmp_count, n;
+	struct swap_info_struct *p;
+	struct page *page;
+	pgoff_t offset;
+	unsigned char *map;
+
+	p = swap_info_get(entry);
+	if (!p)
+		return 0;
+
+	count = swap_count(p->swap_map[swp_offset(entry)]);
+	if (!(count & COUNT_CONTINUED))
+		goto out;
+
+	count &= ~COUNT_CONTINUED;
+	n = SWAP_MAP_MAX + 1;
+
+	offset = swp_offset(entry);
+	page = vmalloc_to_page(p->swap_map + offset);
+	offset &= ~PAGE_MASK;
+	VM_BUG_ON(page_private(page) != SWP_CONTINUED);
+
+	do {
+		page = list_entry(page->lru.next, struct page, lru);
+		map = kmap_atomic(page) + offset;
+		tmp_count = *map;
+		kunmap_atomic(map);
+
+		count += (tmp_count & ~COUNT_CONTINUED) * n;
+		n *= (SWAP_CONT_MAX + 1);
+	} while (tmp_count & COUNT_CONTINUED);
+out:
+	spin_unlock(&p->lock);
+	return count;
+}
+
+/*
  * We can write to an anon page without COW if there are no other references
  * to it.  And as a side-effect, free up its swap: because the old content
  * on disk will never be read, and seeking back there to write new content