All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH RFC] mm/page_idle: simple idle page tracking for virtual memory
@ 2019-07-23 11:54 Konstantin Khlebnikov
  2019-07-23 13:46 ` Joel Fernandes
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Khlebnikov @ 2019-07-23 11:54 UTC (permalink / raw)
  To: Minchan Kim, linux-kernel, Michal Hocko, linux-mm,
	Joel Fernandes (Google),
	Andrew Morton

The page_idle tracking feature currently requires looking up the pagemap
for a process followed by interacting with /sys/kernel/mm/page_idle.
This is quite cumbersome and can be error-prone too. If between
accessing the per-PID pagemap and the global page_idle bitmap, if
something changes with the page then the information is not accurate.
More over looking up PFN from pagemap in Android devices is not
supported by unprivileged process and requires SYS_ADMIN and gives 0 for
the PFN.

This patch adds simplified interface which works only with mapped pages:
Run: "echo 6 > /proc/pid/clear_refs" to mark all mapped pages as idle.
Pages that still idle are marked with bit 57 in /proc/pid/pagemap.
Total size of idle pages is shown in /proc/pid/smaps (_rollup).

Piece of comment is stolen from Joel Fernandes <joel@joelfernandes.org>

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Link: https://lore.kernel.org/lkml/20190722213205.140845-1-joel@joelfernandes.org/
---
 Documentation/admin-guide/mm/pagemap.rst |    3 ++-
 Documentation/filesystems/proc.txt       |    3 +++
 fs/proc/task_mmu.c                       |   33 ++++++++++++++++++++++++++++--
 3 files changed, 36 insertions(+), 3 deletions(-)

diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
index 340a5aee9b80..d7ee60287584 100644
--- a/Documentation/admin-guide/mm/pagemap.rst
+++ b/Documentation/admin-guide/mm/pagemap.rst
@@ -21,7 +21,8 @@ There are four components to pagemap:
     * Bit  55    pte is soft-dirty (see
       :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
     * Bit  56    page exclusively mapped (since 4.2)
-    * Bits 57-60 zero
+    * Bit  57    page is idle
+    * Bits 58-60 zero
     * Bit  61    page is file-page or shared-anon (since 3.5)
     * Bit  62    page swapped
     * Bit  63    page present
diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
index 99ca040e3f90..d222be8b4eb9 100644
--- a/Documentation/filesystems/proc.txt
+++ b/Documentation/filesystems/proc.txt
@@ -574,6 +574,9 @@ To reset the peak resident set size ("high water mark") to the process's
 current value:
     > echo 5 > /proc/PID/clear_refs
 
+To mark all mapped pages as idle:
+    > echo 6 > /proc/PID/clear_refs
+
 Any other value written to /proc/PID/clear_refs will have no effect.
 
 The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 731642e0f5a0..6da952574a1f 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -413,6 +413,7 @@ struct mem_size_stats {
 	unsigned long private_clean;
 	unsigned long private_dirty;
 	unsigned long referenced;
+	unsigned long idle;
 	unsigned long anonymous;
 	unsigned long lazyfree;
 	unsigned long anonymous_thp;
@@ -479,6 +480,10 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
 	if (young || page_is_young(page) || PageReferenced(page))
 		mss->referenced += size;
 
+	/* Not accessed and still idle. */
+	if (!young && page_is_idle(page))
+		mss->idle += size;
+
 	/*
 	 * Then accumulate quantities that may depend on sharing, or that may
 	 * differ page-by-page.
@@ -799,6 +804,9 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss,
 	SEQ_PUT_DEC(" kB\nPrivate_Clean:  ", mss->private_clean);
 	SEQ_PUT_DEC(" kB\nPrivate_Dirty:  ", mss->private_dirty);
 	SEQ_PUT_DEC(" kB\nReferenced:     ", mss->referenced);
+#ifdef CONFIG_IDLE_PAGE_TRACKING
+	SEQ_PUT_DEC(" kB\nIdle:           ", mss->idle);
+#endif
 	SEQ_PUT_DEC(" kB\nAnonymous:      ", mss->anonymous);
 	SEQ_PUT_DEC(" kB\nLazyFree:       ", mss->lazyfree);
 	SEQ_PUT_DEC(" kB\nAnonHugePages:  ", mss->anonymous_thp);
@@ -969,6 +977,7 @@ enum clear_refs_types {
 	CLEAR_REFS_MAPPED,
 	CLEAR_REFS_SOFT_DIRTY,
 	CLEAR_REFS_MM_HIWATER_RSS,
+	CLEAR_REFS_SOFT_ACCESS,
 	CLEAR_REFS_LAST,
 };
 
@@ -1045,6 +1054,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 	pte_t *pte, ptent;
 	spinlock_t *ptl;
 	struct page *page;
+	int young;
 
 	ptl = pmd_trans_huge_lock(pmd, vma);
 	if (ptl) {
@@ -1058,8 +1068,16 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 
 		page = pmd_page(*pmd);
 
+		young = pmdp_test_and_clear_young(vma, addr, pmd);
+
+		if (cp->type == CLEAR_REFS_SOFT_ACCESS) {
+			if (young)
+				set_page_young(page);
+			set_page_idle(page);
+			goto out;
+		}
+
 		/* Clear accessed and referenced bits. */
-		pmdp_test_and_clear_young(vma, addr, pmd);
 		test_and_clear_page_young(page);
 		ClearPageReferenced(page);
 out:
@@ -1086,8 +1104,16 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
 		if (!page)
 			continue;
 
+		young = ptep_test_and_clear_young(vma, addr, pte);
+
+		if (cp->type == CLEAR_REFS_SOFT_ACCESS) {
+			if (young)
+				set_page_young(page);
+			set_page_idle(page);
+			continue;
+		}
+
 		/* Clear accessed and referenced bits. */
-		ptep_test_and_clear_young(vma, addr, pte);
 		test_and_clear_page_young(page);
 		ClearPageReferenced(page);
 	}
@@ -1253,6 +1279,7 @@ struct pagemapread {
 #define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
 #define PM_SOFT_DIRTY		BIT_ULL(55)
 #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
+#define PM_IDLE			BIT_ULL(57)
 #define PM_FILE			BIT_ULL(61)
 #define PM_SWAP			BIT_ULL(62)
 #define PM_PRESENT		BIT_ULL(63)
@@ -1326,6 +1353,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
 		page = vm_normal_page(vma, addr, pte);
 		if (pte_soft_dirty(pte))
 			flags |= PM_SOFT_DIRTY;
+		if (!pte_young(pte) && page && page_is_idle(page))
+			flags |= PM_IDLE;
 	} else if (is_swap_pte(pte)) {
 		swp_entry_t entry;
 		if (pte_swp_soft_dirty(pte))


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/page_idle: simple idle page tracking for virtual memory
  2019-07-23 11:54 [PATCH RFC] mm/page_idle: simple idle page tracking for virtual memory Konstantin Khlebnikov
@ 2019-07-23 13:46 ` Joel Fernandes
  2019-07-23 13:59   ` Konstantin Khlebnikov
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2019-07-23 13:46 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Minchan Kim, linux-kernel, Michal Hocko, linux-mm, Andrew Morton

On Tue, Jul 23, 2019 at 02:54:26PM +0300, Konstantin Khlebnikov wrote:
> The page_idle tracking feature currently requires looking up the pagemap
> for a process followed by interacting with /sys/kernel/mm/page_idle.
> This is quite cumbersome and can be error-prone too. If between
> accessing the per-PID pagemap and the global page_idle bitmap, if
> something changes with the page then the information is not accurate.
> More over looking up PFN from pagemap in Android devices is not
> supported by unprivileged process and requires SYS_ADMIN and gives 0 for
> the PFN.
> 
> This patch adds simplified interface which works only with mapped pages:
> Run: "echo 6 > /proc/pid/clear_refs" to mark all mapped pages as idle.
> Pages that still idle are marked with bit 57 in /proc/pid/pagemap.
> Total size of idle pages is shown in /proc/pid/smaps (_rollup).
> 
> Piece of comment is stolen from Joel Fernandes <joel@joelfernandes.org>

This will not work well for the problem at hand, the heap profiler
(heapprofd) only wants to clear the idle flag for the heap memory area which
is what it is profiling. There is no reason to do it for all mapped pages.
Using the /proc/pid/page_idle in my patch, it can be done selectively for
particular memory areas.

I had previously thought of having an interface that accepts an address
range to set the idle flag, however that is also more complexity.

thanks,

 - Joel


> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
> Link: https://lore.kernel.org/lkml/20190722213205.140845-1-joel@joelfernandes.org/
> ---
>  Documentation/admin-guide/mm/pagemap.rst |    3 ++-
>  Documentation/filesystems/proc.txt       |    3 +++
>  fs/proc/task_mmu.c                       |   33 ++++++++++++++++++++++++++++--
>  3 files changed, 36 insertions(+), 3 deletions(-)
> 
> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
> index 340a5aee9b80..d7ee60287584 100644
> --- a/Documentation/admin-guide/mm/pagemap.rst
> +++ b/Documentation/admin-guide/mm/pagemap.rst
> @@ -21,7 +21,8 @@ There are four components to pagemap:
>      * Bit  55    pte is soft-dirty (see
>        :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
>      * Bit  56    page exclusively mapped (since 4.2)
> -    * Bits 57-60 zero
> +    * Bit  57    page is idle
> +    * Bits 58-60 zero
>      * Bit  61    page is file-page or shared-anon (since 3.5)
>      * Bit  62    page swapped
>      * Bit  63    page present
> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
> index 99ca040e3f90..d222be8b4eb9 100644
> --- a/Documentation/filesystems/proc.txt
> +++ b/Documentation/filesystems/proc.txt
> @@ -574,6 +574,9 @@ To reset the peak resident set size ("high water mark") to the process's
>  current value:
>      > echo 5 > /proc/PID/clear_refs
>  
> +To mark all mapped pages as idle:
> +    > echo 6 > /proc/PID/clear_refs
> +
>  Any other value written to /proc/PID/clear_refs will have no effect.
>  
>  The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
> index 731642e0f5a0..6da952574a1f 100644
> --- a/fs/proc/task_mmu.c
> +++ b/fs/proc/task_mmu.c
> @@ -413,6 +413,7 @@ struct mem_size_stats {
>  	unsigned long private_clean;
>  	unsigned long private_dirty;
>  	unsigned long referenced;
> +	unsigned long idle;
>  	unsigned long anonymous;
>  	unsigned long lazyfree;
>  	unsigned long anonymous_thp;
> @@ -479,6 +480,10 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
>  	if (young || page_is_young(page) || PageReferenced(page))
>  		mss->referenced += size;
>  
> +	/* Not accessed and still idle. */
> +	if (!young && page_is_idle(page))
> +		mss->idle += size;
> +
>  	/*
>  	 * Then accumulate quantities that may depend on sharing, or that may
>  	 * differ page-by-page.
> @@ -799,6 +804,9 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss,
>  	SEQ_PUT_DEC(" kB\nPrivate_Clean:  ", mss->private_clean);
>  	SEQ_PUT_DEC(" kB\nPrivate_Dirty:  ", mss->private_dirty);
>  	SEQ_PUT_DEC(" kB\nReferenced:     ", mss->referenced);
> +#ifdef CONFIG_IDLE_PAGE_TRACKING
> +	SEQ_PUT_DEC(" kB\nIdle:           ", mss->idle);
> +#endif
>  	SEQ_PUT_DEC(" kB\nAnonymous:      ", mss->anonymous);
>  	SEQ_PUT_DEC(" kB\nLazyFree:       ", mss->lazyfree);
>  	SEQ_PUT_DEC(" kB\nAnonHugePages:  ", mss->anonymous_thp);
> @@ -969,6 +977,7 @@ enum clear_refs_types {
>  	CLEAR_REFS_MAPPED,
>  	CLEAR_REFS_SOFT_DIRTY,
>  	CLEAR_REFS_MM_HIWATER_RSS,
> +	CLEAR_REFS_SOFT_ACCESS,
>  	CLEAR_REFS_LAST,
>  };
>  
> @@ -1045,6 +1054,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>  	pte_t *pte, ptent;
>  	spinlock_t *ptl;
>  	struct page *page;
> +	int young;
>  
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> @@ -1058,8 +1068,16 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>  
>  		page = pmd_page(*pmd);
>  
> +		young = pmdp_test_and_clear_young(vma, addr, pmd);
> +
> +		if (cp->type == CLEAR_REFS_SOFT_ACCESS) {
> +			if (young)
> +				set_page_young(page);
> +			set_page_idle(page);
> +			goto out;
> +		}
> +
>  		/* Clear accessed and referenced bits. */
> -		pmdp_test_and_clear_young(vma, addr, pmd);
>  		test_and_clear_page_young(page);
>  		ClearPageReferenced(page);
>  out:
> @@ -1086,8 +1104,16 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>  		if (!page)
>  			continue;
>  
> +		young = ptep_test_and_clear_young(vma, addr, pte);
> +
> +		if (cp->type == CLEAR_REFS_SOFT_ACCESS) {
> +			if (young)
> +				set_page_young(page);
> +			set_page_idle(page);
> +			continue;
> +		}
> +
>  		/* Clear accessed and referenced bits. */
> -		ptep_test_and_clear_young(vma, addr, pte);
>  		test_and_clear_page_young(page);
>  		ClearPageReferenced(page);
>  	}
> @@ -1253,6 +1279,7 @@ struct pagemapread {
>  #define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
>  #define PM_SOFT_DIRTY		BIT_ULL(55)
>  #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
> +#define PM_IDLE			BIT_ULL(57)
>  #define PM_FILE			BIT_ULL(61)
>  #define PM_SWAP			BIT_ULL(62)
>  #define PM_PRESENT		BIT_ULL(63)
> @@ -1326,6 +1353,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>  		page = vm_normal_page(vma, addr, pte);
>  		if (pte_soft_dirty(pte))
>  			flags |= PM_SOFT_DIRTY;
> +		if (!pte_young(pte) && page && page_is_idle(page))
> +			flags |= PM_IDLE;
>  	} else if (is_swap_pte(pte)) {
>  		swp_entry_t entry;
>  		if (pte_swp_soft_dirty(pte))
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/page_idle: simple idle page tracking for virtual memory
  2019-07-23 13:46 ` Joel Fernandes
@ 2019-07-23 13:59   ` Konstantin Khlebnikov
  2019-07-23 14:25     ` Joel Fernandes
  0 siblings, 1 reply; 5+ messages in thread
From: Konstantin Khlebnikov @ 2019-07-23 13:59 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Minchan Kim, linux-kernel, Michal Hocko, linux-mm, Andrew Morton



On 23.07.2019 16:46, Joel Fernandes wrote:
> On Tue, Jul 23, 2019 at 02:54:26PM +0300, Konstantin Khlebnikov wrote:
>> The page_idle tracking feature currently requires looking up the pagemap
>> for a process followed by interacting with /sys/kernel/mm/page_idle.
>> This is quite cumbersome and can be error-prone too. If between
>> accessing the per-PID pagemap and the global page_idle bitmap, if
>> something changes with the page then the information is not accurate.
>> More over looking up PFN from pagemap in Android devices is not
>> supported by unprivileged process and requires SYS_ADMIN and gives 0 for
>> the PFN.
>>
>> This patch adds simplified interface which works only with mapped pages:
>> Run: "echo 6 > /proc/pid/clear_refs" to mark all mapped pages as idle.
>> Pages that still idle are marked with bit 57 in /proc/pid/pagemap.
>> Total size of idle pages is shown in /proc/pid/smaps (_rollup).
>>
>> Piece of comment is stolen from Joel Fernandes <joel@joelfernandes.org>
> 
> This will not work well for the problem at hand, the heap profiler
> (heapprofd) only wants to clear the idle flag for the heap memory area which
> is what it is profiling. There is no reason to do it for all mapped pages.
> Using the /proc/pid/page_idle in my patch, it can be done selectively for
> particular memory areas.
> 
> I had previously thought of having an interface that accepts an address
> range to set the idle flag, however that is also more complexity.

Profiler could look into particular area in /proc/pid/smaps
or count idle pages via /proc/pid/pagemap.

Selective /proc/pid/clear_refs is not so hard to add.
Somthing like echo "6 561214d03000-561214d29000" > /proc/pid/clear_refs
might be useful for all other operations.

> 
> thanks,
> 
>   - Joel
> 
> 
>> Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>> Link: https://lore.kernel.org/lkml/20190722213205.140845-1-joel@joelfernandes.org/
>> ---
>>   Documentation/admin-guide/mm/pagemap.rst |    3 ++-
>>   Documentation/filesystems/proc.txt       |    3 +++
>>   fs/proc/task_mmu.c                       |   33 ++++++++++++++++++++++++++++--
>>   3 files changed, 36 insertions(+), 3 deletions(-)
>>
>> diff --git a/Documentation/admin-guide/mm/pagemap.rst b/Documentation/admin-guide/mm/pagemap.rst
>> index 340a5aee9b80..d7ee60287584 100644
>> --- a/Documentation/admin-guide/mm/pagemap.rst
>> +++ b/Documentation/admin-guide/mm/pagemap.rst
>> @@ -21,7 +21,8 @@ There are four components to pagemap:
>>       * Bit  55    pte is soft-dirty (see
>>         :ref:`Documentation/admin-guide/mm/soft-dirty.rst <soft_dirty>`)
>>       * Bit  56    page exclusively mapped (since 4.2)
>> -    * Bits 57-60 zero
>> +    * Bit  57    page is idle
>> +    * Bits 58-60 zero
>>       * Bit  61    page is file-page or shared-anon (since 3.5)
>>       * Bit  62    page swapped
>>       * Bit  63    page present
>> diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt
>> index 99ca040e3f90..d222be8b4eb9 100644
>> --- a/Documentation/filesystems/proc.txt
>> +++ b/Documentation/filesystems/proc.txt
>> @@ -574,6 +574,9 @@ To reset the peak resident set size ("high water mark") to the process's
>>   current value:
>>       > echo 5 > /proc/PID/clear_refs
>>   
>> +To mark all mapped pages as idle:
>> +    > echo 6 > /proc/PID/clear_refs
>> +
>>   Any other value written to /proc/PID/clear_refs will have no effect.
>>   
>>   The /proc/pid/pagemap gives the PFN, which can be used to find the pageflags
>> diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
>> index 731642e0f5a0..6da952574a1f 100644
>> --- a/fs/proc/task_mmu.c
>> +++ b/fs/proc/task_mmu.c
>> @@ -413,6 +413,7 @@ struct mem_size_stats {
>>   	unsigned long private_clean;
>>   	unsigned long private_dirty;
>>   	unsigned long referenced;
>> +	unsigned long idle;
>>   	unsigned long anonymous;
>>   	unsigned long lazyfree;
>>   	unsigned long anonymous_thp;
>> @@ -479,6 +480,10 @@ static void smaps_account(struct mem_size_stats *mss, struct page *page,
>>   	if (young || page_is_young(page) || PageReferenced(page))
>>   		mss->referenced += size;
>>   
>> +	/* Not accessed and still idle. */
>> +	if (!young && page_is_idle(page))
>> +		mss->idle += size;
>> +
>>   	/*
>>   	 * Then accumulate quantities that may depend on sharing, or that may
>>   	 * differ page-by-page.
>> @@ -799,6 +804,9 @@ static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss,
>>   	SEQ_PUT_DEC(" kB\nPrivate_Clean:  ", mss->private_clean);
>>   	SEQ_PUT_DEC(" kB\nPrivate_Dirty:  ", mss->private_dirty);
>>   	SEQ_PUT_DEC(" kB\nReferenced:     ", mss->referenced);
>> +#ifdef CONFIG_IDLE_PAGE_TRACKING
>> +	SEQ_PUT_DEC(" kB\nIdle:           ", mss->idle);
>> +#endif
>>   	SEQ_PUT_DEC(" kB\nAnonymous:      ", mss->anonymous);
>>   	SEQ_PUT_DEC(" kB\nLazyFree:       ", mss->lazyfree);
>>   	SEQ_PUT_DEC(" kB\nAnonHugePages:  ", mss->anonymous_thp);
>> @@ -969,6 +977,7 @@ enum clear_refs_types {
>>   	CLEAR_REFS_MAPPED,
>>   	CLEAR_REFS_SOFT_DIRTY,
>>   	CLEAR_REFS_MM_HIWATER_RSS,
>> +	CLEAR_REFS_SOFT_ACCESS,
>>   	CLEAR_REFS_LAST,
>>   };
>>   
>> @@ -1045,6 +1054,7 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>>   	pte_t *pte, ptent;
>>   	spinlock_t *ptl;
>>   	struct page *page;
>> +	int young;
>>   
>>   	ptl = pmd_trans_huge_lock(pmd, vma);
>>   	if (ptl) {
>> @@ -1058,8 +1068,16 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>>   
>>   		page = pmd_page(*pmd);
>>   
>> +		young = pmdp_test_and_clear_young(vma, addr, pmd);
>> +
>> +		if (cp->type == CLEAR_REFS_SOFT_ACCESS) {
>> +			if (young)
>> +				set_page_young(page);
>> +			set_page_idle(page);
>> +			goto out;
>> +		}
>> +
>>   		/* Clear accessed and referenced bits. */
>> -		pmdp_test_and_clear_young(vma, addr, pmd);
>>   		test_and_clear_page_young(page);
>>   		ClearPageReferenced(page);
>>   out:
>> @@ -1086,8 +1104,16 @@ static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr,
>>   		if (!page)
>>   			continue;
>>   
>> +		young = ptep_test_and_clear_young(vma, addr, pte);
>> +
>> +		if (cp->type == CLEAR_REFS_SOFT_ACCESS) {
>> +			if (young)
>> +				set_page_young(page);
>> +			set_page_idle(page);
>> +			continue;
>> +		}
>> +
>>   		/* Clear accessed and referenced bits. */
>> -		ptep_test_and_clear_young(vma, addr, pte);
>>   		test_and_clear_page_young(page);
>>   		ClearPageReferenced(page);
>>   	}
>> @@ -1253,6 +1279,7 @@ struct pagemapread {
>>   #define PM_PFRAME_MASK		GENMASK_ULL(PM_PFRAME_BITS - 1, 0)
>>   #define PM_SOFT_DIRTY		BIT_ULL(55)
>>   #define PM_MMAP_EXCLUSIVE	BIT_ULL(56)
>> +#define PM_IDLE			BIT_ULL(57)
>>   #define PM_FILE			BIT_ULL(61)
>>   #define PM_SWAP			BIT_ULL(62)
>>   #define PM_PRESENT		BIT_ULL(63)
>> @@ -1326,6 +1353,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm,
>>   		page = vm_normal_page(vma, addr, pte);
>>   		if (pte_soft_dirty(pte))
>>   			flags |= PM_SOFT_DIRTY;
>> +		if (!pte_young(pte) && page && page_is_idle(page))
>> +			flags |= PM_IDLE;
>>   	} else if (is_swap_pte(pte)) {
>>   		swp_entry_t entry;
>>   		if (pte_swp_soft_dirty(pte))
>>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/page_idle: simple idle page tracking for virtual memory
  2019-07-23 13:59   ` Konstantin Khlebnikov
@ 2019-07-23 14:25     ` Joel Fernandes
  2019-07-23 15:08       ` Konstantin Khlebnikov
  0 siblings, 1 reply; 5+ messages in thread
From: Joel Fernandes @ 2019-07-23 14:25 UTC (permalink / raw)
  To: Konstantin Khlebnikov
  Cc: Minchan Kim, linux-kernel, Michal Hocko, linux-mm, Andrew Morton

On Tue, Jul 23, 2019 at 04:59:07PM +0300, Konstantin Khlebnikov wrote:
> 
> 
> On 23.07.2019 16:46, Joel Fernandes wrote:
> > On Tue, Jul 23, 2019 at 02:54:26PM +0300, Konstantin Khlebnikov wrote:
> > > The page_idle tracking feature currently requires looking up the pagemap
> > > for a process followed by interacting with /sys/kernel/mm/page_idle.
> > > This is quite cumbersome and can be error-prone too. If between
> > > accessing the per-PID pagemap and the global page_idle bitmap, if
> > > something changes with the page then the information is not accurate.
> > > More over looking up PFN from pagemap in Android devices is not
> > > supported by unprivileged process and requires SYS_ADMIN and gives 0 for
> > > the PFN.
> > > 
> > > This patch adds simplified interface which works only with mapped pages:
> > > Run: "echo 6 > /proc/pid/clear_refs" to mark all mapped pages as idle.
> > > Pages that still idle are marked with bit 57 in /proc/pid/pagemap.
> > > Total size of idle pages is shown in /proc/pid/smaps (_rollup).
> > > 
> > > Piece of comment is stolen from Joel Fernandes <joel@joelfernandes.org>
> > 
> > This will not work well for the problem at hand, the heap profiler
> > (heapprofd) only wants to clear the idle flag for the heap memory area which
> > is what it is profiling. There is no reason to do it for all mapped pages.
> > Using the /proc/pid/page_idle in my patch, it can be done selectively for
> > particular memory areas.
> > 
> > I had previously thought of having an interface that accepts an address
> > range to set the idle flag, however that is also more complexity.
> 
> Profiler could look into particular area in /proc/pid/smaps
> or count idle pages via /proc/pid/pagemap.
> 
> Selective /proc/pid/clear_refs is not so hard to add.
> Somthing like echo "6 561214d03000-561214d29000" > /proc/pid/clear_refs
> might be useful for all other operations.

This seems really odd of an interface. Also I don't see how you can avoid
looking up reverse maps to determine if a page is really idle.

What is also more odd is that traditionally clear_refs does interfere with
reclaim due to clearing of accessed bit. Now you have one of the interfaces
with clear_refs that does not interfere with reclaim. That is makes it very
inconsistent. Also in this patch you have 2 interfaces to solve this, where
as my patch added a single clean interface that is easy to use and does not
need parsing of address ranges.

All in all, I don't see much the point of this honestly. But thanks for
poking at it.

thanks,

 - Joel


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH RFC] mm/page_idle: simple idle page tracking for virtual memory
  2019-07-23 14:25     ` Joel Fernandes
@ 2019-07-23 15:08       ` Konstantin Khlebnikov
  0 siblings, 0 replies; 5+ messages in thread
From: Konstantin Khlebnikov @ 2019-07-23 15:08 UTC (permalink / raw)
  To: Joel Fernandes
  Cc: Minchan Kim, linux-kernel, Michal Hocko, linux-mm, Andrew Morton

On 23.07.2019 17:25, Joel Fernandes wrote:
> On Tue, Jul 23, 2019 at 04:59:07PM +0300, Konstantin Khlebnikov wrote:
>>
>>
>> On 23.07.2019 16:46, Joel Fernandes wrote:
>>> On Tue, Jul 23, 2019 at 02:54:26PM +0300, Konstantin Khlebnikov wrote:
>>>> The page_idle tracking feature currently requires looking up the pagemap
>>>> for a process followed by interacting with /sys/kernel/mm/page_idle.
>>>> This is quite cumbersome and can be error-prone too. If between
>>>> accessing the per-PID pagemap and the global page_idle bitmap, if
>>>> something changes with the page then the information is not accurate.
>>>> More over looking up PFN from pagemap in Android devices is not
>>>> supported by unprivileged process and requires SYS_ADMIN and gives 0 for
>>>> the PFN.
>>>>
>>>> This patch adds simplified interface which works only with mapped pages:
>>>> Run: "echo 6 > /proc/pid/clear_refs" to mark all mapped pages as idle.
>>>> Pages that still idle are marked with bit 57 in /proc/pid/pagemap.
>>>> Total size of idle pages is shown in /proc/pid/smaps (_rollup).
>>>>
>>>> Piece of comment is stolen from Joel Fernandes <joel@joelfernandes.org>
>>>
>>> This will not work well for the problem at hand, the heap profiler
>>> (heapprofd) only wants to clear the idle flag for the heap memory area which
>>> is what it is profiling. There is no reason to do it for all mapped pages.
>>> Using the /proc/pid/page_idle in my patch, it can be done selectively for
>>> particular memory areas.
>>>
>>> I had previously thought of having an interface that accepts an address
>>> range to set the idle flag, however that is also more complexity.
>>
>> Profiler could look into particular area in /proc/pid/smaps
>> or count idle pages via /proc/pid/pagemap.
>>
>> Selective /proc/pid/clear_refs is not so hard to add.
>> Somthing like echo "6 561214d03000-561214d29000" > /proc/pid/clear_refs
>> might be useful for all other operations.
> 
> This seems really odd of an interface. Also I don't see how you can avoid
> looking up reverse maps to determine if a page is really idle.

This pretty straight forward format if you look into /proc/pid/maps and others.
Parsing is trivial - just one sscanf().

If we are looking for abandoned pages in particular proces it is enough to
mark page idle and look at access bit in this process.

If page is shared and got foreign access -- it is not abandoned.
And some information could be retrieved right from pagemap: file/anon and
exclusive-map bits.

> 
> What is also more odd is that traditionally clear_refs does interfere with
> reclaim due to clearing of accessed bit. Now you have one of the interfaces
> with clear_refs that does not interfere with reclaim. That is makes it very
> inconsistent. Also in this patch you have 2 interfaces to solve this, where
> as my patch added a single clean interface that is easy to use and does not
> need parsing of address ranges.

Your patch adds yet another per-task proc file which requires special tool.

My just extends existing interface and useful without any tools: just echo and cat.
And yet, special tool could get precise per-page information in binary form
along with other useful bits from /proc/pid/pagemap.

> 
> All in all, I don't see much the point of this honestly. But thanks for
> poking at it.
> 
> thanks,
> 
>   - Joel
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2019-07-23 15:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-23 11:54 [PATCH RFC] mm/page_idle: simple idle page tracking for virtual memory Konstantin Khlebnikov
2019-07-23 13:46 ` Joel Fernandes
2019-07-23 13:59   ` Konstantin Khlebnikov
2019-07-23 14:25     ` Joel Fernandes
2019-07-23 15:08       ` Konstantin Khlebnikov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.