All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
@ 2010-10-27 17:21 Ying Han
  2010-10-27 18:05 ` Rik van Riel
  0 siblings, 1 reply; 17+ messages in thread
From: Ying Han @ 2010-10-27 17:21 UTC (permalink / raw)
  To: linux-mm
  Cc: Rik van Riel, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki,
	Andrew Morton

kswapd's use case of hardware PTE accessed bit is to approximate page LRU.  The
ActiveLRU demotion to InactiveLRU are not base on accessed bit, while it is only
used to promote when a page is on inactive LRU list.  All of the state transitions
are triggered by memory pressure and thus has weak relationship with respect to
time.  In addition, hardware already transparently flush tlb whenever CPU context
switch processes and given limited hardware TLB resource, the time period in
which a page is accessed but not yet propagated to struct page is very small
in practice. With the nature of approximation, kernel really don't need to flush TLB
for changing PTE's access bit.  This commit removes the flush operation from it.

Signed-off-by: Ying Han <yinghan@google.com>
Singed-off-by: Ken Chen <kenchen@google.com>
---
 include/linux/mmu_notifier.h |   12 ++++++++++++
 mm/rmap.c                    |    2 +-
 2 files changed, 13 insertions(+), 1 deletions(-)

diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 4e02ee2..be32c51 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -254,6 +254,17 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 	__young;							\
 })
 
+#define ptep_clear_young_notify(__vma, __address, __ptep)		\
+({									\
+	int __young;							\
+	struct vm_area_struct *___vma = __vma;				\
+	unsigned long ___address = __address;				\
+	__young = ptep_test_and_clear_young(___vma, ___address, __ptep);\
+	__young |= mmu_notifier_clear_flush_young(___vma->vm_mm,	\
+						  ___address);		\
+	__young;							\
+})
+
 #define set_pte_at_notify(__mm, __address, __ptep, __pte)		\
 ({									\
 	struct mm_struct *___mm = __mm;					\
@@ -304,6 +315,7 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 {
 }
 
+#define ptep_clear_young_notify ptep_test_and_clear_young
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define ptep_clear_flush_notify ptep_clear_flush
 #define set_pte_at_notify set_pte_at
diff --git a/mm/rmap.c b/mm/rmap.c
index 92e6757..96f2553 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -506,7 +506,7 @@ int page_referenced_one(struct page *page, struct vm_area_struct *vma,
 		goto out_unmap;
 	}
 
-	if (ptep_clear_flush_young_notify(vma, address, pte)) {
+	if (ptep_clear_young_notify(vma, address, pte)) {
 		/*
 		 * Don't treat a reference through a sequentially read
 		 * mapping as such.  If the page has been used in
-- 
1.7.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 17:21 [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page Ying Han
@ 2010-10-27 18:05 ` Rik van Riel
  2010-10-27 18:22   ` Nick Piggin
  2010-10-27 20:19   ` Ying Han
  0 siblings, 2 replies; 17+ messages in thread
From: Rik van Riel @ 2010-10-27 18:05 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki, Andrew Morton

On 10/27/2010 01:21 PM, Ying Han wrote:
> kswapd's use case of hardware PTE accessed bit is to approximate page LRU.  The
> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while it is only
> used to promote when a page is on inactive LRU list.  All of the state transitions
> are triggered by memory pressure and thus has weak relationship with respect to
> time.  In addition, hardware already transparently flush tlb whenever CPU context
> switch processes and given limited hardware TLB resource, the time period in
> which a page is accessed but not yet propagated to struct page is very small
> in practice. With the nature of approximation, kernel really don't need to flush TLB
> for changing PTE's access bit.  This commit removes the flush operation from it.
>
> Signed-off-by: Ying Han<yinghan@google.com>
> Singed-off-by: Ken Chen<kenchen@google.com>

The reasoning behind the patch makes sense.

However, have you measured any improvements in run time with
this patch?  The VM is already tweaked to minimize the number
of pages that get aged, so it would be interesting to know
where you saw issues.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 18:05 ` Rik van Riel
@ 2010-10-27 18:22   ` Nick Piggin
  2010-10-27 18:37     ` Nick Piggin
  2010-10-27 20:19   ` Ying Han
  1 sibling, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2010-10-27 18:22 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ying Han, linux-mm, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki,
	Andrew Morton

On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel <riel@redhat.com> wrote:
> On 10/27/2010 01:21 PM, Ying Han wrote:
>>
>> kswapd's use case of hardware PTE accessed bit is to approximate page LRU.
>>  The
>> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while it
>> is only
>> used to promote when a page is on inactive LRU list.  All of the state
>> transitions
>> are triggered by memory pressure and thus has weak relationship with
>> respect to
>> time.  In addition, hardware already transparently flush tlb whenever CPU
>> context
>> switch processes and given limited hardware TLB resource, the time period
>> in
>> which a page is accessed but not yet propagated to struct page is very
>> small
>> in practice. With the nature of approximation, kernel really don't need to
>> flush TLB
>> for changing PTE's access bit.  This commit removes the flush operation
>> from it.
>>
>> Signed-off-by: Ying Han<yinghan@google.com>
>> Singed-off-by: Ken Chen<kenchen@google.com>
>
> The reasoning behind the patch makes sense.
>
> However, have you measured any improvements in run time with
> this patch?  The VM is already tweaked to minimize the number
> of pages that get aged, so it would be interesting to know
> where you saw issues.

Firstly, not all CPUs do flush the TLB on VM switch, and secondly, it
would be theoretically possible to spin and never be able to flush free
pages even if none are ever being touched.

It doesn't have to be an absurdly tiny machine, either. You could cover
a good few megs with TLBs (and a small embedded system could easily
have less than that of mapped memory on its LRU).

I agree the theory is fine because if the CPU thinks it is worth to keep a
TLB entry around, then it probably knows better than our stupid LRU :)
And TLB flushing can get nasty when we start swapping a lot with
threaded apps.

However, to handle corner cases it should either:

flush all TLBs once per *something* [eg. every scan priority level above N,
or every N pages scanned, etc]

start doing the flush versions of the ptep manipulation when memory
pressure is getting high.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 18:22   ` Nick Piggin
@ 2010-10-27 18:37     ` Nick Piggin
  2010-10-27 19:13       ` Hugh Dickins
  0 siblings, 1 reply; 17+ messages in thread
From: Nick Piggin @ 2010-10-27 18:37 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Ying Han, linux-mm, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki,
	Andrew Morton

On Wed, Oct 27, 2010 at 12:22 PM, Nick Piggin <npiggin@gmail.com> wrote:
> On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel <riel@redhat.com> wrote:
>> On 10/27/2010 01:21 PM, Ying Han wrote:
>>>
>>> kswapd's use case of hardware PTE accessed bit is to approximate page LRU.
>>>  The
>>> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while it
>>> is only
>>> used to promote when a page is on inactive LRU list.  All of the state
>>> transitions
>>> are triggered by memory pressure and thus has weak relationship with
>>> respect to
>>> time.  In addition, hardware already transparently flush tlb whenever CPU
>>> context
>>> switch processes and given limited hardware TLB resource, the time period
>>> in
>>> which a page is accessed but not yet propagated to struct page is very
>>> small
>>> in practice. With the nature of approximation, kernel really don't need to
>>> flush TLB
>>> for changing PTE's access bit.  This commit removes the flush operation
>>> from it.
>>>
>>> Signed-off-by: Ying Han<yinghan@google.com>
>>> Singed-off-by: Ken Chen<kenchen@google.com>
>>
>> The reasoning behind the patch makes sense.
>>
>> However, have you measured any improvements in run time with
>> this patch?  The VM is already tweaked to minimize the number
>> of pages that get aged, so it would be interesting to know
>> where you saw issues.
>
> Firstly, not all CPUs do flush the TLB on VM switch, and secondly, it
> would be theoretically possible to spin and never be able to flush free
> pages even if none are ever being touched.
>
> It doesn't have to be an absurdly tiny machine, either. You could cover
> a good few megs with TLBs (and a small embedded system could easily
> have less than that of mapped memory on its LRU).
>
> I agree the theory is fine because if the CPU thinks it is worth to keep a
> TLB entry around, then it probably knows better than our stupid LRU :)
> And TLB flushing can get nasty when we start swapping a lot with
> threaded apps.
>
> However, to handle corner cases it should either:
>
> flush all TLBs once per *something* [eg. every scan priority level above N,
> or every N pages scanned, etc]
>
> start doing the flush versions of the ptep manipulation when memory
> pressure is getting high.
>

I'm sorry, that's absurd, ignore that :)

However, it's a scary change -- higher chance of reclaiming a TLB covered page.

I had a vague memory of this problem biting someone when this flush wasn't
actually done properly... maybe powerpc.

But anyway, same solution could be possible, by flushing every N pages scanned.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 18:37     ` Nick Piggin
@ 2010-10-27 19:13       ` Hugh Dickins
  2010-10-27 20:35         ` Ying Han
  0 siblings, 1 reply; 17+ messages in thread
From: Hugh Dickins @ 2010-10-27 19:13 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rik van Riel, Ying Han, Ken Chen, linux-mm, Minchan Kim,
	KAMEZAWA Hiroyuki, Andrew Morton

[-- Attachment #1: Type: TEXT/PLAIN, Size: 2235 bytes --]

On Wed, 27 Oct 2010, Nick Piggin wrote:
> On Wed, Oct 27, 2010 at 12:22 PM, Nick Piggin <npiggin@gmail.com> wrote:
> > On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel <riel@redhat.com> wrote:
> >> On 10/27/2010 01:21 PM, Ying Han wrote:
> >>>
> >>> kswapd's use case of hardware PTE accessed bit is to approximate page LRU.
> >>>  The
> >>> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while it
> >>> is only
> >>> used to promote when a page is on inactive LRU list.  All of the state
> >>> transitions
> >>> are triggered by memory pressure and thus has weak relationship with
> >>> respect to
> >>> time.  In addition, hardware already transparently flush tlb whenever CPU
> >>> context
> >>> switch processes and given limited hardware TLB resource, the time period
> >>> in
> >>> which a page is accessed but not yet propagated to struct page is very
> >>> small
> >>> in practice. With the nature of approximation, kernel really don't need to
> >>> flush TLB
> >>> for changing PTE's access bit.  This commit removes the flush operation
> >>> from it.

It should at least add a comment there in page_referenced_one(), that
a TLB flush ought to be done, but is now judged not worth the effort.

(I'd expect architectures to differ on whether it's worth the effort.)

> >>>
> >>> Signed-off-by: Ying Han<yinghan@google.com>
> >>> Singed-off-by: Ken Chen<kenchen@google.com>

Hey, Ken, switch off those curling tongs :)

> However, it's a scary change -- higher chance of reclaiming a TLB covered page.

Yes, I was often tempted to make such a change in the past;
but ran away when it appeared to be in danger of losing the pte
referenced bit of precisely the most intensively referenced pages.

Ying's point (about what the pte referenced bit is being used for in our
current implementation) is interesting, and might have tipped the balance;
but that's not clear to me - and the flush is only done when mm is on CPU.

> 
> I had a vague memory of this problem biting someone when this flush wasn't
> actually done properly... maybe powerpc.
> 
> But anyway, same solution could be possible, by flushing every N pages scanned.

Yes, batching seems safer.

Hugh

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 18:05 ` Rik van Riel
  2010-10-27 18:22   ` Nick Piggin
@ 2010-10-27 20:19   ` Ying Han
  2010-10-28 11:53     ` Rik van Riel
  1 sibling, 1 reply; 17+ messages in thread
From: Ying Han @ 2010-10-27 20:19 UTC (permalink / raw)
  To: Rik van Riel
  Cc: linux-mm, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]

On Wed, Oct 27, 2010 at 11:05 AM, Rik van Riel <riel@redhat.com> wrote:

> On 10/27/2010 01:21 PM, Ying Han wrote:
>
>> kswapd's use case of hardware PTE accessed bit is to approximate page LRU.
>>  The
>> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while it
>> is only
>> used to promote when a page is on inactive LRU list.  All of the state
>> transitions
>> are triggered by memory pressure and thus has weak relationship with
>> respect to
>> time.  In addition, hardware already transparently flush tlb whenever CPU
>> context
>> switch processes and given limited hardware TLB resource, the time period
>> in
>> which a page is accessed but not yet propagated to struct page is very
>> small
>> in practice. With the nature of approximation, kernel really don't need to
>> flush TLB
>> for changing PTE's access bit.  This commit removes the flush operation
>> from it.
>>
>> Signed-off-by: Ying Han<yinghan@google.com>
>> Singed-off-by: Ken Chen<kenchen@google.com>
>>
>
> The reasoning behind the patch makes sense.
>
> However, have you measured any improvements in run time with
> this patch?  The VM is already tweaked to minimize the number
> of pages that get aged, so it would be interesting to know
> where you saw issues.
>

Rik, the workload we were running are some MapReduce jobs.

--Ying

>
> --
> All rights reversed
>

[-- Attachment #2: Type: text/html, Size: 2089 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 19:13       ` Hugh Dickins
@ 2010-10-27 20:35         ` Ying Han
  2010-10-28  0:11           ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 17+ messages in thread
From: Ying Han @ 2010-10-27 20:35 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Nick Piggin, Rik van Riel, Ken Chen, linux-mm, Minchan Kim,
	KAMEZAWA Hiroyuki, Andrew Morton

[-- Attachment #1: Type: text/plain, Size: 2709 bytes --]

On Wed, Oct 27, 2010 at 12:13 PM, Hugh Dickins <hughd@google.com> wrote:

> On Wed, 27 Oct 2010, Nick Piggin wrote:
> > On Wed, Oct 27, 2010 at 12:22 PM, Nick Piggin <npiggin@gmail.com> wrote:
> > > On Wed, Oct 27, 2010 at 12:05 PM, Rik van Riel <riel@redhat.com>
> wrote:
> > >> On 10/27/2010 01:21 PM, Ying Han wrote:
> > >>>
> > >>> kswapd's use case of hardware PTE accessed bit is to approximate page
> LRU.
> > >>>  The
> > >>> ActiveLRU demotion to InactiveLRU are not base on accessed bit, while
> it
> > >>> is only
> > >>> used to promote when a page is on inactive LRU list.  All of the
> state
> > >>> transitions
> > >>> are triggered by memory pressure and thus has weak relationship with
> > >>> respect to
> > >>> time.  In addition, hardware already transparently flush tlb whenever
> CPU
> > >>> context
> > >>> switch processes and given limited hardware TLB resource, the time
> period
> > >>> in
> > >>> which a page is accessed but not yet propagated to struct page is
> very
> > >>> small
> > >>> in practice. With the nature of approximation, kernel really don't
> need to
> > >>> flush TLB
> > >>> for changing PTE's access bit.  This commit removes the flush
> operation
> > >>> from it.
>
> It should at least add a comment there in page_referenced_one(), that
> a TLB flush ought to be done, but is now judged not worth the effort.
>

I will make the change here.

>
> (I'd expect architectures to differ on whether it's worth the effort.)
>

Right :)  I would like hear from upstream if the problem is general enough
to solve, and thus
we can plan put further effort into it.

> >>>
> > >>> Signed-off-by: Ying Han<yinghan@google.com>
> > >>> Singed-off-by: Ken Chen<kenchen@google.com>
>
> Hey, Ken, switch off those curling tongs :)
>
> > However, it's a scary change -- higher chance of reclaiming a TLB covered
> page.
>
> Yes, I was often tempted to make such a change in the past;
> but ran away when it appeared to be in danger of losing the pte
> referenced bit of precisely the most intensively referenced pages.
>
> Ying's point (about what the pte referenced bit is being used for in our
> current implementation) is interesting, and might have tipped the balance;
> but that's not clear to me - and the flush is only done when mm is on CPU.
>

The initial patch is from Ken, and I am helping out here to get feedback
from
upstream and further improvement. :)

>
> > I had a vague memory of this problem biting someone when this flush
> wasn't
> > actually done properly... maybe powerpc.
> >
> > But anyway, same solution could be possible, by flushing every N pages
> scanned.
>
> Yes, batching seems safer.
>

I might be able to take a look at it.

--Ying

>
> Hugh

[-- Attachment #2: Type: text/html, Size: 4346 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 20:35         ` Ying Han
@ 2010-10-28  0:11           ` KAMEZAWA Hiroyuki
  2010-10-29  1:30             ` Ken Chen
  0 siblings, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-28  0:11 UTC (permalink / raw)
  To: Ying Han
  Cc: Hugh Dickins, Nick Piggin, Rik van Riel, Ken Chen, linux-mm,
	Minchan Kim, Andrew Morton

On Wed, 27 Oct 2010 13:35:02 -0700
Ying Han <yinghan@google.com> wrote:
> >
> > > I had a vague memory of this problem biting someone when this flush
> > wasn't
> > > actually done properly... maybe powerpc.
> > >
> > > But anyway, same solution could be possible, by flushing every N pages
> > scanned.
> >
> > Yes, batching seems safer.
> >
> 
> I might be able to take a look at it.
> 

I'd like to vote for batching.

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-27 20:19   ` Ying Han
@ 2010-10-28 11:53     ` Rik van Riel
  0 siblings, 0 replies; 17+ messages in thread
From: Rik van Riel @ 2010-10-28 11:53 UTC (permalink / raw)
  To: Ying Han
  Cc: linux-mm, Hugh Dickins, Minchan Kim, KAMEZAWA Hiroyuki, Andrew Morton

On 10/27/2010 04:19 PM, Ying Han wrote:
>
>
> On Wed, Oct 27, 2010 at 11:05 AM, Rik van Riel <riel@redhat.com
> <mailto:riel@redhat.com>> wrote:
>
>     On 10/27/2010 01:21 PM, Ying Han wrote:
>
>         kswapd's use case of hardware PTE accessed bit is to approximate
>         page LRU.  The
>         ActiveLRU demotion to InactiveLRU are not base on accessed bit,
>         while it is only
>         used to promote when a page is on inactive LRU list.  All of the
>         state transitions
>         are triggered by memory pressure and thus has weak relationship
>         with respect to
>         time.  In addition, hardware already transparently flush tlb
>         whenever CPU context
>         switch processes and given limited hardware TLB resource, the
>         time period in
>         which a page is accessed but not yet propagated to struct page
>         is very small
>         in practice. With the nature of approximation, kernel really
>         don't need to flush TLB
>         for changing PTE's access bit.  This commit removes the flush
>         operation from it.
>
>         Signed-off-by: Ying Han<yinghan@google.com
>         <mailto:yinghan@google.com>>
>         Singed-off-by: Ken Chen<kenchen@google.com
>         <mailto:kenchen@google.com>>
>
>
>     The reasoning behind the patch makes sense.
>
>     However, have you measured any improvements in run time with
>     this patch?  The VM is already tweaked to minimize the number
>     of pages that get aged, so it would be interesting to know
>     where you saw issues.
>
>
> Rik, the workload we were running are some MapReduce jobs.

Well, what kind of performance improvement did you measure?

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-28  0:11           ` KAMEZAWA Hiroyuki
@ 2010-10-29  1:30             ` Ken Chen
  2010-10-29  2:45               ` KAMEZAWA Hiroyuki
  0 siblings, 1 reply; 17+ messages in thread
From: Ken Chen @ 2010-10-29  1:30 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Ying Han, Hugh Dickins, Nick Piggin, Rik van Riel, linux-mm,
	Minchan Kim, Andrew Morton

On Wed, Oct 27, 2010 at 5:11 PM, KAMEZAWA Hiroyuki
<kamezawa.hiroyu@jp.fujitsu.com> wrote:
> I'd like to vote for batching.

Batch mode isn't going to add much value because the effect of
accessed bit is already deferred.  There are two outcome: (1) the tlb
mapping is already flushed due to capacity conflict or (2) process
context'ed out.  You would want to transfer accessed bit from pte to
page table, but flushing TLB on a already deferred operation seems not
that useful.

- Ken

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-29  1:30             ` Ken Chen
@ 2010-10-29  2:45               ` KAMEZAWA Hiroyuki
  2010-10-29  3:43                 ` Rik van Riel
  0 siblings, 1 reply; 17+ messages in thread
From: KAMEZAWA Hiroyuki @ 2010-10-29  2:45 UTC (permalink / raw)
  To: Ken Chen
  Cc: Ying Han, Hugh Dickins, Nick Piggin, Rik van Riel, linux-mm,
	Minchan Kim, Andrew Morton

On Thu, 28 Oct 2010 18:30:23 -0700
Ken Chen <kenchen@google.com> wrote:

> On Wed, Oct 27, 2010 at 5:11 PM, KAMEZAWA Hiroyuki
> <kamezawa.hiroyu@jp.fujitsu.com> wrote:
> > I'd like to vote for batching.
> 
> Batch mode isn't going to add much value because the effect of
> accessed bit is already deferred.  There are two outcome: (1) the tlb
> mapping is already flushed due to capacity conflict or (2) process
> context'ed out.  You would want to transfer accessed bit from pte to
> page table, but flushing TLB on a already deferred operation seems not
> that useful.
> 
Hmm. Without flushing anywhere in memory reclaim path, a process which
cause page fault and enter vmscan will not see his own recent access bit on
pages in LRU ?

I think it should be flushed at least once..

Thanks,
-Kame

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-29  2:45               ` KAMEZAWA Hiroyuki
@ 2010-10-29  3:43                 ` Rik van Riel
  2010-10-29  4:27                   ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Rik van Riel @ 2010-10-29  3:43 UTC (permalink / raw)
  To: KAMEZAWA Hiroyuki
  Cc: Ken Chen, Ying Han, Hugh Dickins, Nick Piggin, linux-mm,
	Minchan Kim, Andrew Morton

On 10/28/2010 10:45 PM, KAMEZAWA Hiroyuki wrote:

> Hmm. Without flushing anywhere in memory reclaim path, a process which
> cause page fault and enter vmscan will not see his own recent access bit on
> pages in LRU ?

Worse still, because kernel threads do a lazy mmu switch, even
page faulting in the process will not cause the TLB entries to
be flushed.

> I think it should be flushed at least once..

A periodic flush may make sense.

Maybe something along the lines of if the TLB has not been
flushed for over a second (we can see that in timer or scheduler
code), flush it?

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-29  3:43                 ` Rik van Riel
@ 2010-10-29  4:27                   ` Minchan Kim
  2010-10-29 12:31                     ` Rik van Riel
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2010-10-29  4:27 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KAMEZAWA Hiroyuki, Ken Chen, Ying Han, Hugh Dickins, Nick Piggin,
	linux-mm, Andrew Morton

On Fri, Oct 29, 2010 at 12:43 PM, Rik van Riel <riel@redhat.com> wrote:
> On 10/28/2010 10:45 PM, KAMEZAWA Hiroyuki wrote:
>
>> Hmm. Without flushing anywhere in memory reclaim path, a process which
>> cause page fault and enter vmscan will not see his own recent access bit
>> on
>> pages in LRU ?
>
> Worse still, because kernel threads do a lazy mmu switch, even
> page faulting in the process will not cause the TLB entries to
> be flushed.
>
>> I think it should be flushed at least once..
>
> A periodic flush may make sense.
>
> Maybe something along the lines of if the TLB has not been
> flushed for over a second (we can see that in timer or scheduler
> code), flush it?

What happens if we don't flush TLB?
It will make for old page to pretend young page.
If it is, how does it affect reclaim?

It makes for old page to promote into active list by page_check_references.
Of couse, It's not good. But for it, we have to keep wrong TLB until
moving head to tail in inactive list. It's very unlikely. That's
because TLB is very smalll and the process will be switching out.

If lumpy happens(ie, not waiting from head to tail in inactive list to
hold a victim page), that's all right since we ignore young bit in
lumpy case.

I think it's no problem unless inactive list is very short.
Remained one is kernel thread's lazy TLB flush.

So how about flushing TLB in kswapd scheduled in?


> --
> All rights reversed
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-29  4:27                   ` Minchan Kim
@ 2010-10-29 12:31                     ` Rik van Riel
  2010-10-29 13:03                       ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Rik van Riel @ 2010-10-29 12:31 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KAMEZAWA Hiroyuki, Ken Chen, Ying Han, Hugh Dickins, Nick Piggin,
	linux-mm, Andrew Morton

On 10/29/2010 12:27 AM, Minchan Kim wrote:

> What happens if we don't flush TLB?
> It will make for old page to pretend young page.
> If it is, how does it affect reclaim?

Other way around - it will make a young page pretend to be an
old page, because the TLB won't know it needs to flush the
Accessed bit into the page tables (where the bit was recently
cleared).

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-29 12:31                     ` Rik van Riel
@ 2010-10-29 13:03                       ` Minchan Kim
  2010-10-29 13:15                         ` Rik van Riel
  0 siblings, 1 reply; 17+ messages in thread
From: Minchan Kim @ 2010-10-29 13:03 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KAMEZAWA Hiroyuki, Ken Chen, Ying Han, Hugh Dickins, Nick Piggin,
	linux-mm, Andrew Morton

On Fri, Oct 29, 2010 at 9:31 PM, Rik van Riel <riel@redhat.com> wrote:
> On 10/29/2010 12:27 AM, Minchan Kim wrote:
>
>> What happens if we don't flush TLB?
>> It will make for old page to pretend young page.
>> If it is, how does it affect reclaim?
>
> Other way around - it will make a young page pretend to be an
> old page, because the TLB won't know it needs to flush the
> Accessed bit into the page tables (where the bit was recently
> cleared).

Ying's patch just removes TLB flush when page access bit is changed
from young to old.
We still flush TLB flush when from old to young change by
ptep_set_access_flags. Do I miss something?

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-29 13:03                       ` Minchan Kim
@ 2010-10-29 13:15                         ` Rik van Riel
  2010-10-30  0:20                           ` Minchan Kim
  0 siblings, 1 reply; 17+ messages in thread
From: Rik van Riel @ 2010-10-29 13:15 UTC (permalink / raw)
  To: Minchan Kim
  Cc: KAMEZAWA Hiroyuki, Ken Chen, Ying Han, Hugh Dickins, Nick Piggin,
	linux-mm, Andrew Morton

On 10/29/2010 09:03 AM, Minchan Kim wrote:
> On Fri, Oct 29, 2010 at 9:31 PM, Rik van Riel<riel@redhat.com>  wrote:
>> On 10/29/2010 12:27 AM, Minchan Kim wrote:
>>
>>> What happens if we don't flush TLB?
>>> It will make for old page to pretend young page.
>>> If it is, how does it affect reclaim?
>>
>> Other way around - it will make a young page pretend to be an
>> old page, because the TLB won't know it needs to flush the
>> Accessed bit into the page tables (where the bit was recently
>> cleared).
>
> Ying's patch just removes TLB flush when page access bit is changed
> from young to old.
> We still flush TLB flush when from old to young change by
> ptep_set_access_flags. Do I miss something?

The TLB is write-through for the accessed and dirty
bits.

If the TLB has a page translation without the accessed
bit (and is accessing it), the accessed bit will be set
in the page table entry.

If the TLB has a page translation that already has the
accessed bit set, nothing will be written to the page
table entry.

With Ying's change, we will clear the accessed bit in
the page table, without invalidating the corresponding
TLB entry.

This can cause accesses to pages to not lead to the
accessed bit getting set in the corresponding page table
entry.

Making sure the TLB is flushed periodically could fix
that issue.

-- 
All rights reversed

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page.
  2010-10-29 13:15                         ` Rik van Riel
@ 2010-10-30  0:20                           ` Minchan Kim
  0 siblings, 0 replies; 17+ messages in thread
From: Minchan Kim @ 2010-10-30  0:20 UTC (permalink / raw)
  To: Rik van Riel
  Cc: KAMEZAWA Hiroyuki, Ken Chen, Ying Han, Hugh Dickins, Nick Piggin,
	linux-mm, Andrew Morton

On Fri, Oct 29, 2010 at 10:15 PM, Rik van Riel <riel@redhat.com> wrote:
> On 10/29/2010 09:03 AM, Minchan Kim wrote:
>>
>> On Fri, Oct 29, 2010 at 9:31 PM, Rik van Riel<riel@redhat.com>  wrote:
>>>
>>> On 10/29/2010 12:27 AM, Minchan Kim wrote:
>>>
>>>> What happens if we don't flush TLB?
>>>> It will make for old page to pretend young page.
>>>> If it is, how does it affect reclaim?
>>>
>>> Other way around - it will make a young page pretend to be an
>>> old page, because the TLB won't know it needs to flush the
>>> Accessed bit into the page tables (where the bit was recently
>>> cleared).
>>
>> Ying's patch just removes TLB flush when page access bit is changed
>> from young to old.
>> We still flush TLB flush when from old to young change by
>> ptep_set_access_flags. Do I miss something?
>
> The TLB is write-through for the accessed and dirty
> bits.
>
> If the TLB has a page translation without the accessed
> bit (and is accessing it), the accessed bit will be set
> in the page table entry.
>
> If the TLB has a page translation that already has the
> accessed bit set, nothing will be written to the page
> table entry.
>
> With Ying's change, we will clear the accessed bit in
> the page table, without invalidating the corresponding
> TLB entry.
>
> This can cause accesses to pages to not lead to the
> accessed bit getting set in the corresponding page table
> entry.
>
> Making sure the TLB is flushed periodically could fix
> that issue.

Thanks for the kind explanation, Rik.
I got your point.

While we lost access bit during short time, inactive list size is
enough long to promote the page.
So for the page lost access bit is to be working set, it have to be
accessed again until moving the tail of inactive list

I think inactive list's size could be point for starting TLB flush.

>
> --
> All rights reversed
>



-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-10-30  0:20 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-10-27 17:21 [PATCH] mm: don't flush TLB when propagate PTE access bit to struct page Ying Han
2010-10-27 18:05 ` Rik van Riel
2010-10-27 18:22   ` Nick Piggin
2010-10-27 18:37     ` Nick Piggin
2010-10-27 19:13       ` Hugh Dickins
2010-10-27 20:35         ` Ying Han
2010-10-28  0:11           ` KAMEZAWA Hiroyuki
2010-10-29  1:30             ` Ken Chen
2010-10-29  2:45               ` KAMEZAWA Hiroyuki
2010-10-29  3:43                 ` Rik van Riel
2010-10-29  4:27                   ` Minchan Kim
2010-10-29 12:31                     ` Rik van Riel
2010-10-29 13:03                       ` Minchan Kim
2010-10-29 13:15                         ` Rik van Riel
2010-10-30  0:20                           ` Minchan Kim
2010-10-27 20:19   ` Ying Han
2010-10-28 11:53     ` Rik van Riel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.