All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] x86, tlb: switch cr3 in leave_mm() only when needed
@ 2012-03-22 23:33 Suresh Siddha
  2012-03-22 23:44 ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-22 23:33 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Len Brown; +Cc: LKML

From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86, tlb: switch cr3 in leave_mm() only when needed

Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.

intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/mm/tlb.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..ad695cc 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,11 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
  */
 void leave_mm(int cpu)
 {
+	struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
 	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 		BUG();
-	cpumask_clear_cpu(cpu,
-			  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
-	load_cr3(swapper_pg_dir);
+	if (cpumask_test_and_clear_cpu(cpu, mm_cpumask(active_mm)))
+		load_cr3(swapper_pg_dir);
 }
 EXPORT_SYMBOL_GPL(leave_mm);
 



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-22 23:33 [patch] x86, tlb: switch cr3 in leave_mm() only when needed Suresh Siddha
@ 2012-03-22 23:44 ` Linus Torvalds
  2012-03-23  0:01   ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-03-22 23:44 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Ingo Molnar, H. Peter Anvin, Len Brown, LKML

On Thu, Mar 22, 2012 at 4:33 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
>
> Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
> But there is no need to change the cr3, if we already left that mm.
>
> intel_idle() for example calls leave_mm() on every deep c-state entry where
> the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
> leave_mm() whenever the TLB is in LAZY state. Both these paths will be
> improved with this change.

Hmm. If this is reasonably common (and intel_idle() certainly is),
maybe we shouldn't even do the "test_and_clear" RMW cycle.

We could do it with a read-only bit test (no races I can see - if it's
clear, it will stay clear), so we could do this with

    if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
        cpumask_clear_cpu(cpu,mm_cpumask(active_mm));
        load_cr3(swapper_pg_dir);
    }

instead? And avoid touching that "mm_cpumask" (and the atomic
serializing instruction) when not necessary?

                       Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-22 23:44 ` Linus Torvalds
@ 2012-03-23  0:01   ` Suresh Siddha
  2012-03-23  0:20     ` H. Peter Anvin
  2012-03-23  0:31     ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha
  0 siblings, 2 replies; 10+ messages in thread
From: Suresh Siddha @ 2012-03-23  0:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, H. Peter Anvin, Len Brown, LKML

On Thu, 2012-03-22 at 16:44 -0700, Linus Torvalds wrote:
> Hmm. If this is reasonably common (and intel_idle() certainly is),
> maybe we shouldn't even do the "test_and_clear" RMW cycle.
> 
> We could do it with a read-only bit test (no races I can see - if it's
> clear, it will stay clear), so we could do this with
> 
>     if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
>         cpumask_clear_cpu(cpu,mm_cpumask(active_mm));
>         load_cr3(swapper_pg_dir);
>     }
> 
> instead? And avoid touching that "mm_cpumask" (and the atomic
> serializing instruction) when not necessary?

Agreed. Updated patch appended. Thanks.
---
From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86, tlb: switch cr3 in leave_mm() only when needed

Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.

intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/mm/tlb.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..125bcad 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,13 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
  */
 void leave_mm(int cpu)
 {
+	struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
 	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 		BUG();
-	cpumask_clear_cpu(cpu,
-			  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
-	load_cr3(swapper_pg_dir);
+	if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
+		cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
+		load_cr3(swapper_pg_dir);
+	}
 }
 EXPORT_SYMBOL_GPL(leave_mm);
 




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:01   ` Suresh Siddha
@ 2012-03-23  0:20     ` H. Peter Anvin
  2012-03-23  0:37       ` Linus Torvalds
  2012-03-23  0:31     ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha
  1 sibling, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2012-03-23  0:20 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Linus Torvalds, Ingo Molnar, Len Brown, LKML

On 03/22/2012 05:01 PM, Suresh Siddha wrote:
> 
> Agreed. Updated patch appended. Thanks.
> ---
> From: Suresh Siddha <suresh.b.siddha@intel.com>
> Subject: x86, tlb: switch cr3 in leave_mm() only when needed
> 
> Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
> But there is no need to change the cr3, if we already left that mm.
> 
> intel_idle() for example calls leave_mm() on every deep c-state entry where
> the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
> leave_mm() whenever the TLB is in LAZY state. Both these paths will be
> improved with this change.
> 
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> 

Looks great to me.  Linus, do you want this one later in the window if
it tests out OK between now and then, or is it too late?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86, tlb: Switch cr3 in leave_mm() only when needed
  2012-03-23  0:01   ` Suresh Siddha
  2012-03-23  0:20     ` H. Peter Anvin
@ 2012-03-23  0:31     ` tip-bot for Suresh Siddha
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Suresh Siddha @ 2012-03-23  0:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, suresh.b.siddha, tglx

Commit-ID:  a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f
Gitweb:     http://git.kernel.org/tip/a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Thu, 22 Mar 2012 17:01:25 -0700
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 22 Mar 2012 17:23:48 -0700

x86, tlb: Switch cr3 in leave_mm() only when needed

Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.

intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1332460885.16101.147.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/mm/tlb.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..125bcad 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,13 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
  */
 void leave_mm(int cpu)
 {
+	struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
 	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 		BUG();
-	cpumask_clear_cpu(cpu,
-			  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
-	load_cr3(swapper_pg_dir);
+	if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
+		cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
+		load_cr3(swapper_pg_dir);
+	}
 }
 EXPORT_SYMBOL_GPL(leave_mm);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:20     ` H. Peter Anvin
@ 2012-03-23  0:37       ` Linus Torvalds
  2012-03-23  0:51         ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-03-23  0:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Suresh Siddha, Ingo Molnar, Len Brown, LKML

On Thu, Mar 22, 2012 at 5:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Looks great to me.  Linus, do you want this one later in the window if
> it tests out OK between now and then, or is it too late?

It's small and trivial, so it's fine. I *would* like to hear whether
it actually makes any real difference, though. I do see "intel_idle()"
as a big cost in idle profiles, and I assume that translates to power
use too, but if you guys have some actual numbers, that would be even
better.

                   Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:37       ` Linus Torvalds
@ 2012-03-23  0:51         ` Suresh Siddha
  2012-03-23  8:37           ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-23  0:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Ingo Molnar, Len Brown, LKML

On Thu, 2012-03-22 at 17:37 -0700, Linus Torvalds wrote:
> On Thu, Mar 22, 2012 at 5:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > Looks great to me.  Linus, do you want this one later in the window if
> > it tests out OK between now and then, or is it too late?
> 
> It's small and trivial, so it's fine. I *would* like to hear whether
> it actually makes any real difference, though. I do see "intel_idle()"
> as a big cost in idle profiles, and I assume that translates to power
> use too, but if you guys have some actual numbers, that would be even
> better.

I was reviewing this code in some other context and thought this
optimization makes sense. Unless someone beats me, I can collect some
data on Monday.

thanks,
suresh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:51         ` Suresh Siddha
@ 2012-03-23  8:37           ` Ingo Molnar
  2012-03-26 22:47             ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2012-03-23  8:37 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Linus Torvalds, H. Peter Anvin, Len Brown, LKML


* Suresh Siddha <suresh.b.siddha@intel.com> wrote:

> I was reviewing this code in some other context and thought 
> this optimization makes sense. Unless someone beats me, I can 
> collect some data on Monday.

Would be really nice to get this data.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  8:37           ` Ingo Molnar
@ 2012-03-26 22:47             ` Suresh Siddha
  2012-03-26 22:47               ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-26 22:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, Len Brown, LKML

On Fri, 2012-03-23 at 09:37 +0100, Ingo Molnar wrote:
> * Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> 
> > I was reviewing this code in some other context and thought 
> > this optimization makes sense. Unless someone beats me, I can 
> > collect some data on Monday.
> 
> Would be really nice to get this data.
> 

On an idle system with 32logical cpu's (2 socket/8-core each) the
original kernel did the unnecessary load_cr3 2988 times during an
interval of 5 seconds.

With the fix, that count drops to 9 times.

thanks,
suresh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-26 22:47             ` Suresh Siddha
@ 2012-03-26 22:47               ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2012-03-26 22:47 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Ingo Molnar, Linus Torvalds, Len Brown, LKML

On 03/26/2012 03:47 PM, Suresh Siddha wrote:
> On Fri, 2012-03-23 at 09:37 +0100, Ingo Molnar wrote:
>> * Suresh Siddha <suresh.b.siddha@intel.com> wrote:
>>
>>> I was reviewing this code in some other context and thought 
>>> this optimization makes sense. Unless someone beats me, I can 
>>> collect some data on Monday.
>>
>> Would be really nice to get this data.
>>
> 
> On an idle system with 32logical cpu's (2 socket/8-core each) the
> original kernel did the unnecessary load_cr3 2988 times during an
> interval of 5 seconds.
> 
> With the fix, that count drops to 9 times.
> 

Nice.

	-hpa


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-26 22:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-22 23:33 [patch] x86, tlb: switch cr3 in leave_mm() only when needed Suresh Siddha
2012-03-22 23:44 ` Linus Torvalds
2012-03-23  0:01   ` Suresh Siddha
2012-03-23  0:20     ` H. Peter Anvin
2012-03-23  0:37       ` Linus Torvalds
2012-03-23  0:51         ` Suresh Siddha
2012-03-23  8:37           ` Ingo Molnar
2012-03-26 22:47             ` Suresh Siddha
2012-03-26 22:47               ` H. Peter Anvin
2012-03-23  0:31     ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.