linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch] x86, tlb: switch cr3 in leave_mm() only when needed
@ 2012-03-22 23:33 Suresh Siddha
  2012-03-22 23:44 ` Linus Torvalds
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-22 23:33 UTC (permalink / raw)
  To: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Len Brown; +Cc: LKML

From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86, tlb: switch cr3 in leave_mm() only when needed

Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.

intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/mm/tlb.c |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..ad695cc 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,11 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
  */
 void leave_mm(int cpu)
 {
+	struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
 	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 		BUG();
-	cpumask_clear_cpu(cpu,
-			  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
-	load_cr3(swapper_pg_dir);
+	if (cpumask_test_and_clear_cpu(cpu, mm_cpumask(active_mm)))
+		load_cr3(swapper_pg_dir);
 }
 EXPORT_SYMBOL_GPL(leave_mm);
 



^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-22 23:33 [patch] x86, tlb: switch cr3 in leave_mm() only when needed Suresh Siddha
@ 2012-03-22 23:44 ` Linus Torvalds
  2012-03-23  0:01   ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-03-22 23:44 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Ingo Molnar, H. Peter Anvin, Len Brown, LKML

On Thu, Mar 22, 2012 at 4:33 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
>
> Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
> But there is no need to change the cr3, if we already left that mm.
>
> intel_idle() for example calls leave_mm() on every deep c-state entry where
> the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
> leave_mm() whenever the TLB is in LAZY state. Both these paths will be
> improved with this change.

Hmm. If this is reasonably common (and intel_idle() certainly is),
maybe we shouldn't even do the "test_and_clear" RMW cycle.

We could do it with a read-only bit test (no races I can see - if it's
clear, it will stay clear), so we could do this with

    if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
        cpumask_clear_cpu(cpu,mm_cpumask(active_mm));
        load_cr3(swapper_pg_dir);
    }

instead? And avoid touching that "mm_cpumask" (and the atomic
serializing instruction) when not necessary?

                       Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-22 23:44 ` Linus Torvalds
@ 2012-03-23  0:01   ` Suresh Siddha
  2012-03-23  0:20     ` H. Peter Anvin
  2012-03-23  0:31     ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha
  0 siblings, 2 replies; 10+ messages in thread
From: Suresh Siddha @ 2012-03-23  0:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Ingo Molnar, H. Peter Anvin, Len Brown, LKML

On Thu, 2012-03-22 at 16:44 -0700, Linus Torvalds wrote:
> Hmm. If this is reasonably common (and intel_idle() certainly is),
> maybe we shouldn't even do the "test_and_clear" RMW cycle.
> 
> We could do it with a read-only bit test (no races I can see - if it's
> clear, it will stay clear), so we could do this with
> 
>     if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
>         cpumask_clear_cpu(cpu,mm_cpumask(active_mm));
>         load_cr3(swapper_pg_dir);
>     }
> 
> instead? And avoid touching that "mm_cpumask" (and the atomic
> serializing instruction) when not necessary?

Agreed. Updated patch appended. Thanks.
---
From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86, tlb: switch cr3 in leave_mm() only when needed

Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.

intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
 arch/x86/mm/tlb.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..125bcad 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,13 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
  */
 void leave_mm(int cpu)
 {
+	struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
 	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 		BUG();
-	cpumask_clear_cpu(cpu,
-			  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
-	load_cr3(swapper_pg_dir);
+	if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
+		cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
+		load_cr3(swapper_pg_dir);
+	}
 }
 EXPORT_SYMBOL_GPL(leave_mm);
 




^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:01   ` Suresh Siddha
@ 2012-03-23  0:20     ` H. Peter Anvin
  2012-03-23  0:37       ` Linus Torvalds
  2012-03-23  0:31     ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha
  1 sibling, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2012-03-23  0:20 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Linus Torvalds, Ingo Molnar, Len Brown, LKML

On 03/22/2012 05:01 PM, Suresh Siddha wrote:
> 
> Agreed. Updated patch appended. Thanks.
> ---
> From: Suresh Siddha <suresh.b.siddha@intel.com>
> Subject: x86, tlb: switch cr3 in leave_mm() only when needed
> 
> Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
> But there is no need to change the cr3, if we already left that mm.
> 
> intel_idle() for example calls leave_mm() on every deep c-state entry where
> the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
> leave_mm() whenever the TLB is in LAZY state. Both these paths will be
> improved with this change.
> 
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
> 

Looks great to me.  Linus, do you want this one later in the window if
it tests out OK between now and then, or is it too late?

	-hpa

-- 
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel.  I don't speak on their behalf.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:x86/mm] x86, tlb: Switch cr3 in leave_mm() only when needed
  2012-03-23  0:01   ` Suresh Siddha
  2012-03-23  0:20     ` H. Peter Anvin
@ 2012-03-23  0:31     ` tip-bot for Suresh Siddha
  1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Suresh Siddha @ 2012-03-23  0:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, suresh.b.siddha, tglx

Commit-ID:  a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f
Gitweb:     http://git.kernel.org/tip/a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f
Author:     Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Thu, 22 Mar 2012 17:01:25 -0700
Committer:  H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 22 Mar 2012 17:23:48 -0700

x86, tlb: Switch cr3 in leave_mm() only when needed

Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.

intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.

Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1332460885.16101.147.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
 arch/x86/mm/tlb.c |    8 +++++---
 1 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..125bcad 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,13 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
  */
 void leave_mm(int cpu)
 {
+	struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
 	if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
 		BUG();
-	cpumask_clear_cpu(cpu,
-			  mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
-	load_cr3(swapper_pg_dir);
+	if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
+		cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
+		load_cr3(swapper_pg_dir);
+	}
 }
 EXPORT_SYMBOL_GPL(leave_mm);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:20     ` H. Peter Anvin
@ 2012-03-23  0:37       ` Linus Torvalds
  2012-03-23  0:51         ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-03-23  0:37 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: Suresh Siddha, Ingo Molnar, Len Brown, LKML

On Thu, Mar 22, 2012 at 5:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Looks great to me.  Linus, do you want this one later in the window if
> it tests out OK between now and then, or is it too late?

It's small and trivial, so it's fine. I *would* like to hear whether
it actually makes any real difference, though. I do see "intel_idle()"
as a big cost in idle profiles, and I assume that translates to power
use too, but if you guys have some actual numbers, that would be even
better.

                   Linus

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:37       ` Linus Torvalds
@ 2012-03-23  0:51         ` Suresh Siddha
  2012-03-23  8:37           ` Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-23  0:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: H. Peter Anvin, Ingo Molnar, Len Brown, LKML

On Thu, 2012-03-22 at 17:37 -0700, Linus Torvalds wrote:
> On Thu, Mar 22, 2012 at 5:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > Looks great to me.  Linus, do you want this one later in the window if
> > it tests out OK between now and then, or is it too late?
> 
> It's small and trivial, so it's fine. I *would* like to hear whether
> it actually makes any real difference, though. I do see "intel_idle()"
> as a big cost in idle profiles, and I assume that translates to power
> use too, but if you guys have some actual numbers, that would be even
> better.

I was reviewing this code in some other context and thought this
optimization makes sense. Unless someone beats me, I can collect some
data on Monday.

thanks,
suresh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  0:51         ` Suresh Siddha
@ 2012-03-23  8:37           ` Ingo Molnar
  2012-03-26 22:47             ` Suresh Siddha
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2012-03-23  8:37 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Linus Torvalds, H. Peter Anvin, Len Brown, LKML


* Suresh Siddha <suresh.b.siddha@intel.com> wrote:

> I was reviewing this code in some other context and thought 
> this optimization makes sense. Unless someone beats me, I can 
> collect some data on Monday.

Would be really nice to get this data.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-23  8:37           ` Ingo Molnar
@ 2012-03-26 22:47             ` Suresh Siddha
  2012-03-26 22:47               ` H. Peter Anvin
  0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-26 22:47 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, Len Brown, LKML

On Fri, 2012-03-23 at 09:37 +0100, Ingo Molnar wrote:
> * Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> 
> > I was reviewing this code in some other context and thought 
> > this optimization makes sense. Unless someone beats me, I can 
> > collect some data on Monday.
> 
> Would be really nice to get this data.
> 

On an idle system with 32logical cpu's (2 socket/8-core each) the
original kernel did the unnecessary load_cr3 2988 times during an
interval of 5 seconds.

With the fix, that count drops to 9 times.

thanks,
suresh


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
  2012-03-26 22:47             ` Suresh Siddha
@ 2012-03-26 22:47               ` H. Peter Anvin
  0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2012-03-26 22:47 UTC (permalink / raw)
  To: Suresh Siddha; +Cc: Ingo Molnar, Linus Torvalds, Len Brown, LKML

On 03/26/2012 03:47 PM, Suresh Siddha wrote:
> On Fri, 2012-03-23 at 09:37 +0100, Ingo Molnar wrote:
>> * Suresh Siddha <suresh.b.siddha@intel.com> wrote:
>>
>>> I was reviewing this code in some other context and thought 
>>> this optimization makes sense. Unless someone beats me, I can 
>>> collect some data on Monday.
>>
>> Would be really nice to get this data.
>>
> 
> On an idle system with 32logical cpu's (2 socket/8-core each) the
> original kernel did the unnecessary load_cr3 2988 times during an
> interval of 5 seconds.
> 
> With the fix, that count drops to 9 times.
> 

Nice.

	-hpa


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2012-03-26 22:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-22 23:33 [patch] x86, tlb: switch cr3 in leave_mm() only when needed Suresh Siddha
2012-03-22 23:44 ` Linus Torvalds
2012-03-23  0:01   ` Suresh Siddha
2012-03-23  0:20     ` H. Peter Anvin
2012-03-23  0:37       ` Linus Torvalds
2012-03-23  0:51         ` Suresh Siddha
2012-03-23  8:37           ` Ingo Molnar
2012-03-26 22:47             ` Suresh Siddha
2012-03-26 22:47               ` H. Peter Anvin
2012-03-23  0:31     ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).