* [patch] x86, tlb: switch cr3 in leave_mm() only when needed
@ 2012-03-22 23:33 Suresh Siddha
2012-03-22 23:44 ` Linus Torvalds
0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-22 23:33 UTC (permalink / raw)
To: Ingo Molnar, H. Peter Anvin, Linus Torvalds, Len Brown; +Cc: LKML
From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86, tlb: switch cr3 in leave_mm() only when needed
Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.
intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
arch/x86/mm/tlb.c | 6 +++---
1 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..ad695cc 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,11 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
*/
void leave_mm(int cpu)
{
+ struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
BUG();
- cpumask_clear_cpu(cpu,
- mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
- load_cr3(swapper_pg_dir);
+ if (cpumask_test_and_clear_cpu(cpu, mm_cpumask(active_mm)))
+ load_cr3(swapper_pg_dir);
}
EXPORT_SYMBOL_GPL(leave_mm);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-22 23:33 [patch] x86, tlb: switch cr3 in leave_mm() only when needed Suresh Siddha
@ 2012-03-22 23:44 ` Linus Torvalds
2012-03-23 0:01 ` Suresh Siddha
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-03-22 23:44 UTC (permalink / raw)
To: Suresh Siddha; +Cc: Ingo Molnar, H. Peter Anvin, Len Brown, LKML
On Thu, Mar 22, 2012 at 4:33 PM, Suresh Siddha
<suresh.b.siddha@intel.com> wrote:
>
> Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
> But there is no need to change the cr3, if we already left that mm.
>
> intel_idle() for example calls leave_mm() on every deep c-state entry where
> the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
> leave_mm() whenever the TLB is in LAZY state. Both these paths will be
> improved with this change.
Hmm. If this is reasonably common (and intel_idle() certainly is),
maybe we shouldn't even do the "test_and_clear" RMW cycle.
We could do it with a read-only bit test (no races I can see - if it's
clear, it will stay clear), so we could do this with
if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
cpumask_clear_cpu(cpu,mm_cpumask(active_mm));
load_cr3(swapper_pg_dir);
}
instead? And avoid touching that "mm_cpumask" (and the atomic
serializing instruction) when not necessary?
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-22 23:44 ` Linus Torvalds
@ 2012-03-23 0:01 ` Suresh Siddha
2012-03-23 0:20 ` H. Peter Anvin
2012-03-23 0:31 ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha
0 siblings, 2 replies; 10+ messages in thread
From: Suresh Siddha @ 2012-03-23 0:01 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Ingo Molnar, H. Peter Anvin, Len Brown, LKML
On Thu, 2012-03-22 at 16:44 -0700, Linus Torvalds wrote:
> Hmm. If this is reasonably common (and intel_idle() certainly is),
> maybe we shouldn't even do the "test_and_clear" RMW cycle.
>
> We could do it with a read-only bit test (no races I can see - if it's
> clear, it will stay clear), so we could do this with
>
> if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
> cpumask_clear_cpu(cpu,mm_cpumask(active_mm));
> load_cr3(swapper_pg_dir);
> }
>
> instead? And avoid touching that "mm_cpumask" (and the atomic
> serializing instruction) when not necessary?
Agreed. Updated patch appended. Thanks.
---
From: Suresh Siddha <suresh.b.siddha@intel.com>
Subject: x86, tlb: switch cr3 in leave_mm() only when needed
Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.
intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
---
arch/x86/mm/tlb.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..125bcad 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,13 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
*/
void leave_mm(int cpu)
{
+ struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
BUG();
- cpumask_clear_cpu(cpu,
- mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
- load_cr3(swapper_pg_dir);
+ if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
+ cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
+ load_cr3(swapper_pg_dir);
+ }
}
EXPORT_SYMBOL_GPL(leave_mm);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-23 0:01 ` Suresh Siddha
@ 2012-03-23 0:20 ` H. Peter Anvin
2012-03-23 0:37 ` Linus Torvalds
2012-03-23 0:31 ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha
1 sibling, 1 reply; 10+ messages in thread
From: H. Peter Anvin @ 2012-03-23 0:20 UTC (permalink / raw)
To: Suresh Siddha; +Cc: Linus Torvalds, Ingo Molnar, Len Brown, LKML
On 03/22/2012 05:01 PM, Suresh Siddha wrote:
>
> Agreed. Updated patch appended. Thanks.
> ---
> From: Suresh Siddha <suresh.b.siddha@intel.com>
> Subject: x86, tlb: switch cr3 in leave_mm() only when needed
>
> Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
> But there is no need to change the cr3, if we already left that mm.
>
> intel_idle() for example calls leave_mm() on every deep c-state entry where
> the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
> leave_mm() whenever the TLB is in LAZY state. Both these paths will be
> improved with this change.
>
> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
>
Looks great to me. Linus, do you want this one later in the window if
it tests out OK between now and then, or is it too late?
-hpa
--
H. Peter Anvin, Intel Open Source Technology Center
I work for Intel. I don't speak on their behalf.
^ permalink raw reply [flat|nested] 10+ messages in thread
* [tip:x86/mm] x86, tlb: Switch cr3 in leave_mm() only when needed
2012-03-23 0:01 ` Suresh Siddha
2012-03-23 0:20 ` H. Peter Anvin
@ 2012-03-23 0:31 ` tip-bot for Suresh Siddha
1 sibling, 0 replies; 10+ messages in thread
From: tip-bot for Suresh Siddha @ 2012-03-23 0:31 UTC (permalink / raw)
To: linux-tip-commits; +Cc: linux-kernel, hpa, mingo, suresh.b.siddha, tglx
Commit-ID: a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f
Gitweb: http://git.kernel.org/tip/a6fca40f1d7f3e232c9de27c1cebbb9f787fbc4f
Author: Suresh Siddha <suresh.b.siddha@intel.com>
AuthorDate: Thu, 22 Mar 2012 17:01:25 -0700
Committer: H. Peter Anvin <hpa@zytor.com>
CommitDate: Thu, 22 Mar 2012 17:23:48 -0700
x86, tlb: Switch cr3 in leave_mm() only when needed
Currently leave_mm() unconditionally switches the cr3 to swapper_pg_dir.
But there is no need to change the cr3, if we already left that mm.
intel_idle() for example calls leave_mm() on every deep c-state entry where
the CPU flushes the TLB for us. Similarly flush_tlb_all() was also calling
leave_mm() whenever the TLB is in LAZY state. Both these paths will be
improved with this change.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1332460885.16101.147.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
---
arch/x86/mm/tlb.c | 8 +++++---
1 files changed, 5 insertions(+), 3 deletions(-)
diff --git a/arch/x86/mm/tlb.c b/arch/x86/mm/tlb.c
index d6c0418..125bcad 100644
--- a/arch/x86/mm/tlb.c
+++ b/arch/x86/mm/tlb.c
@@ -61,11 +61,13 @@ static DEFINE_PER_CPU_READ_MOSTLY(int, tlb_vector_offset);
*/
void leave_mm(int cpu)
{
+ struct mm_struct *active_mm = percpu_read(cpu_tlbstate.active_mm);
if (percpu_read(cpu_tlbstate.state) == TLBSTATE_OK)
BUG();
- cpumask_clear_cpu(cpu,
- mm_cpumask(percpu_read(cpu_tlbstate.active_mm)));
- load_cr3(swapper_pg_dir);
+ if (cpumask_test_cpu(cpu, mm_cpumask(active_mm))) {
+ cpumask_clear_cpu(cpu, mm_cpumask(active_mm));
+ load_cr3(swapper_pg_dir);
+ }
}
EXPORT_SYMBOL_GPL(leave_mm);
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-23 0:20 ` H. Peter Anvin
@ 2012-03-23 0:37 ` Linus Torvalds
2012-03-23 0:51 ` Suresh Siddha
0 siblings, 1 reply; 10+ messages in thread
From: Linus Torvalds @ 2012-03-23 0:37 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: Suresh Siddha, Ingo Molnar, Len Brown, LKML
On Thu, Mar 22, 2012 at 5:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
>
> Looks great to me. Linus, do you want this one later in the window if
> it tests out OK between now and then, or is it too late?
It's small and trivial, so it's fine. I *would* like to hear whether
it actually makes any real difference, though. I do see "intel_idle()"
as a big cost in idle profiles, and I assume that translates to power
use too, but if you guys have some actual numbers, that would be even
better.
Linus
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-23 0:37 ` Linus Torvalds
@ 2012-03-23 0:51 ` Suresh Siddha
2012-03-23 8:37 ` Ingo Molnar
0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-23 0:51 UTC (permalink / raw)
To: Linus Torvalds; +Cc: H. Peter Anvin, Ingo Molnar, Len Brown, LKML
On Thu, 2012-03-22 at 17:37 -0700, Linus Torvalds wrote:
> On Thu, Mar 22, 2012 at 5:20 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> >
> > Looks great to me. Linus, do you want this one later in the window if
> > it tests out OK between now and then, or is it too late?
>
> It's small and trivial, so it's fine. I *would* like to hear whether
> it actually makes any real difference, though. I do see "intel_idle()"
> as a big cost in idle profiles, and I assume that translates to power
> use too, but if you guys have some actual numbers, that would be even
> better.
I was reviewing this code in some other context and thought this
optimization makes sense. Unless someone beats me, I can collect some
data on Monday.
thanks,
suresh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-23 0:51 ` Suresh Siddha
@ 2012-03-23 8:37 ` Ingo Molnar
2012-03-26 22:47 ` Suresh Siddha
0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2012-03-23 8:37 UTC (permalink / raw)
To: Suresh Siddha; +Cc: Linus Torvalds, H. Peter Anvin, Len Brown, LKML
* Suresh Siddha <suresh.b.siddha@intel.com> wrote:
> I was reviewing this code in some other context and thought
> this optimization makes sense. Unless someone beats me, I can
> collect some data on Monday.
Would be really nice to get this data.
Thanks,
Ingo
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-23 8:37 ` Ingo Molnar
@ 2012-03-26 22:47 ` Suresh Siddha
2012-03-26 22:47 ` H. Peter Anvin
0 siblings, 1 reply; 10+ messages in thread
From: Suresh Siddha @ 2012-03-26 22:47 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Linus Torvalds, H. Peter Anvin, Len Brown, LKML
On Fri, 2012-03-23 at 09:37 +0100, Ingo Molnar wrote:
> * Suresh Siddha <suresh.b.siddha@intel.com> wrote:
>
> > I was reviewing this code in some other context and thought
> > this optimization makes sense. Unless someone beats me, I can
> > collect some data on Monday.
>
> Would be really nice to get this data.
>
On an idle system with 32logical cpu's (2 socket/8-core each) the
original kernel did the unnecessary load_cr3 2988 times during an
interval of 5 seconds.
With the fix, that count drops to 9 times.
thanks,
suresh
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [patch] x86, tlb: switch cr3 in leave_mm() only when needed
2012-03-26 22:47 ` Suresh Siddha
@ 2012-03-26 22:47 ` H. Peter Anvin
0 siblings, 0 replies; 10+ messages in thread
From: H. Peter Anvin @ 2012-03-26 22:47 UTC (permalink / raw)
To: Suresh Siddha; +Cc: Ingo Molnar, Linus Torvalds, Len Brown, LKML
On 03/26/2012 03:47 PM, Suresh Siddha wrote:
> On Fri, 2012-03-23 at 09:37 +0100, Ingo Molnar wrote:
>> * Suresh Siddha <suresh.b.siddha@intel.com> wrote:
>>
>>> I was reviewing this code in some other context and thought
>>> this optimization makes sense. Unless someone beats me, I can
>>> collect some data on Monday.
>>
>> Would be really nice to get this data.
>>
>
> On an idle system with 32logical cpu's (2 socket/8-core each) the
> original kernel did the unnecessary load_cr3 2988 times during an
> interval of 5 seconds.
>
> With the fix, that count drops to 9 times.
>
Nice.
-hpa
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2012-03-26 22:47 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-22 23:33 [patch] x86, tlb: switch cr3 in leave_mm() only when needed Suresh Siddha
2012-03-22 23:44 ` Linus Torvalds
2012-03-23 0:01 ` Suresh Siddha
2012-03-23 0:20 ` H. Peter Anvin
2012-03-23 0:37 ` Linus Torvalds
2012-03-23 0:51 ` Suresh Siddha
2012-03-23 8:37 ` Ingo Molnar
2012-03-26 22:47 ` Suresh Siddha
2012-03-26 22:47 ` H. Peter Anvin
2012-03-23 0:31 ` [tip:x86/mm] x86, tlb: Switch " tip-bot for Suresh Siddha
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).