x86: Use clflush() instead of wbinvd() whenever possible when changing mapping
diff mbox series

Message ID 1248421981-31865-1-git-send-email-thellstrom@vmware.com
State New, archived
Headers show
Series
  • x86: Use clflush() instead of wbinvd() whenever possible when changing mapping
Related show

Commit Message

Thomas Hellstrom July 24, 2009, 7:53 a.m. UTC
The current code uses wbinvd() when the area to flush is > 4MB. Although this
may be faster than using clflush() the effect of wbinvd() on irq latencies
may be catastrophical on systems with large caches. Therefore use clflush()
whenever possible and accept the slight performance hit.

Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
---
 arch/x86/mm/pageattr.c |    5 ++---
 1 files changed, 2 insertions(+), 3 deletions(-)

Comments

Andi Kleen July 24, 2009, 10:05 a.m. UTC | #1
Thomas Hellstrom <thellstrom@vmware.com> writes:

> The current code uses wbinvd() when the area to flush is > 4MB. Although this
> may be faster than using clflush() the effect of wbinvd() on irq latencies
> may be catastrophical on systems with large caches. Therefore use clflush()

may be? You seem to miss some hard data here.

I wouldn't expect wbinvd to take longer than a ms or two, which 
is probably not catastrophic yet.

-Andi
Thomas Hellstrom July 24, 2009, 10:21 a.m. UTC | #2
Andi Kleen wrote:
> Thomas Hellstrom <thellstrom@vmware.com> writes:
>
>   
>> The current code uses wbinvd() when the area to flush is > 4MB. Although this
>> may be faster than using clflush() the effect of wbinvd() on irq latencies
>> may be catastrophical on systems with large caches. Therefore use clflush()
>>     
>
> may be? You seem to miss some hard data here.
>
>   
Admittedly.
However, the concept of flushing and invalidating the caches completely 
on systems with many
processors and huge caches when we intend to only flush only small piece 
of the cache also sounds like a big overkill.

Furthermore, since the wbinvd() has been introduced as an optimization 
of the general clflush() case, did somebody ever check the effects on 
systems with many processors and huge caches?

/Thomas

> -Andi
>
>   

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Andi Kleen July 24, 2009, 10:58 a.m. UTC | #3
On Fri, Jul 24, 2009 at 12:21:50PM +0200, Thomas Hellstrom wrote:
> Andi Kleen wrote:
>> Thomas Hellstrom <thellstrom@vmware.com> writes:
>>
>>   
>>> The current code uses wbinvd() when the area to flush is > 4MB. Although this
>>> may be faster than using clflush() the effect of wbinvd() on irq latencies
>>> may be catastrophical on systems with large caches. Therefore use clflush()
>>>     
>>
>> may be? You seem to miss some hard data here.
>>
>>   
> Admittedly.

So was it motivated by a real problem?

> However, the concept of flushing and invalidating the caches completely on 
> systems with many
> processors and huge caches when we intend to only flush only small piece of 
> the cache also sounds like a big overkill.

The other CPUs will not block (just flush their caches in the background or
in parallel), so the latency shouldn't scale with the number of sockets.
Also number of cores also shouldn't impact it because these tend
to have shared cache hierarchies.

That's just a theory, but not necessarily a worse one than yours :-)

>
> Furthermore, since the wbinvd() has been introduced as an optimization of 
> the general clflush() case, did somebody ever check the effects on systems 
> with many processors and huge caches?

Typically systems with large caches flush faster too.

-Andi
Thomas Hellstrom July 24, 2009, 11:14 a.m. UTC | #4
Andi Kleen wrote:
> On Fri, Jul 24, 2009 at 12:21:50PM +0200, Thomas Hellstrom wrote:
>   
>> Andi Kleen wrote:
>>     
>>> Thomas Hellstrom <thellstrom@vmware.com> writes:
>>>
>>>   
>>>       
>>>> The current code uses wbinvd() when the area to flush is > 4MB. Although this
>>>> may be faster than using clflush() the effect of wbinvd() on irq latencies
>>>> may be catastrophical on systems with large caches. Therefore use clflush()
>>>>     
>>>>         
>>> may be? You seem to miss some hard data here.
>>>
>>>   
>>>       
>> Admittedly.
>>     
>
> So was it motivated by a real problem?
>   

No. It was motivated by the assumption that wbinvd() is just bad:
Qoute:

WBINVD is a very nasty operation. I was talking to some CPU people 
and they really recommended to get rid of it as far as possible. 
Stopping the CPU for msecs is just wrong and there are apparently even 
some theoretical live lock situations. - It is not interruptible in 
earlier VT versions and messes up real time in the hypervisor. Some 
people were doing KVM on rt kernels and had latency spikes from that.


/Qoute
(I believe you wrote that ?)


>> However, the concept of flushing and invalidating the caches completely on 
>> systems with many
>> processors and huge caches when we intend to only flush only small piece of 
>> the cache also sounds like a big overkill.
>>     
>
> The other CPUs will not block (just flush their caches in the background or
> in parallel), so the latency shouldn't scale with the number of sockets.
> Also number of cores also shouldn't impact it because these tend
> to have shared cache hierarchies.
>
> That's just a theory, but not necessarily a worse one than yours :-)
>
>   
>> Furthermore, since the wbinvd() has been introduced as an optimization of 
>> the general clflush() case, did somebody ever check the effects on systems 
>> with many processors and huge caches?
>>     
>
> Typically systems with large caches flush faster too.
>
>   
OK. We should really test this at some point. I currently don't have the 
hardware to do so.

> -Andi
>
>   

/Thomas


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Andi Kleen July 24, 2009, 1:16 p.m. UTC | #5
> No. It was motivated by the assumption that wbinvd() is just bad:

Ok, got it now.

> Qoute:
>
> WBINVD is a very nasty operation. I was talking to some CPU people and they 
> really recommended to get rid of it as far as possible. Stopping the CPU 
> for msecs is just wrong and there are apparently even some theoretical live 
> lock situations. - It is not interruptible in earlier VT versions and 
> messes up real time in the hypervisor. Some people were doing KVM on rt 
> kernels and had latency spikes from that.
>
>
> /Qoute
> (I believe you wrote that ?)

Yes. That's still true and that's one reason to not use it.

-Andi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Pavel Machek July 30, 2009, 9:07 a.m. UTC | #6
On Fri 2009-07-24 09:53:01, Thomas Hellstrom wrote:
> The current code uses wbinvd() when the area to flush is > 4MB. Although this
> may be faster than using clflush() the effect of wbinvd() on irq
> latencies

You are slowing the kernel down, and you did not write down the
reasons. bad.
Thomas Hellstrom Aug. 2, 2009, 4:22 p.m. UTC | #7
Pavel Machek skrev:
> On Fri 2009-07-24 09:53:01, Thomas Hellstrom wrote:
>   
>> The current code uses wbinvd() when the area to flush is > 4MB. Although this
>> may be faster than using clflush() the effect of wbinvd() on irq
>> latencies
>>     
>
> You are slowing the kernel down, and you did not write down the
> reasons. bad.
>
>   
I may slow the kernel down or speed it up depending on your hardware 
configuration. The reasons should be quite obvious if you read the 
discussion following the patch.

/Thomas

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/arch/x86/mm/pageattr.c b/arch/x86/mm/pageattr.c
index 1b734d7..d4327db 100644
--- a/arch/x86/mm/pageattr.c
+++ b/arch/x86/mm/pageattr.c
@@ -209,13 +209,12 @@  static void cpa_flush_array(unsigned long *start, int numpages, int cache,
 			    int in_flags, struct page **pages)
 {
 	unsigned int i, level;
-	unsigned long do_wbinvd = cache && numpages >= 1024; /* 4M threshold */
 
 	BUG_ON(irqs_disabled());
 
-	on_each_cpu(__cpa_flush_all, (void *) do_wbinvd, 1);
+	on_each_cpu(__cpa_flush_all, (void *) 0UL, 1);
 
-	if (!cache || do_wbinvd)
+	if (!cache)
 		return;
 
 	/*