All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/4] x86: faster mb()+documentation tweaks
@ 2016-01-13 20:12 Michael S. Tsirkin
  2016-01-13 20:12 ` [PATCH v3 1/4] x86: add cc clobber for addl Michael S. Tsirkin
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Borislav Petkov

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.

So let's use the locked variant everywhere.

While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h

The documentation fixes are included first - I verified that
they do not change the generated code at all. They should be
safe to apply directly.

The last patch changes mb() to lock addl. I was unable to
measure a speed difference on a macro benchmark,
but I noted that even doing
	#define mb() barrier()
seems to make no difference for most benchmarks
(it causes hangs sometimes, of course).

HPA asked that the last patch is deferred until we hear back from
intel, which makes sense of course. So it needs HPA's ack.

I hope I'm not splitting this up too much - the reason is I wanted to isolate
the code changes (that people might want to test for performance)
from comment changes approved by Linus, from (so far unreviewed) changes
I came up with myself.

Changes from v2:
	add patch adding cc clobber for addl
	tweak commit log for patch 2
	use addl at SP-4 (as opposed to SP) to reduce data dependencies

Michael S. Tsirkin (4):
  x86: add cc clobber for addl
  x86: drop a comment left over from X86_OOSTORE
  x86: tweak the comment about use of wmb for IO
  x86: drop mfence in favor of lock+addl

 arch/x86/include/asm/barrier.h | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 1/4] x86: add cc clobber for addl
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
  2016-01-13 20:12 ` [PATCH v3 1/4] x86: add cc clobber for addl Michael S. Tsirkin
@ 2016-01-13 20:12 ` Michael S. Tsirkin
  2016-01-13 20:12   ` Michael S. Tsirkin
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Borislav Petkov, Ingo Molnar,
	Arnd Bergmann, Borislav Petkov, Andy Lutomirski,
	Andrey Konovalov

addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc.
Historically, gcc doesn't need one on x86, and always considers flags
clobbered. We are probably missing the cc clobber in a *lot* of places
for this reason.

But even if not necessary, it's probably a good thing to add for
documentation, and in case gcc semantcs ever change.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a584e1c..a65bdb1 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -15,9 +15,12 @@
  * Some non-Intel clones support out of order store. wmb() ceases to be a
  * nop for these.
  */
-#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
-#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
-#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
+#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
+				      X86_FEATURE_XMM2) ::: "memory", "cc")
+#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
+				       X86_FEATURE_XMM2) ::: "memory", "cc")
+#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \
+				       X86_FEATURE_XMM2) ::: "memory", "cc")
 #else
 #define mb() 	asm volatile("mfence":::"memory")
 #define rmb()	asm volatile("lfence":::"memory")
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 1/4] x86: add cc clobber for addl
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
@ 2016-01-13 20:12 ` Michael S. Tsirkin
  2016-01-13 20:12 ` Michael S. Tsirkin
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Arnd Bergmann, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, virtualization,
	Ingo Molnar, Borislav Petkov, Borislav Petkov, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney, Ingo Molnar

addl clobbers flags (such as CF) but barrier.h didn't tell this to gcc.
Historically, gcc doesn't need one on x86, and always considers flags
clobbered. We are probably missing the cc clobber in a *lot* of places
for this reason.

But even if not necessary, it's probably a good thing to add for
documentation, and in case gcc semantcs ever change.

Reported-by: Borislav Petkov <bp@alien8.de>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a584e1c..a65bdb1 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -15,9 +15,12 @@
  * Some non-Intel clones support out of order store. wmb() ceases to be a
  * nop for these.
  */
-#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
-#define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
-#define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
+#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
+				      X86_FEATURE_XMM2) ::: "memory", "cc")
+#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
+				       X86_FEATURE_XMM2) ::: "memory", "cc")
+#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \
+				       X86_FEATURE_XMM2) ::: "memory", "cc")
 #else
 #define mb() 	asm volatile("mfence":::"memory")
 #define rmb()	asm volatile("lfence":::"memory")
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
@ 2016-01-13 20:12   ` Michael S. Tsirkin
  2016-01-13 20:12 ` Michael S. Tsirkin
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Borislav Petkov, Ingo Molnar,
	Arnd Bergmann, Andy Lutomirski, Andrey Konovalov,
	Borislav Petkov

The comment about wmb being non-nop to deal with non-intel CPUs is a
left over from before commit 09df7c4c8097 ("x86: Remove
CONFIG_X86_OOSTORE").

It makes no sense now: in particular, wmb is not a nop even for regular
intel CPUs because of weird use-cases e.g. dealing with WC memory.

Drop this comment.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a65bdb1..a291745 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,10 +11,6 @@
  */
 
 #ifdef CONFIG_X86_32
-/*
- * Some non-Intel clones support out of order store. wmb() ceases to be a
- * nop for these.
- */
 #define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
 				      X86_FEATURE_XMM2) ::: "memory", "cc")
 #define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE
@ 2016-01-13 20:12   ` Michael S. Tsirkin
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Arnd Bergmann, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, virtualization,
	Ingo Molnar, Borislav Petkov, Borislav Petkov, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney, Ingo Molnar

The comment about wmb being non-nop to deal with non-intel CPUs is a
left over from before commit 09df7c4c8097 ("x86: Remove
CONFIG_X86_OOSTORE").

It makes no sense now: in particular, wmb is not a nop even for regular
intel CPUs because of weird use-cases e.g. dealing with WC memory.

Drop this comment.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a65bdb1..a291745 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,10 +11,6 @@
  */
 
 #ifdef CONFIG_X86_32
-/*
- * Some non-Intel clones support out of order store. wmb() ceases to be a
- * nop for these.
- */
 #define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
 				      X86_FEATURE_XMM2) ::: "memory", "cc")
 #define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/4] x86: tweak the comment about use of wmb for IO
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  2016-01-13 20:12 ` [PATCH v3 3/4] x86: tweak the comment about use of wmb for IO Michael S. Tsirkin
@ 2016-01-13 20:12 ` Michael S. Tsirkin
  2016-01-13 20:12 ` [PATCH v3 4/4] x86: drop mfence in favor of lock+addl Michael S. Tsirkin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Borislav Petkov, Ingo Molnar,
	Arnd Bergmann, Andy Lutomirski, Andrey Konovalov,
	Borislav Petkov

On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even
that is generally questionable.

Leave them around as historial unless somebody can point to a case where
they care about the performance, but tweak the comment so people
don't think they are strictly required in all cases.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a291745..bfb28ca 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -6,7 +6,7 @@
 
 /*
  * Force strict CPU ordering.
- * And yes, this is required on UP too when we're talking
+ * And yes, this might be required on UP too when we're talking
  * to devices.
  */
 
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 3/4] x86: tweak the comment about use of wmb for IO
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
                   ` (2 preceding siblings ...)
  2016-01-13 20:12   ` Michael S. Tsirkin
@ 2016-01-13 20:12 ` Michael S. Tsirkin
  2016-01-13 20:12 ` Michael S. Tsirkin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Arnd Bergmann, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, virtualization,
	Ingo Molnar, Borislav Petkov, Borislav Petkov, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney, Ingo Molnar

On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even
that is generally questionable.

Leave them around as historial unless somebody can point to a case where
they care about the performance, but tweak the comment so people
don't think they are strictly required in all cases.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a291745..bfb28ca 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -6,7 +6,7 @@
 
 /*
  * Force strict CPU ordering.
- * And yes, this is required on UP too when we're talking
+ * And yes, this might be required on UP too when we're talking
  * to devices.
  */
 
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/4] x86: drop mfence in favor of lock+addl
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  2016-01-13 20:12 ` Michael S. Tsirkin
@ 2016-01-13 20:12 ` Michael S. Tsirkin
  2016-01-13 20:12 ` Michael S. Tsirkin
  2016-01-14 11:39   ` Borislav Petkov
  7 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Borislav Petkov, Andy Lutomirski,
	Ingo Molnar, Borislav Petkov, Arnd Bergmann, Andrey Konovalov,
	Andy Lutomirski

mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.

Just poking at SP would be the most natural, but if we
then read the value from SP, we get a false dependency
which will slow us down.

This was noted in this article:
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/

And is easy to reproduce by sticking a barrier in a small non-inline
function.

So let's use a negative offset - which avoids this problem since we
build with the red zone disabled.

Update rmb/wmb on 32 bit to use the negative offset, too, for
consistency.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index bfb28ca..9a2d257 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,16 +11,15 @@
  */
 
 #ifdef CONFIG_X86_32
-#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
-				      X86_FEATURE_XMM2) ::: "memory", "cc")
-#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
+#define mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc")
+#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \
 				       X86_FEATURE_XMM2) ::: "memory", "cc")
-#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \
+#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \
 				       X86_FEATURE_XMM2) ::: "memory", "cc")
 #else
-#define mb() 	asm volatile("mfence":::"memory")
-#define rmb()	asm volatile("lfence":::"memory")
-#define wmb()	asm volatile("sfence" ::: "memory")
+#define mb() asm volatile("lock; addl $0,-4(%%rsp)" ::: "memory", "cc")
+#define rmb() asm volatile("lfence" ::: "memory")
+#define wmb() asm volatile("sfence" ::: "memory")
 #endif
 
 #ifdef CONFIG_X86_PPRO_FENCE
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH v3 4/4] x86: drop mfence in favor of lock+addl
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
                   ` (5 preceding siblings ...)
  2016-01-13 20:12 ` [PATCH v3 4/4] x86: drop mfence in favor of lock+addl Michael S. Tsirkin
@ 2016-01-13 20:12 ` Michael S. Tsirkin
  2016-01-14 11:39   ` Borislav Petkov
  7 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Arnd Bergmann, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, virtualization,
	Andy Lutomirski, Borislav Petkov, Borislav Petkov,
	Andy Lutomirski, H. Peter Anvin, Thomas Gleixner,
	Paul E. McKenney, Ingo Molnar, Ingo Molnar

mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, as we always did on old 32-bit.

Just poking at SP would be the most natural, but if we
then read the value from SP, we get a false dependency
which will slow us down.

This was noted in this article:
http://shipilev.net/blog/2014/on-the-fence-with-dependencies/

And is easy to reproduce by sticking a barrier in a small non-inline
function.

So let's use a negative offset - which avoids this problem since we
build with the red zone disabled.

Update rmb/wmb on 32 bit to use the negative offset, too, for
consistency.

Suggested-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index bfb28ca..9a2d257 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,16 +11,15 @@
  */
 
 #ifdef CONFIG_X86_32
-#define mb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "mfence", \
-				      X86_FEATURE_XMM2) ::: "memory", "cc")
-#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "lfence", \
+#define mb() asm volatile("lock; addl $0,-4(%%esp)" ::: "memory", "cc")
+#define rmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "lfence", \
 				       X86_FEATURE_XMM2) ::: "memory", "cc")
-#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,0(%%esp)", "sfence", \
+#define wmb() asm volatile(ALTERNATIVE("lock; addl $0,-4(%%esp)", "sfence", \
 				       X86_FEATURE_XMM2) ::: "memory", "cc")
 #else
-#define mb() 	asm volatile("mfence":::"memory")
-#define rmb()	asm volatile("lfence":::"memory")
-#define wmb()	asm volatile("sfence" ::: "memory")
+#define mb() asm volatile("lock; addl $0,-4(%%rsp)" ::: "memory", "cc")
+#define rmb() asm volatile("lfence" ::: "memory")
+#define wmb() asm volatile("sfence" ::: "memory")
 #endif
 
 #ifdef CONFIG_X86_PPRO_FENCE
-- 
MST

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] x86: faster mb()+documentation tweaks
  2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
@ 2016-01-14 11:39   ` Borislav Petkov
  2016-01-13 20:12 ` Michael S. Tsirkin
                     ` (6 subsequent siblings)
  7 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2016-01-14 11:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Linus Torvalds, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, H. Peter Anvin,
	virtualization

On Wed, Jan 13, 2016 at 10:12:22PM +0200, Michael S. Tsirkin wrote:
> mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> 2 to 3 times slower than lock; addl that we use on older CPUs.
> 
> So let's use the locked variant everywhere.
> 
> While I was at it, I found some inconsistencies in comments in
> arch/x86/include/asm/barrier.h
> 
> The documentation fixes are included first - I verified that
> they do not change the generated code at all. They should be
> safe to apply directly.
> 
> The last patch changes mb() to lock addl. I was unable to
> measure a speed difference on a macro benchmark,
> but I noted that even doing
> 	#define mb() barrier()
> seems to make no difference for most benchmarks
> (it causes hangs sometimes, of course).
> 
> HPA asked that the last patch is deferred until we hear back from
> intel, which makes sense of course. So it needs HPA's ack.
> 
> I hope I'm not splitting this up too much - the reason is I wanted to isolate
> the code changes (that people might want to test for performance)
> from comment changes approved by Linus, from (so far unreviewed) changes
> I came up with myself.
> 
> Changes from v2:
> 	add patch adding cc clobber for addl
> 	tweak commit log for patch 2
> 	use addl at SP-4 (as opposed to SP) to reduce data dependencies
> 
> Michael S. Tsirkin (4):
>   x86: add cc clobber for addl
>   x86: drop a comment left over from X86_OOSTORE
>   x86: tweak the comment about use of wmb for IO

First three look ok to me regardless of what happens with 4. So applied.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] x86: faster mb()+documentation tweaks
@ 2016-01-14 11:39   ` Borislav Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Borislav Petkov @ 2016-01-14 11:39 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, linux-kernel, virtualization,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney,
	Linus Torvalds, Ingo Molnar

On Wed, Jan 13, 2016 at 10:12:22PM +0200, Michael S. Tsirkin wrote:
> mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> 2 to 3 times slower than lock; addl that we use on older CPUs.
> 
> So let's use the locked variant everywhere.
> 
> While I was at it, I found some inconsistencies in comments in
> arch/x86/include/asm/barrier.h
> 
> The documentation fixes are included first - I verified that
> they do not change the generated code at all. They should be
> safe to apply directly.
> 
> The last patch changes mb() to lock addl. I was unable to
> measure a speed difference on a macro benchmark,
> but I noted that even doing
> 	#define mb() barrier()
> seems to make no difference for most benchmarks
> (it causes hangs sometimes, of course).
> 
> HPA asked that the last patch is deferred until we hear back from
> intel, which makes sense of course. So it needs HPA's ack.
> 
> I hope I'm not splitting this up too much - the reason is I wanted to isolate
> the code changes (that people might want to test for performance)
> from comment changes approved by Linus, from (so far unreviewed) changes
> I came up with myself.
> 
> Changes from v2:
> 	add patch adding cc clobber for addl
> 	tweak commit log for patch 2
> 	use addl at SP-4 (as opposed to SP) to reduce data dependencies
> 
> Michael S. Tsirkin (4):
>   x86: add cc clobber for addl
>   x86: drop a comment left over from X86_OOSTORE
>   x86: tweak the comment about use of wmb for IO

First three look ok to me regardless of what happens with 4. So applied.

Thanks.

-- 
Regards/Gruss,
    Boris.

ECO tip #101: Trim your mails when you reply.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] x86: faster mb()+documentation tweaks
  2016-01-14 11:39   ` Borislav Petkov
@ 2016-01-26  8:23     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-26  8:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Linus Torvalds, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, H. Peter Anvin,
	virtualization

On Thu, Jan 14, 2016 at 12:39:34PM +0100, Borislav Petkov wrote:
> > Michael S. Tsirkin (4):
> >   x86: add cc clobber for addl
> >   x86: drop a comment left over from X86_OOSTORE
> >   x86: tweak the comment about use of wmb for IO
> 
> First three look ok to me regardless of what happens with 4. So applied.

Sorry - in which tree are these applied?
Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] x86: faster mb()+documentation tweaks
@ 2016-01-26  8:23     ` Michael S. Tsirkin
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-26  8:23 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, linux-kernel, virtualization,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney,
	Linus Torvalds, Ingo Molnar

On Thu, Jan 14, 2016 at 12:39:34PM +0100, Borislav Petkov wrote:
> > Michael S. Tsirkin (4):
> >   x86: add cc clobber for addl
> >   x86: drop a comment left over from X86_OOSTORE
> >   x86: tweak the comment about use of wmb for IO
> 
> First three look ok to me regardless of what happens with 4. So applied.

Sorry - in which tree are these applied?
Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] x86: faster mb()+documentation tweaks
  2016-01-26  8:23     ` Michael S. Tsirkin
@ 2016-01-26  8:26       ` Boris Petkov
  -1 siblings, 0 replies; 16+ messages in thread
From: Boris Petkov @ 2016-01-26  8:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Linus Torvalds, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, H. Peter Anvin,
	virtualization

"Michael S. Tsirkin" <mst@redhat.com> wrote:
>Sorry - in which tree are these applied?

They'll appear in tip at some point.


-- 
Sent from a small device: formatting sux and brevity is inevitable. 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH v3 0/4] x86: faster mb()+documentation tweaks
@ 2016-01-26  8:26       ` Boris Petkov
  0 siblings, 0 replies; 16+ messages in thread
From: Boris Petkov @ 2016-01-26  8:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, linux-kernel, virtualization,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney,
	Linus Torvalds, Ingo Molnar

"Michael S. Tsirkin" <mst@redhat.com> wrote:
>Sorry - in which tree are these applied?

They'll appear in tip at some point.


-- 
Sent from a small device: formatting sux and brevity is inevitable. 

^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH v3 0/4] x86: faster mb()+documentation tweaks
@ 2016-01-13 20:12 Michael S. Tsirkin
  0 siblings, 0 replies; 16+ messages in thread
From: Michael S. Tsirkin @ 2016-01-13 20:12 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, virtualization, Borislav Petkov,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney, Ingo Molnar

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl that we use on older CPUs.

So let's use the locked variant everywhere.

While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h

The documentation fixes are included first - I verified that
they do not change the generated code at all. They should be
safe to apply directly.

The last patch changes mb() to lock addl. I was unable to
measure a speed difference on a macro benchmark,
but I noted that even doing
	#define mb() barrier()
seems to make no difference for most benchmarks
(it causes hangs sometimes, of course).

HPA asked that the last patch is deferred until we hear back from
intel, which makes sense of course. So it needs HPA's ack.

I hope I'm not splitting this up too much - the reason is I wanted to isolate
the code changes (that people might want to test for performance)
from comment changes approved by Linus, from (so far unreviewed) changes
I came up with myself.

Changes from v2:
	add patch adding cc clobber for addl
	tweak commit log for patch 2
	use addl at SP-4 (as opposed to SP) to reduce data dependencies

Michael S. Tsirkin (4):
  x86: add cc clobber for addl
  x86: drop a comment left over from X86_OOSTORE
  x86: tweak the comment about use of wmb for IO
  x86: drop mfence in favor of lock+addl

 arch/x86/include/asm/barrier.h | 20 +++++++++-----------
 1 file changed, 9 insertions(+), 11 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2016-01-26  8:27 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-13 20:12 [PATCH v3 0/4] x86: faster mb()+documentation tweaks Michael S. Tsirkin
2016-01-13 20:12 ` [PATCH v3 1/4] x86: add cc clobber for addl Michael S. Tsirkin
2016-01-13 20:12 ` Michael S. Tsirkin
2016-01-13 20:12 ` [PATCH v3 2/4] x86: drop a comment left over from X86_OOSTORE Michael S. Tsirkin
2016-01-13 20:12   ` Michael S. Tsirkin
2016-01-13 20:12 ` [PATCH v3 3/4] x86: tweak the comment about use of wmb for IO Michael S. Tsirkin
2016-01-13 20:12 ` Michael S. Tsirkin
2016-01-13 20:12 ` [PATCH v3 4/4] x86: drop mfence in favor of lock+addl Michael S. Tsirkin
2016-01-13 20:12 ` Michael S. Tsirkin
2016-01-14 11:39 ` [PATCH v3 0/4] x86: faster mb()+documentation tweaks Borislav Petkov
2016-01-14 11:39   ` Borislav Petkov
2016-01-26  8:23   ` Michael S. Tsirkin
2016-01-26  8:23     ` Michael S. Tsirkin
2016-01-26  8:26     ` Boris Petkov
2016-01-26  8:26       ` Boris Petkov
  -- strict thread matches above, loose matches on Subject: below --
2016-01-13 20:12 Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.