All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-12 22:10 ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.

So let's use the locked variant everywhere - helps keep the code simple as
well.

While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h

I hope I'm not splitting this up too much - the reason is I wanted to isolate
the code changes (that people might want to test for performance) from comment
changes approved by Linus, from (so far unreviewed) comment change I came up
with myself.

Lightly tested on my system.

Michael S. Tsirkin (3):
  x86: drop mfence in favor of lock+addl
  x86: drop a comment left over from X86_OOSTORE
  x86: tweak the comment about use of wmb for IO

 arch/x86/include/asm/barrier.h | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-12 22:10 ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, virtualization, H. Peter Anvin,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.

So let's use the locked variant everywhere - helps keep the code simple as
well.

While I was at it, I found some inconsistencies in comments in
arch/x86/include/asm/barrier.h

I hope I'm not splitting this up too much - the reason is I wanted to isolate
the code changes (that people might want to test for performance) from comment
changes approved by Linus, from (so far unreviewed) comment change I came up
with myself.

Lightly tested on my system.

Michael S. Tsirkin (3):
  x86: drop mfence in favor of lock+addl
  x86: drop a comment left over from X86_OOSTORE
  x86: tweak the comment about use of wmb for IO

 arch/x86/include/asm/barrier.h | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v2 1/3] x86: drop mfence in favor of lock+addl
  2016-01-12 22:10 ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-01-12 22:10 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Ingo Molnar, Arnd Bergmann,
	Andy Lutomirski, Borislav Petkov, Andrey Konovalov

mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, same as we always did on old 32-bit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a584e1c..7f99726 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -15,11 +15,12 @@
  * Some non-Intel clones support out of order store. wmb() ceases to be a
  * nop for these.
  */
-#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
+
+#define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
 #define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
 #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
 #else
-#define mb() 	asm volatile("mfence":::"memory")
+#define mb() asm volatile("lock; addl $0,0(%%rsp)" ::: "memory")
 #define rmb()	asm volatile("lfence":::"memory")
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 1/3] x86: drop mfence in favor of lock+addl
  2016-01-12 22:10 ` Michael S. Tsirkin
  (?)
@ 2016-01-12 22:10 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Arnd Bergmann, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, virtualization,
	Ingo Molnar, Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

mfence appears to be way slower than a locked instruction - let's use
lock+add unconditionally, same as we always did on old 32-bit.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index a584e1c..7f99726 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -15,11 +15,12 @@
  * Some non-Intel clones support out of order store. wmb() ceases to be a
  * nop for these.
  */
-#define mb() alternative("lock; addl $0,0(%%esp)", "mfence", X86_FEATURE_XMM2)
+
+#define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
 #define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
 #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
 #else
-#define mb() 	asm volatile("mfence":::"memory")
+#define mb() asm volatile("lock; addl $0,0(%%rsp)" ::: "memory")
 #define rmb()	asm volatile("lfence":::"memory")
 #define wmb()	asm volatile("sfence" ::: "memory")
 #endif
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/3] x86: drop a comment left over from X86_OOSTORE
  2016-01-12 22:10 ` Michael S. Tsirkin
@ 2016-01-12 22:10   ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Ingo Molnar, Andy Lutomirski,
	Andrey Konovalov, Borislav Petkov

The comment about wmb being non-nop is a left over from before commit
09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE").

It makes no sense now: if you have an SMP system with out of order
stores, making wmb not a nop will not help.

Additionally, wmb is not a nop even for regular intel CPUs because of
weird use-cases e.g. dealing with WC memory.

Drop this comment.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 7f99726..eb220b8 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,11 +11,6 @@
  */
 
 #ifdef CONFIG_X86_32
-/*
- * Some non-Intel clones support out of order store. wmb() ceases to be a
- * nop for these.
- */
-
 #define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
 #define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
 #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 2/3] x86: drop a comment left over from X86_OOSTORE
@ 2016-01-12 22:10   ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, virtualization,
	Ingo Molnar, Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

The comment about wmb being non-nop is a left over from before commit
09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE").

It makes no sense now: if you have an SMP system with out of order
stores, making wmb not a nop will not help.

Additionally, wmb is not a nop even for regular intel CPUs because of
weird use-cases e.g. dealing with WC memory.

Drop this comment.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index 7f99726..eb220b8 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -11,11 +11,6 @@
  */
 
 #ifdef CONFIG_X86_32
-/*
- * Some non-Intel clones support out of order store. wmb() ceases to be a
- * nop for these.
- */
-
 #define mb() asm volatile("lock; addl $0,0(%%esp)" ::: "memory")
 #define rmb() alternative("lock; addl $0,0(%%esp)", "lfence", X86_FEATURE_XMM2)
 #define wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/3] x86: tweak the comment about use of wmb for IO
  2016-01-12 22:10 ` Michael S. Tsirkin
                   ` (3 preceding siblings ...)
  (?)
@ 2016-01-12 22:10 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	H. Peter Anvin, virtualization, Ingo Molnar, Andy Lutomirski,
	Borislav Petkov, Andrey Konovalov

On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even
that is generally questionable.

Leave them around as historial unless somebody can point to a case where
they care about the performance, but tweak the comment so people
don't think they are strictly required in all cases.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index eb220b8..924cd44 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -6,7 +6,7 @@
 
 /*
  * Force strict CPU ordering.
- * And yes, this is required on UP too when we're talking
+ * And yes, this might be required on UP too when we're talking
  * to devices.
  */
 
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v2 3/3] x86: tweak the comment about use of wmb for IO
  2016-01-12 22:10 ` Michael S. Tsirkin
                   ` (4 preceding siblings ...)
  (?)
@ 2016-01-12 22:10 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-12 22:10 UTC (permalink / raw)
  To: linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, virtualization,
	Ingo Molnar, Borislav Petkov, Andy Lutomirski, H. Peter Anvin,
	Thomas Gleixner, Paul E. McKenney, Ingo Molnar

On x86, we *do* still use the non-nop rmb/wmb for IO barriers, but even
that is generally questionable.

Leave them around as historial unless somebody can point to a case where
they care about the performance, but tweak the comment so people
don't think they are strictly required in all cases.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
---
 arch/x86/include/asm/barrier.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/barrier.h b/arch/x86/include/asm/barrier.h
index eb220b8..924cd44 100644
--- a/arch/x86/include/asm/barrier.h
+++ b/arch/x86/include/asm/barrier.h
@@ -6,7 +6,7 @@
 
 /*
  * Force strict CPU ordering.
- * And yes, this is required on UP too when we're talking
+ * And yes, this might be required on UP too when we're talking
  * to devices.
  */
 
-- 
MST

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2016-01-12 22:10 ` Michael S. Tsirkin
@ 2016-01-12 22:25   ` H. Peter Anvin
  -1 siblings, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2016-01-12 22:25 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Peter Zijlstra, Ingo Molnar, Thomas Gleixner,
	Paul E. McKenney, the arch/x86 maintainers, Davidlohr Bueso,
	virtualization

On 01/12/16 14:10, Michael S. Tsirkin wrote:
> mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> 
> So let's use the locked variant everywhere - helps keep the code simple as
> well.
> 
> While I was at it, I found some inconsistencies in comments in
> arch/x86/include/asm/barrier.h
> 
> I hope I'm not splitting this up too much - the reason is I wanted to isolate
> the code changes (that people might want to test for performance) from comment
> changes approved by Linus, from (so far unreviewed) comment change I came up
> with myself.
> 
> Lightly tested on my system.
> 
> Michael S. Tsirkin (3):
>   x86: drop mfence in favor of lock+addl
>   x86: drop a comment left over from X86_OOSTORE
>   x86: tweak the comment about use of wmb for IO
> 

I would like to get feedback from the hardware team about the
implications of this change, first.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-12 22:25   ` H. Peter Anvin
  0 siblings, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2016-01-12 22:25 UTC (permalink / raw)
  To: Michael S. Tsirkin, linux-kernel, Linus Torvalds
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, virtualization, Thomas Gleixner,
	Paul E. McKenney, Ingo Molnar

On 01/12/16 14:10, Michael S. Tsirkin wrote:
> mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> 
> So let's use the locked variant everywhere - helps keep the code simple as
> well.
> 
> While I was at it, I found some inconsistencies in comments in
> arch/x86/include/asm/barrier.h
> 
> I hope I'm not splitting this up too much - the reason is I wanted to isolate
> the code changes (that people might want to test for performance) from comment
> changes approved by Linus, from (so far unreviewed) comment change I came up
> with myself.
> 
> Lightly tested on my system.
> 
> Michael S. Tsirkin (3):
>   x86: drop mfence in favor of lock+addl
>   x86: drop a comment left over from X86_OOSTORE
>   x86: tweak the comment about use of wmb for IO
> 

I would like to get feedback from the hardware team about the
implications of this change, first.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] x86: drop a comment left over from X86_OOSTORE
  2016-01-12 22:10   ` Michael S. Tsirkin
  (?)
  (?)
@ 2016-01-12 22:25   ` One Thousand Gnomes
  -1 siblings, 0 replies; 26+ messages in thread
From: One Thousand Gnomes @ 2016-01-12 22:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Linus Torvalds, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, H. Peter Anvin,
	virtualization, Ingo Molnar, Andy Lutomirski, Andrey Konovalov,
	Borislav Petkov

On Wed, 13 Jan 2016 00:10:19 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> The comment about wmb being non-nop is a left over from before commit
> 09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE").
> 
> It makes no sense now: if you have an SMP system with out of order
> stores, making wmb not a nop will not help.

There were never any IDT Winchip systems with SMP support, and they were
the one system that could enable OOSTORE (and it was worth up to 30% on
some workloads). The fencing it had was just for DMA devices.

Alan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 2/3] x86: drop a comment left over from X86_OOSTORE
  2016-01-12 22:10   ` Michael S. Tsirkin
  (?)
@ 2016-01-12 22:25   ` One Thousand Gnomes
  -1 siblings, 0 replies; 26+ messages in thread
From: One Thousand Gnomes @ 2016-01-12 22:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	Andrey Konovalov, the arch/x86 maintainers, linux-kernel,
	virtualization, Ingo Molnar, Borislav Petkov, Andy Lutomirski,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney,
	Linus Torvalds, Ingo Molnar

On Wed, 13 Jan 2016 00:10:19 +0200
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> The comment about wmb being non-nop is a left over from before commit
> 09df7c4c8097 ("x86: Remove CONFIG_X86_OOSTORE").
> 
> It makes no sense now: if you have an SMP system with out of order
> stores, making wmb not a nop will not help.

There were never any IDT Winchip systems with SMP support, and they were
the one system that could enable OOSTORE (and it was worth up to 30% on
some workloads). The fencing it had was just for DMA devices.

Alan

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2016-01-12 22:25   ` H. Peter Anvin
@ 2016-01-26  8:20     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-26  8:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, Linus Torvalds, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization

On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > 
> > So let's use the locked variant everywhere - helps keep the code simple as
> > well.
> > 
> > While I was at it, I found some inconsistencies in comments in
> > arch/x86/include/asm/barrier.h
> > 
> > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > the code changes (that people might want to test for performance) from comment
> > changes approved by Linus, from (so far unreviewed) comment change I came up
> > with myself.
> > 
> > Lightly tested on my system.
> > 
> > Michael S. Tsirkin (3):
> >   x86: drop mfence in favor of lock+addl
> >   x86: drop a comment left over from X86_OOSTORE
> >   x86: tweak the comment about use of wmb for IO
> > 
> 
> I would like to get feedback from the hardware team about the
> implications of this change, first.
> 
> 	-hpa
> 

Hi hpa,
Any luck getting some feedback on this one?

Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-26  8:20     ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-26  8:20 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, linux-kernel, virtualization,
	Thomas Gleixner, Paul E. McKenney, Linus Torvalds, Ingo Molnar

On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > 
> > So let's use the locked variant everywhere - helps keep the code simple as
> > well.
> > 
> > While I was at it, I found some inconsistencies in comments in
> > arch/x86/include/asm/barrier.h
> > 
> > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > the code changes (that people might want to test for performance) from comment
> > changes approved by Linus, from (so far unreviewed) comment change I came up
> > with myself.
> > 
> > Lightly tested on my system.
> > 
> > Michael S. Tsirkin (3):
> >   x86: drop mfence in favor of lock+addl
> >   x86: drop a comment left over from X86_OOSTORE
> >   x86: tweak the comment about use of wmb for IO
> > 
> 
> I would like to get feedback from the hardware team about the
> implications of this change, first.
> 
> 	-hpa
> 

Hi hpa,
Any luck getting some feedback on this one?

Thanks,

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2016-01-26  8:20     ` Michael S. Tsirkin
@ 2016-01-26 21:37       ` H. Peter Anvin
  -1 siblings, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2016-01-26 21:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: linux-kernel, Linus Torvalds, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization,
	Borislav Petkov

On 01/26/16 00:20, Michael S. Tsirkin wrote:
> On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> 
> Hi hpa,
> Any luck getting some feedback on this one?
> 

Yes.  What we know so far is that *most* cases it will work, but there
are apparently a few corner cases where MFENCE or a full-blown
serializing instruction is necessary.  We are trying to characterize
those corner cases and see if any of them affect the kernel.

Even if they are, we can probably make those barriers explicitly
different, but we don't want to go ahead with the change until we know
where we need to care.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-26 21:37       ` H. Peter Anvin
  0 siblings, 0 replies; 26+ messages in thread
From: H. Peter Anvin @ 2016-01-26 21:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, linux-kernel, virtualization,
	Borislav Petkov, Thomas Gleixner, Paul E. McKenney,
	Linus Torvalds, Ingo Molnar

On 01/26/16 00:20, Michael S. Tsirkin wrote:
> On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> 
> Hi hpa,
> Any luck getting some feedback on this one?
> 

Yes.  What we know so far is that *most* cases it will work, but there
are apparently a few corner cases where MFENCE or a full-blown
serializing instruction is necessary.  We are trying to characterize
those corner cases and see if any of them affect the kernel.

Even if they are, we can probably make those barriers explicitly
different, but we don't want to go ahead with the change until we know
where we need to care.

	-hpa

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2016-01-26 21:37       ` H. Peter Anvin
@ 2016-01-27 14:07         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-27 14:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: linux-kernel, Linus Torvalds, Davidlohr Bueso, Peter Zijlstra,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization,
	Borislav Petkov

On Tue, Jan 26, 2016 at 01:37:38PM -0800, H. Peter Anvin wrote:
> On 01/26/16 00:20, Michael S. Tsirkin wrote:
> > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > 
> > Hi hpa,
> > Any luck getting some feedback on this one?
> > 
> 
> Yes.  What we know so far is that *most* cases it will work, but there
> are apparently a few corner cases where MFENCE or a full-blown
> serializing instruction is necessary.  We are trying to characterize
> those corner cases and see if any of them affect the kernel.

It would be very interesting to know your findings.

Going over the manual I found one such case, and then going over the
kernel code I found some questionable uses of barriers - it would be
interesting to find out what some other cases are.

So I think it's probably useful to find out the full answer, anyway.

Awaiting the answers with interest.

> Even if they are, we can probably make those barriers explicitly
> different, but we don't want to go ahead with the change until we know
> where we need to care.
> 
> 	-hpa

Thanks!

Now that you definitely said there are corner cases, I poked some more
at the manual and found one:
	CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
	to be ordered by any other fencing or serializing instructions or by
	another CLFLUSH instruction. For example, software can use an MFENCE
	instruction to ensure that previous stores are included in the
	write-back.

There are instances of this in mwait_play_dead,
clflush_cache_range, mwait_idle_with_hints, mwait_idle ..

A comment near pcommit_sfence includes an example
flush_and_commit_buffer code which is interesting -
it assumes sfence flushes clflush.

So it appears that pcommit_sfence in that file is wrong then?
At least on processors where it falls back on clflush.

mwait_idle is the only one that calls smp_mb and not mb()
I couldn't figure out why - original patches did mb()
there.


Outside core kernel - drm_cache_flush_clflush, drm_clflush_sg,
drm_clflush_virt_range.

Then there's gru_start_instruction in drivers/misc/sgi-gru/.

But otherwise drivers/misc/sgi-gru/ calls clflush in gru_flush_cache
without calling mb() - this could be a bug.


Looking at all users, it seems that only mwait_idle calls  smp_mb,
around clflush, others call mb().

So at least as a first step, maybe it makes sense to scope this down
somewhat by changing mwait_idle to call mb() and then optimizing
__smp_mb instead of mb?

I'll post v3 that does this.


-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-27 14:07         ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-27 14:07 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, linux-kernel, virtualization,
	Borislav Petkov, Thomas Gleixner, Paul E. McKenney,
	Linus Torvalds, Ingo Molnar

On Tue, Jan 26, 2016 at 01:37:38PM -0800, H. Peter Anvin wrote:
> On 01/26/16 00:20, Michael S. Tsirkin wrote:
> > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > 
> > Hi hpa,
> > Any luck getting some feedback on this one?
> > 
> 
> Yes.  What we know so far is that *most* cases it will work, but there
> are apparently a few corner cases where MFENCE or a full-blown
> serializing instruction is necessary.  We are trying to characterize
> those corner cases and see if any of them affect the kernel.

It would be very interesting to know your findings.

Going over the manual I found one such case, and then going over the
kernel code I found some questionable uses of barriers - it would be
interesting to find out what some other cases are.

So I think it's probably useful to find out the full answer, anyway.

Awaiting the answers with interest.

> Even if they are, we can probably make those barriers explicitly
> different, but we don't want to go ahead with the change until we know
> where we need to care.
> 
> 	-hpa

Thanks!

Now that you definitely said there are corner cases, I poked some more
at the manual and found one:
	CLFLUSH is only ordered by the MFENCE instruction. It is not guaranteed
	to be ordered by any other fencing or serializing instructions or by
	another CLFLUSH instruction. For example, software can use an MFENCE
	instruction to ensure that previous stores are included in the
	write-back.

There are instances of this in mwait_play_dead,
clflush_cache_range, mwait_idle_with_hints, mwait_idle ..

A comment near pcommit_sfence includes an example
flush_and_commit_buffer code which is interesting -
it assumes sfence flushes clflush.

So it appears that pcommit_sfence in that file is wrong then?
At least on processors where it falls back on clflush.

mwait_idle is the only one that calls smp_mb and not mb()
I couldn't figure out why - original patches did mb()
there.


Outside core kernel - drm_cache_flush_clflush, drm_clflush_sg,
drm_clflush_virt_range.

Then there's gru_start_instruction in drivers/misc/sgi-gru/.

But otherwise drivers/misc/sgi-gru/ calls clflush in gru_flush_cache
without calling mb() - this could be a bug.


Looking at all users, it seems that only mwait_idle calls  smp_mb,
around clflush, others call mb().

So at least as a first step, maybe it makes sense to scope this down
somewhat by changing mwait_idle to call mb() and then optimizing
__smp_mb instead of mb?

I'll post v3 that does this.


-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2016-01-27 14:07         ` Michael S. Tsirkin
@ 2016-01-27 14:14           ` Peter Zijlstra
  -1 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2016-01-27 14:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization,
	Borislav Petkov

On Wed, Jan 27, 2016 at 04:07:56PM +0200, Michael S. Tsirkin wrote:
> mwait_idle is the only one that calls smp_mb and not mb()
> I couldn't figure out why - original patches did mb()
> there.

That probably wants changing. That said, running UP kernels on affected
hardware is 'unlikely' :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-27 14:14           ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2016-01-27 14:14 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Davidlohr Bueso, Davidlohr Bueso, the arch/x86 maintainers,
	linux-kernel, virtualization, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Paul E. McKenney, Linus Torvalds, Ingo Molnar

On Wed, Jan 27, 2016 at 04:07:56PM +0200, Michael S. Tsirkin wrote:
> mwait_idle is the only one that calls smp_mb and not mb()
> I couldn't figure out why - original patches did mb()
> there.

That probably wants changing. That said, running UP kernels on affected
hardware is 'unlikely' :-)

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2016-01-27 14:14           ` Peter Zijlstra
@ 2016-01-27 14:18             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-27 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization,
	Borislav Petkov

On Wed, Jan 27, 2016 at 03:14:09PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 27, 2016 at 04:07:56PM +0200, Michael S. Tsirkin wrote:
> > mwait_idle is the only one that calls smp_mb and not mb()
> > I couldn't figure out why - original patches did mb()
> > there.
> 
> That probably wants changing. That said, running UP kernels on affected
> hardware is 'unlikely' :-)

OK that's nice. After changing that one place, everyone calls
mb() around clflush so it should be safe to change smp_mb away
from mfence without breaking things.
I'm testing v4 that does this.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
@ 2016-01-27 14:18             ` Michael S. Tsirkin
  0 siblings, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2016-01-27 14:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Davidlohr Bueso, Davidlohr Bueso, the arch/x86 maintainers,
	linux-kernel, virtualization, Borislav Petkov, H. Peter Anvin,
	Thomas Gleixner, Paul E. McKenney, Linus Torvalds, Ingo Molnar

On Wed, Jan 27, 2016 at 03:14:09PM +0100, Peter Zijlstra wrote:
> On Wed, Jan 27, 2016 at 04:07:56PM +0200, Michael S. Tsirkin wrote:
> > mwait_idle is the only one that calls smp_mb and not mb()
> > I couldn't figure out why - original patches did mb()
> > there.
> 
> That probably wants changing. That said, running UP kernels on affected
> hardware is 'unlikely' :-)

OK that's nice. After changing that one place, everyone calls
mb() around clflush so it should be safe to change smp_mb away
from mfence without breaking things.
I'm testing v4 that does this.

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2016-01-26  8:20     ` Michael S. Tsirkin
  (?)
  (?)
@ 2018-10-11 17:37     ` Andres Freund
  2018-10-11 18:11       ` Michael S. Tsirkin
  2018-10-11 18:11       ` Michael S. Tsirkin
  -1 siblings, 2 replies; 26+ messages in thread
From: Andres Freund @ 2018-10-11 17:37 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization

Hi,

On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > > 
> > > So let's use the locked variant everywhere - helps keep the code simple as
> > > well.
> > > 
> > > While I was at it, I found some inconsistencies in comments in
> > > arch/x86/include/asm/barrier.h
> > > 
> > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > the code changes (that people might want to test for performance) from comment
> > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > with myself.
> > > 
> > > Lightly tested on my system.
> > > 
> > > Michael S. Tsirkin (3):
> > >   x86: drop mfence in favor of lock+addl
> > >   x86: drop a comment left over from X86_OOSTORE
> > >   x86: tweak the comment about use of wmb for IO
> > > 
> > 
> > I would like to get feedback from the hardware team about the
> > implications of this change, first.

> Any luck getting some feedback on this one?

Ping?  I just saw a bunch of kernel fences in a benchmark, making me
wonder why linux uses mfence rather than lock addl. Leading me to this
thread.

Greetings,

Andres Freund

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2018-10-11 17:37     ` Andres Freund
  2018-10-11 18:11       ` Michael S. Tsirkin
@ 2018-10-11 18:11       ` Michael S. Tsirkin
  2018-10-11 18:21         ` Andres Freund
  1 sibling, 1 reply; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-10-11 18:11 UTC (permalink / raw)
  To: Andres Freund
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization

On Thu, Oct 11, 2018 at 10:37:07AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > > > 
> > > > So let's use the locked variant everywhere - helps keep the code simple as
> > > > well.
> > > > 
> > > > While I was at it, I found some inconsistencies in comments in
> > > > arch/x86/include/asm/barrier.h
> > > > 
> > > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > > the code changes (that people might want to test for performance) from comment
> > > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > > with myself.
> > > > 
> > > > Lightly tested on my system.
> > > > 
> > > > Michael S. Tsirkin (3):
> > > >   x86: drop mfence in favor of lock+addl
> > > >   x86: drop a comment left over from X86_OOSTORE
> > > >   x86: tweak the comment about use of wmb for IO
> > > > 
> > > 
> > > I would like to get feedback from the hardware team about the
> > > implications of this change, first.
> 
> > Any luck getting some feedback on this one?
> 
> Ping?  I just saw a bunch of kernel fences in a benchmark, making me
> wonder why linux uses mfence rather than lock addl. Leading me to this
> thread.
> 
> Greetings,
> 
> Andres Freund

It doesn't do it for smp_mb any longer:

commit 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Fri Oct 27 19:14:31 2017 +0300

    locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE


I didn't bother with mb() since I didn't think it's performance
critical, and one needs to worry about drivers possibly doing
non-temporals etc which do need mfence.

Do you see mb() in a benchmark then?

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2018-10-11 17:37     ` Andres Freund
@ 2018-10-11 18:11       ` Michael S. Tsirkin
  2018-10-11 18:11       ` Michael S. Tsirkin
  1 sibling, 0 replies; 26+ messages in thread
From: Michael S. Tsirkin @ 2018-10-11 18:11 UTC (permalink / raw)
  To: Andres Freund
  Cc: Davidlohr Bueso, Davidlohr Bueso, Peter Zijlstra,
	the arch/x86 maintainers, linux-kernel, virtualization,
	H. Peter Anvin, Thomas Gleixner, Paul E. McKenney,
	Linus Torvalds, Ingo Molnar

On Thu, Oct 11, 2018 at 10:37:07AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > > > 
> > > > So let's use the locked variant everywhere - helps keep the code simple as
> > > > well.
> > > > 
> > > > While I was at it, I found some inconsistencies in comments in
> > > > arch/x86/include/asm/barrier.h
> > > > 
> > > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > > the code changes (that people might want to test for performance) from comment
> > > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > > with myself.
> > > > 
> > > > Lightly tested on my system.
> > > > 
> > > > Michael S. Tsirkin (3):
> > > >   x86: drop mfence in favor of lock+addl
> > > >   x86: drop a comment left over from X86_OOSTORE
> > > >   x86: tweak the comment about use of wmb for IO
> > > > 
> > > 
> > > I would like to get feedback from the hardware team about the
> > > implications of this change, first.
> 
> > Any luck getting some feedback on this one?
> 
> Ping?  I just saw a bunch of kernel fences in a benchmark, making me
> wonder why linux uses mfence rather than lock addl. Leading me to this
> thread.
> 
> Greetings,
> 
> Andres Freund

It doesn't do it for smp_mb any longer:

commit 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
Author: Michael S. Tsirkin <mst@redhat.com>
Date:   Fri Oct 27 19:14:31 2017 +0300

    locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE


I didn't bother with mb() since I didn't think it's performance
critical, and one needs to worry about drivers possibly doing
non-temporals etc which do need mfence.

Do you see mb() in a benchmark then?

-- 
MST

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks
  2018-10-11 18:11       ` Michael S. Tsirkin
@ 2018-10-11 18:21         ` Andres Freund
  0 siblings, 0 replies; 26+ messages in thread
From: Andres Freund @ 2018-10-11 18:21 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: H. Peter Anvin, linux-kernel, Linus Torvalds, Davidlohr Bueso,
	Peter Zijlstra, Ingo Molnar, Thomas Gleixner, Paul E. McKenney,
	the arch/x86 maintainers, Davidlohr Bueso, virtualization

Hi,

On 2018-10-11 14:11:42 -0400, Michael S. Tsirkin wrote:
> On Thu, Oct 11, 2018 at 10:37:07AM -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2016-01-26 10:20:14 +0200, Michael S. Tsirkin wrote:
> > > On Tue, Jan 12, 2016 at 02:25:24PM -0800, H. Peter Anvin wrote:
> > > > On 01/12/16 14:10, Michael S. Tsirkin wrote:
> > > > > mb() typically uses mfence on modern x86, but a micro-benchmark shows that it's
> > > > > 2 to 3 times slower than lock; addl $0,(%%e/rsp) that we use on older CPUs.
> > > > > 
> > > > > So let's use the locked variant everywhere - helps keep the code simple as
> > > > > well.
> > > > > 
> > > > > While I was at it, I found some inconsistencies in comments in
> > > > > arch/x86/include/asm/barrier.h
> > > > > 
> > > > > I hope I'm not splitting this up too much - the reason is I wanted to isolate
> > > > > the code changes (that people might want to test for performance) from comment
> > > > > changes approved by Linus, from (so far unreviewed) comment change I came up
> > > > > with myself.
> > > > > 
> > > > > Lightly tested on my system.
> > > > > 
> > > > > Michael S. Tsirkin (3):
> > > > >   x86: drop mfence in favor of lock+addl
> > > > >   x86: drop a comment left over from X86_OOSTORE
> > > > >   x86: tweak the comment about use of wmb for IO
> > > > > 
> > > > 
> > > > I would like to get feedback from the hardware team about the
> > > > implications of this change, first.
> > 
> > > Any luck getting some feedback on this one?
> > 
> > Ping?  I just saw a bunch of kernel fences in a benchmark, making me
> > wonder why linux uses mfence rather than lock addl. Leading me to this
> > thread.
> > 
> > Greetings,
> > 
> > Andres Freund
> 
> It doesn't do it for smp_mb any longer:
> 
> commit 450cbdd0125cfa5d7bbf9e2a6b6961cc48d29730
> Author: Michael S. Tsirkin <mst@redhat.com>
> Date:   Fri Oct 27 19:14:31 2017 +0300
> 
>     locking/x86: Use LOCK ADD for smp_mb() instead of MFENCE

Ooh, missed that one.


> I didn't bother with mb() since I didn't think it's performance
> critical, and one needs to worry about drivers possibly doing
> non-temporals etc which do need mfence.
> 
> Do you see mb() in a benchmark then?

No it was a smp_mp(). It was on an older kernel (was profiling postgres
on hardware I have limited control over,not the kernel. Just noticed the
barrier while looking at perf output). I quickly looked into a current
arch/x86/include/asm/barrier.h and still saw mfences, and then found
this thread. Should have looked more carefully.

Sorry for the noise, and thanks for the quick answer.

- Andres

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2018-10-11 18:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-12 22:10 [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks Michael S. Tsirkin
2016-01-12 22:10 ` Michael S. Tsirkin
2016-01-12 22:10 ` [PATCH v2 1/3] x86: drop mfence in favor of lock+addl Michael S. Tsirkin
2016-01-12 22:10 ` Michael S. Tsirkin
2016-01-12 22:10 ` [PATCH v2 2/3] x86: drop a comment left over from X86_OOSTORE Michael S. Tsirkin
2016-01-12 22:10   ` Michael S. Tsirkin
2016-01-12 22:25   ` One Thousand Gnomes
2016-01-12 22:25   ` One Thousand Gnomes
2016-01-12 22:10 ` [PATCH v2 3/3] x86: tweak the comment about use of wmb for IO Michael S. Tsirkin
2016-01-12 22:10 ` Michael S. Tsirkin
2016-01-12 22:25 ` [PATCH v2 0/3] x86: faster mb()+other barrier.h tweaks H. Peter Anvin
2016-01-12 22:25   ` H. Peter Anvin
2016-01-26  8:20   ` Michael S. Tsirkin
2016-01-26  8:20     ` Michael S. Tsirkin
2016-01-26 21:37     ` H. Peter Anvin
2016-01-26 21:37       ` H. Peter Anvin
2016-01-27 14:07       ` Michael S. Tsirkin
2016-01-27 14:07         ` Michael S. Tsirkin
2016-01-27 14:14         ` Peter Zijlstra
2016-01-27 14:14           ` Peter Zijlstra
2016-01-27 14:18           ` Michael S. Tsirkin
2016-01-27 14:18             ` Michael S. Tsirkin
2018-10-11 17:37     ` Andres Freund
2018-10-11 18:11       ` Michael S. Tsirkin
2018-10-11 18:11       ` Michael S. Tsirkin
2018-10-11 18:21         ` Andres Freund

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.