linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] Define wc_wmb, a write barrier for PCI write combining
@ 2006-02-25  4:20 Bryan O'Sullivan
  2006-02-25  4:43 ` Andi Kleen
                   ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-25  4:20 UTC (permalink / raw)
  To: Andrew Morton, Andi Kleen; +Cc: linux-kernel

On some platforms, a regular wmb() is not sufficient to guarantee that
PCI writes have been flushed to the bus if write combining is in effect.

This change introduces a new macro, wc_wmb(), that makes this guarantee.

It does so by way of a new header file, <linux/system.h>.  This header
can be a site for oft-replicated code from <asm-*/system.h>, but isn't
just yet.

We also define a version of wc_wmb() with the required semantics
on x86_64.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>

diff -r c89918da5f7b -r 94d372e00ccd include/asm-x86_64/system.h
--- a/include/asm-x86_64/system.h	Sat Feb 25 08:01:07 2006 +0800
+++ b/include/asm-x86_64/system.h	Fri Feb 24 20:15:07 2006 -0800
@@ -321,6 +321,8 @@ static inline unsigned long __cmpxchg(vo
 #define mb() 	asm volatile("mfence":::"memory")
 #define rmb()	asm volatile("lfence":::"memory")
 
+#define wc_wmb()	asm volatile("sfence" ::: "memory")
+
 #ifdef CONFIG_UNORDERED_IO
 #define wmb()	asm volatile("sfence" ::: "memory")
 #else
diff -r c89918da5f7b -r 94d372e00ccd include/linux/system.h
--- /dev/null	Thu Jan  1 00:00:00 1970 +0000
+++ b/include/linux/system.h	Fri Feb 24 20:15:07 2006 -0800
@@ -0,0 +1,27 @@
+/*
+ * Copyright 2006 PathScale, Inc.  All Rights Reserved.
+ *
+ * This file is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License
+ * as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software Foundation,
+ * Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA.
+ */
+
+#ifndef _LINUX_SYSTEM_H
+#define _LINUX_SYSTEM_H
+
+#include <asm/system.h>
+
+#ifndef wc_wmb
+#define wc_wmb() wmb()
+#endif
+
+#endif /* _LINUX_SYSTEM_H */



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25  4:20 [PATCH] Define wc_wmb, a write barrier for PCI write combining Bryan O'Sullivan
@ 2006-02-25  4:43 ` Andi Kleen
  2006-02-25  7:34   ` Bryan O'Sullivan
  2006-02-25 14:28 ` Benjamin LaHaise
  2006-02-28 10:01 ` Jes Sorensen
  2 siblings, 1 reply; 44+ messages in thread
From: Andi Kleen @ 2006-02-25  4:43 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, linux-kernel

On Saturday 25 February 2006 05:20, Bryan O'Sullivan wrote:
> On some platforms, a regular wmb() is not sufficient to guarantee that
> PCI writes have been flushed to the bus if write combining is in effect.

On what platforms?
 
> It does so by way of a new header file, <linux/system.h>.  This header
> can be a site for oft-replicated code from <asm-*/system.h>, but isn't
> just yet.

linux/system.h looks unnatural to me.
 
> We also define a version of wc_wmb() with the required semantics
> on x86_64.

Leaving i386 out in the cold?

-Andi


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25  4:43 ` Andi Kleen
@ 2006-02-25  7:34   ` Bryan O'Sullivan
  2006-02-25 13:28     ` Andi Kleen
  0 siblings, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-25  7:34 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel

On Sat, 2006-02-25 at 05:43 +0100, Andi Kleen wrote:
> On Saturday 25 February 2006 05:20, Bryan O'Sullivan wrote:
> > On some platforms, a regular wmb() is not sufficient to guarantee that
> > PCI writes have been flushed to the bus if write combining is in effect.
> 
> On what platforms?

On x86_64 in particular, if CONFIG_UNORDERED_IO is defined.  Regular
wmb() is implemented as a compiler memory barrier then, which isn't
sufficient for PCI write combining.

> linux/system.h looks unnatural to me.

I used this for symmetry with <asm/system.h> where other barriers are
defined.  It could obviously go into io.h instead, since it's an
MMIO-related barrier, but I didn't want to separate it from other
barriers.

If you have a location you'd prefer, please let me know.

> > We also define a version of wc_wmb() with the required semantics
> > on x86_64.
> 
> Leaving i386 out in the cold?

Looks like it should be defined there, too, something like this perhaps:

#define wc_wmb() alternative("lock; addl $0,0(%%esp)", "sfence", X86_FEATURE_XMM)

Does that look right?

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25  7:34   ` Bryan O'Sullivan
@ 2006-02-25 13:28     ` Andi Kleen
  2006-02-25 17:20       ` Bryan O'Sullivan
  2006-02-25 19:01       ` Bryan O'Sullivan
  0 siblings, 2 replies; 44+ messages in thread
From: Andi Kleen @ 2006-02-25 13:28 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, linux-kernel

On Saturday 25 February 2006 08:34, Bryan O'Sullivan wrote:
> On Sat, 2006-02-25 at 05:43 +0100, Andi Kleen wrote:
> > On Saturday 25 February 2006 05:20, Bryan O'Sullivan wrote:
> > > On some platforms, a regular wmb() is not sufficient to guarantee that
> > > PCI writes have been flushed to the bus if write combining is in
> > > effect.
> >
> > On what platforms?
>
> On x86_64 in particular, if CONFIG_UNORDERED_IO is defined. 

I guess you mean undefined.

> Regular 
> wmb() is implemented as a compiler memory barrier then, which isn't
> sufficient for PCI write combining.

It's still totally unclear how you can write portable write combining code.

I think the basic idea is weird. You're basically writing something
that doesn't follow the normal MMIO rules of Linux drivers and you're 
trying to do this portable, which won't work anyways because 
there is no portable way to do this.

Before we can add such a macro I suspect you would first 
need to provide some spec how that "portable write combining"
is supposed to work and get feedback from the other architectures.

Or keep it in architecture specific code, then the generic macro
isn't needed.

-Andi

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25  4:20 [PATCH] Define wc_wmb, a write barrier for PCI write combining Bryan O'Sullivan
  2006-02-25  4:43 ` Andi Kleen
@ 2006-02-25 14:28 ` Benjamin LaHaise
  2006-02-25 17:11   ` Bryan O'Sullivan
  2006-02-28 10:01 ` Jes Sorensen
  2 siblings, 1 reply; 44+ messages in thread
From: Benjamin LaHaise @ 2006-02-25 14:28 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Fri, Feb 24, 2006 at 08:20:50PM -0800, Bryan O'Sullivan wrote:
> On some platforms, a regular wmb() is not sufficient to guarantee that
> PCI writes have been flushed to the bus if write combining is in effect.
> 
> This change introduces a new macro, wc_wmb(), that makes this guarantee.
> 
> It does so by way of a new header file, <linux/system.h>.  This header
> can be a site for oft-replicated code from <asm-*/system.h>, but isn't
> just yet.
> 
> We also define a version of wc_wmb() with the required semantics
> on x86_64.

This does not do what you're trying to accomplish.  sfence has no impact 
on the flushing of the write combining buffers -- they're normally flushed 
after a few cycles of no subsequent merges.  If it is really necessary to 
flush such a write, you will have to do a read from the target PCI device 
(well, bus would do).

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25 14:28 ` Benjamin LaHaise
@ 2006-02-25 17:11   ` Bryan O'Sullivan
  2006-02-25 17:41     ` Benjamin LaHaise
  0 siblings, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-25 17:11 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Sat, 2006-02-25 at 09:28 -0500, Benjamin LaHaise wrote:

> sfence has no impact 
> on the flushing of the write combining buffers

The sfence instruction guarantees that every store that precedes it in
program order is globally visible, including over the likes of the PCI
bus, before any store instruction that follows the fence.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25 13:28     ` Andi Kleen
@ 2006-02-25 17:20       ` Bryan O'Sullivan
  2006-02-25 19:01       ` Bryan O'Sullivan
  1 sibling, 0 replies; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-25 17:20 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel

On Sat, 2006-02-25 at 14:28 +0100, Andi Kleen wrote:

> It's still totally unclear how you can write portable write combining code.
> 
> I think the basic idea is weird. You're basically writing something
> that doesn't follow the normal MMIO rules of Linux drivers and you're 
> trying to do this portable, which won't work anyways because 
> there is no portable way to do this.

This is definitely a problem.  We have an x86-specific hack in our
driver that diddles the MTRRs to make sure that our MMIO space has write
combining enabled.

On other platforms, it looks like write combining, if supported at all,
is implemented in the northbridge.  For those, I think we'd need to mark
the device's mapping as cacheable.

> Before we can add such a macro I suspect you would first 
> need to provide some spec how that "portable write combining"
> is supposed to work and get feedback from the other architectures.
> 
> Or keep it in architecture specific code, then the generic macro
> isn't needed.

OK, thanks for the feedback.

	<b

-- 
Bryan O'Sullivan <bos@pathscale.com>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25 17:11   ` Bryan O'Sullivan
@ 2006-02-25 17:41     ` Benjamin LaHaise
  2006-02-28 17:50       ` Bryan O'Sullivan
  0 siblings, 1 reply; 44+ messages in thread
From: Benjamin LaHaise @ 2006-02-25 17:41 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Sat, Feb 25, 2006 at 09:11:57AM -0800, Bryan O'Sullivan wrote:
> On Sat, 2006-02-25 at 09:28 -0500, Benjamin LaHaise wrote:
> 
> > sfence has no impact 
> > on the flushing of the write combining buffers
> 
> The sfence instruction guarantees that every store that precedes it in
> program order is globally visible, including over the likes of the PCI
> bus, before any store instruction that follows the fence.

That is not the same as saying the write buffers are flushed and wholly 
visible to their destination, it just means that subsequent reads or 
writes will not be reordered prior to the sfence instruction.  It is 
entirely possible that the writes will remain in buffers on the CPU until 
well after the sfence instruction has executed, sfence only affects the 
order in which they become visible on the bus.  If you want to force a 
flush, a read from the PCI device is the only way to accomplish that.

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25 13:28     ` Andi Kleen
  2006-02-25 17:20       ` Bryan O'Sullivan
@ 2006-02-25 19:01       ` Bryan O'Sullivan
  2006-02-28 17:44         ` Jesse Barnes
  1 sibling, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-25 19:01 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel

On Sat, 2006-02-25 at 14:28 +0100, Andi Kleen wrote:

> Before we can add such a macro I suspect you would first 
> need to provide some spec how that "portable write combining"
> is supposed to work and get feedback from the other architectures.

It seems like we'd need a function that tries to enable or disable write
combining on an MMIO memory range.  This would be implemented by arches
that support it, and would fail on others.  Drivers could then try to
enable write combining, and if it failed, either bail, print a warning
message, or do something else appropriate.

So on i386 and x86_64, this function would fiddle with the MTRRs.  On
powerpc, it would either configure the northbridge appropriately or
fail.  On other arches, I don't know enough to say, so the default would
be to fail.

Is this reasonable?  I can code a strawman implementation up, if the
basic idea looks sane.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25  4:20 [PATCH] Define wc_wmb, a write barrier for PCI write combining Bryan O'Sullivan
  2006-02-25  4:43 ` Andi Kleen
  2006-02-25 14:28 ` Benjamin LaHaise
@ 2006-02-28 10:01 ` Jes Sorensen
  2006-02-28 15:42   ` Roland Dreier
  2006-02-28 17:57   ` Bryan O'Sullivan
  2 siblings, 2 replies; 44+ messages in thread
From: Jes Sorensen @ 2006-02-28 10:01 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, Andi Kleen, linux-kernel

>>>>> "Bryan" == Bryan O'Sullivan <bos@pathscale.com> writes:

Bryan> On some platforms, a regular wmb() is not sufficient to
Bryan> guarantee that PCI writes have been flushed to the bus if write
Bryan> combining is in effect.

Bryan> This change introduces a new macro, wc_wmb(), that makes this
Bryan> guarantee.

Bryan,

Could you explain why the current mmiowb() API won't suffice for this?
It seems that this is basically trying to achieve the same thing.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 10:01 ` Jes Sorensen
@ 2006-02-28 15:42   ` Roland Dreier
  2006-02-28 16:08     ` Jes Sorensen
  2006-02-28 17:57   ` Bryan O'Sullivan
  1 sibling, 1 reply; 44+ messages in thread
From: Roland Dreier @ 2006-02-28 15:42 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Bryan O'Sullivan, Andrew Morton, Andi Kleen, linux-kernel

    Jes> Could you explain why the current mmiowb() API won't suffice
    Jes> for this?  It seems that this is basically trying to achieve
    Jes> the same thing.

I don't believe mmiowb() is at all the same thing.  mmiowb() is all
about ordering writes between _different_ CPUs without incurring the
cost of flushing posted writes by issuing a read on the bus.  wc_wmb()
would just act like a true wmb(), even when using write-combining
regions on x86 -- in other words, there would be no cross-CPU synchronization.

 - R.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 15:42   ` Roland Dreier
@ 2006-02-28 16:08     ` Jes Sorensen
  2006-02-28 17:02       ` Roland Dreier
  2006-02-28 17:11       ` Jesse Barnes
  0 siblings, 2 replies; 44+ messages in thread
From: Jes Sorensen @ 2006-02-28 16:08 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Bryan O'Sullivan, Andrew Morton, Andi Kleen, linux-kernel,
	Jesse Barnes

Roland Dreier wrote:
>     Jes> Could you explain why the current mmiowb() API won't suffice
>     Jes> for this?  It seems that this is basically trying to achieve
>     Jes> the same thing.
> 
> I don't believe mmiowb() is at all the same thing.  mmiowb() is all
> about ordering writes between _different_ CPUs without incurring the
> cost of flushing posted writes by issuing a read on the bus.

Not quite correct as far as I understand it. mmiowb() is supposed to
guarantee that writes to MMIO space have completed before continuing.
That of course covers the multi-CPU case, but it should also cover the
write-combining case.

 > wc_wmb()
> would just act like a true wmb(), even when using write-combining
> regions on x86 -- in other words, there would be no cross-CPU synchronization.

I wary of adding yet another variation unless there is a clear
distinction between them that is easy to understandn for driver authors.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 16:08     ` Jes Sorensen
@ 2006-02-28 17:02       ` Roland Dreier
  2006-02-28 17:13         ` Jesse Barnes
  2006-02-28 17:20         ` Jes Sorensen
  2006-02-28 17:11       ` Jesse Barnes
  1 sibling, 2 replies; 44+ messages in thread
From: Roland Dreier @ 2006-02-28 17:02 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Bryan O'Sullivan, Andrew Morton, Andi Kleen, linux-kernel,
	Jesse Barnes

    Jes> Not quite correct as far as I understand it. mmiowb() is
    Jes> supposed to guarantee that writes to MMIO space have
    Jes> completed before continuing.  That of course covers the
    Jes> multi-CPU case, but it should also cover the write-combining
    Jes> case.

I don't believe this is correct.  mmiowb() does not guarantee that
writes have completed -- they may still be pending in a buffer in a
bridge somewhere.  The _only_ effect of mmiowb() is to make sure that
writes which have been ordered between CPUs using some other mechanism
(i.e. a lock) are properly ordered by the rest of the system.  This
only has an effect systems like very large ia64 systems, where (as I
understand it), writes can pass each other on the way to the PCI bus.
In fact, mmiowb() is a NOP on essentially every architecture.

 - R.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 16:08     ` Jes Sorensen
  2006-02-28 17:02       ` Roland Dreier
@ 2006-02-28 17:11       ` Jesse Barnes
  1 sibling, 0 replies; 44+ messages in thread
From: Jesse Barnes @ 2006-02-28 17:11 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Roland Dreier, Bryan O'Sullivan, Andrew Morton, Andi Kleen,
	linux-kernel

On Tuesday, February 28, 2006 8:08 am, Jes Sorensen wrote:
> Roland Dreier wrote:
> >     Jes> Could you explain why the current mmiowb() API won't
> > suffice Jes> for this?  It seems that this is basically trying to
> > achieve Jes> the same thing.
> >
> > I don't believe mmiowb() is at all the same thing.  mmiowb() is all
> > about ordering writes between _different_ CPUs without incurring the
> > cost of flushing posted writes by issuing a read on the bus.
>
> Not quite correct as far as I understand it. mmiowb() is supposed to
> guarantee that writes to MMIO space have completed before continuing.
> That of course covers the multi-CPU case, but it should also cover the
> write-combining case.

It only guarantees that any outstanding writes will hit the device before 
any subsequent writes.  mmiowb() doesn't make any guarantees about when 
the data will actually arrive at the device though.

> I wary of adding yet another variation unless there is a clear
> distinction between them that is easy to understandn for driver
> authors.

I think that's a valid concern, there are so many ill-understood barriers 
floating around; adding another one will create even more confusion.  
Are they all documented somewhere?  Are we sure that we don't have 
duplicates?

At any rate, any new ones we add should be very well documented (I think 
Andi suggested this implicitly when he asked for the semantics to be 
well-defined).

Jesse

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:02       ` Roland Dreier
@ 2006-02-28 17:13         ` Jesse Barnes
  2006-02-28 17:20         ` Jes Sorensen
  1 sibling, 0 replies; 44+ messages in thread
From: Jesse Barnes @ 2006-02-28 17:13 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Jes Sorensen, Bryan O'Sullivan, Andrew Morton, Andi Kleen,
	linux-kernel

On Tuesday, February 28, 2006 9:02 am, Roland Dreier wrote:
>     Jes> Not quite correct as far as I understand it. mmiowb() is
>     Jes> supposed to guarantee that writes to MMIO space have
>     Jes> completed before continuing.  That of course covers the
>     Jes> multi-CPU case, but it should also cover the write-combining
>     Jes> case.
>
> I don't believe this is correct.  mmiowb() does not guarantee that
> writes have completed -- they may still be pending in a buffer in a
> bridge somewhere.  The _only_ effect of mmiowb() is to make sure that
> writes which have been ordered between CPUs using some other mechanism
> (i.e. a lock) are properly ordered by the rest of the system.  This
> only has an effect systems like very large ia64 systems, where (as I
> understand it), writes can pass each other on the way to the PCI bus.
> In fact, mmiowb() is a NOP on essentially every architecture.

I think it could be implemented meaningfully on ppc64, mips64, and 
perhaps some parisc systems, but I don't think their respective 
maintainers have gotten around to that yet.

Anyway, it looks like the write combine ordering Bryan is talking about 
really is a distinct semantic.  Not sure if it's possible (or desirable) 
to overload an existing barrier op to include the semantics he wants.

Jesse

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:02       ` Roland Dreier
  2006-02-28 17:13         ` Jesse Barnes
@ 2006-02-28 17:20         ` Jes Sorensen
  2006-03-01  8:16           ` Jeremy Higdon
  2006-03-01  8:24           ` Jeremy Higdon
  1 sibling, 2 replies; 44+ messages in thread
From: Jes Sorensen @ 2006-02-28 17:20 UTC (permalink / raw)
  To: Roland Dreier
  Cc: Bryan O'Sullivan, Andrew Morton, Andi Kleen, linux-kernel,
	Jesse Barnes

Roland Dreier wrote:
>     Jes> Not quite correct as far as I understand it. mmiowb() is
>     Jes> supposed to guarantee that writes to MMIO space have
>     Jes> completed before continuing.  That of course covers the
>     Jes> multi-CPU case, but it should also cover the write-combining
>     Jes> case.
> 
> I don't believe this is correct.  mmiowb() does not guarantee that
> writes have completed -- they may still be pending in a buffer in a
> bridge somewhere.  The _only_ effect of mmiowb() is to make sure that
> writes which have been ordered between CPUs using some other mechanism
> (i.e. a lock) are properly ordered by the rest of the system.  This
> only has an effect systems like very large ia64 systems, where (as I
> understand it), writes can pass each other on the way to the PCI bus.
> In fact, mmiowb() is a NOP on essentially every architecture.

Hmmmm

That could be, seems like Jesse agrees that it could all be in the
pipeline somewhere. Considering Jesse was responsible for mmiowb() I'll
take his word for it ;-)

In any case, I'd strongly recommend that any new barrier version is
clearly documented. The jungle is very dense already ;(

Cheers,
Jes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25 19:01       ` Bryan O'Sullivan
@ 2006-02-28 17:44         ` Jesse Barnes
  2006-02-28 17:50           ` Roland Dreier
                             ` (2 more replies)
  0 siblings, 3 replies; 44+ messages in thread
From: Jesse Barnes @ 2006-02-28 17:44 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andi Kleen, Andrew Morton, linux-kernel

On Saturday, February 25, 2006 11:01 am, Bryan O'Sullivan wrote:
> On Sat, 2006-02-25 at 14:28 +0100, Andi Kleen wrote:
> > Before we can add such a macro I suspect you would first
> > need to provide some spec how that "portable write combining"
> > is supposed to work and get feedback from the other architectures.
>
> It seems like we'd need a function that tries to enable or disable
> write combining on an MMIO memory range.  This would be implemented by
> arches that support it, and would fail on others.  Drivers could then
> try to enable write combining, and if it failed, either bail, print a
> warning message, or do something else appropriate.
>
> So on i386 and x86_64, this function would fiddle with the MTRRs.  On
> powerpc, it would either configure the northbridge appropriately or
> fail.  On other arches, I don't know enough to say, so the default
> would be to fail.
>
> Is this reasonable?  I can code a strawman implementation up, if the
> basic idea looks sane.

Something like this would be really handy.  Check out fbmem.c:fb_mmap for 
a bad example of what can happen w/o it.

In fact, I think it might make sense to export WC functionality via an 
mmap flag (on an advisory basis since the platform may not support it or 
there may be aliasing issues that prevent it); having an arch 
independent routine to request it would make that addition easy to do in 
generic code.  (In particular I wanted this for the sysfs PCI interface.  
Userspace apps can map PCI resources there and it would be nice if they 
could map them with WC semantics if requested.)

I don't think it addresses the flushing issue you seem to be concerned 
about though.  I don't know the exact semantics of sfence, but I think 
bcrl is likely right that it won't absolutely guarantee that your writes 
have hit the device before proceeding (though it may do that on some CPU 
implementations).

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-25 17:41     ` Benjamin LaHaise
@ 2006-02-28 17:50       ` Bryan O'Sullivan
  2006-02-28 17:58         ` Benjamin LaHaise
  0 siblings, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-28 17:50 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Sat, 2006-02-25 at 12:41 -0500, Benjamin LaHaise wrote:

> That is not the same as saying the write buffers are flushed and wholly 
> visible to their destination, it just means that subsequent reads or 
> writes will not be reordered prior to the sfence instruction.

Right.  I don't need the writes to be visible at the destination device,
in this particular case; a write followed by a barrier just has to show
up at the device before the next write.

Here's what a write to our chip looks like, for sending a packet:

      * write a 64-bit control register that includes the length of the
        segment being written
      * do a barrier to make sure it gets to the device before any other
        writes
      * perform a pile of writes using __iowrite_copy32, not caring
        about the order in which they reach the chip
      * do another barrier
      * perform one final 32-bit write using __raw_writel
      * do one final barrier

The last 32-bit write triggers the chip to put the packet on the wire.
We make sure it happens after the earlier bulk write using a barrier.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:44         ` Jesse Barnes
@ 2006-02-28 17:50           ` Roland Dreier
  2006-02-28 17:50           ` Jesse Barnes
  2006-02-28 17:52           ` Bryan O'Sullivan
  2 siblings, 0 replies; 44+ messages in thread
From: Roland Dreier @ 2006-02-28 17:50 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Bryan O'Sullivan, Andi Kleen, Andrew Morton, linux-kernel

    Jesse> I don't think it addresses the flushing issue you seem to
    Jesse> be concerned about though.  I don't know the exact
    Jesse> semantics of sfence, but I think bcrl is likely right that
    Jesse> it won't absolutely guarantee that your writes have hit the
    Jesse> device before proceeding (though it may do that on some CPU
    Jesse> implementations).

Yeah, I think that Bryan just wrote something different than what he
meant: there is no desire for wc_wmb() to make sure that writes via a
write-combining mapping have gone all the way to the device, any more
than a normal wmb() makes sure normal writes have gone all the way to
the device.  All that wc_wmb() needs to do is make sure that writes
via a write-combining mapping don't get passed by later writes.

This does speak to the need for precise documentation though :)

 - R.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:44         ` Jesse Barnes
  2006-02-28 17:50           ` Roland Dreier
@ 2006-02-28 17:50           ` Jesse Barnes
  2006-02-28 17:52           ` Bryan O'Sullivan
  2 siblings, 0 replies; 44+ messages in thread
From: Jesse Barnes @ 2006-02-28 17:50 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andi Kleen, Andrew Morton, linux-kernel

On Tuesday, February 28, 2006 9:44 am, Jesse Barnes wrote:
> Something like this would be really handy.  Check out fbmem.c:fb_mmap
> for a bad example of what can happen w/o it.
>
> In fact, I think it might make sense to export WC functionality via an
> mmap flag (on an advisory basis since the platform may not support it
> or there may be aliasing issues that prevent it); having an arch
> independent routine to request it would make that addition easy to do
> in generic code.  (In particular I wanted this for the sysfs PCI
> interface. Userspace apps can map PCI resources there and it would be
> nice if they could map them with WC semantics if requested.)

Oh, forgot to mention fallback semantics.  Instead of almost every driver 
doing:
	if (!(iocookie = ioremap_writecombine(addr, size)))
		iocookie = ioremap(addr, size); /* fallback to uncached */

maybe it would be best to have something like

	iocookie = ioremap_wc_or_uc(addr, size);

that tries write combine first but silently falls back to UC if the 
former is impossible (or a new ioremap with flags or priority args or 
whatever).  OTOH some drivers may want to be notified if the WC mapping 
fails?  Just a thought...

Jesse

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:44         ` Jesse Barnes
  2006-02-28 17:50           ` Roland Dreier
  2006-02-28 17:50           ` Jesse Barnes
@ 2006-02-28 17:52           ` Bryan O'Sullivan
  2006-02-28 17:59             ` Jesse Barnes
  2 siblings, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-28 17:52 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Andi Kleen, Andrew Morton, linux-kernel

On Tue, 2006-02-28 at 09:44 -0800, Jesse Barnes wrote:

> Something like this would be really handy.  Check out fbmem.c:fb_mmap for 
> a bad example of what can happen w/o it.

:-)

> In fact, I think it might make sense to export WC functionality via an 
> mmap flag (on an advisory basis since the platform may not support it or 
> there may be aliasing issues that prevent it); having an arch 
> independent routine to request it would make that addition easy to do in 
> generic code.

Yes.

>   (In particular I wanted this for the sysfs PCI interface.  
> Userspace apps can map PCI resources there and it would be nice if they 
> could map them with WC semantics if requested.)

They already sort of can.  It just happens that most arches ignore the
WC parameters.

> I don't think it addresses the flushing issue you seem to be concerned 
> about though.

Yes, I think I could have made my original wording a bit clearer.  I
don't care if writes have hit the device, merely that they do so in an
order that I control.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 10:01 ` Jes Sorensen
  2006-02-28 15:42   ` Roland Dreier
@ 2006-02-28 17:57   ` Bryan O'Sullivan
  2006-02-28 18:07     ` linux-os (Dick Johnson)
  2006-03-01 10:45     ` Jes Sorensen
  1 sibling, 2 replies; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-28 17:57 UTC (permalink / raw)
  To: Jes Sorensen; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Tue, 2006-02-28 at 05:01 -0500, Jes Sorensen wrote:

> Could you explain why the current mmiowb() API won't suffice for this?
> It seems that this is basically trying to achieve the same thing.

It's a no-op on every arch I care about:

#define mmiowb()

Which makes it useless.  Also, based on the comments in the qla driver,
mmiowb() seems to have inter-CPU ordering semantics that I don't want.
I'm thus hesitant to appropriate it for my needs.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:50       ` Bryan O'Sullivan
@ 2006-02-28 17:58         ` Benjamin LaHaise
  2006-02-28 18:20           ` Bryan O'Sullivan
  2006-02-28 18:22           ` Christopher Friesen
  0 siblings, 2 replies; 44+ messages in thread
From: Benjamin LaHaise @ 2006-02-28 17:58 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Tue, Feb 28, 2006 at 09:50:08AM -0800, Bryan O'Sullivan wrote:
> The last 32-bit write triggers the chip to put the packet on the wire.
> We make sure it happens after the earlier bulk write using a barrier.

The barrier you're looking for is wmb() in asm/system.h, which is defined 
on both SMP and UP.  On x86 you do not need the sfence as writes will show 
up on the bus in program order, but you do need wmb() to prevent gcc from 
reordering your code.  Adding a new primative only makes things slower as 
the sfence isn't necessary most of the time (it rightly has a dependancy on 
CONFIG_UNORDERED_IO, which the jury is still out on).

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:52           ` Bryan O'Sullivan
@ 2006-02-28 17:59             ` Jesse Barnes
  0 siblings, 0 replies; 44+ messages in thread
From: Jesse Barnes @ 2006-02-28 17:59 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andi Kleen, Andrew Morton, linux-kernel

On Tuesday, February 28, 2006 9:52 am, Bryan O'Sullivan wrote:
> >   (In particular I wanted this for the sysfs PCI interface.
> > Userspace apps can map PCI resources there and it would be nice if
> > they could map them with WC semantics if requested.)
>
> They already sort of can.  It just happens that most arches ignore the
> WC parameters.

Only on a per-driver basis though at this point, right?  The sysfs code 
is generic across architectures and exports PCI resources independently 
from the driver, so it needs some way of letting the user communicate 
that they want WC behavior.

> > I don't think it addresses the flushing issue you seem to be
> > concerned about though.
>
> Yes, I think I could have made my original wording a bit clearer.  I
> don't care if writes have hit the device, merely that they do so in an
> order that I control.

Ah, ok.  In that case maybe mmiowb *is* what you want.  Or at the very 
least mmiowcb :).

Jesse

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:57   ` Bryan O'Sullivan
@ 2006-02-28 18:07     ` linux-os (Dick Johnson)
  2006-02-28 18:24       ` Christopher Friesen
  2006-03-01 10:45     ` Jes Sorensen
  1 sibling, 1 reply; 44+ messages in thread
From: linux-os (Dick Johnson) @ 2006-02-28 18:07 UTC (permalink / raw)
  To: Bryan O'Sullivan
  Cc: Jes Sorensen, Andrew Morton, Andi Kleen, linux-kernel


On Tue, 28 Feb 2006, Bryan O'Sullivan wrote:

> On Tue, 2006-02-28 at 05:01 -0500, Jes Sorensen wrote:
>
>> Could you explain why the current mmiowb() API won't suffice for this?
>> It seems that this is basically trying to achieve the same thing.
>
> It's a no-op on every arch I care about:
>
> #define mmiowb()
>
> Which makes it useless.  Also, based on the comments in the qla driver,
> mmiowb() seems to have inter-CPU ordering semantics that I don't want.
> I'm thus hesitant to appropriate it for my needs.
>
> 	<b
>

When accessing PCI, you can cause any/all write combinations to
occur and any/all pending writes to get written to the devices
simply by executing a read. If the code requires that all previous
writes be written NOW, then the code should not hide that in
some macro. It should issue a read in its PCI space. Also, the
PCI bus is a FIFO. Nothing gets reordered. Everything will
get to the devices in the order written, but a byte or word on
a longword boundary may be subject to write-combining if all
the components are present in the FIFO.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.15.4 on an i686 machine (5589.54 BogoMips).
Warning : 98.36% of all statistics are fiction, book release in April.
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:58         ` Benjamin LaHaise
@ 2006-02-28 18:20           ` Bryan O'Sullivan
  2006-02-28 19:03             ` Benjamin LaHaise
  2006-02-28 18:22           ` Christopher Friesen
  1 sibling, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-28 18:20 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Tue, 2006-02-28 at 12:58 -0500, Benjamin LaHaise wrote:
> On Tue, Feb 28, 2006 at 09:50:08AM -0800, Bryan O'Sullivan wrote:
> > The last 32-bit write triggers the chip to put the packet on the wire.
> > We make sure it happens after the earlier bulk write using a barrier.
> 
> The barrier you're looking for is wmb() in asm/system.h, which is defined 
> on both SMP and UP.

No.  We're writing to a region that we've marked as write combining, so
the processor or north bridge will not write in program order.  It's
free to write out the write combining store buffers in whatever order it
feels like, unless forced to do otherwise.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:58         ` Benjamin LaHaise
  2006-02-28 18:20           ` Bryan O'Sullivan
@ 2006-02-28 18:22           ` Christopher Friesen
  1 sibling, 0 replies; 44+ messages in thread
From: Christopher Friesen @ 2006-02-28 18:22 UTC (permalink / raw)
  To: Benjamin LaHaise
  Cc: Bryan O'Sullivan, Andrew Morton, Andi Kleen, linux-kernel

Benjamin LaHaise wrote:
> On Tue, Feb 28, 2006 at 09:50:08AM -0800, Bryan O'Sullivan wrote:
> 
>>The last 32-bit write triggers the chip to put the packet on the wire.
>>We make sure it happens after the earlier bulk write using a barrier.
> 
> 
> The barrier you're looking for is wmb() in asm/system.h, which is defined 
> on both SMP and UP.

That will synchronize with other CPUs as well, which may not necessarily 
be needed.

On PPC for instance, you could implement the desired semantics using 
"eieio" (enforce in-order execution of IO).  This is lighter weight than 
a full "sync", which is what wmb() maps to.

Chris

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 18:07     ` linux-os (Dick Johnson)
@ 2006-02-28 18:24       ` Christopher Friesen
  0 siblings, 0 replies; 44+ messages in thread
From: Christopher Friesen @ 2006-02-28 18:24 UTC (permalink / raw)
  To: linux-os (Dick Johnson)
  Cc: Bryan O'Sullivan, Jes Sorensen, Andrew Morton, Andi Kleen,
	linux-kernel

linux-os (Dick Johnson) wrote:

> Also, the
> PCI bus is a FIFO. Nothing gets reordered.

The CPU itself can reorder the write before it hits the PCI bus.  I 
don't think x86 will do this, but others will.

Chris

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 18:20           ` Bryan O'Sullivan
@ 2006-02-28 19:03             ` Benjamin LaHaise
  2006-02-28 19:20               ` Bryan O'Sullivan
  0 siblings, 1 reply; 44+ messages in thread
From: Benjamin LaHaise @ 2006-02-28 19:03 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Tue, Feb 28, 2006 at 10:20:14AM -0800, Bryan O'Sullivan wrote:
> No.  We're writing to a region that we've marked as write combining, so
> the processor or north bridge will not write in program order.  It's
> free to write out the write combining store buffers in whatever order it
> feels like, unless forced to do otherwise.

The rules are a bit more complex than that -- write are weakly ordered, 
but still ordered.  They maybe be delayed with respect to other stores, 
but will never occur before other stores are visible to the cache 
heirarchy.  In terms of how writes to the write combining region are 
delayed, you'll have to look at the addresses of the registers you are 
writing to.  If they map to the same write combining buffer (that is each 
one can combine with the previous write) and are increasing in address, 
then you don't need the explicite barrier.  Most hardware is laid out so 
that this is possible.

The case the write combining buffers affect memory ordering in an 'unexpected' 
way is if your writes combine and you write to registers in an order that 
is opposite from that in which they combine.  Ie, a write to address 8 
followed by a write to address 0 that combines will show up on the bus 
as 0, 8 (assuming an 8 byte writes at 0).  Besides that, the write combining 
buffers can introduce a delay of a few clocks while the cpu defers the write 
in the hope that it will combine with another write, but that delay applies 
to all writes that go through the write combining buffers and thus do not 
change the memory ordering (except as previously noted).

Memory barriers are not cheap.  At least for the example you provided, 
it looks like things are overdone and performance is going to suck, so 
it needs to be avoided if at all possible.  I really think that you 
should be using wmb().

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 19:03             ` Benjamin LaHaise
@ 2006-02-28 19:20               ` Bryan O'Sullivan
  2006-02-28 19:33                 ` Andi Kleen
  2006-02-28 19:34                 ` Benjamin LaHaise
  0 siblings, 2 replies; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-28 19:20 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Tue, 2006-02-28 at 14:03 -0500, Benjamin LaHaise wrote:

> Memory barriers are not cheap.  At least for the example you provided, 
> it looks like things are overdone and performance is going to suck, so 
> it needs to be avoided if at all possible.

There's more to it than that :-)

We added the memory barrier to *improve* performance, in addition to
helping correctness and portability.  Without it, the CPU or north
bridge is free to hold onto the pending writes for a while; the exact
details of how long it will wait depend on the CPU and NB
implementation, but on AMD64 CPUs the delay is up to 16 cycles.

So if we just use wmb(), we incur a 16-cycle penalty on every packet we
send.  This has a deleterious and measurable effect on performance.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 19:20               ` Bryan O'Sullivan
@ 2006-02-28 19:33                 ` Andi Kleen
  2006-02-28 19:44                   ` Bryan O'Sullivan
  2006-03-01 19:20                   ` Bryan O'Sullivan
  2006-02-28 19:34                 ` Benjamin LaHaise
  1 sibling, 2 replies; 44+ messages in thread
From: Andi Kleen @ 2006-02-28 19:33 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Benjamin LaHaise, Andrew Morton, linux-kernel

On Tuesday 28 February 2006 20:20, Bryan O'Sullivan wrote:

> We added the memory barrier to *improve* performance, in addition to
> helping correctness and portability.  Without it, the CPU or north
> bridge is free to hold onto the pending writes for a while; the exact
> details of how long it will wait depend on the CPU and NB
> implementation, but on AMD64 CPUs the delay is up to 16 cycles.

Are you sure you used the right instruction? Normally CLFLUSH is used
for such things, not a write barrier which really only changes ordering.
The documentation is not fully clear, but it sounds like it could
apply to the store buffers too.

Anyways if MFENCE improved performance you're probably relying
on some very specific artifact of the microarchitecture of your 
CPU or Northbridge. I don't think it's a architecurally guaranteed
feature.
 
-Andi

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 19:20               ` Bryan O'Sullivan
  2006-02-28 19:33                 ` Andi Kleen
@ 2006-02-28 19:34                 ` Benjamin LaHaise
  1 sibling, 0 replies; 44+ messages in thread
From: Benjamin LaHaise @ 2006-02-28 19:34 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, Andi Kleen, linux-kernel

On Tue, Feb 28, 2006 at 11:20:24AM -0800, Bryan O'Sullivan wrote:
> So if we just use wmb(), we incur a 16-cycle penalty on every packet we
> send.  This has a deleterious and measurable effect on performance.

sfence doesn't guarantee that the write will be flushed, especially if 
the chipset gets involved.  The only way to do that is the same way any 
pci write can be flushed, which is to read from a register on the device 
in question.

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 19:33                 ` Andi Kleen
@ 2006-02-28 19:44                   ` Bryan O'Sullivan
  2006-03-01 19:20                   ` Bryan O'Sullivan
  1 sibling, 0 replies; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-02-28 19:44 UTC (permalink / raw)
  To: Benjamin LaHaise, Andi Kleen
  Cc: Benjamin LaHaise, Andrew Morton, linux-kernel

On Tue, 2006-02-28 at 20:33 +0100, Andi Kleen wrote:

> Are you sure you used the right instruction? Normally CLFLUSH is used
> for such things, not a write barrier which really only changes ordering.

Hmm.  It's possible we're just getting lucky because another write
happens somewhere else soon after the last write we perform as part of a
packet send.  Perhaps, for complete correctness, we should be using
CLFLUSH instead of the last barrier.  I'll have to look into this.

Thanks,

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:20         ` Jes Sorensen
@ 2006-03-01  8:16           ` Jeremy Higdon
  2006-03-01  8:24           ` Jeremy Higdon
  1 sibling, 0 replies; 44+ messages in thread
From: Jeremy Higdon @ 2006-03-01  8:16 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Roland Dreier, Bryan O'Sullivan, Andrew Morton, Andi Kleen,
	linux-kernel, Jesse Barnes

On Tue, Feb 28, 2006 at 06:20:32PM +0100, Jes Sorensen wrote:
> Roland Dreier wrote:
> >    Jes> Not quite correct as far as I understand it. mmiowb() is
> >    Jes> supposed to guarantee that writes to MMIO space have
> >    Jes> completed before continuing.  That of course covers the
> >    Jes> multi-CPU case, but it should also cover the write-combining
> >    Jes> case.
> >
> >I don't believe this is correct.  mmiowb() does not guarantee that
> >writes have completed -- they may still be pending in a buffer in a
> >bridge somewhere.  The _only_ effect of mmiowb() is to make sure that
> >writes which have been ordered between CPUs using some other mechanism
> >(i.e. a lock) are properly ordered by the rest of the system.  This
> >only has an effect systems like very large ia64 systems, where (as I
> >understand it), writes can pass each other on the way to the PCI bus.
> >In fact, mmiowb() is a NOP on essentially every architecture.
> 
> Hmmmm
> 
> That could be, seems like Jesse agrees that it could all be in the
> pipeline somewhere. Considering Jesse was responsible for mmiowb() I'll
> take his word for it ;-)

Right.  On the Altix, the mmiowb ensures that the write is into a
FIFO on the destination node (node I/O device is attached to), so
that later writes from other nodes will be ordered after it.  But
it doesn't actually force it to the bus.  That's one reason why it's
so much quicker than a using a read for ordering.

jeremy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:20         ` Jes Sorensen
  2006-03-01  8:16           ` Jeremy Higdon
@ 2006-03-01  8:24           ` Jeremy Higdon
  1 sibling, 0 replies; 44+ messages in thread
From: Jeremy Higdon @ 2006-03-01  8:24 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Roland Dreier, Bryan O'Sullivan, Andrew Morton, Andi Kleen,
	linux-kernel, Jesse Barnes

On Tue, Feb 28, 2006 at 06:20:32PM +0100, Jes Sorensen wrote:
> Roland Dreier wrote:
> >    Jes> Not quite correct as far as I understand it. mmiowb() is
> >    Jes> supposed to guarantee that writes to MMIO space have
> >    Jes> completed before continuing.  That of course covers the
> >    Jes> multi-CPU case, but it should also cover the write-combining
> >    Jes> case.
> >
> >I don't believe this is correct.  mmiowb() does not guarantee that
> >writes have completed -- they may still be pending in a buffer in a
> >bridge somewhere.  The _only_ effect of mmiowb() is to make sure that
> >writes which have been ordered between CPUs using some other mechanism
> >(i.e. a lock) are properly ordered by the rest of the system.  This
> >only has an effect systems like very large ia64 systems, where (as I
> >understand it), writes can pass each other on the way to the PCI bus.
> >In fact, mmiowb() is a NOP on essentially every architecture.
> 
> Hmmmm
> 
> That could be, seems like Jesse agrees that it could all be in the
> pipeline somewhere. Considering Jesse was responsible for mmiowb() I'll
> take his word for it ;-)
> 
> In any case, I'd strongly recommend that any new barrier version is
> clearly documented. The jungle is very dense already ;(


I wonder if wc_wmb() is the best name.

mmiowb expands to memory-mapped I/O write barrier which more or less
describes what it does, whereas wc_wmb expands (I'm guessing) to
write-combine write memory barrier.  But it's for mmio writes.

Also, the wmb() does not actually "guarantee that PCI writes have been
flushed to the bus", at least on IA64.  Even for memory transactions,
it only guarantees ordering on IA64, much like mmiowb does for mmio
transactions.

jeremy

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 17:57   ` Bryan O'Sullivan
  2006-02-28 18:07     ` linux-os (Dick Johnson)
@ 2006-03-01 10:45     ` Jes Sorensen
  2006-03-01 17:04       ` Roland Dreier
  1 sibling, 1 reply; 44+ messages in thread
From: Jes Sorensen @ 2006-03-01 10:45 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andrew Morton, Andi Kleen, linux-kernel

Bryan O'Sullivan wrote:
> On Tue, 2006-02-28 at 05:01 -0500, Jes Sorensen wrote:
> 
> 
>>Could you explain why the current mmiowb() API won't suffice for this?
>>It seems that this is basically trying to achieve the same thing.
> 
> 
> It's a no-op on every arch I care about:
> 
> #define mmiowb()
> 
> Which makes it useless.  Also, based on the comments in the qla driver,
> mmiowb() seems to have inter-CPU ordering semantics that I don't want.
> I'm thus hesitant to appropriate it for my needs.

The fact that it's a no-op may simply be because nobody on a specific
arch got to the point where it made sense to define it yet.

Anyway, based on Jesse and Jeremy's comments, then maybe the semantics
here are different. However I do think the name wc_wmb() isn't quite
defining it. If it's only to be used on mmio space, something like
mmio_wc_wmb() would probably be more descriptive.

Cheers,
Jes

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-03-01 10:45     ` Jes Sorensen
@ 2006-03-01 17:04       ` Roland Dreier
  0 siblings, 0 replies; 44+ messages in thread
From: Roland Dreier @ 2006-03-01 17:04 UTC (permalink / raw)
  To: Jes Sorensen
  Cc: Bryan O'Sullivan, Andrew Morton, Andi Kleen, linux-kernel

    Jes> Anyway, based on Jesse and Jeremy's comments, then maybe the
    Jes> semantics here are different. However I do think the name
    Jes> wc_wmb() isn't quite defining it. If it's only to be used on
    Jes> mmio space, something like mmio_wc_wmb() would probably be
    Jes> more descriptive.

I don't see any restriction to MMIO space -- it would be a strange
thing to do, but conceivable one might want to order writes that are
made to ordinary RAM via a write-combining mapping.

 - R.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-02-28 19:33                 ` Andi Kleen
  2006-02-28 19:44                   ` Bryan O'Sullivan
@ 2006-03-01 19:20                   ` Bryan O'Sullivan
  2006-03-01 19:27                     ` Andi Kleen
  1 sibling, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-03-01 19:20 UTC (permalink / raw)
  To: Andi Kleen, Benjamin LaHaise; +Cc: linux-kernel

On Tue, 2006-02-28 at 20:33 +0100, Andi Kleen wrote:

> Anyways if MFENCE improved performance you're probably relying
> on some very specific artifact of the microarchitecture of your 
> CPU or Northbridge. I don't think it's a architecurally guaranteed
> feature.

I looked this up, and you appear to be wrong here.

Here's the appropriate quote from page 246 of the PDF of "AMD64
Architecture Programmer's Manual Volume 2: System Programming":

        http://www.amd.com/us-en/assets/content_type/DownloadableAssets/dwamd_24593.pdf

Section 7.4.1 specifically describes what happens to write buffers:

        [...] the processor completely empties the write buffer by
        writing the contents to memory as a result of performing any of
        the following operations:
        
        SFENCE Instruction
        Executing a store-fence (SFENCE) instruction forces all memory
        writes before the SFENCE (in program order) to be written into
        memory before memory writes that follow the SFENCE instruction.
        The memory-fence (MFENCE) instruction has a similar effect, but
        it forces the ordering of loads in addition to stores.
        [...]

So in fact SFENCE is the appropriate, architecturally guaranteed, thing
for us to be doing on x86_64.

With respect to Ben's contention that wmb() will suffice instead, that
isn't true, either, even on x86-class hardware.  The writes absolutely
travel over the HT bus in non-ascending order on AMD64 systems unless we
fence them, and we've verified this using a HT bus analyser.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-03-01 19:20                   ` Bryan O'Sullivan
@ 2006-03-01 19:27                     ` Andi Kleen
  2006-03-01 19:43                       ` Bryan O'Sullivan
  0 siblings, 1 reply; 44+ messages in thread
From: Andi Kleen @ 2006-03-01 19:27 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Benjamin LaHaise, linux-kernel

On Wednesday 01 March 2006 20:20, Bryan O'Sullivan wrote:

> 
>         [...] the processor completely empties the write buffer by
>         writing the contents to memory as a result of performing any of
>         the following operations:
>         
>         SFENCE Instruction
>         Executing a store-fence (SFENCE) instruction forces all memory
>         writes before the SFENCE (in program order) to be written into
>         memory before memory writes that follow the SFENCE instruction.
>         The memory-fence (MFENCE) instruction has a similar effect, but
>         it forces the ordering of loads in addition to stores.
>         [...]
> 
> So in fact SFENCE is the appropriate, architecturally guaranteed, thing
> for us to be doing on x86_64.

I don't interpret it as being a full synchronous write. It's just a barrier
preventing reordering.  So the writes before could be in theory stuck
forever in some buffer - it just means they won't be later than the writes
after the fence.

Implementing the fences in the way your're suggesting would be very costly
because it could make them potentially stall for thousands of cycles.

I don't have a quote handy right now but volume 3 of the Intel/AMD manuals
have own chapters on the memory ordering rules elaborating on this in much more
detail.

-Andi

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-03-01 19:27                     ` Andi Kleen
@ 2006-03-01 19:43                       ` Bryan O'Sullivan
  2006-03-01 19:49                         ` Andi Kleen
  0 siblings, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-03-01 19:43 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Benjamin LaHaise, linux-kernel

On Wed, 2006-03-01 at 20:27 +0100, Andi Kleen wrote:

> I don't interpret it as being a full synchronous write.

Nor do I.  We don't expect or need the write to make it all the way to
the device.  We already do config space reads for the tiny handful of
non-performance-critical cases where we need to know that a write has in
fact made it to the device.

>  It's just a barrier
> preventing reordering.  So the writes before could be in theory stuck
> forever in some buffer - it just means they won't be later than the writes
> after the fence.

At worst, they'll be off the CPU (guaranteed per spec, as I already
quoted for you), and on the northbridge.  It might hold on to them for a
bit in turn, but there's nothing we can do about that except a config
space read, and we don't need to guarantee that the write has completely
gone to the device.  Real northbridges just don't buffer writes for very
long.

> Implementing the fences in the way your're suggesting would be very costly
> because it could make them potentially stall for thousands of cycles.

But it *doesn't*.  On existing CPUs and systems, today, the phantom
worse-case semantics you are conjuring up simply do not exist.  If
someone builds such an asinine system, the right approach is to handle
it once it exists.

	<b

-- 
Bryan O'Sullivan <bos@pathscale.com>


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-03-01 19:43                       ` Bryan O'Sullivan
@ 2006-03-01 19:49                         ` Andi Kleen
  2006-03-01 20:05                           ` Bryan O'Sullivan
  0 siblings, 1 reply; 44+ messages in thread
From: Andi Kleen @ 2006-03-01 19:49 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Benjamin LaHaise, linux-kernel

On Wednesday 01 March 2006 20:43, Bryan O'Sullivan wrote:

> > Implementing the fences in the way your're suggesting would be very costly
> > because it could make them potentially stall for thousands of cycles.
> 
> But it *doesn't*.  On existing CPUs and systems, today, the phantom
> worse-case semantics you are conjuring up simply do not exist.  If
> someone builds such an asinine system, the right approach is to handle
> it once it exists.

Normally we write code to the defined architecture.

Relying on undocumented side effects of instructions as you're trying
to do here is not very reliable and would likely cause breakage later.

Especially not for encoding it in the general Linux interface.

-Andi

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-03-01 19:49                         ` Andi Kleen
@ 2006-03-01 20:05                           ` Bryan O'Sullivan
  2006-03-01 20:26                             ` Benjamin LaHaise
  0 siblings, 1 reply; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-03-01 20:05 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Benjamin LaHaise, linux-kernel

On Wed, 2006-03-01 at 20:49 +0100, Andi Kleen wrote:

> Relying on undocumented side effects of instructions as you're trying
> to do here is not very reliable and would likely cause breakage later.

I just quoted you the precise relevant semantics of sfence two messages
earlier, from the AMD64 optimisation guide.  Here's the same thing from
the EM64T optimisation guide.  See page 444 of the following PDF:

        ftp://download.intel.com/design/Pentium4/manuals/25366818.pdf

Here's the relevant quote:

        Writes may be delayed and combined in the write combining buffer
        (WC buffer) to reduce memory accesses. If the WC buffer is
        partially filled, the writes may be delayed until the next
        occurrence of a serializing event; such as, an SFENCE or MFENCE
        instruction, [...]
        
If you read section 10.3.1 of the same document (page 446), you'll see
that a CLFLUSH (as you suggested earlier) won't even work to get the WC
buffers to flush, because they're not part of the regular cache
hierarchy.

This section also makes it clear yet again that wmb() is absolutely not
sufficient to get program store order semantics in the presence of WC;
you *have* to use an explicit synchronising instruction of some kind.

This is pretty frustrating.  I'm quoting the manufacturer documentation
at you, and you're handwaving back at me with complete nonsense.

	<b


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-03-01 20:05                           ` Bryan O'Sullivan
@ 2006-03-01 20:26                             ` Benjamin LaHaise
  2006-03-01 20:35                               ` Bryan O'Sullivan
  0 siblings, 1 reply; 44+ messages in thread
From: Benjamin LaHaise @ 2006-03-01 20:26 UTC (permalink / raw)
  To: Bryan O'Sullivan; +Cc: Andi Kleen, linux-kernel

On Wed, Mar 01, 2006 at 12:05:07PM -0800, Bryan O'Sullivan wrote:
> This section also makes it clear yet again that wmb() is absolutely not
> sufficient to get program store order semantics in the presence of WC;
> you *have* to use an explicit synchronising instruction of some kind.

The semantics your code seems to care about are not those of a memory 
barrier (which deals with ordering), but of a flush of the write combining 
buffers.  That's an important high level distinction as they are implemented 
differently across architectures.  Please rename the macro something like 
flush_wc() and document it as such, at which point I remove my objection.

		-ben
-- 
"Ladies and gentlemen, I'm sorry to interrupt, but the police are here 
and they've asked us to stop the party."  Don't Email: <dont@kvack.org>.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [PATCH] Define wc_wmb, a write barrier for PCI write combining
  2006-03-01 20:26                             ` Benjamin LaHaise
@ 2006-03-01 20:35                               ` Bryan O'Sullivan
  0 siblings, 0 replies; 44+ messages in thread
From: Bryan O'Sullivan @ 2006-03-01 20:35 UTC (permalink / raw)
  To: Benjamin LaHaise; +Cc: Andi Kleen, linux-kernel

On Wed, 2006-03-01 at 15:26 -0500, Benjamin LaHaise wrote:

> Please rename the macro something like 
> flush_wc() and document it as such, at which point I remove my objection.

Thanks.  That's a useful suggestion; I don't want to give the thing a
misleading name.

	<b

-- 
Bryan O'Sullivan <bos@pathscale.com>


^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2006-03-01 20:35 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-02-25  4:20 [PATCH] Define wc_wmb, a write barrier for PCI write combining Bryan O'Sullivan
2006-02-25  4:43 ` Andi Kleen
2006-02-25  7:34   ` Bryan O'Sullivan
2006-02-25 13:28     ` Andi Kleen
2006-02-25 17:20       ` Bryan O'Sullivan
2006-02-25 19:01       ` Bryan O'Sullivan
2006-02-28 17:44         ` Jesse Barnes
2006-02-28 17:50           ` Roland Dreier
2006-02-28 17:50           ` Jesse Barnes
2006-02-28 17:52           ` Bryan O'Sullivan
2006-02-28 17:59             ` Jesse Barnes
2006-02-25 14:28 ` Benjamin LaHaise
2006-02-25 17:11   ` Bryan O'Sullivan
2006-02-25 17:41     ` Benjamin LaHaise
2006-02-28 17:50       ` Bryan O'Sullivan
2006-02-28 17:58         ` Benjamin LaHaise
2006-02-28 18:20           ` Bryan O'Sullivan
2006-02-28 19:03             ` Benjamin LaHaise
2006-02-28 19:20               ` Bryan O'Sullivan
2006-02-28 19:33                 ` Andi Kleen
2006-02-28 19:44                   ` Bryan O'Sullivan
2006-03-01 19:20                   ` Bryan O'Sullivan
2006-03-01 19:27                     ` Andi Kleen
2006-03-01 19:43                       ` Bryan O'Sullivan
2006-03-01 19:49                         ` Andi Kleen
2006-03-01 20:05                           ` Bryan O'Sullivan
2006-03-01 20:26                             ` Benjamin LaHaise
2006-03-01 20:35                               ` Bryan O'Sullivan
2006-02-28 19:34                 ` Benjamin LaHaise
2006-02-28 18:22           ` Christopher Friesen
2006-02-28 10:01 ` Jes Sorensen
2006-02-28 15:42   ` Roland Dreier
2006-02-28 16:08     ` Jes Sorensen
2006-02-28 17:02       ` Roland Dreier
2006-02-28 17:13         ` Jesse Barnes
2006-02-28 17:20         ` Jes Sorensen
2006-03-01  8:16           ` Jeremy Higdon
2006-03-01  8:24           ` Jeremy Higdon
2006-02-28 17:11       ` Jesse Barnes
2006-02-28 17:57   ` Bryan O'Sullivan
2006-02-28 18:07     ` linux-os (Dick Johnson)
2006-02-28 18:24       ` Christopher Friesen
2006-03-01 10:45     ` Jes Sorensen
2006-03-01 17:04       ` Roland Dreier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).