linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] I/O space write barrier
@ 2004-10-21 23:13 Jesse Barnes
  2004-10-21 23:17 ` [PATCH] use mmiowb in qla1280.c Jesse Barnes
                   ` (2 more replies)
  0 siblings, 3 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-21 23:13 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: tony.luck, linux-ia64

[-- Attachment #1: Type: text/plain, Size: 2430 bytes --]

Here it is again, updated to apply against the BK tree as of a few minutes 
ago.  Patches to use the new routine in tg3 and qla1280 to follow.  Boot 
tested on Altix w/the tg3 and qla1280 bits applied.

On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to 
I/O space aren't ordered coming from different CPUs.  For the most part, this 
isn't a problem since drivers generally spinlock around code that does writeX 
calls, but if the last operation a driver does before it releases a lock is a 
write and some other CPU takes the lock and immediately does a write, it's 
possible the second CPU's write could arrive before the first's.

This patch adds a mmiowb() call to deal with this sort of situation, and 
adds some documentation describing I/O ordering issues to deviceiobook.tmpl.  
The idea is to mirror the regular, cacheable memory barrier operation, wmb.  
Example of the problem this new macro solves:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

In this case, newval2 could be written to ring_ptr before newval.  Fixing it 
is easy though:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  mmiowb();
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Note that this doesn't address a related case where the driver may want to 
actually make a given write get to the device before proceeding.  This should 
be dealt with by immediately reading a register from the card that has no 
side effects.  According to the PCI spec, that will guarantee that all writes 
have arrived before being sent to the target bus.  If no such register is 
available (in the case of card resets perhaps), reading from config space is 
sufficient (though it may return all ones if the card isn't responding to 
read cycles).  I've tried to describe how mmiowb() differs from PCI posted 
write flushing in the patch to deviceiobook.tmpl.

Signed-off-by: Jesse Barnes <jbarnes@sgi.com>

Thanks,
Jesse

[-- Attachment #2: mmiowb-7.patch --]
[-- Type: text/plain, Size: 18553 bytes --]

===== Documentation/DocBook/deviceiobook.tmpl 1.4 vs edited =====
--- 1.4/Documentation/DocBook/deviceiobook.tmpl	2004-02-03 21:31:10 -08:00
+++ edited/Documentation/DocBook/deviceiobook.tmpl	2004-10-21 15:51:32 -07:00
@@ -147,8 +147,7 @@
 	compiler is not permitted to reorder the I/O sequence. When the 
 	ordering can be compiler optimised, you can use <function>
 	__readb</function> and friends to indicate the relaxed ordering. Use 
-	this with care. The <function>rmb</function> provides a read memory 
-	barrier. The <function>wmb</function> provides a write memory barrier.
+	this with care.
       </para>
 
       <para>
@@ -159,10 +158,72 @@
 	asynchronously. A driver author must issue a read from the same
 	device to ensure that writes have occurred in the specific cases the
 	author cares. This kind of property cannot be hidden from driver
-	writers in the API.
+	writers in the API.  In some cases, the read used to flush the device
+	may be expected to fail (if the card is resetting, for example).  In
+	that case, the read should be done from config space, which is
+	guaranteed to soft-fail if the card doesn't respond.
       </para>
 
       <para>
+	The following is an example of flushing a write to a device when
+	the driver would like to ensure the write's effects are visible prior
+	to continuing execution.
+      </para>
+
+<programlisting>
+static inline void
+qla1280_disable_intrs(struct scsi_qla_host *ha)
+{
+	struct device_reg *reg;
+
+	reg = ha->iobase;
+	/* disable risc and host interrupts */
+	WRT_REG_WORD(&amp;reg->ictrl, 0);
+	/*
+	 * The following read will ensure that the above write
+	 * has been received by the device before we return from this
+	 * function.
+	 */
+	RD_REG_WORD(&amp;reg->ictrl);
+	ha->flags.ints_enabled = 0;
+}
+</programlisting>
+
+      <para>
+	In addition to write posting, on some large multiprocessing systems
+	(e.g. SGI Challenge, Origin and Altix machines) posted writes won't
+	be strongly ordered coming from different CPUs.  Thus it's important
+	to properly protect parts of your driver that do memory-mapped writes
+	with locks and use the <function>mmiowb</function> to make sure they
+	arrive in the order intended.
+      </para>
+
+      <para>
+	Generally, one should use <function>mmiowb</function> prior to
+	releasing a spinlock that protects regions using <function>writeb
+	</function> or similar functions that aren't surrounded by <function>
+	readb</function> calls, which will ensure ordering and flushing.  The
+	following example (again from qla1280.c) illustrates its use.
+      </para>
+
+<programlisting>
+       sp->flags |= SRB_SENT;
+       ha->actthreads++;
+       WRT_REG_WORD(&amp;reg->mailbox4, ha->req_ring_index);
+
+       /*
+        * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+        * of the request queue in register is ordered ahead of writes issued
+        * after this one by other CPUs.  Access to the register is protected
+        * by the host_lock.  Without the mmiowb, however, it is possible for
+        * this CPU to release the host lock, another CPU acquire the host lock,
+        * and write to the request queue in, and have the second write make it
+        * to the chip first.
+        */
+       mmiowb(); /* posted write ordering */
+</programlisting>
+
+      <para>
 	PCI ordering rules also guarantee that PIO read responses arrive
 	after any outstanding DMA writes on that bus, since for some devices
 	the result of a <function>readb</function> call may signal to the
@@ -171,7 +232,9 @@
 	<function>readb</function> call has no relation to any previous DMA
 	writes performed by the device.  The driver can use
 	<function>readb_relaxed</function> for these cases, although only
-	some platforms will honor the relaxed semantics.
+	some platforms will honor the relaxed semantics.  Using the relaxed
+	read functions will provide significant performance benefits on
+	platforms that support it.
       </para>
     </sect1>
 
===== arch/ia64/sn/kernel/iomv.c 1.2 vs edited =====
--- 1.2/arch/ia64/sn/kernel/iomv.c	2004-10-20 13:38:35 -07:00
+++ edited/arch/ia64/sn/kernel/iomv.c	2004-10-21 15:58:10 -07:00
@@ -54,19 +54,16 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_mmiowb - I/O space memory barrier
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/DocBook/deviceiobook.tmpl
+ * for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void sn_mmiob(void)
+void __sn_mmiowb(void)
 {
 	while ((((volatile unsigned long)(*pda->pio_write_status_addr)) &
 		SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) !=
@@ -74,4 +71,4 @@
 		cpu_relax();
 }
 
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_mmiowb);
===== include/asm-alpha/io.h 1.20 vs edited =====
--- 1.20/include/asm-alpha/io.h	2004-09-25 18:54:11 -07:00
+++ edited/include/asm-alpha/io.h	2004-10-21 15:51:42 -07:00
@@ -489,6 +489,8 @@
 #define readl_relaxed(addr) __raw_readl(addr)
 #define readq_relaxed(addr) __raw_readq(addr)
 
+#define mmiowb()
+
 /*
  * String version of IO memory access ops:
  */
===== include/asm-arm/io.h 1.19 vs edited =====
--- 1.19/include/asm-arm/io.h	2004-10-05 16:30:32 -07:00
+++ edited/include/asm-arm/io.h	2004-10-21 15:51:43 -07:00
@@ -136,6 +136,8 @@
 extern void _memcpy_toio(void __iomem *, const void *, size_t);
 extern void _memset_io(void __iomem *, int, size_t);
 
+#define mmiowb()
+
 /*
  *  Memory access primitives
  *  ------------------------
===== include/asm-arm26/io.h 1.3 vs edited =====
--- 1.3/include/asm-arm26/io.h	2004-08-02 01:00:45 -07:00
+++ edited/include/asm-arm26/io.h	2004-10-21 15:51:43 -07:00
@@ -320,6 +320,8 @@
 #define writesw(p,d,l)                        __readwrite_bug("writesw")
 #define writesl(p,d,l)                        __readwrite_bug("writesl")
 
+#define mmiowb()
+
 /* the following macro is depreciated */
 #define ioaddr(port)                    __ioaddr((port))
 
===== include/asm-cris/io.h 1.10 vs edited =====
--- 1.10/include/asm-cris/io.h	2004-06-01 02:27:58 -07:00
+++ edited/include/asm-cris/io.h	2004-10-21 15:51:43 -07:00
@@ -56,6 +56,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 #define memset_io(a,b,c)	memset((void *)(a),(b),(c))
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
===== include/asm-h8300/io.h 1.9 vs edited =====
--- 1.9/include/asm-h8300/io.h	2004-06-18 00:06:08 -07:00
+++ edited/include/asm-h8300/io.h	2004-10-21 15:51:43 -07:00
@@ -200,6 +200,8 @@
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb()
+
 #define inb(addr)    ((h8300_buswidth(addr))?readw((addr) & ~1) & 0xff:readb(addr))
 #define inw(addr)    _swapw(readw(addr))
 #define inl(addr)    _swapl(readl(addr))
===== include/asm-i386/io.h 1.29 vs edited =====
--- 1.29/include/asm-i386/io.h	2004-09-13 11:31:52 -07:00
+++ edited/include/asm-i386/io.h	2004-10-21 15:51:43 -07:00
@@ -178,6 +178,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 static inline void memset_io(volatile void __iomem *addr, unsigned char val, int count)
 {
 	memset((void __force *) addr, val, count);
===== include/asm-ia64/io.h 1.21 vs edited =====
--- 1.21/include/asm-ia64/io.h	2004-10-05 11:30:39 -07:00
+++ edited/include/asm-ia64/io.h	2004-10-21 15:51:44 -07:00
@@ -91,6 +91,20 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ *
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
+static inline void __ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +281,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +295,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
===== include/asm-ia64/machvec.h 1.28 vs edited =====
--- 1.28/include/asm-ia64/machvec.h	2004-10-20 15:52:20 -07:00
+++ edited/include/asm-ia64/machvec.h	2004-10-21 15:53:38 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_mmiowb_t (void);
 typedef unsigned char ia64_mv_readb_t (const volatile void __iomem *);
 typedef unsigned short ia64_mv_readw_t (const volatile void __iomem *);
 typedef unsigned int ia64_mv_readl_t (const volatile void __iomem *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_mmiowb	ia64_mv.mmiowb
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_mmiowb_t *mmiowb;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_mmiowb,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_mmiowb
+# define platform_mmiowb	__ia64_mmiowb
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
===== include/asm-ia64/machvec_init.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/machvec_init.h	2004-02-06 00:30:24 -08:00
+++ edited/include/asm-ia64/machvec_init.h	2004-10-21 15:51:44 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_mmiowb_t __ia64_mmiowb;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
===== include/asm-ia64/machvec_sn2.h 1.15 vs edited =====
--- 1.15/include/asm-ia64/machvec_sn2.h	2004-10-11 13:04:12 -07:00
+++ edited/include/asm-ia64/machvec_sn2.h	2004-10-21 15:51:44 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_mmiowb_t __sn_mmiowb;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_mmiowb			__sn_mmiowb
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
===== include/asm-ia64/sn/io.h 1.3 vs edited =====
--- 1.3/include/asm-ia64/sn/io.h	2004-10-20 15:52:23 -07:00
+++ edited/include/asm-ia64/sn/io.h	2004-10-21 15:55:48 -07:00
@@ -12,7 +12,7 @@
 #include <asm/intrinsics.h>
 
 extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+extern void __sn_mmiowb(void); /* Forward definition */
 
 extern int numionodes;
 
@@ -93,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -104,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -115,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
===== include/asm-m32r/io.h 1.2 vs edited =====
--- 1.2/include/asm-m32r/io.h	2004-10-05 23:05:45 -07:00
+++ edited/include/asm-m32r/io.h	2004-10-21 15:51:48 -07:00
@@ -151,6 +151,9 @@
 #define __raw_readb readb
 #define __raw_readw readw
 #define __raw_readl readl
+#define readb_relaxed readb
+#define readw_relaxed readw
+#define readl_relaxed readl
 
 #define writeb(val, addr)  _writeb((val), (unsigned long)(addr))
 #define writew(val, addr)  _writew((val), (unsigned long)(addr))
@@ -158,6 +161,8 @@
 #define __raw_writeb writeb
 #define __raw_writew writew
 #define __raw_writel writel
+
+#define mmiowb()
 
 #define flush_write_buffers() do { } while (0)  /* M32R_FIXME */
 
===== include/asm-m68k/io.h 1.13 vs edited =====
--- 1.13/include/asm-m68k/io.h	2004-04-12 04:56:00 -07:00
+++ edited/include/asm-m68k/io.h	2004-10-21 15:51:48 -07:00
@@ -306,6 +306,7 @@
 #endif
 #endif /* CONFIG_PCI */
 
+#define mmiowb()
 
 static inline void *ioremap(unsigned long physaddr, unsigned long size)
 {
===== include/asm-m68knommu/io.h 1.3 vs edited =====
--- 1.3/include/asm-m68knommu/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-m68knommu/io.h	2004-10-21 15:51:48 -07:00
@@ -102,6 +102,8 @@
 		*bp++ = _swapl(*ap);
 }
 
+#define mmiowb()
+
 /*
  *	make the short names macros so specific devices
  *	can override them as required
===== include/asm-mips/io.h 1.7 vs edited =====
--- 1.7/include/asm-mips/io.h	2004-02-19 12:53:02 -08:00
+++ edited/include/asm-mips/io.h	2004-10-21 15:51:48 -07:00
@@ -290,6 +290,10 @@
 #define __raw_writeb(b,addr)	((*(volatile unsigned char *)(addr)) = (b))
 #define __raw_writew(w,addr)	((*(volatile unsigned short *)(addr)) = (w))
 #define __raw_writel(l,addr)	((*(volatile unsigned int *)(addr)) = (l))
+
+/* Depends on MIPS III instruction set */
+#define mmiowb() asm volatile ("sync" ::: "memory")
+
 #ifdef CONFIG_MIPS32
 #define ____raw_writeq(val,addr)						\
 ({									\
===== include/asm-parisc/io.h 1.8 vs edited =====
--- 1.8/include/asm-parisc/io.h	2004-08-13 01:23:36 -07:00
+++ edited/include/asm-parisc/io.h	2004-10-21 15:51:49 -07:00
@@ -177,6 +177,8 @@
 #define readl_relaxed(addr) readl(addr)
 #define readq_relaxed(addr) readq(addr)
 
+#define mmiowb()
+
 extern void __memcpy_fromio(unsigned long dest, unsigned long src, int count);
 extern void __memcpy_toio(unsigned long dest, unsigned long src, int count);
 extern void __memset_io(unsigned long dest, char fill, int count);
===== include/asm-ppc/io.h 1.25 vs edited =====
--- 1.25/include/asm-ppc/io.h	2004-10-16 01:16:24 -07:00
+++ edited/include/asm-ppc/io.h	2004-10-21 15:51:49 -07:00
@@ -197,6 +197,8 @@
 #define memcpy_fromio(a,b,c)   memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb()
+
 /*
  * Map in an area of physical address space, for accessing
  * I/O devices etc.
===== include/asm-ppc64/io.h 1.23 vs edited =====
--- 1.23/include/asm-ppc64/io.h	2004-10-18 16:32:20 -07:00
+++ edited/include/asm-ppc64/io.h	2004-10-21 15:51:49 -07:00
@@ -158,6 +158,8 @@
 extern void _insl_ns(volatile u32 *port, void *buf, int nl);
 extern void _outsl_ns(volatile u32 *port, const void *buf, int nl);
 
+#define mmiowb()
+
 /*
  * output pause versions need a delay at least for the
  * w83c105 ide controller in a p610.
===== include/asm-s390/io.h 1.7 vs edited =====
--- 1.7/include/asm-s390/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-s390/io.h	2004-10-21 15:51:49 -07:00
@@ -105,6 +105,8 @@
 #define outb(x,addr) ((void) writeb(x,addr))
 #define outb_p(x,addr) outb(x,addr)
 
+#define mmiowb()
+
 #endif /* __KERNEL__ */
 
 #endif
===== include/asm-sh/io.h 1.7 vs edited =====
--- 1.7/include/asm-sh/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-sh/io.h	2004-10-21 15:51:49 -07:00
@@ -134,6 +134,8 @@
 #define readw_relaxed(a) readw(a)
 #define readl_relaxed(a) readl(a)
 
+#define mmiowb()
+
 /*
  * If the platform has PC-like I/O, this function converts the offset into
  * an address.
===== include/asm-sh64/io.h 1.1 vs edited =====
--- 1.1/include/asm-sh64/io.h	2004-06-29 07:44:46 -07:00
+++ edited/include/asm-sh64/io.h	2004-10-21 15:51:49 -07:00
@@ -86,6 +86,9 @@
 #define readb(addr)		sh64_in8(addr)
 #define readw(addr)		sh64_in16(addr)
 #define readl(addr)		sh64_in32(addr)
+#define readb_relaxed(addr)		sh64_in8(addr)
+#define readw_relaxed(addr)		sh64_in16(addr)
+#define readl_relaxed(addr)		sh64_in32(addr)
 
 #define writeb(b, addr)		sh64_out8(b, addr)
 #define writew(b, addr)		sh64_out16(b, addr)
@@ -105,6 +108,8 @@
 void outb(unsigned long value, unsigned long port);
 void outw(unsigned long value, unsigned long port);
 void outl(unsigned long value, unsigned long port);
+
+#define mmiowb()
 
 #ifdef __KERNEL__
 
===== include/asm-sparc/io.h 1.9 vs edited =====
--- 1.9/include/asm-sparc/io.h	2004-02-19 23:15:13 -08:00
+++ edited/include/asm-sparc/io.h	2004-10-21 15:51:49 -07:00
@@ -23,6 +23,8 @@
 	return ((w&0xff) << 8) | ((w>>8)&0xff);
 }
 
+#define mmiowb()
+
 /*
  * Memory mapped I/O to PCI
  *
===== include/asm-sparc64/io.h 1.13 vs edited =====
--- 1.13/include/asm-sparc64/io.h	2004-09-16 14:43:04 -07:00
+++ edited/include/asm-sparc64/io.h	2004-10-21 15:51:50 -07:00
@@ -439,6 +439,8 @@
 	return retval;
 }
 
+#define mmiowb()
+
 #ifdef __KERNEL__
 
 /* On sparc64 we have the whole physical IO address space accessible
===== include/asm-v850/io.h 1.3 vs edited =====
--- 1.3/include/asm-v850/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-v850/io.h	2004-10-21 15:51:50 -07:00
@@ -102,6 +102,8 @@
 #define ioremap_writethrough(physaddr, size)	(physaddr)
 #define ioremap_fullcache(physaddr, size)	(physaddr)
 
+#define mmiowb()
+
 #define page_to_phys(page)      ((page - mem_map) << PAGE_SHIFT)
 #if 0
 /* This is really stupid; don't define it.  */
===== include/asm-x86_64/io.h 1.15 vs edited =====
--- 1.15/include/asm-x86_64/io.h	2004-10-06 01:08:02 -07:00
+++ edited/include/asm-x86_64/io.h	2004-10-21 15:51:50 -07:00
@@ -186,6 +186,8 @@
 #define __raw_readl readl
 #define __raw_readq readq
 
+#define mmiowb()
+
 #ifdef CONFIG_UNORDERED_IO
 static inline void __writel(__u32 val, void __iomem *addr)
 {

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] use mmiowb in qla1280.c
  2004-10-21 23:13 [PATCH] I/O space write barrier Jesse Barnes
@ 2004-10-21 23:17 ` Jesse Barnes
  2004-10-22  9:53   ` Jes Sorensen
  2004-10-24 16:20   ` James Bottomley
  2004-10-21 23:28 ` [PATCH] use mmiowb in tg3.c Jesse Barnes
  2004-10-22  1:01 ` [PATCH] I/O space write barrier Grant Grundler
  2 siblings, 2 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-21 23:17 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: linux-scsi, jeremy, jes

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

There are a few spots in qla1280.c that don't need a full PCI write flush to 
the device, but rather a simple write ordering guarantee.  This patch changes 
some of the PIO reads that cause write flushes into mmiowb calls instead, 
which is a lighter weight way of ensuring ordering.

Jes and James, can you ack this and/or push it in via the SCSI BK tree?

Thanks,
Jesse

Signed-off-by: Jeremy Higdon <jeremy@sgi.com>
Signed-off-by: Jesse Barnes <jbarnes@sgi.com>



[-- Attachment #2: qla1280-mmiowb-4.patch --]
[-- Type: text/plain, Size: 1654 bytes --]

===== drivers/scsi/qla1280.c 1.69 vs edited =====
--- 1.69/drivers/scsi/qla1280.c	2004-10-20 06:46:21 -07:00
+++ edited/drivers/scsi/qla1280.c	2004-10-21 16:06:11 -07:00
@@ -3400,7 +3400,8 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
+	mmiowb();
 
  out:
 	if (status)
@@ -3668,7 +3669,8 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
+	mmiowb();
 
 out:
 	if (status)
@@ -3778,9 +3780,21 @@
 	} else
 		ha->request_ring_ptr++;
 
-	/* Set chip new ring index. */
+	/*
+	 * Update request index to mailbox4 (Request Queue In).
+	 * The mmiowb() ensures that this write is ordered with writes by other
+	 * CPUs.  Without the mmiowb(), it is possible for the following:
+	 *    CPUA posts write of index 5 to mailbox4
+	 *    CPUA releases host lock
+	 *    CPUB acquires host lock
+	 *    CPUB posts write of index 6 to mailbox4
+	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
+	 *       causing chip to issue full queue of stale commands
+	 * The mmiowb() prevents future writes from crossing the barrier.
+	 * See Documentation/DocBook/deviceiobook.tmpl for more information.
+	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb();
 
 	LEAVE("qla1280_isp_cmd");
 }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] use mmiowb in tg3.c
  2004-10-21 23:13 [PATCH] I/O space write barrier Jesse Barnes
  2004-10-21 23:17 ` [PATCH] use mmiowb in qla1280.c Jesse Barnes
@ 2004-10-21 23:28 ` Jesse Barnes
  2004-10-21 23:40   ` David S. Miller
  2004-10-22 20:51   ` [PATCH] use mmiowb in tg3_poll akepner
  2004-10-22  1:01 ` [PATCH] I/O space write barrier Grant Grundler
  2 siblings, 2 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-21 23:28 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: netdev, jgarzik, davem, gnb, akepner

[-- Attachment #1: Type: text/plain, Size: 557 bytes --]

This patch originally from Greg Banks.  Some parts of the tg3 driver depend on 
PIO writes arriving in order.  This patch ensures that in two key places 
using the new mmiowb macro.  This not only prevents bugs (the queues can be 
corrupted), but is much faster than ensuring ordering using PIO reads (which 
involve a few round trips to the target bus on some platforms).

Arthur has another patch that uses mmiowb in tg3 that he posted earlier as 
well.

Signed-off-by: Greg Banks <gnb@sgi.com>
Signed-off-by: Jesse Barnes <jbarnes@sgi.com

Thanks,
Jesse

[-- Attachment #2: tg3-mmiowb-2.patch --]
[-- Type: text/plain, Size: 459 bytes --]

===== drivers/net/tg3.c 1.210 vs edited =====
--- 1.210/drivers/net/tg3.c	2004-10-06 09:42:56 -07:00
+++ edited/drivers/net/tg3.c	2004-10-21 16:06:37 -07:00
@@ -2729,6 +2729,7 @@
 		tw32_rx_mbox(MAILBOX_RCV_JUMBO_PROD_IDX + TG3_64BIT_REG_LOW,
 			     sw_idx);
 	}
+	mmiowb();
 
 	return received;
 }
@@ -3176,6 +3177,7 @@
 		netif_stop_queue(dev);
 
 out_unlock:
+    	mmiowb();
 	spin_unlock_irqrestore(&tp->tx_lock, flags);
 
 	dev->trans_start = jiffies;

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in tg3.c
  2004-10-21 23:28 ` [PATCH] use mmiowb in tg3.c Jesse Barnes
@ 2004-10-21 23:40   ` David S. Miller
  2004-10-22  1:16     ` Benjamin Herrenschmidt
  2004-10-22  3:01     ` Jesse Barnes
  2004-10-22 20:51   ` [PATCH] use mmiowb in tg3_poll akepner
  1 sibling, 2 replies; 45+ messages in thread
From: David S. Miller @ 2004-10-21 23:40 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: akpm, linux-kernel, netdev, jgarzik, gnb, akepner

On Thu, 21 Oct 2004 16:28:06 -0700
Jesse Barnes <jbarnes@engr.sgi.com> wrote:

> This patch originally from Greg Banks.  Some parts of the tg3 driver depend on 
> PIO writes arriving in order.  This patch ensures that in two key places 
> using the new mmiowb macro.  This not only prevents bugs (the queues can be 
> corrupted), but is much faster than ensuring ordering using PIO reads (which 
> involve a few round trips to the target bus on some platforms).

Do other PCI systems which post PIO writes also potentially reorder
them just like this SGI system does?  Just trying to get this situation
straight in my head.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-21 23:13 [PATCH] I/O space write barrier Jesse Barnes
  2004-10-21 23:17 ` [PATCH] use mmiowb in qla1280.c Jesse Barnes
  2004-10-21 23:28 ` [PATCH] use mmiowb in tg3.c Jesse Barnes
@ 2004-10-22  1:01 ` Grant Grundler
  2004-10-22  3:05   ` Jesse Barnes
  2 siblings, 1 reply; 45+ messages in thread
From: Grant Grundler @ 2004-10-22  1:01 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: akpm, linux-kernel, tony.luck, linux-ia64

On Thu, Oct 21, 2004 at 04:13:19PM -0700, Jesse Barnes wrote:
> This patch adds a mmiowb() call to deal with this sort of situation, and 
> adds some documentation describing I/O ordering issues to deviceiobook.tmpl.  

Jesse,
This looks overall pretty good. Just a few nits.

> The idea is to mirror the regular, cacheable memory barrier operation, wmb.  
> Example of the problem this new macro solves:
> 
> CPU A:  spin_lock_irqsave(&dev_lock, flags)
> CPU A:  ...
> CPU A:  writel(newval, ring_ptr);
> CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
>         ...
> CPU B:  spin_lock_irqsave(&dev_lock, flags)
> CPU B:  writel(newval2, ring_ptr);
> CPU B:  ...
> CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
> 
> In this case, newval2 could be written to ring_ptr before newval.  Fixing it 
> is easy though:
> 
> CPU A:  spin_lock_irqsave(&dev_lock, flags)
> CPU A:  ...
> CPU A:  writel(newval, ring_ptr);
> CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
> CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
>         ...
> CPU B:  spin_lock_irqsave(&dev_lock, flags)
> CPU B:  writel(newval2, ring_ptr);
> CPU B:  ...
> CPU B:  mmiowb();
> CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

This is a great example and should be used instead of the qla1280 code
snippet that you used. Please just point at the source file that contains
the code and name the register usage this example represents.
Ie make it easy to find the example in real code without using
line numbers.

> I've tried to describe how mmiowb() differs from PCI posted 
> write flushing in the patch to deviceiobook.tmpl.

Yes - I think this is alot clearer than the previous documents - thanks!

...
> +<programlisting>
> +       sp->flags |= SRB_SENT;
> +       ha->actthreads++;
> +       WRT_REG_WORD(&amp;reg->mailbox4, ha->req_ring_index);
> +
> +       /*
> +        * A Memory Mapped I/O Write Barrier is needed to ensure that this write
> +        * of the request queue in register is ordered ahead of writes issued
> +        * after this one by other CPUs.  Access to the register is protected
> +        * by the host_lock.  Without the mmiowb, however, it is possible for
> +        * this CPU to release the host lock, another CPU acquire the host lock,
> +        * and write to the request queue in, and have the second write make it
> +        * to the chip first.
> +        */
> +       mmiowb(); /* posted write ordering */
> +</programlisting>

This is the example code I'd like to see replaced with your
synthetic example above.


thanks,
grant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in tg3.c
  2004-10-21 23:40   ` David S. Miller
@ 2004-10-22  1:16     ` Benjamin Herrenschmidt
  2004-10-22  1:33       ` akepner
  2004-10-22  3:01     ` Jesse Barnes
  1 sibling, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2004-10-22  1:16 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jesse Barnes, Andrew Morton, Linux Kernel list, netdev,
	Jeff Garzik, gnb, akepner

On Fri, 2004-10-22 at 09:40, David S. Miller wrote:
> On Thu, 21 Oct 2004 16:28:06 -0700
> Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> 
> > This patch originally from Greg Banks.  Some parts of the tg3 driver depend on 
> > PIO writes arriving in order.  This patch ensures that in two key places 
> > using the new mmiowb macro.  This not only prevents bugs (the queues can be 
> > corrupted), but is much faster than ensuring ordering using PIO reads (which 
> > involve a few round trips to the target bus on some platforms).
> 
> Do other PCI systems which post PIO writes also potentially reorder
> them just like this SGI system does?  Just trying to get this situation
> straight in my head.

I think the problem they have is really related to their spinlock, that
is the IO leaking out of the spinlock vs. other CPUs.

On the other hand, as I discussed with Jesse in the past, I'd like to
extend the semantics of mmiowb() to also define full barrier between
cacheable and non-cacheable accesses. That would help fixing a lot of issues
on ppc and ppc64.

Typically, our normal "light" write barrier doesn't reorder between cacheable
and non-cacheable (MMIO) stores, which is why we had to put some heavy sync
barrier in our MMIO writes macros.

By having an mmiowb() that allow to explicitely do this ordering, it would
allow us to relax the barriers in the MMIO macros, and so get back a few
percent of perfs that we lost with the "fix".

I haven't had time to work on that yet though, I need to review with paulus
all the possible race cases, but hopefully, I'll have a patch on top of
Jesse next week or so and will start converting more drivers.

Ben.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in tg3.c
  2004-10-22  1:16     ` Benjamin Herrenschmidt
@ 2004-10-22  1:33       ` akepner
  2004-10-22  2:07         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 45+ messages in thread
From: akepner @ 2004-10-22  1:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David S. Miller, Jesse Barnes, Andrew Morton, Linux Kernel list,
	netdev, Jeff Garzik, gnb

On Fri, 22 Oct 2004, Benjamin Herrenschmidt wrote:

> ... 
> Typically, our normal "light" write barrier doesn't reorder between cacheable
> and non-cacheable (MMIO) stores, which is why we had to put some heavy sync
> barrier in our MMIO writes macros.
> ...

Do you mean "impose order" rather than "reorder" here? 

-- 
Arthur




^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in tg3.c
  2004-10-22  1:33       ` akepner
@ 2004-10-22  2:07         ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2004-10-22  2:07 UTC (permalink / raw)
  To: akepner
  Cc: David S. Miller, Jesse Barnes, Andrew Morton, Linux Kernel list,
	netdev, Jeff Garzik, gnb

On Fri, 2004-10-22 at 11:33, akepner@sgi.com wrote:
> On Fri, 22 Oct 2004, Benjamin Herrenschmidt wrote:
> 
> > ... 
> > Typically, our normal "light" write barrier doesn't reorder between cacheable
> > and non-cacheable (MMIO) stores, which is why we had to put some heavy sync
> > barrier in our MMIO writes macros.
> > ...
> 
> Do you mean "impose order" rather than "reorder" here? 

Right.

Ben.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in tg3.c
  2004-10-21 23:40   ` David S. Miller
  2004-10-22  1:16     ` Benjamin Herrenschmidt
@ 2004-10-22  3:01     ` Jesse Barnes
  2004-10-22  4:00       ` Paul Mackerras
  1 sibling, 1 reply; 45+ messages in thread
From: Jesse Barnes @ 2004-10-22  3:01 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jesse Barnes, akpm, linux-kernel, netdev, jgarzik, gnb, akepner

On Thursday, October 21, 2004 6:40 pm, David S. Miller wrote:
> On Thu, 21 Oct 2004 16:28:06 -0700
>
> Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> > This patch originally from Greg Banks.  Some parts of the tg3 driver
> > depend on PIO writes arriving in order.  This patch ensures that in two
> > key places using the new mmiowb macro.  This not only prevents bugs (the
> > queues can be corrupted), but is much faster than ensuring ordering using
> > PIO reads (which involve a few round trips to the target bus on some
> > platforms).
>
> Do other PCI systems which post PIO writes also potentially reorder
> them just like this SGI system does?  Just trying to get this situation
> straight in my head.

The HP guys claim that theirs don't, but PPC does, afaik.  And clearly any 
large system that posts PCI writes has the *potential* of reordering them.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-22  1:01 ` [PATCH] I/O space write barrier Grant Grundler
@ 2004-10-22  3:05   ` Jesse Barnes
  2004-10-22  4:26     ` Greg Banks
  2004-10-22 15:26     ` Grant Grundler
  0 siblings, 2 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-22  3:05 UTC (permalink / raw)
  To: Grant Grundler; +Cc: Jesse Barnes, akpm, linux-kernel, tony.luck, linux-ia64

On Thursday, October 21, 2004 8:01 pm, Grant Grundler wrote:
> > +<programlisting>
> > +       sp->flags |= SRB_SENT;
> > +       ha->actthreads++;
> > +       WRT_REG_WORD(&amp;reg->mailbox4, ha->req_ring_index);
> > +
> > +       /*
> > +        * A Memory Mapped I/O Write Barrier is needed to ensure that
> > this write +        * of the request queue in register is ordered ahead
> > of writes issued +        * after this one by other CPUs.  Access to the
> > register is protected +        * by the host_lock.  Without the mmiowb,
> > however, it is possible for +        * this CPU to release the host lock,
> > another CPU acquire the host lock, +        * and write to the request
> > queue in, and have the second write make it +        * to the chip first.
> > +        */
> > +       mmiowb(); /* posted write ordering */
> > +</programlisting>
>
> This is the example code I'd like to see replaced with your
> synthetic example above.

Ok, that makes sense.  I'd like to update the documentation with a separate 
patch though, if that's ok with you.  I think Greg had some ideas about other 
things to cover as well.  Greg?

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in tg3.c
  2004-10-22  3:01     ` Jesse Barnes
@ 2004-10-22  4:00       ` Paul Mackerras
  0 siblings, 0 replies; 45+ messages in thread
From: Paul Mackerras @ 2004-10-22  4:00 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: David S. Miller, Jesse Barnes, akpm, linux-kernel, netdev,
	jgarzik, gnb, akepner

Jesse Barnes writes:

> On Thursday, October 21, 2004 6:40 pm, David S. Miller wrote:
> > On Thu, 21 Oct 2004 16:28:06 -0700
> >
> > Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> > > This patch originally from Greg Banks.  Some parts of the tg3 driver
> > > depend on PIO writes arriving in order.  This patch ensures that in two
> > > key places using the new mmiowb macro.  This not only prevents bugs (the
> > > queues can be corrupted), but is much faster than ensuring ordering using
> > > PIO reads (which involve a few round trips to the target bus on some
> > > platforms).
> >
> > Do other PCI systems which post PIO writes also potentially reorder
> > them just like this SGI system does?  Just trying to get this situation
> > straight in my head.
> 
> The HP guys claim that theirs don't, but PPC does, afaik.  And clearly any 
> large system that posts PCI writes has the *potential* of reordering them.

No, PPC systems don't reorder writes to PCI devices.  Provided you use
inl/outl/readl/writel et al., all PCI accesses from one processor are
strictly ordered, and if you use a spinlock, that gives you strict
access ordering between processors.

Our barrier instructions mostly order cacheable accesses separately
from non-cacheable accesses, except for the strongest barrier
instruction, which orders everything.  Thus it would be useful for us
to have an explicit indication of when a cacheable write (i.e. to main
memory) has to be completed (from a PCI device's point of view) before
a non-cacheable device read or write (e.g. to kick off DMA).

Paul.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-22  3:05   ` Jesse Barnes
@ 2004-10-22  4:26     ` Greg Banks
  2004-10-22 15:26     ` Grant Grundler
  1 sibling, 0 replies; 45+ messages in thread
From: Greg Banks @ 2004-10-22  4:26 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: Grant Grundler, Jesse Barnes, akpm, linux-kernel, tony.luck, linux-ia64

On Fri, 2004-10-22 at 13:05, Jesse Barnes wrote:
>  I think Greg had some ideas about other 
> things to cover as well.  Greg?

You've done most of the stuff I wanted, but it would be nice to see:

*  a mention of the read-to-flush-writes technique and why mmiowb
   is better

*  an example of how and where to use read_relaxed and a description
   of why its better than read

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in qla1280.c
  2004-10-21 23:17 ` [PATCH] use mmiowb in qla1280.c Jesse Barnes
@ 2004-10-22  9:53   ` Jes Sorensen
  2004-10-24 16:20   ` James Bottomley
  1 sibling, 0 replies; 45+ messages in thread
From: Jes Sorensen @ 2004-10-22  9:53 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: akpm, linux-kernel, linux-scsi, jeremy

>>>>> "Jesse" == Jesse Barnes <jbarnes@engr.sgi.com> writes:

Jesse> There are a few spots in qla1280.c that don't need a full PCI
Jesse> write flush to the device, but rather a simple write ordering
Jesse> guarantee.  This patch changes some of the PIO reads that cause
Jesse> write flushes into mmiowb calls instead, which is a lighter
Jesse> weight way of ensuring ordering.

Jesse> Jes and James, can you ack this and/or push it in via the SCSI
Jesse> BK tree?

ACK!

James will you push this one?

Cheers,
Jes

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-22  3:05   ` Jesse Barnes
  2004-10-22  4:26     ` Greg Banks
@ 2004-10-22 15:26     ` Grant Grundler
  1 sibling, 0 replies; 45+ messages in thread
From: Grant Grundler @ 2004-10-22 15:26 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Jesse Barnes, akpm, linux-kernel, tony.luck, linux-ia64

On Thu, Oct 21, 2004 at 10:05:50PM -0500, Jesse Barnes wrote:
> Ok, that makes sense.  I'd like to update the documentation with a separate 
> patch though, if that's ok with you. 

Sure, if that's easier for you.

grant

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] use mmiowb in tg3_poll
  2004-10-21 23:28 ` [PATCH] use mmiowb in tg3.c Jesse Barnes
  2004-10-21 23:40   ` David S. Miller
@ 2004-10-22 20:51   ` akepner
  1 sibling, 0 replies; 45+ messages in thread
From: akepner @ 2004-10-22 20:51 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: netdev, jgarzik, davem, jbarnes, gnb


Returning from tg3_poll() without flushing the PIO write which
reenables interrupts can result in lower cpu utilization and higher
throughput. So use a memory barrier, mmiowb(), instead of flushing the
write with a PIO read.

Signed-off-by: Arthur Kepner <akepner@sgi.com>
---

--- linux-2.6.9-rc4-mm1.orig/drivers/net/tg3.c	2004-10-22 13:51:10.000000000 -0700
+++ linux-2.6.9-rc4-mm1/drivers/net/tg3.c	2004-10-22 13:53:36.000000000 -0700
@@ -417,6 +417,20 @@
 	tg3_cond_int(tp);
 }
 
+/* tg3_restart_ints
+ *  similar to tg3_enable_ints, but it can return without flushing the 
+ *  PIO write which reenables interrupts
+ */
+static void tg3_restart_ints(struct tg3 *tp)
+{
+	tw32(TG3PCI_MISC_HOST_CTRL,
+		(tp->misc_host_ctrl & ~MISC_HOST_CTRL_MASK_PCI_INT));
+	tw32_mailbox(MAILBOX_INTERRUPT_0 + TG3_64BIT_REG_LOW, 0x00000000);
+	mmiowb();
+
+	tg3_cond_int(tp);
+}
+
 static inline void tg3_netif_stop(struct tg3 *tp)
 {
 	netif_poll_disable(tp->dev);
@@ -2787,7 +2801,7 @@
 	if (done) {
 		spin_lock_irqsave(&tp->lock, flags);
 		__netif_rx_complete(netdev);
-		tg3_enable_ints(tp);
+		tg3_restart_ints(tp);
 		spin_unlock_irqrestore(&tp->lock, flags);
 	}
 


-- 
Arthur






^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in qla1280.c
  2004-10-21 23:17 ` [PATCH] use mmiowb in qla1280.c Jesse Barnes
  2004-10-22  9:53   ` Jes Sorensen
@ 2004-10-24 16:20   ` James Bottomley
  2004-10-25 16:18     ` Jesse Barnes
  1 sibling, 1 reply; 45+ messages in thread
From: James Bottomley @ 2004-10-24 16:20 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Andrew Morton, Linux Kernel, SCSI Mailing List, jeremy, jes

On Thu, 2004-10-21 at 19:17, Jesse Barnes wrote:
> There are a few spots in qla1280.c that don't need a full PCI write flush to 
> the device, but rather a simple write ordering guarantee.  This patch changes 
> some of the PIO reads that cause write flushes into mmiowb calls instead, 
> which is a lighter weight way of ensuring ordering.
> 
> Jes and James, can you ack this and/or push it in via the SCSI BK tree?

This doesn't seem to work:

  CC [M]  drivers/scsi/qla1280.o
drivers/scsi/qla1280.c: In function `qla1280_64bit_start_scsi':
drivers/scsi/qla1280.c:3404: warning: implicit declaration of function
`mmiowb'

  MODPOST
*** Warning: "mmiowb" [drivers/scsi/qla1280.ko] undefined!

James



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in qla1280.c
  2004-10-24 16:20   ` James Bottomley
@ 2004-10-25 16:18     ` Jesse Barnes
  2004-10-25 19:02       ` Andrew Morton
  0 siblings, 1 reply; 45+ messages in thread
From: Jesse Barnes @ 2004-10-25 16:18 UTC (permalink / raw)
  To: James Bottomley
  Cc: Andrew Morton, Linux Kernel, SCSI Mailing List, jeremy, jes

On Sunday, October 24, 2004 9:20 am, James Bottomley wrote:
> On Thu, 2004-10-21 at 19:17, Jesse Barnes wrote:
> > There are a few spots in qla1280.c that don't need a full PCI write flush
> > to the device, but rather a simple write ordering guarantee.  This patch
> > changes some of the PIO reads that cause write flushes into mmiowb calls
> > instead, which is a lighter weight way of ensuring ordering.
> >
> > Jes and James, can you ack this and/or push it in via the SCSI BK tree?
>
> This doesn't seem to work:
>
>   CC [M]  drivers/scsi/qla1280.o
> drivers/scsi/qla1280.c: In function `qla1280_64bit_start_scsi':
> drivers/scsi/qla1280.c:3404: warning: implicit declaration of function
> `mmiowb'
>
>   MODPOST
> *** Warning: "mmiowb" [drivers/scsi/qla1280.ko] undefined!

Only works in Andrew's tree until he pushes mmiowb to Linus.

Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in qla1280.c
  2004-10-25 16:18     ` Jesse Barnes
@ 2004-10-25 19:02       ` Andrew Morton
  2004-10-25 19:33         ` Jesse Barnes
  0 siblings, 1 reply; 45+ messages in thread
From: Andrew Morton @ 2004-10-25 19:02 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: James.Bottomley, linux-kernel, linux-scsi, jeremy, jes

Jesse Barnes <jbarnes@engr.sgi.com> wrote:
>
> > *** Warning: "mmiowb" [drivers/scsi/qla1280.ko] undefined!
> 
>  Only works in Andrew's tree until he pushes mmiowb to Linus.

I haven't been following this stuff at all closely and I need help
determining which patches should be pushed, and when.  Please.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] use mmiowb in qla1280.c
  2004-10-25 19:02       ` Andrew Morton
@ 2004-10-25 19:33         ` Jesse Barnes
  0 siblings, 0 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-25 19:33 UTC (permalink / raw)
  To: Andrew Morton; +Cc: James.Bottomley, linux-kernel, linux-scsi, jeremy, jes

On Monday, October 25, 2004 12:02 pm, Andrew Morton wrote:
> Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> > > *** Warning: "mmiowb" [drivers/scsi/qla1280.ko] undefined!
> >
> >  Only works in Andrew's tree until he pushes mmiowb to Linus.
>
> I haven't been following this stuff at all closely and I need help
> determining which patches should be pushed, and when.  Please.

I think it's safe to push the main mmiowb() patch now, then James and the 
netdev tree can start including driver changes to make use of it.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-16  3:20         ` Matthew Wilcox
@ 2004-10-16  3:31           ` Jeremy Higdon
  0 siblings, 0 replies; 45+ messages in thread
From: Jeremy Higdon @ 2004-10-16  3:31 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Guennadi Liakhovetski, akpm, Jesse Barnes, linux-kernel, gnb,
	linux-scsi, james.bottomley

On Sat, Oct 16, 2004 at 04:20:44AM +0100, Matthew Wilcox wrote:
> On Fri, Oct 15, 2004 at 05:38:09PM -0700, Jeremy Higdon wrote:
> > -	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
> > +	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
> > +	mmiowb();
> 
> I really don't think we want a comment by every mmiowb() explaining what
> it does.  We needed one by the write flush because it had two potential
> meanings, and we didn't want people overoptimising it away.  But mmiowb()
> is clear and unambiguous.

This seems to be a case of not being able to please everyone.
James Bottomley asked for documentation on the usage of mmiowb().
Guennadi asked for one copy of the documentation and the references
to that in other places.

I don't really care that much, but this is the third version of this
patch, where the only difference is comments.  If it's all right, let
this go in, and you can submit patches to change the comments later.

I believe James' idea was that qla1280 would be the "example" driver.

jeremy

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-16  0:38       ` Jeremy Higdon
@ 2004-10-16  3:20         ` Matthew Wilcox
  2004-10-16  3:31           ` Jeremy Higdon
  0 siblings, 1 reply; 45+ messages in thread
From: Matthew Wilcox @ 2004-10-16  3:20 UTC (permalink / raw)
  To: Jeremy Higdon
  Cc: Guennadi Liakhovetski, akpm, Jesse Barnes, linux-kernel, gnb, linux-scsi

On Fri, Oct 15, 2004 at 05:38:09PM -0700, Jeremy Higdon wrote:
> -	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
> +	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
> +	mmiowb();

I really don't think we want a comment by every mmiowb() explaining what
it does.  We needed one by the write flush because it had two potential
meanings, and we didn't want people overoptimising it away.  But mmiowb()
is clear and unambiguous.

-- 
"Next the statesmen will invent cheap lies, putting the blame upon 
the nation that is attacked, and every man will be glad of those
conscience-soothing falsities, and will diligently study them, and refuse
to examine any refutations of them; and thus he will by and by convince 
himself that the war is just, and will thank God for the better sleep 
he enjoys after this process of grotesque self-deception." -- Mark Twain

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-30 21:21     ` Guennadi Liakhovetski
@ 2004-10-16  0:38       ` Jeremy Higdon
  2004-10-16  3:20         ` Matthew Wilcox
  0 siblings, 1 reply; 45+ messages in thread
From: Jeremy Higdon @ 2004-10-16  0:38 UTC (permalink / raw)
  To: Guennadi Liakhovetski, akpm; +Cc: Jesse Barnes, linux-kernel, gnb, linux-scsi

On Thu, Sep 30, 2004 at 11:21:39PM +0200, Guennadi Liakhovetski wrote:
> 
> A pretty obvious note: instead of repeating this nice but pretty lengthy 
> comment 3 times in the same file, wouldn't it be better to write at 
> further locations something like
> 
> 	/* Enforce IO-ordering. See comment in <function> for details. */
> 
> Also helps if you later have to modify the comment, or move it, or add 
> more mmiowb()s, or do some other modifications.
> 
> Thanks
> Guennadi
> ---
> Guennadi Liakhovetski

Suggestion applied.  Now that the mmiowb is in the mm patch, this
patch to qla1280 should be ready for inclusion therein also.

signed-off-by: Jeremy Higdon <jeremy@sgi.com>


--- linux-2.6.9-rc4.orig/drivers/scsi/qla1280.c	2004-10-15 17:21:21.000000000 -0700
+++ linux-2.6.9-rc4/drivers/scsi/qla1280.c	2004-10-15 17:23:08.000000000 -0700
@@ -3409,7 +3409,8 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
+	mmiowb();
 
  out:
 	if (status)
@@ -3677,7 +3678,8 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	/* Enforce mmio write ordering; see comment in qla1280_isp_cmd(). */
+	mmiowb();
 
 out:
 	if (status)
@@ -3787,9 +3789,21 @@
 	} else
 		ha->request_ring_ptr++;
 
-	/* Set chip new ring index. */
+	/*
+	 * Update request index to mailbox4 (Request Queue In).
+	 * The mmiowb() ensures that this write is ordered with writes by other
+	 * CPUs.  Without the mmiowb(), it is possible for the following:
+	 *    CPUA posts write of index 5 to mailbox4
+	 *    CPUA releases host lock
+	 *    CPUB acquires host lock
+	 *    CPUB posts write of index 6 to mailbox4
+	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
+	 *       causing chip to issue full queue of stale commands
+	 * The mmiowb() prevents future writes from crossing the barrier.
+	 * See Documentation/DocBook/deviceiobook.tmpl for more information.
+	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb();
 
 	LEAVE("qla1280_isp_cmd");
 }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05 23:57               ` Roland Dreier
@ 2004-10-06  1:45                 ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2004-10-06  1:45 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Jesse Barnes, Albert Cahalan, linux-kernel mailing list

On Wed, 2004-10-06 at 09:57, Roland Dreier wrote:

> I could be wrong but I think that the eieio in the ppc IO write
> functions should be strong enough that mmiowb() can be a no-op.
> 
> By the way, are the ordering rules different for ppc32 and ppc64?  I
> notice that the ppc32 out_xxx() functions do eieio while the ppc64
> versions do a full sync.

ppc32 and ppc64 are identical by spec, but the current chips smaller
store queues are such that we didn't epxerience on ppc32 the amount
of issues we had on ppc64.

eieio will not order a cacheable store vs. a non-cacheable store by
spec, which is the root of our problem on ppc and why we had to change
some of these into sync's. Extended the semantics of mmiowb() to a more
generic ordering of MMIO vs. "the rest of the world" would help us as
I don't beleive in defining yet-another barrier and have it properly
used by device driver writers.

Ben.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05 22:41             ` Benjamin Herrenschmidt
  2004-10-05 23:09               ` Jesse Barnes
@ 2004-10-05 23:57               ` Roland Dreier
  2004-10-06  1:45                 ` Benjamin Herrenschmidt
  1 sibling, 1 reply; 45+ messages in thread
From: Roland Dreier @ 2004-10-05 23:57 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jesse Barnes, Albert Cahalan, linux-kernel mailing list

    Benjamin> I don't understand that neither. You can never guarantee
    Benjamin> any ordering between writes from different CPUs unless
    Benjamin> you have a sinlock. If you have an ordering problem with
    Benjamin> spinlocks, then it's a totally different issue, a bit
    Benjamin> more like MMIO vs. cacheable mem that we have on PPC. If
    Benjamin> this is the problem you are trying to chase, then we
    Benjamin> could use such a barrier on ppc too and make it a hard
    Benjamin> sync, but it has nothing to do with the write barrier we
    Benjamin> already have in our IO accessors...

As I understand it, the problem is that on some Itanium boxes, it's
possible to have the following code run:

	CPU 1				CPU 2
    spin_lock(&devlock);
    writel(foo);
    spin_unlock(&devlock);

				    spin_lock(&devlock);
				    writel(bar);
				    spin_unlock(&devlock);

and still have bar arrive at the device before foo.  One possibility
would be to add a read after the writel() to flush the posted write.
The proposed mmiowb() function is somewhat lighter weight -- it
guarantees that later writes will not hit the device before any writes
already issued, but it doesn't say anything about writes making it all
the way to the device.

I could be wrong but I think that the eieio in the ppc IO write
functions should be strong enough that mmiowb() can be a no-op.

By the way, are the ordering rules different for ppc32 and ppc64?  I
notice that the ppc32 out_xxx() functions do eieio while the ppc64
versions do a full sync.

Thanks,
  Roland



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05 22:41             ` Benjamin Herrenschmidt
@ 2004-10-05 23:09               ` Jesse Barnes
  2004-10-05 23:57               ` Roland Dreier
  1 sibling, 0 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-05 23:09 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Albert Cahalan, linux-kernel mailing list

On Tuesday, October 5, 2004 3:41 pm, Benjamin Herrenschmidt wrote:
> On Wed, 2004-10-06 at 01:33, Jesse Barnes wrote:
> > This macro is only supposed to deal with writes from different CPUs that
> > may arrive out of order, nothing else.  It sounds like PPC won't allow
> > that normally, so I can be an empty definition.
>
> I don't understand that neither. You can never guarantee any ordering
> between writes from different CPUs unless you have a sinlock. If you
> have an ordering problem with spinlocks, then it's a totally different
> issue, a bit more like MMIO vs. cacheable mem that we have on PPC.

Right.

> If 
> this is the problem you are trying to chase, then we could use such a
> barrier on ppc too and make it a hard sync, but it has nothing to do
> with the write barrier we already have in our IO accessors...

Ok.

>
> > > That  doesn't solve my need of MMIO vs. memory unless you are trying to
> > > cover that as well, in which case it should be a sync.
> >
> > No, I think that has to be covered separately.
>
> How so ? Again, this whole "ordering of writes between different CPU" makes
> absolutely no sense to me.

It's like you said above.  I meant that ordering of writes to I/O space and 
memory space should be dealt with differently on PPC, as you've said before.  
I guess you need new barrier types for that?

Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05 15:33           ` Jesse Barnes
@ 2004-10-05 22:41             ` Benjamin Herrenschmidt
  2004-10-05 23:09               ` Jesse Barnes
  2004-10-05 23:57               ` Roland Dreier
  0 siblings, 2 replies; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2004-10-05 22:41 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Albert Cahalan, linux-kernel mailing list

On Wed, 2004-10-06 at 01:33, Jesse Barnes wrote:

> This macro is only supposed to deal with writes from different CPUs that may 
> arrive out of order, nothing else.  It sounds like PPC won't allow that 
> normally, so I can be an empty definition.

I don't understand that neither. You can never guarantee any ordering
between writes from different CPUs unless you have a sinlock. If you
have an ordering problem with spinlocks, then it's a totally different
issue, a bit more like MMIO vs. cacheable mem that we have on PPC. If
this is the problem you are trying to chase, then we could use such a
barrier on ppc too and make it a hard sync, but it has nothing to do
with the write barrier we already have in our IO accessors...

> > That  doesn't solve my need of MMIO vs. memory unless you are trying to
> > cover that as well, in which case it should be a sync.
> 
> No, I think that has to be covered separately.

How so ? Again, this whole "ordering of writes between different CPU" makes
absolutely no sense to me.

Ben.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] I/O space write barrier
@ 2004-10-05 22:38 Jesse Barnes
  0 siblings, 0 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-05 22:38 UTC (permalink / raw)
  To: akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2552 bytes --]

I've integrated BenH's latest comments.  If it turns out they actually need 
this (they may in the future if they implement the other barriers they'd 
like), then they can trivially update their definition of mmiowb().

On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to 
I/O space aren't ordered coming from different CPUs.  For the most part, this 
isn't a problem since drivers generally spinlock around code that does writeX 
calls, but if the last operation a driver does before it releases a lock is a 
write and some other CPU takes the lock and immediately does a write, it's 
possible the second CPU's write could arrive before the first's.

This patch adds a mmiowb() call to deal with this sort of situation, and 
adds some documentation describing I/O ordering issues to deviceiobook.tmpl.  
The idea is to mirror the regular, cacheable memory barrier operation, wmb.  
Example of the problem this new macro solves:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

In this case, newval2 could be written to ring_ptr before newval.  Fixing it 
is easy though:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  mmiowb();
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Note that this doesn't address a related case where the driver may want to 
actually make a given write get to the device before proceeding.  This should 
be dealt with by immediately reading a register from the card that has no 
side effects.  According to the PCI spec, that will guarantee that all writes 
have arrived before being sent to the target bus.  If no such register is 
available (in the case of card resets perhaps), reading from config space is 
sufficient (though it may return all ones if the card isn't responding to 
read cycles).  I've tried to describe how mmiowb() differs from PCI posted 
write flushing in the patch to deviceiobook.tmpl.

Patches to use this new primitive in various drivers will come separately, 
probably via the SCSI tree.

Signed-off-by: Jesse Barnes <jbarnes@sgi.com>

Thanks,
Jesse

[-- Attachment #2: mmiowb-6.patch --]
[-- Type: text/plain, Size: 19083 bytes --]

===== Documentation/DocBook/deviceiobook.tmpl 1.4 vs edited =====
--- 1.4/Documentation/DocBook/deviceiobook.tmpl	2004-02-03 21:31:10 -08:00
+++ edited/Documentation/DocBook/deviceiobook.tmpl	2004-10-05 15:30:33 -07:00
@@ -147,8 +147,7 @@
 	compiler is not permitted to reorder the I/O sequence. When the 
 	ordering can be compiler optimised, you can use <function>
 	__readb</function> and friends to indicate the relaxed ordering. Use 
-	this with care. The <function>rmb</function> provides a read memory 
-	barrier. The <function>wmb</function> provides a write memory barrier.
+	this with care.
       </para>
 
       <para>
@@ -159,10 +158,72 @@
 	asynchronously. A driver author must issue a read from the same
 	device to ensure that writes have occurred in the specific cases the
 	author cares. This kind of property cannot be hidden from driver
-	writers in the API.
+	writers in the API.  In some cases, the read used to flush the device
+	may be expected to fail (if the card is resetting, for example).  In
+	that case, the read should be done from config space, which is
+	guaranteed to soft-fail if the card doesn't respond.
       </para>
 
       <para>
+	The following is an example of flushing a write to a device when
+	the driver would like to ensure the write's effects are visible prior
+	to continuing execution.
+      </para>
+
+<programlisting>
+static inline void
+qla1280_disable_intrs(struct scsi_qla_host *ha)
+{
+	struct device_reg *reg;
+
+	reg = ha->iobase;
+	/* disable risc and host interrupts */
+	WRT_REG_WORD(&amp;reg->ictrl, 0);
+	/*
+	 * The following read will ensure that the above write
+	 * has been received by the device before we return from this
+	 * function.
+	 */
+	RD_REG_WORD(&amp;reg->ictrl);
+	ha->flags.ints_enabled = 0;
+}
+</programlisting>
+
+      <para>
+	In addition to write posting, on some large multiprocessing systems
+	(e.g. SGI Challenge, Origin and Altix machines) posted writes won't
+	be strongly ordered coming from different CPUs.  Thus it's important
+	to properly protect parts of your driver that do memory-mapped writes
+	with locks and use the <function>mmiowb</function> to make sure they
+	arrive in the order intended.
+      </para>
+
+      <para>
+	Generally, one should use <function>mmiowb</function> prior to
+	releasing a spinlock that protects regions using <function>writeb
+	</function> or similar functions that aren't surrounded by <function>
+	readb</function> calls, which will ensure ordering and flushing.  The
+	following example (again from qla1280.c) illustrates its use.
+      </para>
+
+<programlisting>
+       sp->flags |= SRB_SENT;
+       ha->actthreads++;
+       WRT_REG_WORD(&amp;reg->mailbox4, ha->req_ring_index);
+
+       /*
+        * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+        * of the request queue in register is ordered ahead of writes issued
+        * after this one by other CPUs.  Access to the register is protected
+        * by the host_lock.  Without the mmiowb, however, it is possible for
+        * this CPU to release the host lock, another CPU acquire the host lock,
+        * and write to the request queue in, and have the second write make it
+        * to the chip first.
+        */
+       mmiowb(); /* posted write ordering */
+</programlisting>
+
+      <para>
 	PCI ordering rules also guarantee that PIO read responses arrive
 	after any outstanding DMA writes on that bus, since for some devices
 	the result of a <function>readb</function> call may signal to the
@@ -171,7 +232,9 @@
 	<function>readb</function> call has no relation to any previous DMA
 	writes performed by the device.  The driver can use
 	<function>readb_relaxed</function> for these cases, although only
-	some platforms will honor the relaxed semantics.
+	some platforms will honor the relaxed semantics.  Using the relaxed
+	read functions will provide significant performance benefits on
+	platforms that support it.
       </para>
     </sect1>
 
===== arch/ia64/sn/io/machvec/iomv.c 1.9 vs edited =====
--- 1.9/arch/ia64/sn/io/machvec/iomv.c	2004-05-26 06:49:19 -07:00
+++ edited/arch/ia64/sn/io/machvec/iomv.c	2004-10-05 15:30:34 -07:00
@@ -54,23 +54,19 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_mmiowb - I/O space write barrier
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/DocBook/deviceiobook.tmpl
+ * for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void
-sn_mmiob (void)
+void __sn_mmiowb(void)
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
 		cpu_relax();
 }
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_mmiowb);
===== include/asm-alpha/io.h 1.20 vs edited =====
--- 1.20/include/asm-alpha/io.h	2004-09-25 18:54:11 -07:00
+++ edited/include/asm-alpha/io.h	2004-10-05 15:30:34 -07:00
@@ -489,6 +489,8 @@
 #define readl_relaxed(addr) __raw_readl(addr)
 #define readq_relaxed(addr) __raw_readq(addr)
 
+#define mmiowb()
+
 /*
  * String version of IO memory access ops:
  */
===== include/asm-arm/io.h 1.18 vs edited =====
--- 1.18/include/asm-arm/io.h	2004-06-03 10:30:09 -07:00
+++ edited/include/asm-arm/io.h	2004-10-05 15:30:35 -07:00
@@ -135,6 +135,8 @@
 extern void _memcpy_toio(unsigned long, const void *, size_t);
 extern void _memset_io(unsigned long, int, size_t);
 
+#define mmiowb()
+
 /*
  *  Memory access primitives
  *  ------------------------
===== include/asm-arm26/io.h 1.3 vs edited =====
--- 1.3/include/asm-arm26/io.h	2004-08-02 01:00:45 -07:00
+++ edited/include/asm-arm26/io.h	2004-10-05 15:30:35 -07:00
@@ -320,6 +320,8 @@
 #define writesw(p,d,l)                        __readwrite_bug("writesw")
 #define writesl(p,d,l)                        __readwrite_bug("writesl")
 
+#define mmiowb()
+
 /* the following macro is depreciated */
 #define ioaddr(port)                    __ioaddr((port))
 
===== include/asm-cris/io.h 1.10 vs edited =====
--- 1.10/include/asm-cris/io.h	2004-06-01 02:27:58 -07:00
+++ edited/include/asm-cris/io.h	2004-10-05 15:30:35 -07:00
@@ -56,6 +56,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 #define memset_io(a,b,c)	memset((void *)(a),(b),(c))
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
===== include/asm-h8300/io.h 1.9 vs edited =====
--- 1.9/include/asm-h8300/io.h	2004-06-18 00:06:08 -07:00
+++ edited/include/asm-h8300/io.h	2004-10-05 15:30:35 -07:00
@@ -200,6 +200,8 @@
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb()
+
 #define inb(addr)    ((h8300_buswidth(addr))?readw((addr) & ~1) & 0xff:readb(addr))
 #define inw(addr)    _swapw(readw(addr))
 #define inl(addr)    _swapl(readl(addr))
===== include/asm-i386/io.h 1.29 vs edited =====
--- 1.29/include/asm-i386/io.h	2004-09-13 11:31:52 -07:00
+++ edited/include/asm-i386/io.h	2004-10-05 15:30:36 -07:00
@@ -178,6 +178,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 static inline void memset_io(volatile void __iomem *addr, unsigned char val, int count)
 {
 	memset((void __force *) addr, val, count);
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-ia64/io.h	2004-10-05 15:30:36 -07:00
@@ -91,6 +91,20 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ *
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
+static inline void __ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +281,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +295,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
===== include/asm-ia64/machvec.h 1.26 vs edited =====
--- 1.26/include/asm-ia64/machvec.h	2004-08-03 16:05:22 -07:00
+++ edited/include/asm-ia64/machvec.h	2004-10-05 15:30:36 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_mmiowb_t (void);
 typedef unsigned char ia64_mv_readb_t (void *);
 typedef unsigned short ia64_mv_readw_t (void *);
 typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_mmiowb	ia64_mv.mmiowb
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_mmiowb_t *mmiowb;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_mmiowb,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_mmiowb
+# define platform_mmiowb	__ia64_mmiowb
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
===== include/asm-ia64/machvec_init.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/machvec_init.h	2004-02-06 00:30:24 -08:00
+++ edited/include/asm-ia64/machvec_init.h	2004-10-05 15:30:36 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_mmiowb_t __ia64_mmiowb;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
===== include/asm-ia64/machvec_sn2.h 1.14 vs edited =====
--- 1.14/include/asm-ia64/machvec_sn2.h	2004-07-10 17:14:00 -07:00
+++ edited/include/asm-ia64/machvec_sn2.h	2004-10-05 15:30:36 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_mmiowb_t __sn_mmiowb;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_mmiowb			__sn_mmiowb
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
===== include/asm-ia64/sn/io.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/sn/io.h	2004-02-13 07:00:22 -08:00
+++ edited/include/asm-ia64/sn/io.h	2004-10-05 15:30:37 -07:00
@@ -58,8 +58,8 @@
 #include <asm/sn/sn2/shubio.h>
 
 /*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
  */
-extern void sn_mmiob(void);
+extern void __sn_mmiowb(void);
 
 #endif /* _ASM_IA64_SN_IO_H */
===== include/asm-ia64/sn/sn2/io.h 1.6 vs edited =====
--- 1.6/include/asm-ia64/sn/sn2/io.h	2004-07-22 17:00:00 -07:00
+++ edited/include/asm-ia64/sn/sn2/io.h	2004-10-05 15:30:37 -07:00
@@ -11,8 +11,10 @@
 #include <linux/compiler.h>
 #include <asm/intrinsics.h>
 
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_mmiowb(void);
 
 #define __sn_mf_a()   ia64_mfa()
 
@@ -91,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -102,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -113,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
===== include/asm-m32r/io.h 1.1 vs edited =====
--- 1.1/include/asm-m32r/io.h	2004-09-17 00:06:56 -07:00
+++ edited/include/asm-m32r/io.h	2004-10-05 15:30:37 -07:00
@@ -150,6 +150,9 @@
 #define __raw_readb readb
 #define __raw_readw readw
 #define __raw_readl readl
+#define readb_relaxed readb
+#define readw_relaxed readw
+#define readl_relaxed readl
 
 #define writeb(val, addr)  _writeb((val), (unsigned long)(addr))
 #define writew(val, addr)  _writew((val), (unsigned long)(addr))
@@ -157,6 +160,8 @@
 #define __raw_writeb writeb
 #define __raw_writew writew
 #define __raw_writel writel
+
+#define mmiowb()
 
 #define flush_write_buffers() do { } while (0)  /* M32R_FIXME */
 
===== include/asm-m68k/io.h 1.13 vs edited =====
--- 1.13/include/asm-m68k/io.h	2004-04-12 04:56:00 -07:00
+++ edited/include/asm-m68k/io.h	2004-10-05 15:30:37 -07:00
@@ -306,6 +306,7 @@
 #endif
 #endif /* CONFIG_PCI */
 
+#define mmiowb()
 
 static inline void *ioremap(unsigned long physaddr, unsigned long size)
 {
===== include/asm-m68knommu/io.h 1.3 vs edited =====
--- 1.3/include/asm-m68knommu/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-m68knommu/io.h	2004-10-05 15:30:37 -07:00
@@ -102,6 +102,8 @@
 		*bp++ = _swapl(*ap);
 }
 
+#define mmiowb()
+
 /*
  *	make the short names macros so specific devices
  *	can override them as required
===== include/asm-mips/io.h 1.7 vs edited =====
--- 1.7/include/asm-mips/io.h	2004-02-19 12:53:02 -08:00
+++ edited/include/asm-mips/io.h	2004-10-05 15:30:38 -07:00
@@ -290,6 +290,10 @@
 #define __raw_writeb(b,addr)	((*(volatile unsigned char *)(addr)) = (b))
 #define __raw_writew(w,addr)	((*(volatile unsigned short *)(addr)) = (w))
 #define __raw_writel(l,addr)	((*(volatile unsigned int *)(addr)) = (l))
+
+/* Depends on MIPS III instruction set */
+#define mmiowb() asm volatile ("sync" ::: "memory")
+
 #ifdef CONFIG_MIPS32
 #define ____raw_writeq(val,addr)						\
 ({									\
===== include/asm-parisc/io.h 1.8 vs edited =====
--- 1.8/include/asm-parisc/io.h	2004-08-13 01:23:36 -07:00
+++ edited/include/asm-parisc/io.h	2004-10-05 15:30:38 -07:00
@@ -177,6 +177,8 @@
 #define readl_relaxed(addr) readl(addr)
 #define readq_relaxed(addr) readq(addr)
 
+#define mmiowb()
+
 extern void __memcpy_fromio(unsigned long dest, unsigned long src, int count);
 extern void __memcpy_toio(unsigned long dest, unsigned long src, int count);
 extern void __memset_io(unsigned long dest, char fill, int count);
===== include/asm-ppc/io.h 1.24 vs edited =====
--- 1.24/include/asm-ppc/io.h	2004-09-07 23:32:54 -07:00
+++ edited/include/asm-ppc/io.h	2004-10-05 15:31:53 -07:00
@@ -197,6 +197,8 @@
 #define memcpy_fromio(a,b,c)   memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb()
+
 /*
  * Map in an area of physical address space, for accessing
  * I/O devices etc.
===== include/asm-ppc64/io.h 1.22 vs edited =====
--- 1.22/include/asm-ppc64/io.h	2004-09-21 02:14:10 -07:00
+++ edited/include/asm-ppc64/io.h	2004-10-05 15:31:37 -07:00
@@ -152,6 +152,8 @@
 extern void _insl_ns(volatile u32 *port, void *buf, int nl);
 extern void _outsl_ns(volatile u32 *port, const void *buf, int nl);
 
+#define mmiowb()
+
 /*
  * output pause versions need a delay at least for the
  * w83c105 ide controller in a p610.
===== include/asm-s390/io.h 1.7 vs edited =====
--- 1.7/include/asm-s390/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-s390/io.h	2004-10-05 15:30:38 -07:00
@@ -105,6 +105,8 @@
 #define outb(x,addr) ((void) writeb(x,addr))
 #define outb_p(x,addr) outb(x,addr)
 
+#define mmiowb()
+
 #endif /* __KERNEL__ */
 
 #endif
===== include/asm-sh/io.h 1.7 vs edited =====
--- 1.7/include/asm-sh/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-sh/io.h	2004-10-05 15:30:39 -07:00
@@ -134,6 +134,8 @@
 #define readw_relaxed(a) readw(a)
 #define readl_relaxed(a) readl(a)
 
+#define mmiowb()
+
 /*
  * If the platform has PC-like I/O, this function converts the offset into
  * an address.
===== include/asm-sh64/io.h 1.1 vs edited =====
--- 1.1/include/asm-sh64/io.h	2004-06-29 07:44:46 -07:00
+++ edited/include/asm-sh64/io.h	2004-10-05 15:30:39 -07:00
@@ -86,6 +86,9 @@
 #define readb(addr)		sh64_in8(addr)
 #define readw(addr)		sh64_in16(addr)
 #define readl(addr)		sh64_in32(addr)
+#define readb_relaxed(addr)		sh64_in8(addr)
+#define readw_relaxed(addr)		sh64_in16(addr)
+#define readl_relaxed(addr)		sh64_in32(addr)
 
 #define writeb(b, addr)		sh64_out8(b, addr)
 #define writew(b, addr)		sh64_out16(b, addr)
@@ -105,6 +108,8 @@
 void outb(unsigned long value, unsigned long port);
 void outw(unsigned long value, unsigned long port);
 void outl(unsigned long value, unsigned long port);
+
+#define mmiowb()
 
 #ifdef __KERNEL__
 
===== include/asm-sparc/io.h 1.9 vs edited =====
--- 1.9/include/asm-sparc/io.h	2004-02-19 23:15:13 -08:00
+++ edited/include/asm-sparc/io.h	2004-10-05 15:30:39 -07:00
@@ -23,6 +23,8 @@
 	return ((w&0xff) << 8) | ((w>>8)&0xff);
 }
 
+#define mmiowb()
+
 /*
  * Memory mapped I/O to PCI
  *
===== include/asm-sparc64/io.h 1.13 vs edited =====
--- 1.13/include/asm-sparc64/io.h	2004-09-16 14:43:04 -07:00
+++ edited/include/asm-sparc64/io.h	2004-10-05 15:30:39 -07:00
@@ -439,6 +439,8 @@
 	return retval;
 }
 
+#define mmiowb()
+
 #ifdef __KERNEL__
 
 /* On sparc64 we have the whole physical IO address space accessible
===== include/asm-v850/io.h 1.3 vs edited =====
--- 1.3/include/asm-v850/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-v850/io.h	2004-10-05 15:30:40 -07:00
@@ -102,6 +102,8 @@
 #define ioremap_writethrough(physaddr, size)	(physaddr)
 #define ioremap_fullcache(physaddr, size)	(physaddr)
 
+#define mmiowb()
+
 #define page_to_phys(page)      ((page - mem_map) << PAGE_SHIFT)
 #if 0
 /* This is really stupid; don't define it.  */
===== include/asm-x86_64/io.h 1.14 vs edited =====
--- 1.14/include/asm-x86_64/io.h	2004-08-24 02:08:31 -07:00
+++ edited/include/asm-x86_64/io.h	2004-10-05 15:30:40 -07:00
@@ -186,6 +186,8 @@
 #define __raw_readl readl
 #define __raw_readq readq
 
+#define mmiowb()
+
 #ifdef CONFIG_UNORDERED_IO
 static inline void __writel(u32 val, void *addr)
 {

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05  3:04         ` Benjamin Herrenschmidt
@ 2004-10-05 15:33           ` Jesse Barnes
  2004-10-05 22:41             ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 45+ messages in thread
From: Jesse Barnes @ 2004-10-05 15:33 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Albert Cahalan, linux-kernel mailing list

On Monday, October 4, 2004 8:04 pm, Benjamin Herrenschmidt wrote:
> > I agree, it's hard to get right, especially when you've got a large base
> > of drivers with hard to find assumptions about ordering.
> >
> > What about mmiowb()?  Should it be eieio?  I don't want to post another
> > patch if I don't have to...
>
> I don't understand the whole story...
>
> If normal accesses aren't relaxed (and I think they shouldn't be), then
> there is no point in a barrier here.... If you need an explicit barrier
> for explicitely relaxed accesses, then yes.

This macro is only supposed to deal with writes from different CPUs that may 
arrive out of order, nothing else.  It sounds like PPC won't allow that 
normally, so I can be an empty definition.

> That doesn't solve my need of MMIO vs. memory unless you are trying to
> cover that as well, in which case it should be a sync.

No, I think that has to be covered separately.

Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05  2:26       ` Jesse Barnes
@ 2004-10-05  3:04         ` Benjamin Herrenschmidt
  2004-10-05 15:33           ` Jesse Barnes
  0 siblings, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2004-10-05  3:04 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Albert Cahalan, linux-kernel mailing list


> I agree, it's hard to get right, especially when you've got a large base of 
> drivers with hard to find assumptions about ordering.
> 
> What about mmiowb()?  Should it be eieio?  I don't want to post another patch 
> if I don't have to...

I don't understand the whole story...

If normal accesses aren't relaxed (and I think they shouldn't be), then
there is no point in a barrier here.... If you need an explicit barrier
for explicitely relaxed accesses, then yes.

That doesn't solve my need of MMIO vs. memory unless you are trying to
cover that as well, in which case it should be a sync.

Ben.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05  0:32   ` Albert Cahalan
  2004-10-05  1:22     ` Benjamin Herrenschmidt
@ 2004-10-05  2:33     ` Jesse Barnes
  1 sibling, 0 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-10-05  2:33 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, benh

On Monday, October 04, 2004 5:32 pm, Albert Cahalan wrote:
> Ideally, it would be eieio, and the eieio in each
> of the IO operations would be removed. Finding and
> fixing all the drivers that break looks impossible
> though; most driver developers will be on x86 boxes.

Well, I won't pretend to understand how the PPC ordering rules work, so I'll 
defer to benh on that one.

> In that case: wmmiob
>
> (or something longer, like mmio_write_fence maybe)
>
> As a name, "wmb" sucks almost as much as "cli" and "sti" do.
> It dates back to the Alpha port, where it's an opcode.

The other option I briefly considered was wwjd(), but I don't think He has an 
official position on posted write ordering.

Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05  1:22     ` Benjamin Herrenschmidt
@ 2004-10-05  2:26       ` Jesse Barnes
  2004-10-05  3:04         ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 45+ messages in thread
From: Jesse Barnes @ 2004-10-05  2:26 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Albert Cahalan, linux-kernel mailing list

On Monday, October 04, 2004 6:22 pm, Benjamin Herrenschmidt wrote:
> On Tue, 2004-10-05 at 10:32, Albert Cahalan wrote:
> > Ideally, it would be eieio, and the eieio in each
> > of the IO operations would be removed. Finding and
> > fixing all the drivers that break looks impossible
> > though; most driver developers will be on x86 boxes.
>
> I don't agree. IO operations shouldn't be relaxed by
> default. That's really asking too much of driver writers

I agree, it's hard to get right, especially when you've got a large base of 
drivers with hard to find assumptions about ordering.

What about mmiowb()?  Should it be eieio?  I don't want to post another patch 
if I don't have to...

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-05  0:32   ` Albert Cahalan
@ 2004-10-05  1:22     ` Benjamin Herrenschmidt
  2004-10-05  2:26       ` Jesse Barnes
  2004-10-05  2:33     ` Jesse Barnes
  1 sibling, 1 reply; 45+ messages in thread
From: Benjamin Herrenschmidt @ 2004-10-05  1:22 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: Jesse Barnes, linux-kernel mailing list

On Tue, 2004-10-05 at 10:32, Albert Cahalan wrote:


> Ideally, it would be eieio, and the eieio in each
> of the IO operations would be removed. Finding and
> fixing all the drivers that break looks impossible
> though; most driver developers will be on x86 boxes.

I don't agree. IO operations shouldn't be relaxed by
default. That's really asking too much of driver writers

Ben.


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-04 21:20 ` Jesse Barnes
@ 2004-10-05  0:32   ` Albert Cahalan
  2004-10-05  1:22     ` Benjamin Herrenschmidt
  2004-10-05  2:33     ` Jesse Barnes
  0 siblings, 2 replies; 45+ messages in thread
From: Albert Cahalan @ 2004-10-05  0:32 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Albert Cahalan, linux-kernel mailing list, benh

On Mon, 2004-10-04 at 17:20, Jesse Barnes wrote:
> On Monday, October 4, 2004 1:39 pm, Albert Cahalan wrote:
> > > diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
> > > --- a/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > > +++ b/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > > @@ -197,6 +197,8 @@
> > >  #define memcpy_fromio(a,b,c)   memcpy((a),(void *)(b),(c))
> > >  #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c))
> > >
> > > +#define mmiowb() asm volatile ("eieio" ::: "memory")
> > > +
> > >  /*
> > >   * Map in an area of physical address space, for accessing
> > >   * I/O devices etc.
> >
> > I don't think this is right. For ppc, eieio is
> > already included as part of the assembly for the
> > IO operations. If you could delete that, great,
> > but I suspect that nearly all drivers would break.
> 
> Ok, if it's covered than mmiowb() can just be empty for ppc.

Ideally, it would be eieio, and the eieio in each
of the IO operations would be removed. Finding and
fixing all the drivers that break looks impossible
though; most driver developers will be on x86 boxes.

> > BTW, the "eieio" name is better. The "wb" part
> > of "mmiowb" looks like "write back" to me, as if
> > it were some sort of cache push operation. It is
> > also lacking an appropriate song. :-)
> 
> It's supposed to be 'write barrier' just like wmb is a write memory barrier, 
> so is mmiowb a memory-mapped I/O write barrier.  Make sense?

In that case: wmmiob

(or something longer, like mmio_write_fence maybe)

As a name, "wmb" sucks almost as much as "cli" and "sti" do.
It dates back to the Alpha port, where it's an opcode.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-10-04 20:39 Albert Cahalan
@ 2004-10-04 21:20 ` Jesse Barnes
  2004-10-05  0:32   ` Albert Cahalan
  0 siblings, 1 reply; 45+ messages in thread
From: Jesse Barnes @ 2004-10-04 21:20 UTC (permalink / raw)
  To: Albert Cahalan; +Cc: linux-kernel mailing list, benh

On Monday, October 4, 2004 1:39 pm, Albert Cahalan wrote:
> > diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
> > --- a/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > +++ b/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> > @@ -197,6 +197,8 @@
> >  #define memcpy_fromio(a,b,c)   memcpy((a),(void *)(b),(c))
> >  #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c))
> >
> > +#define mmiowb() asm volatile ("eieio" ::: "memory")
> > +
> >  /*
> >   * Map in an area of physical address space, for accessing
> >   * I/O devices etc.
>
> I don't think this is right. For ppc, eieio is
> already included as part of the assembly for the
> IO operations. If you could delete that, great,
> but I suspect that nearly all drivers would break.

Ok, if it's covered than mmiowb() can just be empty for ppc.

> BTW, the "eieio" name is better. The "wb" part
> of "mmiowb" looks like "write back" to me, as if
> it were some sort of cache push operation. It is
> also lacking an appropriate song. :-)

It's supposed to be 'write barrier' just like wmb is a write memory barrier, 
so is mmiowb a memory-mapped I/O write barrier.  Make sense?

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
@ 2004-10-04 20:39 Albert Cahalan
  2004-10-04 21:20 ` Jesse Barnes
  0 siblings, 1 reply; 45+ messages in thread
From: Albert Cahalan @ 2004-10-04 20:39 UTC (permalink / raw)
  To: linux-kernel mailing list; +Cc: jbarnes, benh

> diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
> --- a/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> +++ b/include/asm-ppc/io.h 2004-09-27 10:48:41 -07:00
> @@ -197,6 +197,8 @@
>  #define memcpy_fromio(a,b,c)   memcpy((a),(void *)(b),(c))
>  #define memcpy_toio(a,b,c) memcpy((void *)(a),(b),(c))
> 
> +#define mmiowb() asm volatile ("eieio" ::: "memory")
> +
>  /*
>   * Map in an area of physical address space, for accessing
>   * I/O devices etc.

I don't think this is right. For ppc, eieio is
already included as part of the assembly for the
IO operations. If you could delete that, great,
but I suspect that nearly all drivers would break.

There's an existing eieio() inline function that
you could use, instead of fresh assembly code.

BTW, the "eieio" name is better. The "wb" part
of "mmiowb" looks like "write back" to me, as if
it were some sort of cache push operation. It is
also lacking an appropriate song. :-)



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-30  7:15   ` Jeremy Higdon
@ 2004-09-30 21:21     ` Guennadi Liakhovetski
  2004-10-16  0:38       ` Jeremy Higdon
  0 siblings, 1 reply; 45+ messages in thread
From: Guennadi Liakhovetski @ 2004-09-30 21:21 UTC (permalink / raw)
  To: Jeremy Higdon; +Cc: Jesse Barnes, akpm, linux-kernel, gnb

On Thu, 30 Sep 2004, Jeremy Higdon wrote:

> Here's the qla1280 patch to go with Jesse's mmiowb patch.
> Comments are verbose to satisfy a request from Mr. Bottomley.
>
> signed-off-by: Jeremy Higdon  <jeremy@sgi.com>
>
> ===== drivers/scsi/qla1280.c 1.65 vs edited =====
> --- 1.65/drivers/scsi/qla1280.c	2004-07-28 20:59:10 -07:00
> +++ edited/drivers/scsi/qla1280.c	2004-09-29 23:43:30 -07:00
> @@ -3397,8 +3397,22 @@
> 		"qla1280_64bit_start_scsi: Wakeup RISC for pending command\n");
> 	sp->flags |= SRB_SENT;
> 	ha->actthreads++;
> +
> +	/*
> +	 * Update request index to mailbox4 (Request Queue In).
> +	 * The mmiowb() ensures that this write is ordered with writes by other
> +	 * CPUs.  Without the mmiowb(), it is possible for the following:
> +	 *    CPUA posts write of index 5 to mailbox4
> +	 *    CPUA releases host lock
> +	 *    CPUB acquires host lock
> +	 *    CPUB posts write of index 6 to mailbox4
> +	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
> +	 *       causing chip to issue full queue of stale commands
> +	 * The mmiowb() prevents future writes from crossing the barrier.
> +	 * See Documentation/DocBook/deviceiobook.tmpl for more information.
> +	 */

A pretty obvious note: instead of repeating this nice but pretty lengthy 
comment 3 times in the same file, wouldn't it be better to write at 
further locations something like

 	/* Enforce IO-ordering. See comment in <function> for details. */

Also helps if you later have to modify the comment, or move it, or add 
more mmiowb()s, or do some other modifications.

Thanks
Guennadi
---
Guennadi Liakhovetski


^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-29 22:55 ` Jesse Barnes
@ 2004-09-30  7:15   ` Jeremy Higdon
  2004-09-30 21:21     ` Guennadi Liakhovetski
  0 siblings, 1 reply; 45+ messages in thread
From: Jeremy Higdon @ 2004-09-30  7:15 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: akpm, linux-kernel, gnb

Here's the qla1280 patch to go with Jesse's mmiowb patch.
Comments are verbose to satisfy a request from Mr. Bottomley.

signed-off-by: Jeremy Higdon  <jeremy@sgi.com>

===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c	2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c	2004-09-29 23:43:30 -07:00
@@ -3397,8 +3397,22 @@
 		"qla1280_64bit_start_scsi: Wakeup RISC for pending command\n");
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
+
+	/*
+	 * Update request index to mailbox4 (Request Queue In).
+	 * The mmiowb() ensures that this write is ordered with writes by other
+	 * CPUs.  Without the mmiowb(), it is possible for the following:
+	 *    CPUA posts write of index 5 to mailbox4
+	 *    CPUA releases host lock
+	 *    CPUB acquires host lock
+	 *    CPUB posts write of index 6 to mailbox4
+	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
+	 *       causing chip to issue full queue of stale commands
+	 * The mmiowb() prevents future writes from crossing the barrier.
+	 * See Documentation/DocBook/deviceiobook.tmpl for more information.
+	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb();
 
  out:
 	if (status)
@@ -3665,8 +3679,22 @@
 		"for pending command\n");
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
+
+	/*
+	 * Update request index to mailbox4 (Request Queue In).
+	 * The mmiowb() ensures that this write is ordered with writes by other
+	 * CPUs.  Without the mmiowb(), it is possible for the following:
+	 *    CPUA posts write of index 5 to mailbox4
+	 *    CPUA releases host lock
+	 *    CPUB acquires host lock
+	 *    CPUB posts write of index 6 to mailbox4
+	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
+	 *       causing chip to issue full queue of stale commands
+	 * The mmiowb() prevents future writes from crossing the barrier.
+	 * See Documentation/DocBook/deviceiobook.tmpl for more information.
+	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb();
 
 out:
 	if (status)
@@ -3776,9 +3804,21 @@
 	} else
 		ha->request_ring_ptr++;
 
-	/* Set chip new ring index. */
+	/*
+	 * Update request index to mailbox4 (Request Queue In).
+	 * The mmiowb() ensures that this write is ordered with writes by other
+	 * CPUs.  Without the mmiowb(), it is possible for the following:
+	 *    CPUA posts write of index 5 to mailbox4
+	 *    CPUA releases host lock
+	 *    CPUB acquires host lock
+	 *    CPUB posts write of index 6 to mailbox4
+	 *    On PCI bus, order reverses and write of 6 posts, then index 5,
+	 *       causing chip to issue full queue of stale commands
+	 * The mmiowb() prevents future writes from crossing the barrier.
+	 * See Documentation/DocBook/deviceiobook.tmpl for more information.
+	 */
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb();
 
 	LEAVE("qla1280_isp_cmd");
 }

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-29 20:50       ` David S. Miller
@ 2004-09-30  2:23         ` Greg Banks
  0 siblings, 0 replies; 45+ messages in thread
From: Greg Banks @ 2004-09-30  2:23 UTC (permalink / raw)
  To: David S. Miller
  Cc: Jesse Barnes, akpm, linux-kernel, Jeremy Higdon, John Partridge,
	Linux Network Development list

On Thu, 2004-09-30 at 06:50, David S. Miller wrote:
> On Wed, 29 Sep 2004 13:43:55 -0700
> Jesse Barnes <jbarnes@engr.sgi.com> wrote:
> 
> > The patch that actually implements mmiowb() already does this, I think Greg 
> > just used his patch for testing.  

Yes, that hunk will be unnecessary when Jesse's patch goes in.

> The proper way to do it of course is to 
> > just use mmiowb() where needed in tg3 after the write barrier patch gets in.
> 
> Perfect, please send me a tg3 patch once the mmiowb() bits
> go into the tree.

Will do.

Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.



^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-27 18:03 Jesse Barnes
  2004-09-29 10:36 ` Greg Banks
@ 2004-09-29 22:55 ` Jesse Barnes
  2004-09-30  7:15   ` Jeremy Higdon
  1 sibling, 1 reply; 45+ messages in thread
From: Jesse Barnes @ 2004-09-29 22:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, gnb, jeremy

[-- Attachment #1: Type: text/plain, Size: 276 bytes --]

On Monday, September 27, 2004 11:03 am, Jesse Barnes wrote:
> Received a few comments and no complaints, so I'm sending this patch in for
> inclusion.

This one fixes up the documentation a bit, but is otherwise the same.

Signed-off-by: Jesse Barnes <jbarnes@sgi.com>

Jesse

[-- Attachment #2: mmiowb-5.patch --]
[-- Type: text/plain, Size: 21454 bytes --]

diff -Nru a/Documentation/DocBook/deviceiobook.tmpl b/Documentation/DocBook/deviceiobook.tmpl
--- a/Documentation/DocBook/deviceiobook.tmpl	2004-09-29 15:51:10 -07:00
+++ b/Documentation/DocBook/deviceiobook.tmpl	2004-09-29 15:51:10 -07:00
@@ -147,8 +147,7 @@
 	compiler is not permitted to reorder the I/O sequence. When the 
 	ordering can be compiler optimised, you can use <function>
 	__readb</function> and friends to indicate the relaxed ordering. Use 
-	this with care. The <function>rmb</function> provides a read memory 
-	barrier. The <function>wmb</function> provides a write memory barrier.
+	this with care.
       </para>
 
       <para>
@@ -159,10 +158,72 @@
 	asynchronously. A driver author must issue a read from the same
 	device to ensure that writes have occurred in the specific cases the
 	author cares. This kind of property cannot be hidden from driver
-	writers in the API.
+	writers in the API.  In some cases, the read used to flush the device
+	may be expected to fail (if the card is resetting, for example).  In
+	that case, the read should be done from config space, which is
+	guaranteed to soft-fail if the card doesn't respond.
       </para>
 
       <para>
+	The following is an example of flushing a write to a device when
+	the driver would like to ensure the write's effects are visible prior
+	to continuing execution.
+      </para>
+
+<programlisting>
+static inline void
+qla1280_disable_intrs(struct scsi_qla_host *ha)
+{
+	struct device_reg *reg;
+
+	reg = ha->iobase;
+	/* disable risc and host interrupts */
+	WRT_REG_WORD(&amp;reg->ictrl, 0);
+	/*
+	 * The following read will ensure that the above write
+	 * has been received by the device before we return from this
+	 * function.
+	 */
+	RD_REG_WORD(&amp;reg->ictrl);
+	ha->flags.ints_enabled = 0;
+}
+</programlisting>
+
+      <para>
+	In addition to write posting, on some large multiprocessing systems
+	(e.g. SGI Challenge, Origin and Altix machines) posted writes won't
+	be strongly ordered coming from different CPUs.  Thus it's important
+	to properly protect parts of your driver that do memory-mapped writes
+	with locks and use the <function>mmiowb</function> to make sure they
+	arrive in the order intended.
+      </para>
+
+      <para>
+	Generally, one should use <function>mmiowb</function> prior to
+	releasing a spinlock that protects regions using <function>writeb
+	</function> or similar functions that aren't surrounded by <function>
+	readb</function> calls, which will ensure ordering and flushing.  The
+	following example (again from qla1280.c) illustrates its use.
+      </para>
+
+<programlisting>
+       sp->flags |= SRB_SENT;
+       ha->actthreads++;
+       WRT_REG_WORD(&amp;reg->mailbox4, ha->req_ring_index);
+
+       /*
+        * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+        * of the request queue in register is ordered ahead of writes issued
+        * after this one by other CPUs.  Access to the register is protected
+        * by the host_lock.  Without the mmiowb, however, it is possible for
+        * this CPU to release the host lock, another CPU acquire the host lock,
+        * and write to the request queue in, and have the second write make it
+        * to the chip first.
+        */
+       mmiowb(); /* posted write ordering */
+</programlisting>
+
+      <para>
 	PCI ordering rules also guarantee that PIO read responses arrive
 	after any outstanding DMA writes on that bus, since for some devices
 	the result of a <function>readb</function> call may signal to the
@@ -171,7 +232,9 @@
 	<function>readb</function> call has no relation to any previous DMA
 	writes performed by the device.  The driver can use
 	<function>readb_relaxed</function> for these cases, although only
-	some platforms will honor the relaxed semantics.
+	some platforms will honor the relaxed semantics.  Using the relaxed
+	read functions will provide significant performance benefits on
+	platforms that support it.
       </para>
     </sect1>
 
diff -Nru a/Documentation/io_ordering.txt b/Documentation/io_ordering.txt
--- a/Documentation/io_ordering.txt	2004-09-29 15:51:10 -07:00
+++ /dev/null	Wed Dec 31 16:00:00 196900
@@ -1,47 +0,0 @@
-On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
-platforms, driver writers are responsible for ensuring that I/O writes to
-memory-mapped addresses on their device arrive in the order intended.  This is
-typically done by reading a 'safe' device or bridge register, causing the I/O
-chipset to flush pending writes to the device before any reads are posted.  A
-driver would usually use this technique immediately prior to the exit of a
-critical section of code protected by spinlocks.  This would ensure that
-subsequent writes to I/O space arrived only after all prior writes (much like a
-memory barrier op, mb(), only with respect to I/O).
-
-A more concrete example from a hypothetical device driver:
-
-        ...
-CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
-CPU A:  ...
-CPU A:  writel(newval, ring_ptr);
-CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-        ...
-CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
-CPU B:  ...
-CPU B:  writel(newval2, ring_ptr);
-CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-        ...
-
-In the case above, the device may receive newval2 before it receives newval,
-which could cause problems.  Fixing it is easy enough though:
-
-        ...
-CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
-CPU A:  ...
-CPU A:  writel(newval, ring_ptr);
-CPU A:  (void)readl(safe_register); /* maybe a config register? */
-CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-        ...
-CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
-CPU B:  ...
-CPU B:  writel(newval2, ring_ptr);
-CPU B:  (void)readl(safe_register); /* maybe a config register? */
-CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-
-Here, the reads from safe_register will cause the I/O chipset to flush any
-pending writes before actually posting the read to the chipset, preventing
-possible data corruption.
diff -Nru a/arch/ia64/sn/io/machvec/iomv.c b/arch/ia64/sn/io/machvec/iomv.c
--- a/arch/ia64/sn/io/machvec/iomv.c	2004-09-29 15:51:10 -07:00
+++ b/arch/ia64/sn/io/machvec/iomv.c	2004-09-29 15:51:10 -07:00
@@ -54,23 +54,19 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_mmiowb - I/O space write barrier
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/DocBook/deviceiobook.tmpl
+ * for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void
-sn_mmiob (void)
+void __sn_mmiowb(void)
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
 		cpu_relax();
 }
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_mmiowb);
diff -Nru a/include/asm-alpha/io.h b/include/asm-alpha/io.h
--- a/include/asm-alpha/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-alpha/io.h	2004-09-29 15:51:10 -07:00
@@ -489,6 +489,8 @@
 #define readl_relaxed(addr) __raw_readl(addr)
 #define readq_relaxed(addr) __raw_readq(addr)
 
+#define mmiowb()
+
 /*
  * String version of IO memory access ops:
  */
diff -Nru a/include/asm-arm/io.h b/include/asm-arm/io.h
--- a/include/asm-arm/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-arm/io.h	2004-09-29 15:51:10 -07:00
@@ -135,6 +135,8 @@
 extern void _memcpy_toio(unsigned long, const void *, size_t);
 extern void _memset_io(unsigned long, int, size_t);
 
+#define mmiowb()
+
 /*
  *  Memory access primitives
  *  ------------------------
diff -Nru a/include/asm-arm26/io.h b/include/asm-arm26/io.h
--- a/include/asm-arm26/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-arm26/io.h	2004-09-29 15:51:10 -07:00
@@ -320,6 +320,8 @@
 #define writesw(p,d,l)                        __readwrite_bug("writesw")
 #define writesl(p,d,l)                        __readwrite_bug("writesl")
 
+#define mmiowb()
+
 /* the following macro is depreciated */
 #define ioaddr(port)                    __ioaddr((port))
 
diff -Nru a/include/asm-cris/io.h b/include/asm-cris/io.h
--- a/include/asm-cris/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-cris/io.h	2004-09-29 15:51:10 -07:00
@@ -56,6 +56,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 #define memset_io(a,b,c)	memset((void *)(a),(b),(c))
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
diff -Nru a/include/asm-h8300/io.h b/include/asm-h8300/io.h
--- a/include/asm-h8300/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-h8300/io.h	2004-09-29 15:51:10 -07:00
@@ -200,6 +200,8 @@
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb()
+
 #define inb(addr)    ((h8300_buswidth(addr))?readw((addr) & ~1) & 0xff:readb(addr))
 #define inw(addr)    _swapw(readw(addr))
 #define inl(addr)    _swapl(readl(addr))
diff -Nru a/include/asm-i386/io.h b/include/asm-i386/io.h
--- a/include/asm-i386/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-i386/io.h	2004-09-29 15:51:10 -07:00
@@ -178,6 +178,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 static inline void memset_io(volatile void __iomem *addr, unsigned char val, int count)
 {
 	memset((void __force *) addr, val, count);
diff -Nru a/include/asm-ia64/io.h b/include/asm-ia64/io.h
--- a/include/asm-ia64/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ia64/io.h	2004-09-29 15:51:10 -07:00
@@ -91,6 +91,20 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ *
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
+static inline void __ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +281,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +295,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
diff -Nru a/include/asm-ia64/machvec.h b/include/asm-ia64/machvec.h
--- a/include/asm-ia64/machvec.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ia64/machvec.h	2004-09-29 15:51:10 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_mmiowb_t (void);
 typedef unsigned char ia64_mv_readb_t (void *);
 typedef unsigned short ia64_mv_readw_t (void *);
 typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_mmiowb	ia64_mv.mmiowb
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_mmiowb_t *mmiowb;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_mmiowb,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_mmiowb
+# define platform_mmiowb	__ia64_mmiowb
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
diff -Nru a/include/asm-ia64/machvec_init.h b/include/asm-ia64/machvec_init.h
--- a/include/asm-ia64/machvec_init.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ia64/machvec_init.h	2004-09-29 15:51:10 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_mmiowb_t __ia64_mmiowb;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
diff -Nru a/include/asm-ia64/machvec_sn2.h b/include/asm-ia64/machvec_sn2.h
--- a/include/asm-ia64/machvec_sn2.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ia64/machvec_sn2.h	2004-09-29 15:51:10 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_mmiowb_t __sn_mmiowb;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_mmiowb			__sn_mmiowb
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
diff -Nru a/include/asm-ia64/sn/io.h b/include/asm-ia64/sn/io.h
--- a/include/asm-ia64/sn/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ia64/sn/io.h	2004-09-29 15:51:10 -07:00
@@ -58,8 +58,8 @@
 #include <asm/sn/sn2/shubio.h>
 
 /*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
  */
-extern void sn_mmiob(void);
+extern void __sn_mmiowb(void);
 
 #endif /* _ASM_IA64_SN_IO_H */
diff -Nru a/include/asm-ia64/sn/sn2/io.h b/include/asm-ia64/sn/sn2/io.h
--- a/include/asm-ia64/sn/sn2/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ia64/sn/sn2/io.h	2004-09-29 15:51:10 -07:00
@@ -11,8 +11,10 @@
 #include <linux/compiler.h>
 #include <asm/intrinsics.h>
 
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_mmiowb(void);
 
 #define __sn_mf_a()   ia64_mfa()
 
@@ -91,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -102,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -113,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
diff -Nru a/include/asm-m32r/io.h b/include/asm-m32r/io.h
--- a/include/asm-m32r/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-m32r/io.h	2004-09-29 15:51:10 -07:00
@@ -150,6 +150,9 @@
 #define __raw_readb readb
 #define __raw_readw readw
 #define __raw_readl readl
+#define readb_relaxed readb
+#define readw_relaxed readw
+#define readl_relaxed readl
 
 #define writeb(val, addr)  _writeb((val), (unsigned long)(addr))
 #define writew(val, addr)  _writew((val), (unsigned long)(addr))
@@ -157,6 +160,8 @@
 #define __raw_writeb writeb
 #define __raw_writew writew
 #define __raw_writel writel
+
+#define mmiowb()
 
 #define flush_write_buffers() do { } while (0)  /* M32R_FIXME */
 
diff -Nru a/include/asm-m68k/io.h b/include/asm-m68k/io.h
--- a/include/asm-m68k/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-m68k/io.h	2004-09-29 15:51:10 -07:00
@@ -306,6 +306,7 @@
 #endif
 #endif /* CONFIG_PCI */
 
+#define mmiowb()
 
 static inline void *ioremap(unsigned long physaddr, unsigned long size)
 {
diff -Nru a/include/asm-m68knommu/io.h b/include/asm-m68knommu/io.h
--- a/include/asm-m68knommu/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-m68knommu/io.h	2004-09-29 15:51:10 -07:00
@@ -102,6 +102,8 @@
 		*bp++ = _swapl(*ap);
 }
 
+#define mmiowb()
+
 /*
  *	make the short names macros so specific devices
  *	can override them as required
diff -Nru a/include/asm-mips/io.h b/include/asm-mips/io.h
--- a/include/asm-mips/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-mips/io.h	2004-09-29 15:51:10 -07:00
@@ -290,6 +290,10 @@
 #define __raw_writeb(b,addr)	((*(volatile unsigned char *)(addr)) = (b))
 #define __raw_writew(w,addr)	((*(volatile unsigned short *)(addr)) = (w))
 #define __raw_writel(l,addr)	((*(volatile unsigned int *)(addr)) = (l))
+
+/* Depends on MIPS III instruction set */
+#define mmiowb() asm volatile ("sync" ::: "memory")
+
 #ifdef CONFIG_MIPS32
 #define ____raw_writeq(val,addr)						\
 ({									\
diff -Nru a/include/asm-parisc/io.h b/include/asm-parisc/io.h
--- a/include/asm-parisc/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-parisc/io.h	2004-09-29 15:51:10 -07:00
@@ -177,6 +177,8 @@
 #define readl_relaxed(addr) readl(addr)
 #define readq_relaxed(addr) readq(addr)
 
+#define mmiowb()
+
 extern void __memcpy_fromio(unsigned long dest, unsigned long src, int count);
 extern void __memcpy_toio(unsigned long dest, unsigned long src, int count);
 extern void __memset_io(unsigned long dest, char fill, int count);
diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
--- a/include/asm-ppc/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ppc/io.h	2004-09-29 15:51:10 -07:00
@@ -197,6 +197,8 @@
 #define memcpy_fromio(a,b,c)   memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb() asm volatile ("eieio" ::: "memory")
+
 /*
  * Map in an area of physical address space, for accessing
  * I/O devices etc.
diff -Nru a/include/asm-ppc64/io.h b/include/asm-ppc64/io.h
--- a/include/asm-ppc64/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-ppc64/io.h	2004-09-29 15:51:10 -07:00
@@ -152,6 +152,8 @@
 extern void _insl_ns(volatile u32 *port, void *buf, int nl);
 extern void _outsl_ns(volatile u32 *port, const void *buf, int nl);
 
+#define mmiowb() asm volatile ("eieio" ::: "memory")
+
 /*
  * output pause versions need a delay at least for the
  * w83c105 ide controller in a p610.
diff -Nru a/include/asm-s390/io.h b/include/asm-s390/io.h
--- a/include/asm-s390/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-s390/io.h	2004-09-29 15:51:10 -07:00
@@ -105,6 +105,8 @@
 #define outb(x,addr) ((void) writeb(x,addr))
 #define outb_p(x,addr) outb(x,addr)
 
+#define mmiowb()
+
 #endif /* __KERNEL__ */
 
 #endif
diff -Nru a/include/asm-sh/io.h b/include/asm-sh/io.h
--- a/include/asm-sh/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-sh/io.h	2004-09-29 15:51:10 -07:00
@@ -134,6 +134,8 @@
 #define readw_relaxed(a) readw(a)
 #define readl_relaxed(a) readl(a)
 
+#define mmiowb()
+
 /*
  * If the platform has PC-like I/O, this function converts the offset into
  * an address.
diff -Nru a/include/asm-sh64/io.h b/include/asm-sh64/io.h
--- a/include/asm-sh64/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-sh64/io.h	2004-09-29 15:51:10 -07:00
@@ -86,6 +86,9 @@
 #define readb(addr)		sh64_in8(addr)
 #define readw(addr)		sh64_in16(addr)
 #define readl(addr)		sh64_in32(addr)
+#define readb_relaxed(addr)		sh64_in8(addr)
+#define readw_relaxed(addr)		sh64_in16(addr)
+#define readl_relaxed(addr)		sh64_in32(addr)
 
 #define writeb(b, addr)		sh64_out8(b, addr)
 #define writew(b, addr)		sh64_out16(b, addr)
@@ -105,6 +108,8 @@
 void outb(unsigned long value, unsigned long port);
 void outw(unsigned long value, unsigned long port);
 void outl(unsigned long value, unsigned long port);
+
+#define mmiowb()
 
 #ifdef __KERNEL__
 
diff -Nru a/include/asm-sparc/io.h b/include/asm-sparc/io.h
--- a/include/asm-sparc/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-sparc/io.h	2004-09-29 15:51:10 -07:00
@@ -23,6 +23,8 @@
 	return ((w&0xff) << 8) | ((w>>8)&0xff);
 }
 
+#define mmiowb()
+
 /*
  * Memory mapped I/O to PCI
  *
diff -Nru a/include/asm-sparc64/io.h b/include/asm-sparc64/io.h
--- a/include/asm-sparc64/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-sparc64/io.h	2004-09-29 15:51:10 -07:00
@@ -439,6 +439,8 @@
 	return retval;
 }
 
+#define mmiowb()
+
 #ifdef __KERNEL__
 
 /* On sparc64 we have the whole physical IO address space accessible
diff -Nru a/include/asm-v850/io.h b/include/asm-v850/io.h
--- a/include/asm-v850/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-v850/io.h	2004-09-29 15:51:10 -07:00
@@ -102,6 +102,8 @@
 #define ioremap_writethrough(physaddr, size)	(physaddr)
 #define ioremap_fullcache(physaddr, size)	(physaddr)
 
+#define mmiowb()
+
 #define page_to_phys(page)      ((page - mem_map) << PAGE_SHIFT)
 #if 0
 /* This is really stupid; don't define it.  */
diff -Nru a/include/asm-x86_64/io.h b/include/asm-x86_64/io.h
--- a/include/asm-x86_64/io.h	2004-09-29 15:51:10 -07:00
+++ b/include/asm-x86_64/io.h	2004-09-29 15:51:10 -07:00
@@ -186,6 +186,8 @@
 #define __raw_readl readl
 #define __raw_readq readq
 
+#define mmiowb()
+
 #ifdef CONFIG_UNORDERED_IO
 static inline void __writel(u32 val, void *addr)
 {

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-29 20:43     ` Jesse Barnes
@ 2004-09-29 20:50       ` David S. Miller
  2004-09-30  2:23         ` Greg Banks
  0 siblings, 1 reply; 45+ messages in thread
From: David S. Miller @ 2004-09-29 20:50 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: gnb, akpm, linux-kernel, jeremy, johnip, netdev

On Wed, 29 Sep 2004 13:43:55 -0700
Jesse Barnes <jbarnes@engr.sgi.com> wrote:

> The patch that actually implements mmiowb() already does this, I think Greg 
> just used his patch for testing.  The proper way to do it of course is to 
> just use mmiowb() where needed in tg3 after the write barrier patch gets in.

Perfect, please send me a tg3 patch once the mmiowb() bits
go into the tree.

Thanks a lot.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-29 20:35   ` David S. Miller
@ 2004-09-29 20:43     ` Jesse Barnes
  2004-09-29 20:50       ` David S. Miller
  0 siblings, 1 reply; 45+ messages in thread
From: Jesse Barnes @ 2004-09-29 20:43 UTC (permalink / raw)
  To: David S. Miller; +Cc: Greg Banks, akpm, linux-kernel, jeremy, johnip, netdev

On Wednesday, September 29, 2004 1:35 pm, David S. Miller wrote:
> On Wed, 29 Sep 2004 20:36:46 +1000
>
> Greg Banks <gnb@sgi.com> wrote:
> > Ok, here's a patch for the tg3 network driver to use mmiowb().  Tests
> > over the last couple of days has shown that it solves the oopses in
> > tg3_tx() that I reported and attempted to patch some time ago:
> >
> > http://marc.theaimsgroup.com/?l=linux-netdev&m=108538612421774&w=2
> >
> > The CPU usage of the mmiowb() approach is also significantly better
> > than doing PCI reads to flush the writes (by setting the existing
> > TG3_FLAG_MBOX_WRITE_REORDER flag).  In an artificial CPU-constrained
> > test on a ProPack kernel, the same amount of CPU work for the REORDER
> > solution pushes 85.1 MB/s over 2 NICs compared to 146.5 MB/s for the
> > mmiowb() solution.
>
> Please put this macro in asm/io.h or similar and make sure
> every platform has it implemented or provides a NOP version.

The patch that actually implements mmiowb() already does this, I think Greg 
just used his patch for testing.  The proper way to do it of course is to 
just use mmiowb() where needed in tg3 after the write barrier patch gets in.

> A lot of people are going to get this wrong btw.  The only
> way it's really going to be cured across the board is if someone
> like yourself who understands this audits all of the drivers.

Yep, just like PCI posting (though many people seem to have a grasp on that 
now).

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-29 10:36 ` Greg Banks
@ 2004-09-29 20:35   ` David S. Miller
  2004-09-29 20:43     ` Jesse Barnes
  0 siblings, 1 reply; 45+ messages in thread
From: David S. Miller @ 2004-09-29 20:35 UTC (permalink / raw)
  To: Greg Banks; +Cc: jbarnes, akpm, linux-kernel, jeremy, johnip, netdev

On Wed, 29 Sep 2004 20:36:46 +1000
Greg Banks <gnb@sgi.com> wrote:

> Ok, here's a patch for the tg3 network driver to use mmiowb().  Tests
> over the last couple of days has shown that it solves the oopses in
> tg3_tx() that I reported and attempted to patch some time ago:
> 
> http://marc.theaimsgroup.com/?l=linux-netdev&m=108538612421774&w=2
> 
> The CPU usage of the mmiowb() approach is also significantly better
> than doing PCI reads to flush the writes (by setting the existing
> TG3_FLAG_MBOX_WRITE_REORDER flag).  In an artificial CPU-constrained
> test on a ProPack kernel, the same amount of CPU work for the REORDER
> solution pushes 85.1 MB/s over 2 NICs compared to 146.5 MB/s for the
> mmiowb() solution.

Please put this macro in asm/io.h or similar and make sure
every platform has it implemented or provides a NOP version.

A lot of people are going to get this wrong btw.  The only
way it's really going to be cured across the board is if someone
like yourself who understands this audits all of the drivers.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* Re: [PATCH] I/O space write barrier
  2004-09-27 18:03 Jesse Barnes
@ 2004-09-29 10:36 ` Greg Banks
  2004-09-29 20:35   ` David S. Miller
  2004-09-29 22:55 ` Jesse Barnes
  1 sibling, 1 reply; 45+ messages in thread
From: Greg Banks @ 2004-09-29 10:36 UTC (permalink / raw)
  To: Jesse Barnes
  Cc: akpm, linux-kernel, jeremy, John Partridge, David S. Miller,
	Linux Network Development List

G'day,

On Mon, Sep 27, 2004 at 11:03:39AM -0700, Jesse Barnes wrote:
> On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to 
> I/O space aren't ordered coming from different CPUs.  For the most part, this 
> isn't a problem since drivers generally spinlock around code that does writeX 
> calls, but if the last operation a driver does before it releases a lock is a 
> write and some other CPU takes the lock and immediately does a write, it's 
> possible the second CPU's write could arrive before the first's.
> 
> This patch adds a mmiowb() call to deal with this sort of situation, and 
> adds some documentation describing I/O ordering issues to deviceiobook.tmpl.  
> The idea is to mirror the regular, cacheable memory barrier operation, wmb.  
>[...]
> Patches to use this new primitive in various drivers will come separately, 
> probably via the SCSI tree.

Ok, here's a patch for the tg3 network driver to use mmiowb().  Tests
over the last couple of days has shown that it solves the oopses in
tg3_tx() that I reported and attempted to patch some time ago:

http://marc.theaimsgroup.com/?l=linux-netdev&m=108538612421774&w=2

The CPU usage of the mmiowb() approach is also significantly better
than doing PCI reads to flush the writes (by setting the existing
TG3_FLAG_MBOX_WRITE_REORDER flag).  In an artificial CPU-constrained
test on a ProPack kernel, the same amount of CPU work for the REORDER
solution pushes 85.1 MB/s over 2 NICs compared to 146.5 MB/s for the
mmiowb() solution.


--- linux.orig/drivers/net/tg3.c	2004-09-22 17:20:45.%N +1000
+++ linux/drivers/net/tg3.c	2004-09-29 19:45:16.%N +1000
@@ -44,6 +44,19 @@
 #include <asm/pbm.h>
 #endif
 
+#ifndef mmiowb
+/*
+ * mmiowb() is a memory-mapped I/O write boundary, useful for
+ * preserving send ring update ordering between multiple CPUs
+ * Define it if it doesn't exist.
+ */
+#ifdef CONFIG_IA64_SGI_SN2
+#define mmiowb()    sn_mmiob()
+#else
+#define mmiowb()
+#endif
+#endif
+
 #if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
 #define TG3_VLAN_TAG_USED 1
 #else
@@ -2725,6 +2738,7 @@ next_pkt_nopost:
 		tw32_rx_mbox(MAILBOX_RCV_JUMBO_PROD_IDX + TG3_64BIT_REG_LOW,
 			     sw_idx);
 	}
+	mmiowb();
 
 	return received;
 }
@@ -3172,6 +3186,7 @@ static int tg3_start_xmit(struct sk_buff
 		netif_stop_queue(dev);
 
 out_unlock:
+    	mmiowb();
 	spin_unlock_irqrestore(&tp->tx_lock, flags);
 
 	dev->trans_start = jiffies;


Greg.
-- 
Greg Banks, R&D Software Engineer, SGI Australian Software Group.
I don't speak for SGI.

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] I/O space write barrier
@ 2004-09-27 18:03 Jesse Barnes
  2004-09-29 10:36 ` Greg Banks
  2004-09-29 22:55 ` Jesse Barnes
  0 siblings, 2 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-09-27 18:03 UTC (permalink / raw)
  To: akpm, linux-kernel; +Cc: gnb, jeremy

[-- Attachment #1: Type: text/plain, Size: 2421 bytes --]

Received a few comments and no complaints, so I'm sending this patch in for 
inclusion.

On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to 
I/O space aren't ordered coming from different CPUs.  For the most part, this 
isn't a problem since drivers generally spinlock around code that does writeX 
calls, but if the last operation a driver does before it releases a lock is a 
write and some other CPU takes the lock and immediately does a write, it's 
possible the second CPU's write could arrive before the first's.

This patch adds a mmiowb() call to deal with this sort of situation, and 
adds some documentation describing I/O ordering issues to deviceiobook.tmpl.  
The idea is to mirror the regular, cacheable memory barrier operation, wmb.  
Example of the problem this new macro solves:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

In this case, newval2 could be written to ring_ptr before newval.  Fixing it 
is easy though:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  mmiowb();
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Note that this doesn't address a related case where the driver may want to 
actually make a given write get to the device before proceeding.  This should 
be dealt with by immediately reading a register from the card that has no 
side effects.  According to the PCI spec, that will guarantee that all writes 
have arrived before being sent to the target bus.  If no such register is 
available (in the case of card resets perhaps), reading from config space is 
sufficient (though it may return all ones if the card isn't responding to 
read cycles).  I've tried to describe how mmiowb() differs from PCI posted 
write flushing in the patch to deviceiobook.tmpl.

Patches to use this new primitive in various drivers will come separately, 
probably via the SCSI tree.

Signed-off-by: Jesse Barnes <jbarnes@sgi.com>

Thanks,
Jesse

[-- Attachment #2: mmiowb-3.patch --]
[-- Type: text/plain, Size: 21370 bytes --]

diff -Nru a/Documentation/DocBook/deviceiobook.tmpl b/Documentation/DocBook/deviceiobook.tmpl
--- a/Documentation/DocBook/deviceiobook.tmpl	2004-09-27 10:48:41 -07:00
+++ b/Documentation/DocBook/deviceiobook.tmpl	2004-09-27 10:48:41 -07:00
@@ -147,8 +147,7 @@
 	compiler is not permitted to reorder the I/O sequence. When the 
 	ordering can be compiler optimised, you can use <function>
 	__readb</function> and friends to indicate the relaxed ordering. Use 
-	this with care. The <function>rmb</function> provides a read memory 
-	barrier. The <function>wmb</function> provides a write memory barrier.
+	this with care.
       </para>
 
       <para>
@@ -159,10 +158,72 @@
 	asynchronously. A driver author must issue a read from the same
 	device to ensure that writes have occurred in the specific cases the
 	author cares. This kind of property cannot be hidden from driver
-	writers in the API.
+	writers in the API.  In some cases, the read used to flush the device
+	may be expected to fail (if the card is resetting, for example).  In
+	that case, the read should be done from config space, which is
+	guaranteed to soft-fail if the card doesn't respond.
       </para>
 
       <para>
+	The following is an example of flushing a write to a device when
+	the driver would like to ensure the write's effects are visible prior
+	to continuing execution.
+      </para>
+
+<programlisting>
+static inline void
+qla1280_disable_intrs(struct scsi_qla_host *ha)
+{
+	struct device_reg *reg;
+
+	reg = ha->iobase;
+	/* disable risc and host interrupts */
+	WRT_REG_WORD(&reg->ictrl, 0);
+	/*
+	 * The following read will ensure that the above write
+	 * has been received by the device before we return from this
+	 * function.
+	 */
+	RD_REG_WORD(&reg->ictrl);
+	ha->flags.ints_enabled = 0;
+}
+</programlisting>
+
+      <para>
+	In addition to write posting, some large SMP systems (e.g. SGI
+	Challenge, Origin and Altix machines) won't strongly order writes
+	coming from different CPUs.  Thus it's important to properly
+	protect parts of your driver that do memory-mapped writes with
+	locks and use the <function>mmiob</function> to make sure they
+	arrive in the order intended.
+      </para>
+
+      <para>
+	Generally, one should use <function>mmiowb</function> prior to
+	releasing a spinlock that protects regions using <function>writeb
+	</function> or similar functions that aren't surrounded by <function>
+	readb</function> calls, which will ensure ordering and flushing.  The
+	following example (again from qla1280.c) illustrates its use.
+      </para>
+
+<programlisting>
+       sp->flags |= SRB_SENT;
+       ha->actthreads++;
+       WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
+
+       /*
+        * A Memory Mapped I/O Write Barrier is needed to ensure that this write
+        * of the request queue in register is ordered ahead of writes issued
+        * after this one by other CPUs.  Access to the register is protected
+        * by the host_lock.  Without the mmiowb, however, it is possible for
+        * this CPU to release the host lock, another CPU acquire the host lock,
+        * and write to the request queue in, and have the second write make it
+        * to the chip first.
+        */
+       mmiowb(); /* posted write ordering */
+</programlisting>
+
+      <para>
 	PCI ordering rules also guarantee that PIO read responses arrive
 	after any outstanding DMA writes on that bus, since for some devices
 	the result of a <function>readb</function> call may signal to the
@@ -171,7 +232,9 @@
 	<function>readb</function> call has no relation to any previous DMA
 	writes performed by the device.  The driver can use
 	<function>readb_relaxed</function> for these cases, although only
-	some platforms will honor the relaxed semantics.
+	some platforms will honor the relaxed semantics.  Using the relaxed
+	read functions will provide significant performance benefits on
+	platforms that support it.
       </para>
     </sect1>
 
diff -Nru a/Documentation/io_ordering.txt b/Documentation/io_ordering.txt
--- a/Documentation/io_ordering.txt	2004-09-27 10:48:41 -07:00
+++ /dev/null	Wed Dec 31 16:00:00 196900
@@ -1,47 +0,0 @@
-On some platforms, so-called memory-mapped I/O is weakly ordered.  On such
-platforms, driver writers are responsible for ensuring that I/O writes to
-memory-mapped addresses on their device arrive in the order intended.  This is
-typically done by reading a 'safe' device or bridge register, causing the I/O
-chipset to flush pending writes to the device before any reads are posted.  A
-driver would usually use this technique immediately prior to the exit of a
-critical section of code protected by spinlocks.  This would ensure that
-subsequent writes to I/O space arrived only after all prior writes (much like a
-memory barrier op, mb(), only with respect to I/O).
-
-A more concrete example from a hypothetical device driver:
-
-        ...
-CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
-CPU A:  ...
-CPU A:  writel(newval, ring_ptr);
-CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-        ...
-CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
-CPU B:  ...
-CPU B:  writel(newval2, ring_ptr);
-CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-        ...
-
-In the case above, the device may receive newval2 before it receives newval,
-which could cause problems.  Fixing it is easy enough though:
-
-        ...
-CPU A:  spin_lock_irqsave(&dev_lock, flags)
-CPU A:  val = readl(my_status);
-CPU A:  ...
-CPU A:  writel(newval, ring_ptr);
-CPU A:  (void)readl(safe_register); /* maybe a config register? */
-CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
-        ...
-CPU B:  spin_lock_irqsave(&dev_lock, flags)
-CPU B:  val = readl(my_status);
-CPU B:  ...
-CPU B:  writel(newval2, ring_ptr);
-CPU B:  (void)readl(safe_register); /* maybe a config register? */
-CPU B:  spin_unlock_irqrestore(&dev_lock, flags)
-
-Here, the reads from safe_register will cause the I/O chipset to flush any
-pending writes before actually posting the read to the chipset, preventing
-possible data corruption.
diff -Nru a/arch/ia64/sn/io/machvec/iomv.c b/arch/ia64/sn/io/machvec/iomv.c
--- a/arch/ia64/sn/io/machvec/iomv.c	2004-09-27 10:48:41 -07:00
+++ b/arch/ia64/sn/io/machvec/iomv.c	2004-09-27 10:48:41 -07:00
@@ -54,23 +54,19 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_mmiowb - I/O space write barrier
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/DocBook/deviceiobook.tmpl
+ * for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void
-sn_mmiob (void)
+void __sn_mmiowb(void)
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
 		cpu_relax();
 }
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_mmiowb);
diff -Nru a/include/asm-alpha/io.h b/include/asm-alpha/io.h
--- a/include/asm-alpha/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-alpha/io.h	2004-09-27 10:48:41 -07:00
@@ -489,6 +489,8 @@
 #define readl_relaxed(addr) __raw_readl(addr)
 #define readq_relaxed(addr) __raw_readq(addr)
 
+#define mmiowb()
+
 /*
  * String version of IO memory access ops:
  */
diff -Nru a/include/asm-arm/io.h b/include/asm-arm/io.h
--- a/include/asm-arm/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-arm/io.h	2004-09-27 10:48:41 -07:00
@@ -135,6 +135,8 @@
 extern void _memcpy_toio(unsigned long, const void *, size_t);
 extern void _memset_io(unsigned long, int, size_t);
 
+#define mmiowb()
+
 /*
  *  Memory access primitives
  *  ------------------------
diff -Nru a/include/asm-arm26/io.h b/include/asm-arm26/io.h
--- a/include/asm-arm26/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-arm26/io.h	2004-09-27 10:48:41 -07:00
@@ -320,6 +320,8 @@
 #define writesw(p,d,l)                        __readwrite_bug("writesw")
 #define writesl(p,d,l)                        __readwrite_bug("writesl")
 
+#define mmiowb()
+
 /* the following macro is depreciated */
 #define ioaddr(port)                    __ioaddr((port))
 
diff -Nru a/include/asm-cris/io.h b/include/asm-cris/io.h
--- a/include/asm-cris/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-cris/io.h	2004-09-27 10:48:41 -07:00
@@ -56,6 +56,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 #define memset_io(a,b,c)	memset((void *)(a),(b),(c))
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
diff -Nru a/include/asm-h8300/io.h b/include/asm-h8300/io.h
--- a/include/asm-h8300/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-h8300/io.h	2004-09-27 10:48:41 -07:00
@@ -200,6 +200,8 @@
 #define memcpy_fromio(a,b,c)	memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb()
+
 #define inb(addr)    ((h8300_buswidth(addr))?readw((addr) & ~1) & 0xff:readb(addr))
 #define inw(addr)    _swapw(readw(addr))
 #define inl(addr)    _swapl(readl(addr))
diff -Nru a/include/asm-i386/io.h b/include/asm-i386/io.h
--- a/include/asm-i386/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-i386/io.h	2004-09-27 10:48:41 -07:00
@@ -178,6 +178,8 @@
 #define __raw_writew writew
 #define __raw_writel writel
 
+#define mmiowb()
+
 static inline void memset_io(volatile void __iomem *addr, unsigned char val, int count)
 {
 	memset((void __force *) addr, val, count);
diff -Nru a/include/asm-ia64/io.h b/include/asm-ia64/io.h
--- a/include/asm-ia64/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ia64/io.h	2004-09-27 10:48:41 -07:00
@@ -91,6 +91,20 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ *
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
+static inline void __ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +281,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +295,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
diff -Nru a/include/asm-ia64/machvec.h b/include/asm-ia64/machvec.h
--- a/include/asm-ia64/machvec.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ia64/machvec.h	2004-09-27 10:48:41 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_mmiowb_t (void);
 typedef unsigned char ia64_mv_readb_t (void *);
 typedef unsigned short ia64_mv_readw_t (void *);
 typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_mmiowb	ia64_mv.mmiowb
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_mmiowb_t *mmiowb;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_mmiowb,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_mmiowb
+# define platform_mmiowb	__ia64_mmiowb
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
diff -Nru a/include/asm-ia64/machvec_init.h b/include/asm-ia64/machvec_init.h
--- a/include/asm-ia64/machvec_init.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ia64/machvec_init.h	2004-09-27 10:48:41 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_mmiowb_t __ia64_mmiowb;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
diff -Nru a/include/asm-ia64/machvec_sn2.h b/include/asm-ia64/machvec_sn2.h
--- a/include/asm-ia64/machvec_sn2.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ia64/machvec_sn2.h	2004-09-27 10:48:41 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_mmiowb_t __sn_mmiowb;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_mmiowb			__sn_mmiowb
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
diff -Nru a/include/asm-ia64/sn/io.h b/include/asm-ia64/sn/io.h
--- a/include/asm-ia64/sn/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ia64/sn/io.h	2004-09-27 10:48:41 -07:00
@@ -58,8 +58,8 @@
 #include <asm/sn/sn2/shubio.h>
 
 /*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
  */
-extern void sn_mmiob(void);
+extern void __sn_mmiowb(void);
 
 #endif /* _ASM_IA64_SN_IO_H */
diff -Nru a/include/asm-ia64/sn/sn2/io.h b/include/asm-ia64/sn/sn2/io.h
--- a/include/asm-ia64/sn/sn2/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ia64/sn/sn2/io.h	2004-09-27 10:48:41 -07:00
@@ -11,8 +11,10 @@
 #include <linux/compiler.h>
 #include <asm/intrinsics.h>
 
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_mmiowb(void);
 
 #define __sn_mf_a()   ia64_mfa()
 
@@ -91,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -102,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -113,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
diff -Nru a/include/asm-m32r/io.h b/include/asm-m32r/io.h
--- a/include/asm-m32r/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-m32r/io.h	2004-09-27 10:48:41 -07:00
@@ -150,6 +150,9 @@
 #define __raw_readb readb
 #define __raw_readw readw
 #define __raw_readl readl
+#define readb_relaxed readb
+#define readw_relaxed readw
+#define readl_relaxed readl
 
 #define writeb(val, addr)  _writeb((val), (unsigned long)(addr))
 #define writew(val, addr)  _writew((val), (unsigned long)(addr))
@@ -157,6 +160,8 @@
 #define __raw_writeb writeb
 #define __raw_writew writew
 #define __raw_writel writel
+
+#define mmiowb()
 
 #define flush_write_buffers() do { } while (0)  /* M32R_FIXME */
 
diff -Nru a/include/asm-m68k/io.h b/include/asm-m68k/io.h
--- a/include/asm-m68k/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-m68k/io.h	2004-09-27 10:48:41 -07:00
@@ -306,6 +306,7 @@
 #endif
 #endif /* CONFIG_PCI */
 
+#define mmiowb()
 
 static inline void *ioremap(unsigned long physaddr, unsigned long size)
 {
diff -Nru a/include/asm-m68knommu/io.h b/include/asm-m68knommu/io.h
--- a/include/asm-m68knommu/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-m68knommu/io.h	2004-09-27 10:48:41 -07:00
@@ -102,6 +102,8 @@
 		*bp++ = _swapl(*ap);
 }
 
+#define mmiowb()
+
 /*
  *	make the short names macros so specific devices
  *	can override them as required
diff -Nru a/include/asm-mips/io.h b/include/asm-mips/io.h
--- a/include/asm-mips/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-mips/io.h	2004-09-27 10:48:41 -07:00
@@ -290,6 +290,9 @@
 #define __raw_writeb(b,addr)	((*(volatile unsigned char *)(addr)) = (b))
 #define __raw_writew(w,addr)	((*(volatile unsigned short *)(addr)) = (w))
 #define __raw_writel(l,addr)	((*(volatile unsigned int *)(addr)) = (l))
+
+#define mmiowb() asm volatile ("sync" ::: "memory")
+
 #ifdef CONFIG_MIPS32
 #define ____raw_writeq(val,addr)						\
 ({									\
diff -Nru a/include/asm-parisc/io.h b/include/asm-parisc/io.h
--- a/include/asm-parisc/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-parisc/io.h	2004-09-27 10:48:41 -07:00
@@ -177,6 +177,8 @@
 #define readl_relaxed(addr) readl(addr)
 #define readq_relaxed(addr) readq(addr)
 
+#define mmiowb()
+
 extern void __memcpy_fromio(unsigned long dest, unsigned long src, int count);
 extern void __memcpy_toio(unsigned long dest, unsigned long src, int count);
 extern void __memset_io(unsigned long dest, char fill, int count);
diff -Nru a/include/asm-ppc/io.h b/include/asm-ppc/io.h
--- a/include/asm-ppc/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ppc/io.h	2004-09-27 10:48:41 -07:00
@@ -197,6 +197,8 @@
 #define memcpy_fromio(a,b,c)   memcpy((a),(void *)(b),(c))
 #define memcpy_toio(a,b,c)	memcpy((void *)(a),(b),(c))
 
+#define mmiowb() asm volatile ("eieio" ::: "memory")
+
 /*
  * Map in an area of physical address space, for accessing
  * I/O devices etc.
diff -Nru a/include/asm-ppc64/io.h b/include/asm-ppc64/io.h
--- a/include/asm-ppc64/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-ppc64/io.h	2004-09-27 10:48:41 -07:00
@@ -152,6 +152,8 @@
 extern void _insl_ns(volatile u32 *port, void *buf, int nl);
 extern void _outsl_ns(volatile u32 *port, const void *buf, int nl);
 
+#define mmiowb() asm volatile ("eieio" ::: "memory")
+
 /*
  * output pause versions need a delay at least for the
  * w83c105 ide controller in a p610.
diff -Nru a/include/asm-s390/io.h b/include/asm-s390/io.h
--- a/include/asm-s390/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-s390/io.h	2004-09-27 10:48:41 -07:00
@@ -105,6 +105,8 @@
 #define outb(x,addr) ((void) writeb(x,addr))
 #define outb_p(x,addr) outb(x,addr)
 
+#define mmiowb()
+
 #endif /* __KERNEL__ */
 
 #endif
diff -Nru a/include/asm-sh/io.h b/include/asm-sh/io.h
--- a/include/asm-sh/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-sh/io.h	2004-09-27 10:48:41 -07:00
@@ -134,6 +134,8 @@
 #define readw_relaxed(a) readw(a)
 #define readl_relaxed(a) readl(a)
 
+#define mmiowb()
+
 /*
  * If the platform has PC-like I/O, this function converts the offset into
  * an address.
diff -Nru a/include/asm-sh64/io.h b/include/asm-sh64/io.h
--- a/include/asm-sh64/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-sh64/io.h	2004-09-27 10:48:41 -07:00
@@ -86,6 +86,9 @@
 #define readb(addr)		sh64_in8(addr)
 #define readw(addr)		sh64_in16(addr)
 #define readl(addr)		sh64_in32(addr)
+#define readb_relaxed(addr)		sh64_in8(addr)
+#define readw_relaxed(addr)		sh64_in16(addr)
+#define readl_relaxed(addr)		sh64_in32(addr)
 
 #define writeb(b, addr)		sh64_out8(b, addr)
 #define writew(b, addr)		sh64_out16(b, addr)
@@ -105,6 +108,8 @@
 void outb(unsigned long value, unsigned long port);
 void outw(unsigned long value, unsigned long port);
 void outl(unsigned long value, unsigned long port);
+
+#define mmiowb()
 
 #ifdef __KERNEL__
 
diff -Nru a/include/asm-sparc/io.h b/include/asm-sparc/io.h
--- a/include/asm-sparc/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-sparc/io.h	2004-09-27 10:48:41 -07:00
@@ -23,6 +23,8 @@
 	return ((w&0xff) << 8) | ((w>>8)&0xff);
 }
 
+#define mmiowb()
+
 /*
  * Memory mapped I/O to PCI
  *
diff -Nru a/include/asm-sparc64/io.h b/include/asm-sparc64/io.h
--- a/include/asm-sparc64/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-sparc64/io.h	2004-09-27 10:48:41 -07:00
@@ -439,6 +439,8 @@
 	return retval;
 }
 
+#define mmiowb()
+
 #ifdef __KERNEL__
 
 /* On sparc64 we have the whole physical IO address space accessible
diff -Nru a/include/asm-v850/io.h b/include/asm-v850/io.h
--- a/include/asm-v850/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-v850/io.h	2004-09-27 10:48:41 -07:00
@@ -102,6 +102,8 @@
 #define ioremap_writethrough(physaddr, size)	(physaddr)
 #define ioremap_fullcache(physaddr, size)	(physaddr)
 
+#define mmiowb()
+
 #define page_to_phys(page)      ((page - mem_map) << PAGE_SHIFT)
 #if 0
 /* This is really stupid; don't define it.  */
diff -Nru a/include/asm-x86_64/io.h b/include/asm-x86_64/io.h
--- a/include/asm-x86_64/io.h	2004-09-27 10:48:41 -07:00
+++ b/include/asm-x86_64/io.h	2004-09-27 10:48:41 -07:00
@@ -186,6 +186,8 @@
 #define __raw_readl readl
 #define __raw_readq readq
 
+#define mmiowb()
+
 #ifdef CONFIG_UNORDERED_IO
 static inline void __writel(u32 val, void *addr)
 {

^ permalink raw reply	[flat|nested] 45+ messages in thread

* [PATCH] I/O space write barrier
@ 2004-09-22 15:45 Jesse Barnes
  0 siblings, 0 replies; 45+ messages in thread
From: Jesse Barnes @ 2004-09-22 15:45 UTC (permalink / raw)
  To: linux-kernel; +Cc: Jeremy Higdon

[-- Attachment #1: Type: text/plain, Size: 2430 bytes --]

On some platforms (e.g. SGI Challenge, Origin, and Altix machines), writes to 
I/O space aren't ordered coming from different CPUs.  For the most part, this 
isn't a problem since drivers generally spinlock around code that does writeX 
calls, but if the last operation a driver does before it releases a lock is a 
write and some other CPU takes the lock and immediately does a write, it's 
possible the second CPU's write could arrive before the first's.

This patch adds a mmiowb() call to deal with this sort of situation, and 
modifies the qla1280 driver to use it.  The idea is to mirror the regular, 
cacheable memory barrier operation, wmb (arg!  I misnamed it in my patch, 
I'll fix that up in the next rev assuming it's ok).  Example:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

In this case, newval2 could be written to ring_ptr before newval.  Fixing it 
is easy though:

CPU A:  spin_lock_irqsave(&dev_lock, flags)
CPU A:  ...
CPU A:  writel(newval, ring_ptr);
CPU A:  mmiowb(); /* ensure no other writes beat us to the device */
CPU A:  spin_unlock_irqrestore(&dev_lock, flags)
        ...
CPU B:  spin_lock_irqsave(&dev_lock, flags)
CPU B:  writel(newval2, ring_ptr);
CPU B:  ...
CPU B:  mmiowb();
CPU B:  spin_unlock_irqrestore(&dev_lock, flags)

Note that this doesn't address a related case where the driver may want to 
actually make a given write get to the device before proceeding.  This should 
be dealt with by immediately reading a register from the card that has no 
side effects.  According to the PCI spec, that will guarantee that all writes 
have arrived before being sent to the target bus.  If no such register is 
available (in the case of card resets perhaps), reading from config space is 
sufficient (though it may return all ones if the card isn't responding to 
read cycles).

I'm also not quite sure if I got the ppc64 case right (and obviously I need to 
add the macro for other platforms).  The assumption is that *most* platforms 
don't have this issue, and those that do have a chipset register or CPU 
instruction to ensure ordering.  If that turns out not to be the case, we can 
add an address argument to the function.

Thanks,
Jesse

[-- Attachment #2: mmiowb.patch --]
[-- Type: text/x-diff, Size: 10164 bytes --]

===== Documentation/DocBook/deviceiobook.tmpl 1.4 vs edited =====
--- 1.4/Documentation/DocBook/deviceiobook.tmpl	2004-02-03 21:31:10 -08:00
+++ edited/Documentation/DocBook/deviceiobook.tmpl	2004-09-22 07:59:56 -07:00
@@ -147,8 +147,7 @@
 	compiler is not permitted to reorder the I/O sequence. When the 
 	ordering can be compiler optimised, you can use <function>
 	__readb</function> and friends to indicate the relaxed ordering. Use 
-	this with care. The <function>rmb</function> provides a read memory 
-	barrier. The <function>wmb</function> provides a write memory barrier.
+	this with care.
       </para>
 
       <para>
@@ -163,6 +162,15 @@
       </para>
 
       <para>
+	In addition to write posting, some large SMP systems (e.g. SGI
+	Challenge, Origin and Altix machines) won't strongly order writes
+	coming from different CPUs.  Thus it's important to properly
+	protect parts of your driver that do memory-mapped writes with
+	locks and use the <function>mmiob</function> to make sure they
+	arrive in the order intended.
+      </para>
+
+      <para>
 	PCI ordering rules also guarantee that PIO read responses arrive
 	after any outstanding DMA writes on that bus, since for some devices
 	the result of a <function>readb</function> call may signal to the
@@ -171,7 +179,9 @@
 	<function>readb</function> call has no relation to any previous DMA
 	writes performed by the device.  The driver can use
 	<function>readb_relaxed</function> for these cases, although only
-	some platforms will honor the relaxed semantics.
+	some platforms will honor the relaxed semantics.  Using the relaxed
+	read functions will provide significant performance benefits on
+	platforms that support it.
       </para>
     </sect1>
 
===== arch/ia64/sn/io/machvec/iomv.c 1.9 vs edited =====
--- 1.9/arch/ia64/sn/io/machvec/iomv.c	2004-05-26 06:49:19 -07:00
+++ edited/arch/ia64/sn/io/machvec/iomv.c	2004-09-22 08:33:35 -07:00
@@ -54,23 +54,19 @@
 EXPORT_SYMBOL(sn_io_addr);
 
 /**
- * sn_mmiob - I/O space memory barrier
+ * __sn_mmiowb - I/O space write barrier
  *
- * Acts as a memory mapped I/O barrier for platforms that queue writes to 
- * I/O space.  This ensures that subsequent writes to I/O space arrive after
- * all previous writes.  For most ia64 platforms, this is a simple
- * 'mf.a' instruction.  For other platforms, mmiob() may have to read
- * a chipset register to ensure ordering.
+ * See include/asm-ia64/io.h and Documentation/DocBook/deviceiobook.tmpl
+ * for details.
  *
  * On SN2, we wait for the PIO_WRITE_STATUS SHub register to clear.
  * See PV 871084 for details about the WAR about zero value.
  *
  */
-void
-sn_mmiob (void)
+void __sn_mmiowb(void)
 {
 	while ((((volatile unsigned long) (*pda->pio_write_status_addr)) & SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK) != 
 				SH_PIO_WRITE_STATUS_0_PENDING_WRITE_COUNT_MASK)
 		cpu_relax();
 }
-EXPORT_SYMBOL(sn_mmiob);
+EXPORT_SYMBOL(__sn_mmiowb);
===== drivers/scsi/qla1280.c 1.65 vs edited =====
--- 1.65/drivers/scsi/qla1280.c	2004-07-28 20:59:10 -07:00
+++ edited/drivers/scsi/qla1280.c	2004-09-22 08:31:46 -07:00
@@ -1715,7 +1715,7 @@
 	reg = ha->iobase;
 	/* enable risc and host interrupts */
 	WRT_REG_WORD(&reg->ictrl, (ISP_EN_INT | ISP_EN_RISC));
-	RD_REG_WORD(&reg->ictrl);	/* PCI Posted Write flush */
+	mmiowb(); /* make sure this write arrives before any others */
 	ha->flags.ints_enabled = 1;
 }
 
@@ -1727,7 +1727,7 @@
 	reg = ha->iobase;
 	/* disable risc and host interrupts */
 	WRT_REG_WORD(&reg->ictrl, 0);
-	RD_REG_WORD(&reg->ictrl);	/* PCI Posted Write flush */
+	mmiowb(); /* make sure this write arrives before any others */
 	ha->flags.ints_enabled = 0;
 }
 
@@ -3398,7 +3398,7 @@
 	sp->flags |= SRB_SENT;
 	ha->actthreads++;
 	WRT_REG_WORD(&reg->mailbox4, ha->req_ring_index);
-	(void) RD_REG_WORD(&reg->mailbox4); /* PCI posted write flush */
+	mmiowb(); /* make sure this write arrives before any others */
 
  out:
 	if (status)
===== include/asm-ia64/io.h 1.19 vs edited =====
--- 1.19/include/asm-ia64/io.h	2004-02-03 21:31:10 -08:00
+++ edited/include/asm-ia64/io.h	2004-09-22 08:11:50 -07:00
@@ -91,6 +91,20 @@
  */
 #define __ia64_mf_a()	ia64_mfa()
 
+/**
+ * __ia64_mmiowb - I/O write barrier
+ *
+ * Ensure ordering of I/O space writes.  This will make sure that writes
+ * following the barrier will arrive after all previous writes.  For most
+ * ia64 platforms, this is a simple 'mf.a' instruction.
+ *
+ * See Documentation/DocBook/deviceiobook.tmpl for more information.
+ */
+static inline void __ia64_mmiowb(void)
+{
+	ia64_mfa();
+}
+
 static inline const unsigned long
 __ia64_get_io_port_base (void)
 {
@@ -267,6 +281,7 @@
 #define __outb		platform_outb
 #define __outw		platform_outw
 #define __outl		platform_outl
+#define __mmiowb	platform_mmiowb
 
 #define inb(p)		__inb(p)
 #define inw(p)		__inw(p)
@@ -280,6 +295,7 @@
 #define outsb(p,s,c)	__outsb(p,s,c)
 #define outsw(p,s,c)	__outsw(p,s,c)
 #define outsl(p,s,c)	__outsl(p,s,c)
+#define mmiowb()	__mmiowb()
 
 /*
  * The address passed to these functions are ioremap()ped already.
===== include/asm-ia64/machvec.h 1.26 vs edited =====
--- 1.26/include/asm-ia64/machvec.h	2004-08-03 16:05:22 -07:00
+++ edited/include/asm-ia64/machvec.h	2004-09-22 08:12:33 -07:00
@@ -62,6 +62,7 @@
 typedef void ia64_mv_outb_t (unsigned char, unsigned long);
 typedef void ia64_mv_outw_t (unsigned short, unsigned long);
 typedef void ia64_mv_outl_t (unsigned int, unsigned long);
+typedef void ia64_mv_mmiowb_t (void);
 typedef unsigned char ia64_mv_readb_t (void *);
 typedef unsigned short ia64_mv_readw_t (void *);
 typedef unsigned int ia64_mv_readl_t (void *);
@@ -130,6 +131,7 @@
 #  define platform_outb		ia64_mv.outb
 #  define platform_outw		ia64_mv.outw
 #  define platform_outl		ia64_mv.outl
+#  define platform_mmiowb	ia64_mv.mmiowb
 #  define platform_readb        ia64_mv.readb
 #  define platform_readw        ia64_mv.readw
 #  define platform_readl        ia64_mv.readl
@@ -176,6 +178,7 @@
 	ia64_mv_outb_t *outb;
 	ia64_mv_outw_t *outw;
 	ia64_mv_outl_t *outl;
+	ia64_mv_mmiowb_t *mmiowb;
 	ia64_mv_readb_t *readb;
 	ia64_mv_readw_t *readw;
 	ia64_mv_readl_t *readl;
@@ -218,6 +221,7 @@
 	platform_outb,				\
 	platform_outw,				\
 	platform_outl,				\
+	platform_mmiowb,			\
 	platform_readb,				\
 	platform_readw,				\
 	platform_readl,				\
@@ -343,6 +347,9 @@
 #endif
 #ifndef platform_outl
 # define platform_outl		__ia64_outl
+#endif
+#ifndef platform_mmiowb
+# define platform_mmiowb	__ia64_mmiowb
 #endif
 #ifndef platform_readb
 # define platform_readb		__ia64_readb
===== include/asm-ia64/machvec_init.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/machvec_init.h	2004-02-06 00:30:24 -08:00
+++ edited/include/asm-ia64/machvec_init.h	2004-09-22 08:13:45 -07:00
@@ -12,6 +12,7 @@
 extern ia64_mv_outb_t __ia64_outb;
 extern ia64_mv_outw_t __ia64_outw;
 extern ia64_mv_outl_t __ia64_outl;
+extern ia64_mv_mmiowb_t __ia64_mmiowb;
 extern ia64_mv_readb_t __ia64_readb;
 extern ia64_mv_readw_t __ia64_readw;
 extern ia64_mv_readl_t __ia64_readl;
===== include/asm-ia64/machvec_sn2.h 1.14 vs edited =====
--- 1.14/include/asm-ia64/machvec_sn2.h	2004-07-10 17:14:00 -07:00
+++ edited/include/asm-ia64/machvec_sn2.h	2004-09-22 08:14:03 -07:00
@@ -49,6 +49,7 @@
 extern ia64_mv_outb_t __sn_outb;
 extern ia64_mv_outw_t __sn_outw;
 extern ia64_mv_outl_t __sn_outl;
+extern ia64_mv_mmiowb_t __sn_mmiowb;
 extern ia64_mv_readb_t __sn_readb;
 extern ia64_mv_readw_t __sn_readw;
 extern ia64_mv_readl_t __sn_readl;
@@ -92,6 +93,7 @@
 #define platform_outb			__sn_outb
 #define platform_outw			__sn_outw
 #define platform_outl			__sn_outl
+#define platform_mmiowb			__sn_mmiowb
 #define platform_readb			__sn_readb
 #define platform_readw			__sn_readw
 #define platform_readl			__sn_readl
===== include/asm-ia64/sn/io.h 1.7 vs edited =====
--- 1.7/include/asm-ia64/sn/io.h	2004-02-13 07:00:22 -08:00
+++ edited/include/asm-ia64/sn/io.h	2004-09-22 08:32:52 -07:00
@@ -58,8 +58,8 @@
 #include <asm/sn/sn2/shubio.h>
 
 /*
- * Used to ensure write ordering (like mb(), but for I/O space)
+ * Used to ensure write ordering
  */
-extern void sn_mmiob(void);
+extern void __sn_mmiowb(void);
 
 #endif /* _ASM_IA64_SN_IO_H */
===== include/asm-ia64/sn/sn2/io.h 1.6 vs edited =====
--- 1.6/include/asm-ia64/sn/sn2/io.h	2004-07-22 17:00:00 -07:00
+++ edited/include/asm-ia64/sn/sn2/io.h	2004-09-22 08:15:12 -07:00
@@ -11,8 +11,10 @@
 #include <linux/compiler.h>
 #include <asm/intrinsics.h>
 
-extern void * sn_io_addr(unsigned long port) __attribute_const__; /* Forward definition */
-extern void sn_mmiob(void); /* Forward definition */
+/* Forward declarations */
+struct device;
+extern void *sn_io_addr(unsigned long port) __attribute_const__;
+extern void __sn_mmiowb(void);
 
 #define __sn_mf_a()   ia64_mfa()
 
@@ -91,7 +93,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -102,7 +104,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
@@ -113,7 +115,7 @@
 
 	if ((addr = sn_io_addr(port))) {
 		*addr = val;
-		sn_mmiob();
+		__sn_mmiowb();
 	}
 }
 
===== include/asm-mips/io.h 1.7 vs edited =====
--- 1.7/include/asm-mips/io.h	2004-02-19 12:53:02 -08:00
+++ edited/include/asm-mips/io.h	2004-09-22 08:24:53 -07:00
@@ -290,6 +290,9 @@
 #define __raw_writeb(b,addr)	((*(volatile unsigned char *)(addr)) = (b))
 #define __raw_writew(w,addr)	((*(volatile unsigned short *)(addr)) = (w))
 #define __raw_writel(l,addr)	((*(volatile unsigned int *)(addr)) = (l))
+
+#define mmiowb() asm volatile ("sync" ::: "memory")
+
 #ifdef CONFIG_MIPS32
 #define ____raw_writeq(val,addr)						\
 ({									\
===== include/asm-ppc64/io.h 1.21 vs edited =====
--- 1.21/include/asm-ppc64/io.h	2004-09-13 11:31:52 -07:00
+++ edited/include/asm-ppc64/io.h	2004-09-22 08:24:12 -07:00
@@ -152,6 +152,8 @@
 extern void _insl_ns(volatile u32 *port, void *buf, int nl);
 extern void _outsl_ns(volatile u32 *port, const void *buf, int nl);
 
+#define mmiowb() asm volatile ("eieio" ::: "memory")
+
 /*
  * output pause versions need a delay at least for the
  * w83c105 ide controller in a p610.

^ permalink raw reply	[flat|nested] 45+ messages in thread

end of thread, other threads:[~2004-10-25 19:36 UTC | newest]

Thread overview: 45+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-10-21 23:13 [PATCH] I/O space write barrier Jesse Barnes
2004-10-21 23:17 ` [PATCH] use mmiowb in qla1280.c Jesse Barnes
2004-10-22  9:53   ` Jes Sorensen
2004-10-24 16:20   ` James Bottomley
2004-10-25 16:18     ` Jesse Barnes
2004-10-25 19:02       ` Andrew Morton
2004-10-25 19:33         ` Jesse Barnes
2004-10-21 23:28 ` [PATCH] use mmiowb in tg3.c Jesse Barnes
2004-10-21 23:40   ` David S. Miller
2004-10-22  1:16     ` Benjamin Herrenschmidt
2004-10-22  1:33       ` akepner
2004-10-22  2:07         ` Benjamin Herrenschmidt
2004-10-22  3:01     ` Jesse Barnes
2004-10-22  4:00       ` Paul Mackerras
2004-10-22 20:51   ` [PATCH] use mmiowb in tg3_poll akepner
2004-10-22  1:01 ` [PATCH] I/O space write barrier Grant Grundler
2004-10-22  3:05   ` Jesse Barnes
2004-10-22  4:26     ` Greg Banks
2004-10-22 15:26     ` Grant Grundler
  -- strict thread matches above, loose matches on Subject: below --
2004-10-05 22:38 Jesse Barnes
2004-10-04 20:39 Albert Cahalan
2004-10-04 21:20 ` Jesse Barnes
2004-10-05  0:32   ` Albert Cahalan
2004-10-05  1:22     ` Benjamin Herrenschmidt
2004-10-05  2:26       ` Jesse Barnes
2004-10-05  3:04         ` Benjamin Herrenschmidt
2004-10-05 15:33           ` Jesse Barnes
2004-10-05 22:41             ` Benjamin Herrenschmidt
2004-10-05 23:09               ` Jesse Barnes
2004-10-05 23:57               ` Roland Dreier
2004-10-06  1:45                 ` Benjamin Herrenschmidt
2004-10-05  2:33     ` Jesse Barnes
2004-09-27 18:03 Jesse Barnes
2004-09-29 10:36 ` Greg Banks
2004-09-29 20:35   ` David S. Miller
2004-09-29 20:43     ` Jesse Barnes
2004-09-29 20:50       ` David S. Miller
2004-09-30  2:23         ` Greg Banks
2004-09-29 22:55 ` Jesse Barnes
2004-09-30  7:15   ` Jeremy Higdon
2004-09-30 21:21     ` Guennadi Liakhovetski
2004-10-16  0:38       ` Jeremy Higdon
2004-10-16  3:20         ` Matthew Wilcox
2004-10-16  3:31           ` Jeremy Higdon
2004-09-22 15:45 Jesse Barnes

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).