linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] ppc32 copy_to_user dcbt fixup
@ 2004-03-13  4:15 Bryan Rittmeyer
  2004-03-13  4:50 ` Benjamin Herrenschmidt
  2004-03-13  9:11 ` Eugene Surovegin
  0 siblings, 2 replies; 9+ messages in thread
From: Bryan Rittmeyer @ 2004-03-13  4:15 UTC (permalink / raw)
  To: linux-kernel; +Cc: linuxppc-dev list, Paul Mackerras, Benjamin Herrenschmidt

[-- Attachment #1: Type: text/plain, Size: 1374 bytes --]

copy_tofrom_user and copy_page use dcbt to prefetch source data [1].
Since at least 2.4.17, these functions have been prefetching
beyond the end of the source buffer, leading to two problems:

1. Subtly broken software cache coherency. If the area following src
was invalidate_dcache_range'd prior to submitting for DMA,
an out-of-bounds dcbt from copy_to_user of a separate slab object
may read in the area before DMA completion. When the DMA does complete,
data will not be loaded from RAM because stale data is already in cache.
Thus you get a corrupt network packet, bogus audio capture, etc.

This problem probably does not affect hardware coherent systems
(all Apple machines?). However:

2. The extra 'dcbt' wastes bus bandwidth. Worst case: on a 128 byte copy,
we currently dcbt 256 bytes. These extra loads trash cache, potentially
causing writeback of more useful data.

The attached patch attempts to reign in dcbt prefetching at the end of
copies such that we do not read beyond the src area. This change fixes 
DMA data corruption on software coherent systems and improves
performance slightly in my lame microbenchmark [2].

[1] csum_partial_copy_generic does not use dcbt/dcbz despite being
scorching hot in TCP workloads. I'm cooking up another patch to
dcb?ize it. 

[2] http://staidm.org/linux/ppc/copy_dcbt/copyuser-microbench.tar.bz2

Comments?

-Bryan


[-- Attachment #2: dcbt.patch --]
[-- Type: text/plain, Size: 2133 bytes --]

--- linuxppc-2.5-benh/arch/ppc/lib/string.S~	2004-03-12 14:06:50.000000000 -0800
+++ linuxppc-2.5-benh/arch/ppc/lib/string.S	2004-03-12 16:26:09.000000000 -0800
@@ -443,16 +443,16 @@
 
 #if !defined(CONFIG_8xx)
 	/* Here we decide how far ahead to prefetch the source */
+	li	r12,1
 #if MAX_COPY_PREFETCH > 1
 	/* Heuristically, for large transfers we prefetch
 	   MAX_COPY_PREFETCH cachelines ahead.  For small transfers
 	   we prefetch 1 cacheline ahead. */
 	cmpwi	r0,MAX_COPY_PREFETCH
-	li	r7,1
 	li	r3,4
 	ble	111f
-	li	r7,MAX_COPY_PREFETCH
-111:	mtctr	r7
+	li	r12,MAX_COPY_PREFETCH
+111:	mtctr	r12
 112:	dcbt	r3,r4
 	addi	r3,r3,CACHELINE_BYTES
 	bdnz	112b
@@ -462,9 +462,29 @@
 #endif /* MAX_COPY_PREFETCH */
 #endif /* CONFIG_8xx */
 
+	/* don't run dcbt on cachelines outside our src area.
+       it increases bus traffic, may displace useful data,
+       and busts software cache coherency. those factors
+       are typically worse than the extra branch.
+
+       if r5 == 0, then we have to stop dcbt when ctr <= r12.  
+       if r5 != 0 (partial bytes at end) we should do an extra
+       dcbt for them--kmalloc etc will not put multiple
+       objects within a single cacheline
+
+       -bryan at staidm org
+    */
+	cmpwi	r5,0
+	beq		52f
+	addi	r12,r12,1
+
+52:
 	mtctr	r0
 53:
 #if !defined(CONFIG_8xx)
+	mfctr	r7
+	cmplw	0,r7,r12
+	ble		54f
 	dcbt	r3,r4
 54:	dcbz	r11,r6
 #endif
--- linuxppc-2.5-benh/arch/ppc/kernel/misc.S~	2004-02-04 13:35:34.000000000 -0800
+++ linuxppc-2.5-benh/arch/ppc/kernel/misc.S	2004-03-12 17:07:52.000000000 -0800
@@ -787,15 +787,16 @@
 
 #ifndef CONFIG_8xx
 #if MAX_COPY_PREFETCH > 1
-	li	r0,MAX_COPY_PREFETCH
+	li	r12,MAX_COPY_PREFETCH
 	li	r11,4
-	mtctr	r0
+	mtctr	r12
 11:	dcbt	r11,r4
 	addi	r11,r11,L1_CACHE_LINE_SIZE
 	bdnz	11b
 #else /* MAX_L1_COPY_PREFETCH == 1 */
 	dcbt	r5,r4
 	li	r11,L1_CACHE_LINE_SIZE+4
+	li	r12,1
 #endif /* MAX_L1_COPY_PREFETCH */
 #endif /* CONFIG_8xx */
 
@@ -803,8 +804,11 @@
 	mtctr	r0
 1:
 #ifndef CONFIG_8xx
+	mfctr	r7
+	cmplw	0,r7,r12
+	ble		2f
 	dcbt	r11,r4
-	dcbz	r5,r3
+2:	dcbz	r5,r3
 #endif
 	COPY_16_BYTES
 #if L1_CACHE_LINE_SIZE >= 32

^ permalink raw reply	[flat|nested] 9+ messages in thread
[parent not found: <1z8Na-5hH-1@gated-at.bofh.it>]

end of thread, other threads:[~2004-03-16  2:02 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-13  4:15 [PATCH] ppc32 copy_to_user dcbt fixup Bryan Rittmeyer
2004-03-13  4:50 ` Benjamin Herrenschmidt
2004-03-13  7:49   ` Bryan Rittmeyer
2004-03-13  8:36     ` Benjamin Herrenschmidt
2004-03-15  8:38   ` Segher Boessenkool
2004-03-13  9:11 ` Eugene Surovegin
2004-03-16  1:59   ` Bryan Rittmeyer
     [not found] <1z8Na-5hH-1@gated-at.bofh.it>
2004-03-13 14:39 ` Danjel McGougan
2004-03-14 22:35   ` Bryan Rittmeyer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).