All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: memcpy_toio on i386 using byte writes even when n%2==0
       [not found] <6gMqr-8uW-23@gated-at.bofh.it>
@ 2006-05-26 23:46 ` Robert Hancock
  2006-05-29  7:38   ` H. Peter Anvin
  2006-05-30 13:55   ` Chris Lesiak
  0 siblings, 2 replies; 9+ messages in thread
From: Robert Hancock @ 2006-05-26 23:46 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Lesiak

Chris Lesiak wrote:
> I'm working on a driver for a custom PCI card on the i386 architecture.
> The card uses a PLX9030 pci bridge to link an FPGA to the PCI bus using
> a 16 bit bus.  I found that something broke when moving from 2.6.10 to
> 2.6.17-rc4.  In the driver, I use memcpy_toio to write 14 bytes to a
> memory region in the FPGA.
> 
> To copy the 14 bytes, 2.6.10 does three 32 bit writes followed by one 16
> bit write.  2.6.10 does three 32 bit writes followed by two 8 bit write.
> 
> The PLX9030 breaks the 32 bit writes into 16 bit writes for its local
> bus just fine.  The problem is that my board doesn't handle byte
> enables.  It was assumed that if all memory transfers were a multiple of
> 2 bytes, then byte accesses wouldn't be used.  This is no longer true in
> 2.6.7-rc4.
> 
> I've solved the problem by padding to 16 bytes, but should this be
> considered a bug in the kernel?

It does seem a little bit less efficient, but I don't know think it's 
necessarily a bug. There's no guarantee of what size writes will be used 
with the memcpy_to/fromio functions.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: memcpy_toio on i386 using byte writes even when n%2==0
  2006-05-26 23:46 ` memcpy_toio on i386 using byte writes even when n%2==0 Robert Hancock
@ 2006-05-29  7:38   ` H. Peter Anvin
  2006-05-30 13:55   ` Chris Lesiak
  1 sibling, 0 replies; 9+ messages in thread
From: H. Peter Anvin @ 2006-05-29  7:38 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <44779358.9010703@shaw.ca>
By author:    Robert Hancock <hancockr@shaw.ca>
In newsgroup: linux.dev.kernel
> 
> It does seem a little bit less efficient, but I don't know think it's 
> necessarily a bug. There's no guarantee of what size writes will be used 
> with the memcpy_to/fromio functions.
> 

There are only a few semantics that make sense: fixed 8, 16, 32, or 64
bits, plus "optimal"; the latter to be used for anything that doesn't
require a specific transfer size.  Logically, an unqualified
"memcpy_to/fromio" should be the optimal size (as few transfers as
possible) -- we have a qualified "memcpy_to/fromio32" already, and 8-
and 16-bit variants could/should be added.

However, having the unqualified version do byte transfers seems like a
really bad idea.

	-hpa


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: memcpy_toio on i386 using byte writes even when n%2==0
  2006-05-26 23:46 ` memcpy_toio on i386 using byte writes even when n%2==0 Robert Hancock
  2006-05-29  7:38   ` H. Peter Anvin
@ 2006-05-30 13:55   ` Chris Lesiak
  2006-05-30 14:24     ` linux-os (Dick Johnson)
  1 sibling, 1 reply; 9+ messages in thread
From: Chris Lesiak @ 2006-05-30 13:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Robert Hancock

On Fri, 2006-05-26 at 17:46 -0600, Robert Hancock wrote:
> It does seem a little bit less efficient, but I don't know think it's 
> necessarily a bug. There's no guarantee of what size writes will be used 
> with the memcpy_to/fromio functions.

I shouldn't have made that assumption in the first place, but I suspect
that I am not the only one to have done so.  Probably other hardware
also gets caught not supporting byte enables.
-- 
Chris Lesiak
chris.lesiak@licor.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: memcpy_toio on i386 using byte writes even when n%2==0
  2006-05-30 13:55   ` Chris Lesiak
@ 2006-05-30 14:24     ` linux-os (Dick Johnson)
  0 siblings, 0 replies; 9+ messages in thread
From: linux-os (Dick Johnson) @ 2006-05-30 14:24 UTC (permalink / raw)
  To: Chris Lesiak; +Cc: linux-kernel, Robert Hancock


On Tue, 30 May 2006, Chris Lesiak wrote:

> On Fri, 2006-05-26 at 17:46 -0600, Robert Hancock wrote:
>> It does seem a little bit less efficient, but I don't know think it's
>> necessarily a bug. There's no guarantee of what size writes will be used
>> with the memcpy_to/fromio functions.
>
> I shouldn't have made that assumption in the first place, but I suspect
> that I am not the only one to have done so.  Probably other hardware
> also gets caught not supporting byte enables.
> --
> Chris Lesiak
> chris.lesiak@licor.com
>

If byte writes are used, they should always be last for any
odd byte. I think you found a bug in spite of the fact that
whoever made the revision to memcpy probably thinks they
did something 'cool'. This is an example of cute code causing
problems. The classic example of a proper memcpy() that uses
the ix86 built-in macros runs like this:

 		pushl	%esi		# Save precious registers
 		pushl	%edi
 		movl	COUNT(%esp),%ecx
 		movl	SOURCE(%esp),%esi
 		movl	DEST(%esp),%edi
 		cld
 		shrl	$1,%ecx		# Make WORDS, possibly set carry
 		rep	movsw		# Copy the words
 		adcl	%ecx,%ecx	# Any spare byte
 		rep	movsb		# Copy any spare byte
 		popl	%edi		# Restore precious registers
 		popl	%esi

Note that there isn't any code for moving dwords because the
chances of gaining anything are slim (alignment may hurt).
This kind of code results in the principle of least surprise.
More sophisticated code usually takes longer to execute although
it often looks 'cute' as the designer attempts to create some
sort of alignment, at least for one of the elements. The jumps
in such code usually negate the advantages of any such cuteness.

I've found that it's often necessary to create private functions
to get around the disadvantages of some of the recent cute code.
You can always make a MemcpyTo_io().... It won't ever change
unless you change it! That way, your modules will compile and
work forever, regardless of any "improvements" made in the
source-code tree.

Cheers,
Dick Johnson
Penguin : Linux version 2.6.16.4 on an i686 machine (5592.73 BogoMips).
New book: http://www.AbominableFirebug.com/
_
\x1a\x04

****************************************************************
The information transmitted in this message is confidential and may be privileged.  Any review, retransmission, dissemination, or other use of this information by persons or entities other than the intended recipient is prohibited.  If you are not the intended recipient, please notify Analogic Corporation immediately - by replying to this message or by sending an email to DeliveryErrors@analogic.com - and destroy all copies of this information, including any attachments, without reading or disclosing them.

Thank you.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: memcpy_toio on i386 using byte writes even when n%2==0
  2006-06-03 23:52         ` Robert Hancock
@ 2006-06-04  0:48           ` H. Peter Anvin
  0 siblings, 0 replies; 9+ messages in thread
From: H. Peter Anvin @ 2006-06-04  0:48 UTC (permalink / raw)
  To: Robert Hancock; +Cc: linux-kernel

Robert Hancock wrote:
> H. Peter Anvin wrote:
>> For something that generates I/O transactions, it's imperative to
>> generate the smallest possible number of transactions.  Furthermore,
>> smaller than dword transactions aren't burstable, except at the
>> beginning and end of a burst.
> 
> Well, theoretically for writes they could be, if the memory region was 
> prefetchable and the PCI chipset supported byte merge. It certainly 
> isn't optimal however.
> 

If so, then merging doesn't matter either way.

	-hpa

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: memcpy_toio on i386 using byte writes even when n%2==0
       [not found]       ` <6inxv-4U2-17@gated-at.bofh.it>
@ 2006-06-03 23:52         ` Robert Hancock
  2006-06-04  0:48           ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Hancock @ 2006-06-03 23:52 UTC (permalink / raw)
  To: H. Peter Anvin, linux-kernel

H. Peter Anvin wrote:
> For something that generates I/O transactions, it's imperative to
> generate the smallest possible number of transactions.  Furthermore,
> smaller than dword transactions aren't burstable, except at the
> beginning and end of a burst.

Well, theoretically for writes they could be, if the memory region was 
prefetchable and the PCI chipset supported byte merge. It certainly 
isn't optimal however.

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: memcpy_toio on i386 using byte writes even when n%2==0
  2006-05-31  0:55       ` Robert Hancock
@ 2006-05-31  1:13         ` H. Peter Anvin
  0 siblings, 0 replies; 9+ messages in thread
From: H. Peter Anvin @ 2006-05-31  1:13 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <447CE99B.7070707@shaw.ca>
By author:    Robert Hancock <hancockr@shaw.ca>
In newsgroup: linux.dev.kernel
> > 
> > Note that there isn't any code for moving dwords because the
> > chances of gaining anything are slim (alignment may hurt).
> 
> I'd say the chances of gaining something from executing half as many 
> instructions on copying a large block of memory are very good indeed..
> 

For something that generates I/O transactions, it's imperative to
generate the smallest possible number of transactions.  Furthermore,
smaller than dword transactions aren't burstable, except at the
beginning and end of a burst.

	-hpa


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: memcpy_toio on i386 using byte writes even when n%2==0
       [not found]     ` <6idov-5Tc-7@gated-at.bofh.it>
@ 2006-05-31  0:55       ` Robert Hancock
  2006-05-31  1:13         ` H. Peter Anvin
  0 siblings, 1 reply; 9+ messages in thread
From: Robert Hancock @ 2006-05-31  0:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-os (Dick Johnson)

linux-os (Dick Johnson) wrote:
> If byte writes are used, they should always be last for any
> odd byte. I think you found a bug in spite of the fact that
> whoever made the revision to memcpy probably thinks they
> did something 'cool'. This is an example of cute code causing
> problems. The classic example of a proper memcpy() that uses
> the ix86 built-in macros runs like this:
> 
>  		pushl	%esi		# Save precious registers
>  		pushl	%edi
>  		movl	COUNT(%esp),%ecx
>  		movl	SOURCE(%esp),%esi
>  		movl	DEST(%esp),%edi
>  		cld
>  		shrl	$1,%ecx		# Make WORDS, possibly set carry
>  		rep	movsw		# Copy the words
>  		adcl	%ecx,%ecx	# Any spare byte
>  		rep	movsb		# Copy any spare byte
>  		popl	%edi		# Restore precious registers
>  		popl	%esi
> 
> Note that there isn't any code for moving dwords because the
> chances of gaining anything are slim (alignment may hurt).

I'd say the chances of gaining something from executing half as many 
instructions on copying a large block of memory are very good indeed..

-- 
Robert Hancock      Saskatoon, SK, Canada
To email, remove "nospam" from hancockr@nospamshaw.ca
Home Page: http://www.roberthancock.com/


^ permalink raw reply	[flat|nested] 9+ messages in thread

* memcpy_toio on i386 using byte writes even when n%2==0
@ 2006-05-26 15:29 Chris Lesiak
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Lesiak @ 2006-05-26 15:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Chris Lesiak

I'm working on a driver for a custom PCI card on the i386 architecture.
The card uses a PLX9030 pci bridge to link an FPGA to the PCI bus using
a 16 bit bus.  I found that something broke when moving from 2.6.10 to
2.6.17-rc4.  In the driver, I use memcpy_toio to write 14 bytes to a
memory region in the FPGA.

To copy the 14 bytes, 2.6.10 does three 32 bit writes followed by one 16
bit write.  2.6.10 does three 32 bit writes followed by two 8 bit write.

The PLX9030 breaks the 32 bit writes into 16 bit writes for its local
bus just fine.  The problem is that my board doesn't handle byte
enables.  It was assumed that if all memory transfers were a multiple of
2 bytes, then byte accesses wouldn't be used.  This is no longer true in
2.6.7-rc4.

I've solved the problem by padding to 16 bytes, but should this be
considered a bug in the kernel?

Both kernels use __memcpy to implement memcpy_toio.  Here is the
relevent code from <asm-i386/string.h>

The 2.6.10 version:

static inline void * __memcpy(void * to, const void * from, size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
        "rep ; movsl\n\t"
        "testb $2,%b4\n\t"
        "je 1f\n\t"
        "movsw\n"
        "1:\ttestb $1,%b4\n\t"
        "je 2f\n\t"
        "movsb\n"
        "2:"
        : "=&c" (d0), "=&D" (d1), "=&S" (d2)
        :"0" (n/4), "q" (n),"1" ((long) to),"2" ((long) from)
        : "memory");
return (to);
}

The 2.6.17-rc4 version:

static __always_inline void * __memcpy(void * to, const void * from,
size_t n)
{
int d0, d1, d2;
__asm__ __volatile__(
        "rep ; movsl\n\t"
        "movl %4,%%ecx\n\t"
        "andl $3,%%ecx\n\t"
#if 1   /* want to pay 2 byte penalty for a chance to skip microcoded
rep? */
        "jz 1f\n\t"
#endif
        "rep ; movsb\n\t"
        "1:"
        : "=&c" (d0), "=&D" (d1), "=&S" (d2)
        : "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from)
        : "memory");
return (to);
}

-- 
Chris Lesiak
chris.lesiak@licor.com


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2006-06-04  0:49 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <6gMqr-8uW-23@gated-at.bofh.it>
2006-05-26 23:46 ` memcpy_toio on i386 using byte writes even when n%2==0 Robert Hancock
2006-05-29  7:38   ` H. Peter Anvin
2006-05-30 13:55   ` Chris Lesiak
2006-05-30 14:24     ` linux-os (Dick Johnson)
     [not found] <6ined-4gY-17@gated-at.bofh.it>
     [not found] ` <6ined-4gY-21@gated-at.bofh.it>
     [not found]   ` <6inee-4gY-23@gated-at.bofh.it>
     [not found]     ` <6ined-4gY-15@gated-at.bofh.it>
     [not found]       ` <6inxv-4U2-17@gated-at.bofh.it>
2006-06-03 23:52         ` Robert Hancock
2006-06-04  0:48           ` H. Peter Anvin
     [not found] <6gUec-3mb-7@gated-at.bofh.it>
     [not found] ` <6gUec-3mb-5@gated-at.bofh.it>
     [not found]   ` <6icVy-56r-9@gated-at.bofh.it>
     [not found]     ` <6idov-5Tc-7@gated-at.bofh.it>
2006-05-31  0:55       ` Robert Hancock
2006-05-31  1:13         ` H. Peter Anvin
2006-05-26 15:29 Chris Lesiak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.