[PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
@ 2011-01-18 18:11 Lars-Peter Clausen
  2011-01-18 18:44 ` Arnd Bergmann
  0 siblings, 1 reply; 10+ messages in thread
From: Lars-Peter Clausen @ 2011-01-18 18:11 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arch, linux-kernel, Lars-Peter Clausen

Currently io{read,write}{16,32} expand to
	*addr = cpu_to_leXX(cpu_to_beXX(val))
and
	val = beXX_to_cpu(leXX_to_cpu(*addr))

While it should rather be:
	*addr = cpu_to_be{16,32}(val)
and
	val = be{16,32}_to_cpu(*addr)

The current implementation works on litte-endian targets(where cpu_to_leXX is a
noop), but breaks on on big-endian targets, this patch fixes it.

Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
---
 include/asm-generic/io.h |    8 ++++----
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/asm-generic/io.h b/include/asm-generic/io.h
index 4644c9a..5d93735 100644
--- a/include/asm-generic/io.h
+++ b/include/asm-generic/io.h
@@ -244,15 +244,15 @@ static inline void writesb(const void __iomem *addr, const void *buf, int len)
 #ifndef CONFIG_GENERIC_IOMAP
 #define ioread8(addr)		readb(addr)
 #define ioread16(addr)		readw(addr)
-#define ioread16be(addr)	be16_to_cpu(ioread16(addr))
+#define ioread16be(addr)	be16_to_cpu(__raw_readw(addr))
 #define ioread32(addr)		readl(addr)
-#define ioread32be(addr)	be32_to_cpu(ioread32(addr))
+#define ioread32be(addr)	be32_to_cpu(__raw_readl(addr))
 
 #define iowrite8(v, addr)	writeb((v), (addr))
 #define iowrite16(v, addr)	writew((v), (addr))
-#define iowrite16be(v, addr)	iowrite16(be16_to_cpu(v), (addr))
+#define iowrite16be(v, addr)	__raw_writew(be16_to_cpu(v), (addr))
 #define iowrite32(v, addr)	writel((v), (addr))
-#define iowrite32be(v, addr)	iowrite32(be32_to_cpu(v), (addr))
+#define iowrite32be(v, addr)	__raw_writel(be32_to_cpu(v), (addr))
 
 #define ioread8_rep(p, dst, count) \
 	insb((unsigned long) (p), (dst), (count))
-- 
1.7.2.3


^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-18 18:11 [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems Lars-Peter Clausen
@ 2011-01-18 18:44 ` Arnd Bergmann
  2011-01-18 19:01   ` Lars-Peter Clausen
  0 siblings, 1 reply; 10+ messages in thread
From: Arnd Bergmann @ 2011-01-18 18:44 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: linux-arch, linux-kernel

On Tuesday 18 January 2011 19:11:01 Lars-Peter Clausen wrote:
> Currently io{read,write}{16,32} expand to
>         *addr = cpu_to_leXX(cpu_to_beXX(val))
> and
>         val = beXX_to_cpu(leXX_to_cpu(*addr))
> 
> While it should rather be:
>         *addr = cpu_to_be{16,32}(val)
> and
>         val = be{16,32}_to_cpu(*addr)
> 
> The current implementation works on litte-endian targets(where cpu_to_leXX is a
> noop), but breaks on on big-endian targets, this patch fixes it.
> 
> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>

The existing code looks broken on big-endian indeed, thanks
for the report!

However, you cannot use __raw_readl in general, because that
function knows nothing about the various kinds of I/O accesses
(potentially) handled by readl. Think of architectures where
readl is not a pointer dereference but something else.

The right solution is probably to use swab16/swab32 for the
big-endian functions. This also corrects the iowrite functions
which really should be using cpu_to_be32 instead of be32_to_cpu
(although they are always defined to be the same afaict.

	Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-18 18:44 ` Arnd Bergmann
@ 2011-01-18 19:01   ` Lars-Peter Clausen
  2011-01-18 19:56     ` Arnd Bergmann
  0 siblings, 1 reply; 10+ messages in thread
From: Lars-Peter Clausen @ 2011-01-18 19:01 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arch, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/18/2011 07:44 PM, Arnd Bergmann wrote:
> On Tuesday 18 January 2011 19:11:01 Lars-Peter Clausen wrote:
>> Currently io{read,write}{16,32} expand to
>>         *addr = cpu_to_leXX(cpu_to_beXX(val))
>> and
>>         val = beXX_to_cpu(leXX_to_cpu(*addr))
>>
>> While it should rather be:
>>         *addr = cpu_to_be{16,32}(val)
>> and
>>         val = be{16,32}_to_cpu(*addr)
>>
>> The current implementation works on litte-endian targets(where cpu_to_leXX is a
>> noop), but breaks on on big-endian targets, this patch fixes it.
>>
>> Signed-off-by: Lars-Peter Clausen <lars@metafoo.de>
> 
> The existing code looks broken on big-endian indeed, thanks
> for the report!
> 
> However, you cannot use __raw_readl in general, because that
> function knows nothing about the various kinds of I/O accesses
> (potentially) handled by readl. Think of architectures where
> readl is not a pointer dereference but something else.

Well, i've though about that as well, but in the current asm-generic/io.h readl is
unconditionally defined as cpu_to_le32(__raw_readl(addr)) and ioread32 is defined as
readl.

So unless an arch io.h undefines those macros and redefines them (which none of the
current archs does, as far as i can see), we are o

If an arch chooses to redefine ioread or readl, it should probably also redefine
ioread{16,32}be.

> 
> The right solution is probably to use swab16/swab32 for the
> big-endian functions. This also corrects the iowrite functions
> which really should be using cpu_to_be32 instead of be32_to_cpu
> (although they are always defined to be the same afaict.

This would first cause a conversion to little-endian, which is a swap() in the
generic case and then you would call swap() again on the result. Which is basically a
noop, but I'm not sure if compilers will detect this.

> 
> 	Arnd

- - Lars
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk0143gACgkQBX4mSR26RiPJwACgh1KJKR6o5X/kazS34g/stGKq
OQ4AniZR7SlxTzpJDz7xTvPfeqAqH3S6
=Q6cD
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-18 19:01   ` Lars-Peter Clausen
@ 2011-01-18 19:56     ` Arnd Bergmann
  2011-01-18 20:54       ` Lars-Peter Clausen
  0 siblings, 1 reply; 10+ messages in thread
From: Arnd Bergmann @ 2011-01-18 19:56 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: linux-arch, linux-kernel

On Tuesday 18 January 2011 20:01:12 Lars-Peter Clausen wrote:
> Well, i've though about that as well, but in the current asm-generic/io.h readl is
> unconditionally defined as cpu_to_le32(__raw_readl(addr)) and ioread32 is defined as
> readl.
> 
> So unless an arch io.h undefines those macros and redefines them (which none of the
> current archs does, as far as i can see), we are o
> 
> If an arch chooses to redefine ioread or readl, it should probably also redefine
> ioread{16,32}be.

Right, but the header file also serves as a template for new architectures
that cannot directly use it. I would prefer not to give a possibly bad example
here, especially when it's in a rarely used function.

> > The right solution is probably to use swab16/swab32 for the
> > big-endian functions. This also corrects the iowrite functions
> > which really should be using cpu_to_be32 instead of be32_to_cpu
> > (although they are always defined to be the same afaict.
> 
> This would first cause a conversion to little-endian, which is a swap() in the
> generic case and then you would call swap() again on the result. Which is basically a
> noop, but I'm not sure if compilers will detect this.

The overhead of the swab() is certainly dwarfed by the long time spent in
readl().

I would prefer to swap twice in this case and let the compiler work it out
if possible. The next best alternative would probably be to define both
ioread and ioread_be using __raw_readl in combination with a le32_to_cpu
or be32_to_cpu.

	Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-18 19:56     ` Arnd Bergmann
@ 2011-01-18 20:54       ` Lars-Peter Clausen
  2011-01-18 21:37         ` Arnd Bergmann
  0 siblings, 1 reply; 10+ messages in thread
From: Lars-Peter Clausen @ 2011-01-18 20:54 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arch, linux-kernel

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 01/18/2011 08:56 PM, Arnd Bergmann wrote:
> On Tuesday 18 January 2011 20:01:12 Lars-Peter Clausen wrote:
>> Well, i've though about that as well, but in the current asm-generic/io.h readl is
>> unconditionally defined as cpu_to_le32(__raw_readl(addr)) and ioread32 is defined as
>> readl.
>>
>> So unless an arch io.h undefines those macros and redefines them (which none of the
>> current archs does, as far as i can see), we are o
>>
>> If an arch chooses to redefine ioread or readl, it should probably also redefine
>> ioread{16,32}be.
> 
> Right, but the header file also serves as a template for new architectures
> that cannot directly use it. I would prefer not to give a possibly bad example
> here, especially when it's in a rarely used function.

Maybe I'm missing something here, but if I have a big-endian architecture isn't
ioread{16,32}be what I should use to access iomapped memory?

>>> The right solution is probably to use swab16/swab32 for the
>>> big-endian functions. This also corrects the iowrite functions
>>> which really should be using cpu_to_be32 instead of be32_to_cpu
>>> (although they are always defined to be the same afaict.
>>
>> This would first cause a conversion to little-endian, which is a swap() in the
>> generic case and then you would call swap() again on the result. Which is basically a
>> noop, but I'm not sure if compilers will detect this.
> 
> The overhead of the swab() is certainly dwarfed by the long time spent in
> readl().

Well at least the code size overhead is fundamental:

I have this simple function:

  cycles_t get_cycles(void)
  {
  	return ioread32be(CSR_TIMER_COUNTER(timer));
  }

which when compiled for the lm32 arch results in the following assembler code

with #define ioread32be(addr) be32_to_cpu(__raw_readl(addr)):

  00000128 <get_cycles>:
	mvhi r2,0x4021
	ori r2,r2,0xa100
	lw r1,(r2+0)
	lw r1,(r1+0)
	ret

with #define ioread32be(addr) swap32(ioread32(addr)):

  4001a694 <get_cycles>:
	addi sp,sp,-16
	sw (sp+16),r11
	sw (sp+12),r12
	sw (sp+8),r13
	sw (sp+4),ra
	mvhi r2,0x4021
	ori r2,r2,0xa100
	lw r1,(r2+0)
	mvi r2,24
	mvhi r13,0xff
	lw r12,(r1+0)
	mv r1,r12
	calli 400f6f9c <__lshrsi3>
	mv r11,r1
	mvi r2,24
	mv r1,r12
	calli 400f6f6c <__ashlsi3>
	or r11,r11,r1
	mvi r2,8
	andi r1,r12,0xff00
	calli 400f6f6c <__ashlsi3>
	or r11,r11,r1
	mvi r2,8
	and r1,r12,r13
	calli 400f6f9c <__lshrsi3>
	or r11,r11,r1
	mv r1,r11
	mvi r2,24
	calli 400f6f9c <__lshrsi3>
	mv r12,r1
	mvi r2,24
	mv r1,r11
	calli 400f6f6c <__ashlsi3>
	or r12,r12,r1
	mvi r2,8
	andi r1,r11,0xff00
	calli 400f6f6c <__ashlsi3>
	or r12,r12,r1
	mvi r2,8
	and r1,r11,r13
	calli 400f6f9c <__lshrsi3>
	or r1,r12,r1
	lw ra,(sp+4)
	lw r11,(sp+16)
	lw r12,(sp+12)
	lw r13,(sp+8)
	addi sp,sp,16
	ret


So I as someone who implements arch support has two options either redefine
ioread32be in the arch io header, or use __raw_readl everywhere to access iomap memory.
> 
> I would prefer to swap twice in this case and let the compiler work it out
> if possible. The next best alternative would probably be to define both
> ioread and ioread_be using __raw_readl in combination with a le32_to_cpu
> or be32_to_cpu.
> 
> 	Arnd

- - Lars
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.10 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk01/iMACgkQBX4mSR26RiNRQQCfeS4P27FYN5Sy3oxFqzbjsWAe
NH8Ani1IDQfLoM4kqpkDXneGkQN4HXqz
=OQK1
-----END PGP SIGNATURE-----

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-18 20:54       ` Lars-Peter Clausen
@ 2011-01-18 21:37         ` Arnd Bergmann
  2011-01-18 22:22           ` Lars-Peter Clausen
  0 siblings, 1 reply; 10+ messages in thread
From: Arnd Bergmann @ 2011-01-18 21:37 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: linux-arch, linux-kernel

On Tuesday 18 January 2011 21:54:59 Lars-Peter Clausen wrote:
> >
> > Right, but the header file also serves as a template for new architectures
> > that cannot directly use it. I would prefer not to give a possibly bad example
> > here, especially when it's in a rarely used function.
> 
> Maybe I'm missing something here, but if I have a big-endian architecture isn't
> ioread{16,32}be what I should use to access iomapped memory?

Most I/O devices are little-endian, even for big-endian machines, and
should use readl or ioread. If you have big-endian SoC components,
ioread*be is often the right choice, but that case is rather rare.

Some architectures also define their own I/O accessors for SoC components,
since those often have other requirements from PCI MMIO areas.
E.g. on powerpc, the in_be32/in_le32 accessor only works on directly
mapped MMIO regions and performs no PCI error handling. On ARM, the
readl_relaxed() accessor does not synchronize with external buses.
On x86, readl is different from ioread32 in that it cannot work on
addresses returned from ioport_map.
I believe some SoCs are even configurable to have little- or big-endian
I/O, so the accessor does not do byte swapping.

It might be a good idea to make all this a little more structured, but
it's also fine if you set your own rules for a new architecture when
it has non-PCI devices that work in other ways.

> >>> The right solution is probably to use swab16/swab32 for the
> >>> big-endian functions. This also corrects the iowrite functions
> >>> which really should be using cpu_to_be32 instead of be32_to_cpu
> >>> (although they are always defined to be the same afaict.
> >>
> >> This would first cause a conversion to little-endian, which is a swap() in the
> >> generic case and then you would call swap() again on the result. Which is basically a
> >> noop, but I'm not sure if compilers will detect this.
> >
> > The overhead of the swab() is certainly dwarfed by the long time spent in
> > readl().
> 
> Well at least the code size overhead is fundamental:

Fair enough. You could of course make it out of line, but then you would
no longer be able to use the generic implementation of these functions.

> with #define ioread32be(addr) swap32(ioread32(addr)):
> 
>   4001a694 <get_cycles>:
>         addi sp,sp,-16
>         sw (sp+16),r11
>         sw (sp+12),r12
>         sw (sp+8),r13
>         sw (sp+4),ra
>         mvhi r2,0x4021
>         ori r2,r2,0xa100
>         lw r1,(r2+0)
>         mvi r2,24
>         mvhi r13,0xff
>         lw r12,(r1+0)
>         mv r1,r12
>         calli 400f6f9c <__lshrsi3>
>         mv r11,r1
>         mvi r2,24
>         mv r1,r12
>         calli 400f6f6c <__ashlsi3>
>         or r11,r11,r1
>         mvi r2,8
>         andi r1,r12,0xff00
> ...

That is indeed huge. Byte swapping is a relatively common operation
in the kernel, so independent of the solution to this particular
problem, it will be a good idea to see if you can do a better
implementation than this, using inline assembly or gcc internal
helpers.

> So I as someone who implements arch support has two options either redefine
> ioread32be in the arch io header, or use __raw_readl everywhere to access iomap memory.

__raw_readl is not a good thing to use, because of a number of reasons.
Please choose one of these four:

* change the common ioread*/iowrite* functions to all be based on
  the __raw_* I/O versions, not just the big-endian ones. The
  space overhead you quoted is enough of a justification for that.
* change asm-generic/io.h so you can override the definitions
  with architecture specific implementations.
* use GENERIC_IOMAP.
* define your own bus-specific accessors that are big-endian and
  based on __raw_readl/__raw_writel.

	Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-18 21:37         ` Arnd Bergmann
@ 2011-01-18 22:22           ` Lars-Peter Clausen
  2011-01-19  9:58             ` Arnd Bergmann
  0 siblings, 1 reply; 10+ messages in thread
From: Lars-Peter Clausen @ 2011-01-18 22:22 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: linux-arch, linux-kernel

On 01/18/2011 10:37 PM, Arnd Bergmann wrote:
> On Tuesday 18 January 2011 21:54:59 Lars-Peter Clausen wrote:
>>>
>>> Right, but the header file also serves as a template for new architectures
>>> that cannot directly use it. I would prefer not to give a possibly bad example
>>> here, especially when it's in a rarely used function.
>>
>> Maybe I'm missing something here, but if I have a big-endian architecture isn't
>> ioread{16,32}be what I should use to access iomapped memory?
> 
> Most I/O devices are little-endian, even for big-endian machines, and
> should use readl or ioread. If you have big-endian SoC components,
> ioread*be is often the right choice, but that case is rather rare.

The lm32 architecture is a big-endian softcpu architecture. I'm currently working on
the MilkyMist SoC which uses it and all the SoC components have native endianess.

> 
> Some architectures also define their own I/O accessors for SoC components,
> since those often have other requirements from PCI MMIO areas.
> E.g. on powerpc, the in_be32/in_le32 accessor only works on directly
> mapped MMIO regions and performs no PCI error handling.

I've seen those and actually the lm32 architecture currently defines (and uses) them
as well. But I wanted to replace them with something more generic and try to reuse as
much as possible from asm-generic.

> On ARM, the
> readl_relaxed() accessor does not synchronize with external buses.
> On x86, readl is different from ioread32 in that it cannot work on
> addresses returned from ioport_map.
> I believe some SoCs are even configurable to have little- or big-endian
> I/O, so the accessor does not do byte swapping.
> 
> It might be a good idea to make all this a little more structured, but
> it's also fine if you set your own rules for a new architecture when
> it has non-PCI devices that work in other ways.
> 
>>>>> The right solution is probably to use swab16/swab32 for the
>>>>> big-endian functions. This also corrects the iowrite functions
>>>>> which really should be using cpu_to_be32 instead of be32_to_cpu
>>>>> (although they are always defined to be the same afaict.
>>>>
>>>> This would first cause a conversion to little-endian, which is a swap() in the
>>>> generic case and then you would call swap() again on the result. Which is basically a
>>>> noop, but I'm not sure if compilers will detect this.
>>>
>>> The overhead of the swab() is certainly dwarfed by the long time spent in
>>> readl().
>>
>> Well at least the code size overhead is fundamental:
> 
> Fair enough. You could of course make it out of line, but then you would
> no longer be able to use the generic implementation of these functions.
> 
>> with #define ioread32be(addr) swap32(ioread32(addr)):
>>
>>   4001a694 <get_cycles>:
>>         addi sp,sp,-16
>>         sw (sp+16),r11
>>         sw (sp+12),r12
>>         sw (sp+8),r13
>>         sw (sp+4),ra
>>         mvhi r2,0x4021
>>         ori r2,r2,0xa100
>>         lw r1,(r2+0)
>>         mvi r2,24
>>         mvhi r13,0xff
>>         lw r12,(r1+0)
>>         mv r1,r12
>>         calli 400f6f9c <__lshrsi3>
>>         mv r11,r1
>>         mvi r2,24
>>         mv r1,r12
>>         calli 400f6f6c <__ashlsi3>
>>         or r11,r11,r1
>>         mvi r2,8
>>         andi r1,r12,0xff00
>> ...
> 
> That is indeed huge. Byte swapping is a relatively common operation
> in the kernel, so independent of the solution to this particular
> problem, it will be a good idea to see if you can do a better
> implementation than this, using inline assembly or gcc internal
> helpers.

The reason why it got so huge is that the kernel was compiled without barrel-shifter
support, so we have basically 4 instructions per shift calling a helper function.

But thats not the point anyway. The point I'm trying to make is, that accessing
iomapped memory is exactly a single instruction. And I don't see why - no matter if
swab takes 4 or 20 instructions - we should add any additional overhead.
> 
>> So I as someone who implements arch support has two options either redefine
>> ioread32be in the arch io header, or use __raw_readl everywhere to access iomap memory.
> 
> __raw_readl is not a good thing to use, because of a number of reasons.
> Please choose one of these four:
> 
> * change the common ioread*/iowrite* functions to all be based on
>   the __raw_* I/O versions, not just the big-endian ones. The
>   space overhead you quoted is enough of a justification for that.

I would prefer this solution.

> * change asm-generic/io.h so you can override the definitions
>   with architecture specific implementations.
> * use GENERIC_IOMAP.
> * define your own bus-specific accessors that are big-endian and
>   based on __raw_readl/__raw_writel.
> 
> 	Arnd

- Lars

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-18 22:22           ` Lars-Peter Clausen
@ 2011-01-19  9:58             ` Arnd Bergmann
  2011-01-19 12:28               ` Jonas Bonn
  0 siblings, 1 reply; 10+ messages in thread
From: Arnd Bergmann @ 2011-01-19  9:58 UTC (permalink / raw)
  To: Lars-Peter Clausen; +Cc: linux-arch, linux-kernel

On Tuesday 18 January 2011 23:22:23 Lars-Peter Clausen wrote:
> On 01/18/2011 10:37 PM, Arnd Bergmann wrote:
> > ...
> The lm32 architecture is a big-endian softcpu architecture. I'm currently working on
> the MilkyMist SoC which uses it and all the SoC components have native endianess.

ok.

> > Some architectures also define their own I/O accessors for SoC components,
> > since those often have other requirements from PCI MMIO areas.
> > E.g. on powerpc, the in_be32/in_le32 accessor only works on directly
> > mapped MMIO regions and performs no PCI error handling.
> 
> I've seen those and actually the lm32 architecture currently defines (and uses) them
> as well. But I wanted to replace them with something more generic and try to reuse as
> much as possible from asm-generic.

That is a very good idea in general. Unfortunately, the asm-generic
infrastructure is currently missing an interface that is designed for
SoC components. IMHO, it would be very good if we could use this chance
to create one, or make an existing interface from one of the architectures
more generic. I can also understand if you don't want to get caught up
in the discussion between the architecture maintainers trying to find
a common position on this ;-).

> > That is indeed huge. Byte swapping is a relatively common operation
> > in the kernel, so independent of the solution to this particular
> > problem, it will be a good idea to see if you can do a better
> > implementation than this, using inline assembly or gcc internal
> > helpers.
> 
> The reason why it got so huge is that the kernel was compiled without barrel-shifter
> support, so we have basically 4 instructions per shift calling a helper function.
> 
> But thats not the point anyway. The point I'm trying to make is, that accessing
> iomapped memory is exactly a single instruction. And I don't see why - no matter if
> swab takes 4 or 20 instructions - we should add any additional overhead.

Right, and the point that I was trying to make is that an mmio read access
typically takes many hundred cycles, somtimes thousands of cycles, to
complete. Making the accessors out of line would be just fine, and even
a hundred instructions would not be much of an issue then (although a bit
silly if we just end up swapping twice, as you pointed out).

In the abscence of the barrel shifter, would it help to define the
function like this? (independent of the ioread question)

static inline __u32 __arch__swab32(__u32 x)
{
        union { __u32 u; char c[4]; } in, out;
	in.u = x;
	out.c[0] = in.c[3];
	out.c[1] = in.c[2];
	out.c[2] = in.c[1];
	out.c[3] = in.c[0];
        return out.u;
}
#define __arch__swab32 __arch__swab32

If your compiler has a __builtin_bswap32, that would be even better.

> >> So I as someone who implements arch support has two options either redefine
> >> ioread32be in the arch io header, or use __raw_readl everywhere to access iomap memory.
> > 
> > __raw_readl is not a good thing to use, because of a number of reasons.
> > Please choose one of these four:
> > 
> > * change the common ioread*/iowrite* functions to all be based on
> >   the __raw_* I/O versions, not just the big-endian ones. The
> >   space overhead you quoted is enough of a justification for that.
> 
> I would prefer this solution.

Ok, fair enough. When you send a patch to do that, please indicate whether
you prefer me to take it in my tree, or you just want to carry it in your
series with my Ack.

	Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-19  9:58             ` Arnd Bergmann
@ 2011-01-19 12:28               ` Jonas Bonn
  2011-01-19 14:47                 ` Arnd Bergmann
  0 siblings, 1 reply; 10+ messages in thread
From: Jonas Bonn @ 2011-01-19 12:28 UTC (permalink / raw)
  To: Arnd Bergmann; +Cc: Lars-Peter Clausen, linux-arch, linux-kernel

On 19 January 2011 10:58, Arnd Bergmann <arnd@arndb.de> wrote:
> On Tuesday 18 January 2011 23:22:23 Lars-Peter Clausen wrote:
>> On 01/18/2011 10:37 PM, Arnd Bergmann wrote:
>> > ...
>> The lm32 architecture is a big-endian softcpu architecture. I'm currently working on
>> the MilkyMist SoC which uses it and all the SoC components have native endianess.
>
> ok.

I think this is the key concept that's being missed here:  devices
with _native_ endianess.  This is something that I've been grappling
with for the OpenRISC platform, as well.

The ioread/write accessors are supposed to be a platform-agnostic
interface and have a builtin assumption about the endianess of the
device/bus.  Device drivers that use ioread32 are explicity accessing
a little-endian device; ioread32be explicitly a big-endian device.
It's not really for the architecture to redefine these.  The same
applies to the readl/readw accessors as well, as far as I understand
it.

What's needed is another way of accessing device memory that doesn't
carry the endianess implication.  For the Wishbone bus, used by the
OpenRISC processor, I still haven't found a more satisfactory solution
than to define "wishbone accessors" that map to the underlying
io-accessors of the desired endianess.  These allow the endianess of
the bus to be set at compile time and hopefully for all wishbone
device drivers to do the right thing.

#ifdef CONFIG_WISHBONE_BIG_ENDIAN
#define wb_ioread8(p) ioread8(p)
#define wb_ioread16(p) ioread16be(p)
#define wb_ioread32(p) ioread32be(p)
#define wb_iowrite8(p) iowrite8(p)
#define wb_iowrite16(p) iowrite16be(p)
#define wb_iowrite32(p) iowrite32be(p)
#else
#define wb_ioread8(p) ioread8(p)
#define wb_ioread16(p) ioread16(p)
#define wb_ioread32(p) ioread32(p)
#define wb_iowrite8(p) iowrite8(p)
#define wb_iowrite16(p) iowrite16(p)
#define wb_iowrite32(p) iowrite32(p)
#endif /* CONFIG_WISHBONE_BIG_ENDIAN */

That said, there are architectures that do redefine the endianess of
the io-accessors to mean "native endianess", but this must have
implications for a large number of the drivers in the kernel tree and,
in my opinion, must be technically _incorrect_.  A driver written with
ioread32 shouldn't have the endianess of the access changed by the
architecture underneath it... or are others of a different opinion?

>> >
>> > * change the common ioread*/iowrite* functions to all be based on
>> >   the __raw_* I/O versions, not just the big-endian ones. The
>> >   space overhead you quoted is enough of a justification for that.
>>

Doing this breaks the endianess assumption built into the io-accessors.

/Jonas

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems
  2011-01-19 12:28               ` Jonas Bonn
@ 2011-01-19 14:47                 ` Arnd Bergmann
  0 siblings, 0 replies; 10+ messages in thread
From: Arnd Bergmann @ 2011-01-19 14:47 UTC (permalink / raw)
  To: Jonas Bonn; +Cc: Lars-Peter Clausen, linux-arch, linux-kernel

On Wednesday 19 January 2011, Jonas Bonn wrote:
> The ioread/write accessors are supposed to be a platform-agnostic
> interface and have a builtin assumption about the endianess of the
> device/bus.  Device drivers that use ioread32 are explicity accessing
> a little-endian device; ioread32be explicitly a big-endian device.
> It's not really for the architecture to redefine these.  The same
> applies to the readl/readw accessors as well, as far as I understand
> it.

Yes. The difference between readl and ioread32 is that readl is
defined to work on PCI MMIO areas that are mapped through ioremap
(and may work on other things, depending on the architecture),
while ioread32 must work on both PCI MMIO and PCI PIO registers
mapped with ioremap, ioport_map or pci_iomap. All of these devices
are expected to have fixed endianess, which is independent of the
CPU endianess.

> What's needed is another way of accessing device memory that doesn't
> carry the endianess implication.  For the Wishbone bus, used by the
> OpenRISC processor, I still haven't found a more satisfactory solution
> than to define "wishbone accessors" that map to the underlying
> io-accessors of the desired endianess.  These allow the endianess of
> the bus to be set at compile time and hopefully for all wishbone
> device drivers to do the right thing.
> 
> #ifdef CONFIG_WISHBONE_BIG_ENDIAN
> #define wb_ioread8(p) ioread8(p)
> #define wb_ioread16(p) ioread16be(p)
> #define wb_ioread32(p) ioread32be(p)
> #define wb_iowrite8(p) iowrite8(p)
> #define wb_iowrite16(p) iowrite16be(p)
> #define wb_iowrite32(p) iowrite32be(p)
> #else
> #define wb_ioread8(p) ioread8(p)
> #define wb_ioread16(p) ioread16(p)
> #define wb_ioread32(p) ioread32(p)
> #define wb_iowrite8(p) iowrite8(p)
> #define wb_iowrite16(p) iowrite16(p)
> #define wb_iowrite32(p) iowrite32(p)
> #endif /* CONFIG_WISHBONE_BIG_ENDIAN */

Right. A number of buses do the same right now, for similar reasons.

> That said, there are architectures that do redefine the endianess of
> the io-accessors to mean "native endianess", but this must have
> implications for a large number of the drivers in the kernel tree and,
> in my opinion, must be technically _incorrect_.  A driver written with
> ioread32 shouldn't have the endianess of the access changed by the
> architecture underneath it... or are others of a different opinion?

I think you are absolutely right.

> >> > * change the common ioread*/iowrite* functions to all be based on
> >> >   the __raw_* I/O versions, not just the big-endian ones. The
> >> >   space overhead you quoted is enough of a justification for that.
> >>
> 
> Doing this breaks the endianess assumption built into the io-accessors.

To be clearer, I did not suggest 

#define ioread32(p) __raw_readl(p)
#define ioread32be(p) __raw_readl(p)

but instead

#define ioread32(p) le32_to_cpu(__raw_readl(p))
#define ioread32be(p) be32_to_cpu(__raw_readl(p))

You can consider this as the least invasive variant as long as you never
intend to run Linux on the other endianess. Many powerpc and arm CPUs
have variable endianess, but Linux assumes that powerpc is always big-endian
and ARM is always little-endian for the majority of the boards, even
if it would be possible to configure them differently. Strictly speaking,
you are right that it is not technically correct to use a fixed-endian
accessor to talk to a native-endian device, but as long as the kernel is
also fixed-endian, it will not cause actual harm.

Besides, something I've seen a few times now is that native-endian
SoC devices end up getting put behind bus bridges with a fixed-endianess
eventually, and you have to change the code again. See usb-ohci for a
common example.

	Arnd

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-19 14:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-01-18 18:11 [PATCH] asm-generic/io.h: Fix io{read,write}{16,32}be for big endian systems Lars-Peter Clausen
2011-01-18 18:44 ` Arnd Bergmann
2011-01-18 19:01   ` Lars-Peter Clausen
2011-01-18 19:56     ` Arnd Bergmann
2011-01-18 20:54       ` Lars-Peter Clausen
2011-01-18 21:37         ` Arnd Bergmann
2011-01-18 22:22           ` Lars-Peter Clausen
2011-01-19  9:58             ` Arnd Bergmann
2011-01-19 12:28               ` Jonas Bonn
2011-01-19 14:47                 ` Arnd Bergmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.