linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Being more anal about iospace accesses..
       [not found] ` <Pine.LNX.4.58.0409150737260.2333@ppc970.osdl.org>
@ 2004-09-15 16:30   ` Linus Torvalds
  2004-09-15 16:54     ` Jörn Engel
                       ` (5 more replies)
  0 siblings, 6 replies; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 16:30 UTC (permalink / raw)
  To: Kernel Mailing List


This is a background mail mainly for driver writers and/or architecture
people. Or others that are just interested in really low-level hw access
details. Others - please feel free to ignore.

[ This has been discussed to some degree already on the architecture 
  mailing lists and obviously among the people who actually worked on it, 
  but I thought I'd bounce it off linux-kernel too, in order to make
  people more aware of what the new type-checking does. Most people may
  have seen it as only generating a ton of new warnings for some crufty
  device drivers. ]

The background for this iospace type-checking change is that we've long
had some serious confusion about how to access PCI memory mapped IO
(MMIO), mainly because on a PC (and some non-PC's too) that IO really does 
look like regular memory, so you can have a driver that just accesses a 
pointer directly, and it will actually work on most machines.

At the same time, we've had the proper "accessor" functions (read[bwl](), 
write[bwl]() and friends) that on purpose dropped all type information 
from the MMIO pointer, mostly just because of historical reasons, and as a 
result some drivers didn't use a pointer at all, but some kind of integer. 
Sometimes even one that couldn't _fit_ a MMIO address in it on a 64-bit 
machine.

In short, the PCI MMIO access case was largely the same as the user
pointer case, except the access functions were different (readb vs
get_user) and they were even less lax about checking for sanity. At least
the user access code required a pointer with the right size.

We've been very successful in annotating user pointers, and that found a
couple of bugs, and more importantly it made the kernel code much more
"aware" of what kind of pointer was passed around. In general, a big
success, I think. And an obvious example for what MMIO pointers should do.

So lately, the kernel infrastructure for MMIO accesses has become a _lot_
more strict about what it accepts. Not only do the MMIO access functions
want a real pointer (which is already more type-checking than we did
before, and causes gcc to spew out lots of warnings for some drivers), but 
as with user pointers, sparse annotations mark them as being in a 
different address space, and building the kernel with checking on will 
warn about mixing up address spaces. So far so good.

So right now the current snapshots (and 2.6.9-rc2) have this enabled, and
some drivers will be _very_ noisy when compiled. Most of the regular ones
are fine, so maybe people haven't even noticed it that much, but some of
them were using things like "u32" to store MMIO pointers, and are
generally extremely broken on anything but an x86.  We'll hopefully get
around to fixing them up eventually, but in the meantime this should at 
least explain the background for some of the new noise people may see.

Perhaps even more interesting is _another_ case of driver, though: one
that started warning not because it was ugly and broken, but because it
did something fairly rare but something that does happen occasionally: it
mixed PIO and MMIO accesses on purpose, because it drove hardware that
literally uses one or the other.

Sometimes such a "mixed interface" driver does it based on a compile
option that just #defines 'writel()' to 'inl()', sometimes it's a runtime
decision depending on the hardware or configuration. 

The anal typechecking obviously ended up being very unhappy about this, 
since it wants "void __iomem *" for MMIO pointers, and a normal "unsigned 
long" for PIO accesses. The compile-time option could have been easily 
fixed up by adding the proper cast when re-defining the IO accessor, but 
that doesn't work for the dynamic case.

Also, the compile-time switchers often really _wanted_ to be dynamic, but
it was just too painful with the regular Linux IO interfaces to duplicate 
the code and do things conditionally one way or the other.

To make a long story even longer: rather than scrapping the typechecking,
or requiring drivers to do strange and nasty casts all over the place,
there's now a new interface in town. It's called "iomap", because it
extends the old "ioremap()" interface to work on the PIO accesses too.

That way, the drivers that really want to mix both PIO and MMIO accesses
can very naturally do it: they just need to remap the PIO space too, the
same way that we've required people to remap the MMIO space for a long
long time.

For example, if you don't know (or, more importantly - don't care) what 
kind of IO interface you use, you can now do something like

	void __iomem * map = pci_iomap(dev, bar, maxbytes);
	...
	status = ioread32(map + DRIVER_STATUS_OFFSET);

and it will do the proper IO mapping for the named PCI BAR for that
device. Regardless of whether the BAR was an IO or MEM mapping. Very
convenient for cases where the hardware migt expose its IO window in
either (or sometimes both).

Nothing in the current tree actually uses this new interface, although
Jeff has patches for SATA for testing (and they clean up the code quite
noticeably, never mind getting rid of the warnings).  The interface has
been implemented by yours truly for x86 and ppc64, and David did a
first-pass version for sparc64 too (missing the "xxxx_rep()" functions
that were added a bit later, I believe).

So far experience seems to show that it's a very natural interface for
most non-x86 hardware - they all tend to map in both PIO and MMIO into one
address space _anyway_, so the two aren't really any different. It's
mainly just x86 and it's ilk that actually have two different interfaces
for the two kinds of PCI accesses, and at least in that case it's trivial
to encode the difference in the virtual ioremap pointer.

The best way to explain the interface is to just point you guys at the
<asm-generic/iomap.h> file, which isn't very big, has about as much
comments than code, and contains nothing but the necessary function
declarations. The actual meaning of the functions should be pretty
obvious even without the comments.

Feel free to flame or discuss rationally,

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:30   ` Being more anal about iospace accesses Linus Torvalds
@ 2004-09-15 16:54     ` Jörn Engel
  2004-09-15 17:07       ` Linus Torvalds
                         ` (4 more replies)
  2004-09-15 16:56     ` Dave Jones
                       ` (4 subsequent siblings)
  5 siblings, 5 replies; 34+ messages in thread
From: Jörn Engel @ 2004-09-15 16:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:
> 
> For example, if you don't know (or, more importantly - don't care) what 
> kind of IO interface you use, you can now do something like
> 
> 	void __iomem * map = pci_iomap(dev, bar, maxbytes);
> 	...
> 	status = ioread32(map + DRIVER_STATUS_OFFSET);

C now supports pointer arithmetic with void*?  I hope the width of a
void is not architecture dependent, that would introduce more subtle
bugs.

Jörn

-- 
Sometimes, asking the right question is already the answer.
-- Unknown

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:30   ` Being more anal about iospace accesses Linus Torvalds
  2004-09-15 16:54     ` Jörn Engel
@ 2004-09-15 16:56     ` Dave Jones
  2004-09-15 17:19     ` Roger Luethi
                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: Dave Jones @ 2004-09-15 16:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, Sep 15, 2004 at 09:30:42AM -0700, Linus Torvalds wrote:

 > So right now the current snapshots (and 2.6.9-rc2) have this enabled, and
 > some drivers will be _very_ noisy when compiled. Most of the regular ones
 > are fine, so maybe people haven't even noticed it that much, but some of
 > them were using things like "u32" to store MMIO pointers, and are
 > generally extremely broken on anything but an x86.  We'll hopefully get
 > around to fixing them up eventually, but in the meantime this should at 
 > least explain the background for some of the new noise people may see.

For the curious, 6MB of sparse output is generated from a make allmodconfig
right now. (http://www.codemonkey.org.uk/junk/2.6.9-rc2-warnings.txt)

You can filter out just the __iomem warnings by grepping for asn:2

		Dave


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:54     ` Jörn Engel
@ 2004-09-15 17:07       ` Linus Torvalds
  2004-09-15 17:32         ` Jörn Engel
                           ` (2 more replies)
  2004-09-15 17:07       ` Jeff Garzik
                         ` (3 subsequent siblings)
  4 siblings, 3 replies; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 17:07 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Kernel Mailing List



On Wed, 15 Sep 2004, Jörn Engel wrote:
> 
> C now supports pointer arithmetic with void*?

C doesn't. gcc does. It's a documented extension, and it acts like if it 
was done on a byte.

See gcc's user guide "Extension to the C Language Family".

It's a singularly good feature. 

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:54     ` Jörn Engel
  2004-09-15 17:07       ` Linus Torvalds
@ 2004-09-15 17:07       ` Jeff Garzik
  2004-09-15 17:16       ` Roland Dreier
                         ` (2 subsequent siblings)
  4 siblings, 0 replies; 34+ messages in thread
From: Jeff Garzik @ 2004-09-15 17:07 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List

On Wed, Sep 15, 2004 at 06:54:50PM +0200, Jörn Engel wrote:
> On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:
> > 
> > For example, if you don't know (or, more importantly - don't care) what 
> > kind of IO interface you use, you can now do something like
> > 
> > 	void __iomem * map = pci_iomap(dev, bar, maxbytes);
> > 	...
> > 	status = ioread32(map + DRIVER_STATUS_OFFSET);
> 
> C now supports pointer arithmetic with void*?  I hope the width of a
> void is not architecture dependent, that would introduce more subtle
> bugs.

gcc extension, which has been used in the kernel for ages.

	Jeff




^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:54     ` Jörn Engel
  2004-09-15 17:07       ` Linus Torvalds
  2004-09-15 17:07       ` Jeff Garzik
@ 2004-09-15 17:16       ` Roland Dreier
  2004-09-15 17:39         ` Linus Torvalds
  2004-09-15 17:36       ` Horst von Brand
  2004-09-15 17:40       ` Brian Gerst
  4 siblings, 1 reply; 34+ messages in thread
From: Roland Dreier @ 2004-09-15 17:16 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List

    Jörn> C now supports pointer arithmetic with void*?  I hope the
    Jörn> width of a void is not architecture dependent, that would
    Jörn> introduce more subtle bugs.

Pointer arithmetic on void * has been a gcc extension since forever
(gcc acts as though sizeof (void) == 1).

However, I somewhat agree -- it's ugly for drivers rely on this and do
arithmetic on void *.  It should be OK for a driver to use 
char __iomem * for its IO base if it needs to add in offsets, right?

 - R.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:30   ` Being more anal about iospace accesses Linus Torvalds
  2004-09-15 16:54     ` Jörn Engel
  2004-09-15 16:56     ` Dave Jones
@ 2004-09-15 17:19     ` Roger Luethi
  2004-09-15 17:23     ` Richard B. Johnson
                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 34+ messages in thread
From: Roger Luethi @ 2004-09-15 17:19 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 15 Sep 2004 09:30:42 -0700, Linus Torvalds wrote:
> This is a background mail mainly for driver writers and/or architecture

Thanks, I appreciate the effort (speaking as the maintainer of an
"even more interesting" driver).

Roger

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:30   ` Being more anal about iospace accesses Linus Torvalds
                       ` (2 preceding siblings ...)
  2004-09-15 17:19     ` Roger Luethi
@ 2004-09-15 17:23     ` Richard B. Johnson
  2004-09-15 22:21     ` Deepak Saxena
  2004-09-15 22:29     ` Roland Dreier
  5 siblings, 0 replies; 34+ messages in thread
From: Richard B. Johnson @ 2004-09-15 17:23 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 15 Sep 2004, Linus Torvalds wrote:

>
> This is a background mail mainly for driver writers and/or architecture
> people. Or others that are just interested in really low-level hw access
> details. Others - please feel free to ignore.
>
[SNIPPED mostly....]


> For example, if you don't know (or, more importantly - don't care) what
> kind of IO interface you use, you can now do something like
>
> 	void __iomem * map = pci_iomap(dev, bar, maxbytes);
> 	...
> 	status = ioread32(map + DRIVER_STATUS_OFFSET);
                          ^^^^^^^^^^^^^^^^

Doesn't this rely on the non-standard GNUism that you can
perform pointer-arithmetic on a void-pointer? Which is illegal,
immoral, and fattening. I'd much rather see char-pointers so
it's valid to perform the offset math. That way, in the future,
a new tool that follows (ANSI, IEEE, POSIX) rules doesn't barf.
I suggest a new pointer type like (BASE *) or (BAR *) that
hides the (unsigned char *) necessary to not barf, plus
minimize side-effects.


Cheers,
Dick Johnson
Penguin : Linux version 2.4.26 on an i686 machine (5570.56 BogoMips).
            Note 96.31% of all statistics are fiction.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:07       ` Linus Torvalds
@ 2004-09-15 17:32         ` Jörn Engel
  2004-09-15 17:57           ` Linus Torvalds
  2004-09-15 17:45         ` Nikita Danilov
  2004-09-15 18:40         ` Chris Wedgwood
  2 siblings, 1 reply; 34+ messages in thread
From: Jörn Engel @ 2004-09-15 17:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Wed, 15 September 2004 10:07:29 -0700, Linus Torvalds wrote:
> On Wed, 15 Sep 2004, Jörn Engel wrote:
> > 
> > C now supports pointer arithmetic with void*?
> 
> C doesn't. gcc does. It's a documented extension, and it acts like if it 
> was done on a byte.
> 
> See gcc's user guide "Extension to the C Language Family".
> 
> It's a singularly good feature. 

Nice.

But it still leaves me confused.  Before I had this code:

	struct regs {
		uint32_t status;
		...
	}

	...

	struct regs *regs = ioremap(...);
	uint32_t status = regs->status;
	...

So now I should do it like this:

	#define REG_STATUS 0

	...

	void __iomem *regs = ioremap(...);
	uint32_t status = readl(regs + REG_STATUS);
	...

But wait, that only works when long is 32bit wide.  Plus I could be
stupid enough and "#define REG_STATUS 64" while the register space is
just 64 bytes long.  It solves the confusion about address spaces,
agreed, but overall I'm more confused now.  Hope it's just temporary.

Jörn

-- 
There is no worse hell than that provided by the regrets
for wasted opportunities.
-- Andre-Louis Moreau in Scarabouche

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:54     ` Jörn Engel
                         ` (2 preceding siblings ...)
  2004-09-15 17:16       ` Roland Dreier
@ 2004-09-15 17:36       ` Horst von Brand
  2004-09-15 17:40       ` Brian Gerst
  4 siblings, 0 replies; 34+ messages in thread
From: Horst von Brand @ 2004-09-15 17:36 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List

=?iso-8859-1?Q?J=F6rn?= Engel <joern@wohnheim.fh-wedel.de> said:
> On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:

[...]

> > For example, if you don't know (or, more importantly - don't care) what 
> > kind of IO interface you use, you can now do something like
> > 
> > 	void __iomem * map = pci_iomap(dev, bar, maxbytes);
> > 	...
> > 	status = ioread32(map + DRIVER_STATUS_OFFSET);

> C now supports pointer arithmetic with void*?

It doesn't. It's a gcc-ism.

>                                               I hope the width of a
> void is not architecture dependent, that would introduce more subtle
> bugs.

gcc takes it as a char pointer for such uses.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:16       ` Roland Dreier
@ 2004-09-15 17:39         ` Linus Torvalds
  2004-09-15 20:07           ` Russell King
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 17:39 UTC (permalink / raw)
  To: Roland Dreier; +Cc: Jörn Engel, Kernel Mailing List



On Wed, 15 Sep 2004, Roland Dreier wrote:
> 
> However, I somewhat agree -- it's ugly for drivers rely on this and do
> arithmetic on void *.  It should be OK for a driver to use 
> char __iomem * for its IO base if it needs to add in offsets, right?

"char __iomem *" will certainly work - all the normal pointer conversions 
are ok. Some people in fact use pointers to structures in MMIO space, and 
this is quite reasonable when working with a chip that uses "mailboxes" 
for commands.

However, I disagree with "void *" arithmetic being ugly. It's a very nice
feature to have a pointer that can be validly cast to any other type, and
that is the whole _point_ of "void *". The fact that C++ got that wrong is
arguably the worst failing of the language, causing tons of unnecessary
casts that can silently hide real bugs (maybe the thing you cast wasn't a
"void *" in the first place, but you'll never know - the compiler will do
the cast for you).

For example, to go back to the mailbox example, let's say that your 
hardware has an IO area that is 8kB in size, with the last 4kB being 
mailboxes.

The _sane_ way to do that is to do

	void __iomem *base_io = ioremap(...);
	struct mailbox __iomem *mbox = base_io + MAILBOX_OFFSET;

and then just work on that.

In contrast, havign to cast to a "char *" in order to do arithmetic, and 
then casting back to the resultant structure type pointer is not only 
ugly and unreadable, it's a lot more prone to errors as a result.

In other words, think of "void *" as a pointer to storage. Not "char"  
(which is the C name for a signed byte), but really, it's the pointer to 
whatever underlying memory there is. And a _fundamental_ part of such 
memory is the fact that it is addressable. Thus "pointer to storage 
arithmetic" really does make sense on a very fundamental level. It has 
nothing to do with C types, and that also explains why "void *" silently 
converts to anything else. It's a very internally consistent world-view.

Now, I disagree with gcc when it comes to actually taking the "size" of 
void. Gcc will silently accept

	void *x;
	x = malloc(sizeof(*x));

which I consider to be an abomination (and the above _can_ happen, quite
easily, as part of macros for doing allocation etc - nobody would write 
it in that form, but if you have an "MEMALLOC(x)" macro that does the 
sizeof, you could end up trying to feed the compiler bogus code).

The fact that you can do arithmetic on typeless storage does _not_ imply
that typeless storage would have a "size" in my book.

So sparse will say:

	warning: cannot size expression

and refuse to look at broken code like the above. But hey, the fact that I 
have better taste than anybody else in the universe is just something I 
have to live with. It's not easy being me.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:54     ` Jörn Engel
                         ` (3 preceding siblings ...)
  2004-09-15 17:36       ` Horst von Brand
@ 2004-09-15 17:40       ` Brian Gerst
  4 siblings, 0 replies; 34+ messages in thread
From: Brian Gerst @ 2004-09-15 17:40 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List

Jörn Engel wrote:
> On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote:
> 
>>For example, if you don't know (or, more importantly - don't care) what 
>>kind of IO interface you use, you can now do something like
>>
>>	void __iomem * map = pci_iomap(dev, bar, maxbytes);
>>	...
>>	status = ioread32(map + DRIVER_STATUS_OFFSET);
> 
> 
> C now supports pointer arithmetic with void*?  I hope the width of a
> void is not architecture dependent, that would introduce more subtle
> bugs.
> 
> Jörn
> 

http://gcc.gnu.org/onlinedocs/gcc-3.4.2/gcc/Pointer-Arith.html#Pointer-Arith

5.18 Arithmetic on void- and Function-Pointers

In GNU C, addition and subtraction operations are supported on pointers 
to void and on pointers to functions. This is done by treating the size 
of a void or of a function as 1.

A consequence of this is that sizeof is also allowed on void and on 
function types, and returns 1.

The option -Wpointer-arith requests a warning if these extensions are used.

--
				Brian Gerst

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:07       ` Linus Torvalds
  2004-09-15 17:32         ` Jörn Engel
@ 2004-09-15 17:45         ` Nikita Danilov
  2004-09-15 18:09           ` Linus Torvalds
  2004-09-15 18:40         ` Chris Wedgwood
  2 siblings, 1 reply; 34+ messages in thread
From: Nikita Danilov @ 2004-09-15 17:45 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jörn Engel, Kernel Mailing List

Linus Torvalds writes:
 > 
 > 
 > On Wed, 15 Sep 2004, JЖrn Engel wrote:
 > > 
 > > C now supports pointer arithmetic with void*?
 > 
 > C doesn't. gcc does. It's a documented extension, and it acts like if it 
 > was done on a byte.
 > 
 > See gcc's user guide "Extension to the C Language Family".
 > 
 > It's a singularly good feature. 

Unfortunately it breaks even better identity

  foo *p;

  p + nr == (foo *)((char *)p + nr * sizeof *p)

unless one thinks that is --together with gcc-- that nothing occupies
exactly one byte.

 > 
 > 		Linus

Nikita.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:32         ` Jörn Engel
@ 2004-09-15 17:57           ` Linus Torvalds
  2004-09-15 18:06             ` Linus Torvalds
                               ` (2 more replies)
  0 siblings, 3 replies; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 17:57 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Kernel Mailing List



On Wed, 15 Sep 2004, Jörn Engel wrote:
> 
> But it still leaves me confused.  Before I had this code:
> 
> 	struct regs {
> 		uint32_t status;
> 		...
> 	}
> 
> 	...
> 
> 	struct regs *regs = ioremap(...);
> 	uint32_t status = regs->status;
> 	...
> 
> So now I should do it like this:
> 
> 	#define REG_STATUS 0
> 
> 	...
> 
> 	void __iomem *regs = ioremap(...);
> 	uint32_t status = readl(regs + REG_STATUS);

No, you can certainly continue to use non-void pointers. The "void __iomem
*" case is just the typeless one, exactly equivalent to regular void
pointer usage.

So let me clarify my original post with two points:

 - if your device only supports MMIO, you might as well just use the old 
   interfaces. The new interface will _also_ work, but there is no real 
   advantage, unless you count the "pci_iomap()" as a simpler interface.

   The new interface is really only meaningful for things that want to 
   support _both_ PIO and MMIO. It's also, perhaps, a bit syntactically 
   easier to work with, so some people might prefer that for that 
   reason. See my comments further down on the auto-sizing. BUT it doesn't 
   make the old interfaces go away by any means, and I'm not even
   suggesting that people should re-write drivers just for the hell of it.

   In short: if you don't go "ooh, that will simplify XXX", then you 
   should just ignore the new interfaces.

 - you can _absolutely_ use other pointers than "void *". You should 
   annotate them with "__iomem" if you want to be sparse-clean (and it 
   often helps visually to clarify the issue), but gcc won't care, the 
   "__iomem" annotation is purely a extended check.

So you can absolutely still continue with

	struct mydev_iolayout {
		__u32 status;
		__u32 irqmask;
		...

	struct mydev_iolayout __iomem *regs = pci_map(...);
	status = ioread32(&regs.status);

which is often a lot more readable, and thus in fact _preferred_. It also
adds another level of type-checking, so I applaud drivers that do this.

Now, I'm _contemplating_ also allowing the "get_user()" kind of "unsized" 
access function for the new interface. Right now all the old (and the new) 
access functions are all explicitly sized. But for the "struct layout" 
case, it's actually often nice to just say

	status = ioread(&regs.status);

and the compiler can certainly figure out the size of the register on its
own. This was impossible with the old interface, because the old 
interfaces didn't even take a _pointer_, much less one that could be sized 
up automatically.

(The auto-sizing is something that "get_user()" and "put_user()" have
always done, and it makes them very easy to use. It involved a few pretty
ugly macros, but hey, that's all hidden away, and is actually pretty
simple in the end).

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:57           ` Linus Torvalds
@ 2004-09-15 18:06             ` Linus Torvalds
  2004-09-15 19:34             ` Greg KH
  2004-09-16 14:58             ` Horst von Brand
  2 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 18:06 UTC (permalink / raw)
  To: Jörn Engel; +Cc: Kernel Mailing List



On Wed, 15 Sep 2004, Linus Torvalds wrote:
> 
>    In short: if you don't go "ooh, that will simplify XXX", then you 
>    should just ignore the new interfaces.

Btw, to get an example of what _is_ simplified, look at 
drivers/scsi/libata-core.c:

	void ata_tf_load(struct ata_port *ap, struct ata_taskfile *tf)
	{
	        if (ap->flags & ATA_FLAG_MMIO)
	                ata_tf_load_mmio(ap, tf);
	        else
	                ata_tf_load_pio(ap, tf);
	}

and realize that "ata_tf_load_mmio()" and "ata_tf_load_pio()" are exactly 
the SAME FUNCTION. Except one uses MMIO, the other uses PIO. With the new 
setup, it literally collapses into one function, and code size goes down 
pretty dramatically. Not to mention making the code more readable.

For another example of this (of the static kind), look at something like 
drivers/net/8139too.c:

	#ifdef USE_IO_OPS

	#define RTL_R8(reg)             inb (((unsigned long)ioaddr) + (reg))
	#define RTL_R16(reg)            inw (((unsigned long)ioaddr) + (reg))
	#define RTL_R32(reg)            ((unsigned long) inl (((unsigned long)ioaddr) + (reg)))
	...

	#else

	...
	/* read MMIO register */
	#define RTL_R8(reg)             readb (ioaddr + (reg))
	#define RTL_R16(reg)            readw (ioaddr + (reg))
	#define RTL_R32(reg)            ((unsigned long) readl (ioaddr + (reg)))

see? In this case, USE_IO_OPS depends on a static configuration variable, 
namely CONFIG_8139TOO_PIO. So the user at _compile_ time has to decide 
whether he wants to use MMIO or PIO. See the Kconfig help file:

          This instructs the driver to use programmed I/O ports (PIO) instead 
          of PCI shared memory (MMIO).  This can possibly solve some problems
          in case your mainboard has memory consistency issues.  If unsure,
          say N.

In other words, the new interface is not designed to replace the old ones,
it's designed to help drivers like these, that either go to a lot of extra
pain in order to support both methods, or then have a _static_ config
option that makes it really hard for system vendors to just release one
driver that knows when it needs to use PIO and when it needs MMIO.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:45         ` Nikita Danilov
@ 2004-09-15 18:09           ` Linus Torvalds
  0 siblings, 0 replies; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 18:09 UTC (permalink / raw)
  To: Nikita Danilov; +Cc: Jörn Engel, Kernel Mailing List



On Wed, 15 Sep 2004, Nikita Danilov wrote:
> 
> Unfortunately it breaks even better identity
> 
>   foo *p;
> 
>   p + nr == (foo *)((char *)p + nr * sizeof *p)

No, gcc allows the above, by making sizeof(void) be 1.

And sane compilers would just inform the user at compile-time with a nice
readable message that he's doing something stupid.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:07       ` Linus Torvalds
  2004-09-15 17:32         ` Jörn Engel
  2004-09-15 17:45         ` Nikita Danilov
@ 2004-09-15 18:40         ` Chris Wedgwood
  2 siblings, 0 replies; 34+ messages in thread
From: Chris Wedgwood @ 2004-09-15 18:40 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jörn Engel, Kernel Mailing List

On Wed, Sep 15, 2004 at 10:07:29AM -0700, Linus Torvalds wrote:

> C doesn't. gcc does. It's a documented extension, and it acts like
> if it was done on a byte.

[...]

> It's a singularly good feature.

I dunno about that.  Maybe it is, but it has some gotchas.

Recently when doing a sparsification of code I noticed there are
places which essentially do things like:

    void *foo;

    [...]

    foo += bar * n;

Part of the fix (cleanup) was to change the 'void *foo' to
'gratuitous_typedef_t __user *foo' --- which silently breaks the math
if you don't explicitly check for this.


  --cw

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:57           ` Linus Torvalds
  2004-09-15 18:06             ` Linus Torvalds
@ 2004-09-15 19:34             ` Greg KH
  2004-09-15 19:53               ` Linus Torvalds
  2004-09-16 14:58             ` Horst von Brand
  2 siblings, 1 reply; 34+ messages in thread
From: Greg KH @ 2004-09-15 19:34 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: J?rn Engel, Kernel Mailing List

On Wed, Sep 15, 2004 at 10:57:25AM -0700, Linus Torvalds wrote:
> 
> So you can absolutely still continue with
> 
> 	struct mydev_iolayout {
> 		__u32 status;
> 		__u32 irqmask;
> 		...
> 
> 	struct mydev_iolayout __iomem *regs = pci_map(...);
> 	status = ioread32(&regs.status);
> 
> which is often a lot more readable, and thus in fact _preferred_. It also
> adds another level of type-checking, so I applaud drivers that do this.

Currently a few drivers do:
	status = readl(&regs.status);
which causes sparse warnings.

How should that code be changed to prevent this?  Convert them all to
ioread32()?  Or figure out a way to supress the warning for readl()?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 19:34             ` Greg KH
@ 2004-09-15 19:53               ` Linus Torvalds
  2004-09-15 20:06                 ` Greg KH
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 19:53 UTC (permalink / raw)
  To: Greg KH; +Cc: J?rn Engel, Kernel Mailing List



On Wed, 15 Sep 2004, Greg KH wrote:
> 
> Currently a few drivers do:
> 	status = readl(&regs.status);

I assume that's "&regs->status", since regs had better be a pointer..

> which causes sparse warnings.
> 
> How should that code be changed to prevent this?  Convert them all to
> ioread32()?  Or figure out a way to supress the warning for readl()?

Just make sure that you annotate "regs" as a pointer to IO space.

Ie if you have

	struct xxxx __iomem *regs;

then everything will be fine - sparse knows that "regs" is a IO pointer,
and that obviously makes "&regs->member" _also_ an IO pointer.

You'll need that __iomem annotation anyway, since otherwise sparse would
complain when you do the assignment from the "ioremap()" call (and the
thing had better come from ioremap() some way, or it's not valid in the
first place to do "readl()" on it).

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 19:53               ` Linus Torvalds
@ 2004-09-15 20:06                 ` Greg KH
  0 siblings, 0 replies; 34+ messages in thread
From: Greg KH @ 2004-09-15 20:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: J?rn Engel, Kernel Mailing List

On Wed, Sep 15, 2004 at 12:53:51PM -0700, Linus Torvalds wrote:
> 
> 
> On Wed, 15 Sep 2004, Greg KH wrote:
> > 
> > Currently a few drivers do:
> > 	status = readl(&regs.status);
> 
> I assume that's "&regs->status", since regs had better be a pointer..

Yes, sorry.

> > which causes sparse warnings.
> > 
> > How should that code be changed to prevent this?  Convert them all to
> > ioread32()?  Or figure out a way to supress the warning for readl()?
> 
> Just make sure that you annotate "regs" as a pointer to IO space.

Ah, ok, that works.  Thanks for the clarification, I now realize that
__iomem can be a marker for any type of pointer, not just a void.
That's where I was confused.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:39         ` Linus Torvalds
@ 2004-09-15 20:07           ` Russell King
  0 siblings, 0 replies; 34+ messages in thread
From: Russell King @ 2004-09-15 20:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roland Dreier, Jörn Engel, Kernel Mailing List

On Wed, Sep 15, 2004 at 10:39:02AM -0700, Linus Torvalds wrote:
> In other words, think of "void *" as a pointer to storage. Not "char"  
> (which is the C name for a signed byte),

Common Programming Error #99: "char" is implementation whether it is
signed or may be unsigned.  Only a "char" type qualified by "signed"
or "unsigned" can be relied upon to have the requested property.

-- 
Russell King
 Linux kernel    2.6 ARM Linux   - http://www.arm.linux.org.uk/
 maintainer of:  2.6 PCMCIA      - http://pcmcia.arm.linux.org.uk/
                 2.6 Serial core

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:30   ` Being more anal about iospace accesses Linus Torvalds
                       ` (3 preceding siblings ...)
  2004-09-15 17:23     ` Richard B. Johnson
@ 2004-09-15 22:21     ` Deepak Saxena
  2004-09-15 22:46       ` Linus Torvalds
  2004-09-15 22:29     ` Roland Dreier
  5 siblings, 1 reply; 34+ messages in thread
From: Deepak Saxena @ 2004-09-15 22:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

On Sep 15 2004, at 09:30, Linus Torvalds was caught saying:
> At the same time, we've had the proper "accessor" functions (read[bwl](), 
> write[bwl]() and friends) that on purpose dropped all type information 
> from the MMIO pointer, mostly just because of historical reasons, and as a 
> result some drivers didn't use a pointer at all, but some kind of integer. 
> Sometimes even one that couldn't _fit_ a MMIO address in it on a 64-bit 
> machine.

Linus,

Since we are on the subject of io-access, I would like a
clarification/opinion on the read*/write* & in*/out* accessors 
(and now the ioread/write equivalents). Are these functions only meant 
to be used for PCI memory-mapped devices or _any_ memory mapped devices? 
Same with ioremap(). I ask because there are bits of code in the
kernel that use these on non-PCI devices and this sometimes causes 
some complication in platform-level code. For example, b/c of
the way PCI access work on the IXP4xx (indirect access via register
read/writes), we have to do the following to differentiate b/w
PCI and non-PCI devices (include/asm-arm/arch-ixp4xx/io.h):

static inline void *
__ixp4xx_ioremap(unsigned long addr, size_t size, unsigned long flags, unsigned long align)
{
	extern void * __ioremap(unsigned long, size_t, unsigned long, unsigned long);
	if((addr < 0x48000000) || (addr > 0x4fffffff))
		return __ioremap(addr, size, flags, align);

	return (void *)addr;
}

static inline void 
__ixp4xx_writeb(u8 value, u32 addr)
{
	u32 n, byte_enables, data;

	if (addr >= VMALLOC_START) {
		__raw_writeb(value, addr);
		return;
	}

	n = addr % 4;
	byte_enables = (0xf & ~BIT(n)) << IXP4XX_PCI_NP_CBE_BESL;
	data = value << (8*n);
	ixp4xx_pci_write(addr, byte_enables | NP_CMD_MEMWRITE, data);
}

#define	writeb(p, v)			__ixp4xx_writeb(p, v)

(0x48000000 -> 0x4fffffff is the PCI memory window on this CPU).

While this is not a huge level of uglyness, I have systems where 
this is going to get much uglier b/c we have overlapping addresses
on different buses and we need to be able to differentiate accesses
It raises the question of whether we need a different interface 
for non-PCI devices, if we should be passing a struct device into all 
the I/O accessors functions to make it easier for platform code to 
determine what to do, or if we should make I/O access functions a 
property of devices. So instead of doing read*/write/in*/out*, we 
would do either:

a) pass device into io-access routines:

	cookie = iomap(dev, foo, len);
	bar = read32(dev, cookie + offset);

or

b) make access routines a function of the devices themselves

	cookie = dev->iomap(foo, len);
	bar = dev->read32(cookie + REG_OFFSET);

The former is nicer b/c it allows the dev to be ignored on systems where
we do not care about PCI vs non-PCI devices.

Comments?

~Deepak

-- 
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

"Unlike me, many of you have accepted the situation of your imprisonment
and will die here like rotten cabbages." - Number 6

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 16:30   ` Being more anal about iospace accesses Linus Torvalds
                       ` (4 preceding siblings ...)
  2004-09-15 22:21     ` Deepak Saxena
@ 2004-09-15 22:29     ` Roland Dreier
  2004-09-15 23:26       ` Being more careful " Linus Torvalds
  5 siblings, 1 reply; 34+ messages in thread
From: Roland Dreier @ 2004-09-15 22:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Kernel Mailing List

Linus, while we're on the subject of new sparse checks, could you give
a quick recap of the semantics of the new __leXX types (and what
__bitwise means to sparse)?  I don't think I've ever seen this stuff
described on LKML.

Thanks,
  Roland

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 22:21     ` Deepak Saxena
@ 2004-09-15 22:46       ` Linus Torvalds
  2004-09-15 23:09         ` Deepak Saxena
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 22:46 UTC (permalink / raw)
  To: Deepak Saxena; +Cc: Kernel Mailing List



On Wed, 15 Sep 2004, Deepak Saxena wrote:
> 
> Since we are on the subject of io-access, I would like a
> clarification/opinion on the read*/write* & in*/out* accessors 
> (and now the ioread/write equivalents). Are these functions only meant 
> to be used for PCI memory-mapped devices or _any_ memory mapped devices? 

It really depends on the bus architecture.

At some point, if the bus is different enough from a "normal" setup, you 
should just use your own accessor functions. Trying to overload 
"readl/writel" is just too painful.

However, at that point you should also realize that you can't re-use _any_ 
of the existing chip drivers, and you'll have to write your own. If the 
bus is exotic enough, that's not a problem, and you'd have to do that 
anyway. But there really aren't all that many "exotic" buses around any 
more.

Quite frankly, of your two suggested interfaces, I would select neither. 
I'd just say that if your bus is special enough, just write your own 
drivers, and use

	cookie = ixp4xx_iomap(dev, xx);
	...
	ixp4xx_iowrite(val, cookie + offset);

which is perfectly valid. You don't have to make these devices even _look_ 
like a PCI device. Why should you?

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 22:46       ` Linus Torvalds
@ 2004-09-15 23:09         ` Deepak Saxena
  2004-09-16 12:51           ` Geert Uytterhoeven
  0 siblings, 1 reply; 34+ messages in thread
From: Deepak Saxena @ 2004-09-15 23:09 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

On Sep 15 2004, at 15:46, Linus Torvalds was caught saying:
> Quite frankly, of your two suggested interfaces, I would select neither. 
> I'd just say that if your bus is special enough, just write your own 
> drivers, and use
> 
> 	cookie = ixp4xx_iomap(dev, xx);
> 	...
> 	ixp4xx_iowrite(val, cookie + offset);
> 
> which is perfectly valid. You don't have to make these devices even _look_ 
> like a PCI device. Why should you?

Problem is that some of those devices are not that special. For example,
the on-board 16550 is accessed using readb/writeb in the 8250.c driver.
I don't think we want to add that level of low-level detail to that
driver and instead should just hide it in the platform code. I look
at it from the point of view that the driver should not care about how
the access actually occurs on the bus. It just says, write data foo at
location bar regardless of whether bar is ISA, PCI, on-chip, RapidIO,
etc and that writing of the data is hidden in the implementation of
the accessor API.

~Deepak


-- 
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

"Unlike me, many of you have accepted the situation of your imprisonment
and will die here like rotten cabbages." - Number 6

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more careful about iospace accesses..
  2004-09-15 22:29     ` Roland Dreier
@ 2004-09-15 23:26       ` Linus Torvalds
  2004-09-16  0:10         ` viro
  2004-09-16 12:25         ` David Woodhouse
  0 siblings, 2 replies; 34+ messages in thread
From: Linus Torvalds @ 2004-09-15 23:26 UTC (permalink / raw)
  To: Roland Dreier, Al Viro; +Cc: Kernel Mailing List


[ Subject line changed to avoid getting caught as spam ;]

On Wed, 15 Sep 2004, Roland Dreier wrote:
>
> Linus, while we're on the subject of new sparse checks, could you give
> a quick recap of the semantics of the new __leXX types (and what
> __bitwise means to sparse)?  I don't think I've ever seen this stuff
> described on LKML.

[ The bitwise checks are actually by Al Viro, but I'll explain the basic
  idea. Al is Cc'd so that he can add any corrections or extensions. ]

Sparse allows a number of extra type qualifiers, including address spaces 
and various random extra restrictions on what you can do with them. There 
are "context" bits that allow you to use a symbol or type only in certain 
contexts, for example, and there are type qualifiers like "noderef" that 
just say that a pointer cannot be dereferenced (it looks _exactly_ like a 
pointer in all other respects, but trying to actually access anything 
through it will cause a sparse warning).

The "bitwise" attribute is very much like the "noderef" one, in that it
restricts how you can use an expression of that type. Unlike "noderef",
it's designed for integer types, though. In fact, sparse will refuse to
apply the bitwise attribute to non-integer types.

As the name suggests, a "bitwise" expression is one that is restricted to
only a certain "bitwise" operations that make sense within that class. In
particular, you can't mix a "bitwise" class with a normal integer
expression (the constant zero happens to be special, since it's "safe"  
for all bitwise ops), and in fact you can't even mix it with _another_
bitwise expression of a different type.

And when I say "different", I mean even _slightly_ different. Each typedef 
creates a type of it's own, and will thus create a bitwise type that is 
not compatible with anything else. So if you declare

	int __bitwise i;
	int __bitwise j;

the two variables "i" and "j" are _not_ compatible, simply because they
were declared separately, while in the case of

	int __bitwise i, j;

they _are_ compatible. The above is a horribly contrieved example, as it
shows an extreme case that doesn't make much sense, but it shows how
"bitwise" always creates its own new "class".

Normally you'd always use "__bitwise"  in a typedef, which effectively
makes that particular typedef one single "bitwise class". After that, you 
can obviously declare any number of variables in that class.

Now apart from the classes having to match, "bitwise" as it's name
suggests, also restricts all operations within that class to a subset of
"bit-safe" operations. For example, addition isn't "bit-safe", since
clearly the carry-chain moves bits around. But you can do normal bit-wise
operations, and you can compare the values against other values in the
same class, since those are all "bit-safe".

Oh, as an example of something that isn't obviously bit-safe: look out for
things like bit negation: doing a ~ is ok on an bitwise "int" type, but it
is _not_ ok on a bitwise "short" or "char". Why?  Because on a bitwise
"int" you actually stay within the type. But doing the same thing on a
short or char will move "outside" the type by virtue of setting the high
bits (normal C semantics: a short gets promoted to an "int", so doign a
bitwise negation on a short will actually set the high bits).

So as far as sparse is concerned, a "bitwise" type is not really so much 
about endianness as it is about making sure bits are never lost or moved 
around.

For example, you can use the bitwise operation to verify the __GFP_XXX 
mask bits. Right now they are just regular integers, which means that you 
can write

	kmalloc(GFP_KERNEL, size);

and the compiler will not notice anything wrong. But something is
_seriously_ wrong: the GFP_KERNEL should be the _second_ argument. If we
mark it to be a "bitwise" type (which it is), that bug would have been
noticed immediately, and you could still do all the operations that are
valid of GFP_xxx values.

See the usage?

In the byte-order case, what we have is:

	typedef __u16 __bitwise __le16;
	typedef __u16 __bitwise __be16;
	typedef __u32 __bitwise __le32;
	typedef __u32 __bitwise __be32;
	typedef __u64 __bitwise __le64;
	typedef __u64 __bitwise __be64;

and if you think about the above rules about what is acceptable for 
bitwise types, you'll likely immediately notivce that it automatically 
means

 - you can never assign a __le16 type to any other integer type or any 
   other bitwise type. You'd get a warnign about incompatible types. Makes 
   sense, no?
 - you can only do operations that are safe within that byte order. For 
   example, it is safe to do a bitwise "&" on two __le16 values. Clearly 
   the result is meaningful.
 - if you want to go outside that bitwise type, you have to convert it 
   properly first. For example, if you want to add a constant to a __le16 
   type, you can do so, but you have to use the proper sequence:

	__le16 sum, a, b;

	sum = a + b;	/* INVALID! "warning: incompatible types for operation (+)" */
	sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b));	/* Ok */

See? 

In short, "bitwise" is about more than just byte-order, but the semantics 
of bitwise-restricted ops happen to be the semantics that are valid for 
byte-order operations too.

Oh, btw, right now you only get the warnings from sparse if you use
"-Wbitwise" on the command line. Without that, sparse will ignore the
bitwise attribute.

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more careful about iospace accesses..
  2004-09-15 23:26       ` Being more careful " Linus Torvalds
@ 2004-09-16  0:10         ` viro
  2004-09-16 11:40           ` David Woodhouse
  2004-09-16 12:25         ` David Woodhouse
  1 sibling, 1 reply; 34+ messages in thread
From: viro @ 2004-09-16  0:10 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roland Dreier, Kernel Mailing List

On Wed, Sep 15, 2004 at 04:26:12PM -0700, Linus Torvalds wrote:
 
>    other bitwise type. You'd get a warnign about incompatible types. Makes 
>    sense, no?
>  - you can only do operations that are safe within that byte order. For 
>    example, it is safe to do a bitwise "&" on two __le16 values. Clearly 
>    the result is meaningful.

BTW, so far the most frequent class of endianness bugs had been along the
lines of
	foo->le16_field = cpu_to_le32(12);
and vice versa.  On big-endian it's a guaranteed FUBAR - think carefully about
the value that will end up there.

> Oh, btw, right now you only get the warnings from sparse if you use
> "-Wbitwise" on the command line. Without that, sparse will ignore the
> bitwise attribute.

We probably want __attribute__((opaque)) in addition to bitwise - e.g. for
the handles of all sorts passed in network filesystem protocols.  I'll look
into that when we get the endianness warnings somewhat under control.

For now I'm going to #define __opaque __bitwise and use it for stuff like
	typedef __u32 __opaque cifs_fid;
etc.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more careful about iospace accesses..
  2004-09-16  0:10         ` viro
@ 2004-09-16 11:40           ` David Woodhouse
  0 siblings, 0 replies; 34+ messages in thread
From: David Woodhouse @ 2004-09-16 11:40 UTC (permalink / raw)
  To: viro; +Cc: Linus Torvalds, Roland Dreier, Kernel Mailing List

On Thu, 2004-09-16 at 01:10 +0100,
viro@parcelfarce.linux.theplanet.co.uk wrote:
> On Wed, Sep 15, 2004 at 04:26:12PM -0700, Linus Torvalds wrote:
>  
> >    other bitwise type. You'd get a warnign about incompatible types. Makes 
> >    sense, no?
> >  - you can only do operations that are safe within that byte order. For 
> >    example, it is safe to do a bitwise "&" on two __le16 values. Clearly 
> >    the result is meaningful.
> 
> BTW, so far the most frequent class of endianness bugs had been along the
> lines of
> 	foo->le16_field = cpu_to_le32(12);
> and vice versa.  On big-endian it's a guaranteed FUBAR - think carefully about
> the value that will end up there.

Is that really more frequent than just 'foo->le16_field = 12' ? 
I'm surprised. 

Certainly, it was the frequency of pure assignment without _any_ attempt
at byte-swapping which caused me to introduce the 'jint32_t' et al
structures in jffs2, which even gcc then bitches about if you use them
wrongly.

I suppose I can ditch those now -- I always intended to after a while
anyway.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more careful about iospace accesses..
  2004-09-15 23:26       ` Being more careful " Linus Torvalds
  2004-09-16  0:10         ` viro
@ 2004-09-16 12:25         ` David Woodhouse
  2004-09-16 14:01           ` Linus Torvalds
  1 sibling, 1 reply; 34+ messages in thread
From: David Woodhouse @ 2004-09-16 12:25 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Roland Dreier, Al Viro, Kernel Mailing List

On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote:
>  - if you want to go outside that bitwise type, you have to convert it 
>    properly first. For example, if you want to add a constant to a __le16 
>    type, you can do so, but you have to use the proper sequence:
> 
> 	__le16 sum, a, b;
> 
> 	sum = a + b;	/* INVALID! "warning: incompatible types for operation (+)" */
> 	sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b));	/* Ok */
> 
> See? 

Yeah right, that latter case is _so_ much more readable, and makes it
_so_ easy for the compiler to optimise precisely when it wants to do the
byte-swapping, especially if the back end has load-and-swap or
store-and-swap instructions. :)

It's even nicer when it ends up as:

	sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b));	/* Ok */
	sum |= c;
	sum = cpu_to_le16(le16_to_cpu(sum) + le16_to_cpu(d));

I'd really quite like to see the real compiler know about endianness,
too. I dare say I _could_ optimise the above (admittedly contrived but
not _so_ unlikely) case, but I don't _want_ to hand-optimise my code --
that's what I keep a compiler _for_.

-- 
dwmw2



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 23:09         ` Deepak Saxena
@ 2004-09-16 12:51           ` Geert Uytterhoeven
  2004-09-16 22:10             ` Deepak Saxena
  0 siblings, 1 reply; 34+ messages in thread
From: Geert Uytterhoeven @ 2004-09-16 12:51 UTC (permalink / raw)
  To: Deepak Saxena; +Cc: Linus Torvalds, Linux Kernel Development

On Wed, 15 Sep 2004, Deepak Saxena wrote:
> On Sep 15 2004, at 15:46, Linus Torvalds was caught saying:
> > Quite frankly, of your two suggested interfaces, I would select neither.
> > I'd just say that if your bus is special enough, just write your own
> > drivers, and use
> >
> > 	cookie = ixp4xx_iomap(dev, xx);
> > 	...
> > 	ixp4xx_iowrite(val, cookie + offset);
> >
> > which is perfectly valid. You don't have to make these devices even _look_
> > like a PCI device. Why should you?
>
> Problem is that some of those devices are not that special. For example,
> the on-board 16550 is accessed using readb/writeb in the 8250.c driver.

Just what I was thinking...

> I don't think we want to add that level of low-level detail to that
> driver and instead should just hide it in the platform code. I look
> at it from the point of view that the driver should not care about how
> the access actually occurs on the bus. It just says, write data foo at
> location bar regardless of whether bar is ISA, PCI, on-chip, RapidIO,
> etc and that writing of the data is hidden in the implementation of
> the accessor API.

While 16550 serial is a bad example since it does byte accesses only (and thus
doesn't suffer from endianness), I have no problem to imagine there exist
platforms where you have multiple instances of a `standard' (cfr. 16550 serial)
device block, while each of them must be accessed differently:
  - one of them is in PCI I/O space (little endian)
  - one of them is in PCI MMIO space (little endian)
  - one of them is on-chip MMIO (big endian)
  - one of them is somewhere else, but sparsely addressed (some bytes of
    padding between each register)
and we can for sure come up with a few more examples of weird addressing.

How to solve this, without cluttering each ioread*() with many if clauses?

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more careful about iospace accesses..
  2004-09-16 12:25         ` David Woodhouse
@ 2004-09-16 14:01           ` Linus Torvalds
  2004-09-18  9:46             ` Kai Henningsen
  0 siblings, 1 reply; 34+ messages in thread
From: Linus Torvalds @ 2004-09-16 14:01 UTC (permalink / raw)
  To: David Woodhouse; +Cc: Roland Dreier, Al Viro, Kernel Mailing List



On Thu, 16 Sep 2004, David Woodhouse wrote:

> On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote:
> >  - if you want to go outside that bitwise type, you have to convert it 
> >    properly first. For example, if you want to add a constant to a __le16 
> >    type, you can do so, but you have to use the proper sequence:
> > 
> > 	__le16 sum, a, b;
> > 
> > 	sum = a + b;	/* INVALID! "warning: incompatible types for operation (+)" */
> > 	sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b));	/* Ok */
> > 
> > See? 
> 
> Yeah right, that latter case is _so_ much more readable

It's not about readability.

It's about the first case being WRONG!

You can't add two values in the wrong byte-order. It's not an operation 
that makes sense. You _have_ to convert them to CPU byte order first.

I certainly agree that the first version "looks nicer". 

> It's even nicer when it ends up as:
> 
> 	sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b));	/* Ok */
> 	sum |= c;
> 	sum = cpu_to_le16(le16_to_cpu(sum) + le16_to_cpu(d));

This is actually the strongest argument _against_ hiding endianness in the 
compiler, or hiding it behind macros. Make it very explicit, and just make 
sure there are tools (ie 'sparse') that can tell you when you do something 
wrong.

Any programmer who sees the above will go "well that's stupid", and 
rewrite it as something saner instead. You can certainly rewrite it as

	cpu_sum = le16_to_cpu(a) + le16_to_cpu(b);
	cpu_sum |= le16_to_cpu(c);
	cpu_sum += le16_to_cpu(d);
	sum = cpu_to_le16(d);

which gets rid of the double conversions. 

But if you hide the endianness in macro's, you'll never see the mess at 
all, and won't be able to fix it.

> I'd really quite like to see the real compiler know about endianness,
> too.

I would have agreed with you some time ago. Having been bitten by too damn 
many bompiler bugs I'e become convinced that the compiler doing things 
behind your back to "help" you just isn't worth it. Not in a kernel, at 
least. It's much better to build up good typechecking and the 
infrastructure to help you get the job done.

Expressions like the above might happen once or twice in a project with
several million lines of code. It's just not worth compiler infrastructure
for - that just makes people use it as if it is free, and impossible to
find the bugs when they _do_ happen. Much better to have a type system 
that can warn about the bad uses, but that doesn't actually change any of 
the code generated..

		Linus

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-15 17:57           ` Linus Torvalds
  2004-09-15 18:06             ` Linus Torvalds
  2004-09-15 19:34             ` Greg KH
@ 2004-09-16 14:58             ` Horst von Brand
  2 siblings, 0 replies; 34+ messages in thread
From: Horst von Brand @ 2004-09-16 14:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jörn Engel, Kernel Mailing List

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

Linus Torvalds <torvalds@osdl.org> said:
> On Wed, 15 Sep 2004, Jörn Engel wrote:
> > 
> > But it still leaves me confused.  Before I had this code:

[...]

> No, you can certainly continue to use non-void pointers. The "void __iomem
> *" case is just the typeless one, exactly equivalent to regular void
> pointer usage.
> 
> So let me clarify my original post with two points:

[...]

Could you please put the clarified message into Documentation? It would be
a shame if it got lost.
-- 
Dr. Horst H. von Brand                   User #22616 counter.li.org
Departamento de Informatica                     Fono: +56 32 654431
Universidad Tecnica Federico Santa Maria              +56 32 654239
Casilla 110-V, Valparaiso, Chile                Fax:  +56 32 797513

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more anal about iospace accesses..
  2004-09-16 12:51           ` Geert Uytterhoeven
@ 2004-09-16 22:10             ` Deepak Saxena
  0 siblings, 0 replies; 34+ messages in thread
From: Deepak Saxena @ 2004-09-16 22:10 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Linus Torvalds, Linux Kernel Development

On Sep 16 2004, at 14:51, Geert Uytterhoeven was caught saying:
> While 16550 serial is a bad example since it does byte accesses only (and thus
> doesn't suffer from endianness), I have no problem to imagine there exist
> platforms where you have multiple instances of a `standard' (cfr. 16550 serial)

No longer true. We now have UPIO_IOMEM32 for some of the fscked up HW 
that large silicon manufacturer has released with 32-bit wide registers
that must be accessed as full 32-bits.

> device block, while each of them must be accessed differently:
>   - one of them is in PCI I/O space (little endian)
>   - one of them is in PCI MMIO space (little endian)
>   - one of them is on-chip MMIO (big endian)
>   - one of them is somewhere else, but sparsely addressed (some bytes of
>     padding between each register)
> and we can for sure come up with a few more examples of weird addressing.
> 
> How to solve this, without cluttering each ioread*() with many if clauses?

If clauses will still be needed, the only difference is that instead
of basing them on hardcoded addresses we now base them on
the devices coming in. 

~Deepak

-- 
Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/

"Unlike me, many of you have accepted the situation of your imprisonment
and will die here like rotten cabbages." - Number 6

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: Being more careful about iospace accesses..
  2004-09-16 14:01           ` Linus Torvalds
@ 2004-09-18  9:46             ` Kai Henningsen
  0 siblings, 0 replies; 34+ messages in thread
From: Kai Henningsen @ 2004-09-18  9:46 UTC (permalink / raw)
  To: linux-kernel

torvalds@osdl.org (Linus Torvalds)  wrote on 16.09.04 in <Pine.LNX.4.58.0409160652460.2333@ppc970.osdl.org>:

> On Thu, 16 Sep 2004, David Woodhouse wrote:
>
> > On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote:
> > >  - if you want to go outside that bitwise type, you have to convert it
> > >    properly first. For example, if you want to add a constant to a
> > >    __le16 type, you can do so, but you have to use the proper sequence:
> > >
> > > 	__le16 sum, a, b;
> > >
> > > 	sum = a + b;	/* INVALID! "warning: incompatible types for operation
> > > (+)" */ 	sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b));	/* Ok */
> > >
> > > See?
> >
> > Yeah right, that latter case is _so_ much more readable
>
> It's not about readability.
>
> It's about the first case being WRONG!

... in general.

And on the machines where it works (le machines), I'd certainly expect  
those conversion functions to be trivially eliminated by the compiler (ie,  
they're either trivial macros or trivial inline functions).

> > I'd really quite like to see the real compiler know about endianness,
> > too.

Well, these day, optimizers often can recognize the usual endianness  
conversion idioms, so the compiler still gets a chance at inserting your  
load-with-swap or whatever.

MfG Kai

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2004-09-18 13:06 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Pine.LNX.4.58.0409081543320.5912@ppc970.osdl.org>
     [not found] ` <Pine.LNX.4.58.0409150737260.2333@ppc970.osdl.org>
2004-09-15 16:30   ` Being more anal about iospace accesses Linus Torvalds
2004-09-15 16:54     ` Jörn Engel
2004-09-15 17:07       ` Linus Torvalds
2004-09-15 17:32         ` Jörn Engel
2004-09-15 17:57           ` Linus Torvalds
2004-09-15 18:06             ` Linus Torvalds
2004-09-15 19:34             ` Greg KH
2004-09-15 19:53               ` Linus Torvalds
2004-09-15 20:06                 ` Greg KH
2004-09-16 14:58             ` Horst von Brand
2004-09-15 17:45         ` Nikita Danilov
2004-09-15 18:09           ` Linus Torvalds
2004-09-15 18:40         ` Chris Wedgwood
2004-09-15 17:07       ` Jeff Garzik
2004-09-15 17:16       ` Roland Dreier
2004-09-15 17:39         ` Linus Torvalds
2004-09-15 20:07           ` Russell King
2004-09-15 17:36       ` Horst von Brand
2004-09-15 17:40       ` Brian Gerst
2004-09-15 16:56     ` Dave Jones
2004-09-15 17:19     ` Roger Luethi
2004-09-15 17:23     ` Richard B. Johnson
2004-09-15 22:21     ` Deepak Saxena
2004-09-15 22:46       ` Linus Torvalds
2004-09-15 23:09         ` Deepak Saxena
2004-09-16 12:51           ` Geert Uytterhoeven
2004-09-16 22:10             ` Deepak Saxena
2004-09-15 22:29     ` Roland Dreier
2004-09-15 23:26       ` Being more careful " Linus Torvalds
2004-09-16  0:10         ` viro
2004-09-16 11:40           ` David Woodhouse
2004-09-16 12:25         ` David Woodhouse
2004-09-16 14:01           ` Linus Torvalds
2004-09-18  9:46             ` Kai Henningsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).