* Being more anal about iospace accesses.. [not found] ` <Pine.LNX.4.58.0409150737260.2333@ppc970.osdl.org> @ 2004-09-15 16:30 ` Linus Torvalds 2004-09-15 16:54 ` Jörn Engel ` (5 more replies) 0 siblings, 6 replies; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 16:30 UTC (permalink / raw) To: Kernel Mailing List This is a background mail mainly for driver writers and/or architecture people. Or others that are just interested in really low-level hw access details. Others - please feel free to ignore. [ This has been discussed to some degree already on the architecture mailing lists and obviously among the people who actually worked on it, but I thought I'd bounce it off linux-kernel too, in order to make people more aware of what the new type-checking does. Most people may have seen it as only generating a ton of new warnings for some crufty device drivers. ] The background for this iospace type-checking change is that we've long had some serious confusion about how to access PCI memory mapped IO (MMIO), mainly because on a PC (and some non-PC's too) that IO really does look like regular memory, so you can have a driver that just accesses a pointer directly, and it will actually work on most machines. At the same time, we've had the proper "accessor" functions (read[bwl](), write[bwl]() and friends) that on purpose dropped all type information from the MMIO pointer, mostly just because of historical reasons, and as a result some drivers didn't use a pointer at all, but some kind of integer. Sometimes even one that couldn't _fit_ a MMIO address in it on a 64-bit machine. In short, the PCI MMIO access case was largely the same as the user pointer case, except the access functions were different (readb vs get_user) and they were even less lax about checking for sanity. At least the user access code required a pointer with the right size. We've been very successful in annotating user pointers, and that found a couple of bugs, and more importantly it made the kernel code much more "aware" of what kind of pointer was passed around. In general, a big success, I think. And an obvious example for what MMIO pointers should do. So lately, the kernel infrastructure for MMIO accesses has become a _lot_ more strict about what it accepts. Not only do the MMIO access functions want a real pointer (which is already more type-checking than we did before, and causes gcc to spew out lots of warnings for some drivers), but as with user pointers, sparse annotations mark them as being in a different address space, and building the kernel with checking on will warn about mixing up address spaces. So far so good. So right now the current snapshots (and 2.6.9-rc2) have this enabled, and some drivers will be _very_ noisy when compiled. Most of the regular ones are fine, so maybe people haven't even noticed it that much, but some of them were using things like "u32" to store MMIO pointers, and are generally extremely broken on anything but an x86. We'll hopefully get around to fixing them up eventually, but in the meantime this should at least explain the background for some of the new noise people may see. Perhaps even more interesting is _another_ case of driver, though: one that started warning not because it was ugly and broken, but because it did something fairly rare but something that does happen occasionally: it mixed PIO and MMIO accesses on purpose, because it drove hardware that literally uses one or the other. Sometimes such a "mixed interface" driver does it based on a compile option that just #defines 'writel()' to 'inl()', sometimes it's a runtime decision depending on the hardware or configuration. The anal typechecking obviously ended up being very unhappy about this, since it wants "void __iomem *" for MMIO pointers, and a normal "unsigned long" for PIO accesses. The compile-time option could have been easily fixed up by adding the proper cast when re-defining the IO accessor, but that doesn't work for the dynamic case. Also, the compile-time switchers often really _wanted_ to be dynamic, but it was just too painful with the regular Linux IO interfaces to duplicate the code and do things conditionally one way or the other. To make a long story even longer: rather than scrapping the typechecking, or requiring drivers to do strange and nasty casts all over the place, there's now a new interface in town. It's called "iomap", because it extends the old "ioremap()" interface to work on the PIO accesses too. That way, the drivers that really want to mix both PIO and MMIO accesses can very naturally do it: they just need to remap the PIO space too, the same way that we've required people to remap the MMIO space for a long long time. For example, if you don't know (or, more importantly - don't care) what kind of IO interface you use, you can now do something like void __iomem * map = pci_iomap(dev, bar, maxbytes); ... status = ioread32(map + DRIVER_STATUS_OFFSET); and it will do the proper IO mapping for the named PCI BAR for that device. Regardless of whether the BAR was an IO or MEM mapping. Very convenient for cases where the hardware migt expose its IO window in either (or sometimes both). Nothing in the current tree actually uses this new interface, although Jeff has patches for SATA for testing (and they clean up the code quite noticeably, never mind getting rid of the warnings). The interface has been implemented by yours truly for x86 and ppc64, and David did a first-pass version for sparc64 too (missing the "xxxx_rep()" functions that were added a bit later, I believe). So far experience seems to show that it's a very natural interface for most non-x86 hardware - they all tend to map in both PIO and MMIO into one address space _anyway_, so the two aren't really any different. It's mainly just x86 and it's ilk that actually have two different interfaces for the two kinds of PCI accesses, and at least in that case it's trivial to encode the difference in the virtual ioremap pointer. The best way to explain the interface is to just point you guys at the <asm-generic/iomap.h> file, which isn't very big, has about as much comments than code, and contains nothing but the necessary function declarations. The actual meaning of the functions should be pretty obvious even without the comments. Feel free to flame or discuss rationally, Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:30 ` Being more anal about iospace accesses Linus Torvalds @ 2004-09-15 16:54 ` Jörn Engel 2004-09-15 17:07 ` Linus Torvalds ` (4 more replies) 2004-09-15 16:56 ` Dave Jones ` (4 subsequent siblings) 5 siblings, 5 replies; 34+ messages in thread From: Jörn Engel @ 2004-09-15 16:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote: > > For example, if you don't know (or, more importantly - don't care) what > kind of IO interface you use, you can now do something like > > void __iomem * map = pci_iomap(dev, bar, maxbytes); > ... > status = ioread32(map + DRIVER_STATUS_OFFSET); C now supports pointer arithmetic with void*? I hope the width of a void is not architecture dependent, that would introduce more subtle bugs. Jörn -- Sometimes, asking the right question is already the answer. -- Unknown ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:54 ` Jörn Engel @ 2004-09-15 17:07 ` Linus Torvalds 2004-09-15 17:32 ` Jörn Engel ` (2 more replies) 2004-09-15 17:07 ` Jeff Garzik ` (3 subsequent siblings) 4 siblings, 3 replies; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 17:07 UTC (permalink / raw) To: Jörn Engel; +Cc: Kernel Mailing List On Wed, 15 Sep 2004, Jörn Engel wrote: > > C now supports pointer arithmetic with void*? C doesn't. gcc does. It's a documented extension, and it acts like if it was done on a byte. See gcc's user guide "Extension to the C Language Family". It's a singularly good feature. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:07 ` Linus Torvalds @ 2004-09-15 17:32 ` Jörn Engel 2004-09-15 17:57 ` Linus Torvalds 2004-09-15 17:45 ` Nikita Danilov 2004-09-15 18:40 ` Chris Wedgwood 2 siblings, 1 reply; 34+ messages in thread From: Jörn Engel @ 2004-09-15 17:32 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, 15 September 2004 10:07:29 -0700, Linus Torvalds wrote: > On Wed, 15 Sep 2004, Jörn Engel wrote: > > > > C now supports pointer arithmetic with void*? > > C doesn't. gcc does. It's a documented extension, and it acts like if it > was done on a byte. > > See gcc's user guide "Extension to the C Language Family". > > It's a singularly good feature. Nice. But it still leaves me confused. Before I had this code: struct regs { uint32_t status; ... } ... struct regs *regs = ioremap(...); uint32_t status = regs->status; ... So now I should do it like this: #define REG_STATUS 0 ... void __iomem *regs = ioremap(...); uint32_t status = readl(regs + REG_STATUS); ... But wait, that only works when long is 32bit wide. Plus I could be stupid enough and "#define REG_STATUS 64" while the register space is just 64 bytes long. It solves the confusion about address spaces, agreed, but overall I'm more confused now. Hope it's just temporary. Jörn -- There is no worse hell than that provided by the regrets for wasted opportunities. -- Andre-Louis Moreau in Scarabouche ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:32 ` Jörn Engel @ 2004-09-15 17:57 ` Linus Torvalds 2004-09-15 18:06 ` Linus Torvalds ` (2 more replies) 0 siblings, 3 replies; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 17:57 UTC (permalink / raw) To: Jörn Engel; +Cc: Kernel Mailing List On Wed, 15 Sep 2004, Jörn Engel wrote: > > But it still leaves me confused. Before I had this code: > > struct regs { > uint32_t status; > ... > } > > ... > > struct regs *regs = ioremap(...); > uint32_t status = regs->status; > ... > > So now I should do it like this: > > #define REG_STATUS 0 > > ... > > void __iomem *regs = ioremap(...); > uint32_t status = readl(regs + REG_STATUS); No, you can certainly continue to use non-void pointers. The "void __iomem *" case is just the typeless one, exactly equivalent to regular void pointer usage. So let me clarify my original post with two points: - if your device only supports MMIO, you might as well just use the old interfaces. The new interface will _also_ work, but there is no real advantage, unless you count the "pci_iomap()" as a simpler interface. The new interface is really only meaningful for things that want to support _both_ PIO and MMIO. It's also, perhaps, a bit syntactically easier to work with, so some people might prefer that for that reason. See my comments further down on the auto-sizing. BUT it doesn't make the old interfaces go away by any means, and I'm not even suggesting that people should re-write drivers just for the hell of it. In short: if you don't go "ooh, that will simplify XXX", then you should just ignore the new interfaces. - you can _absolutely_ use other pointers than "void *". You should annotate them with "__iomem" if you want to be sparse-clean (and it often helps visually to clarify the issue), but gcc won't care, the "__iomem" annotation is purely a extended check. So you can absolutely still continue with struct mydev_iolayout { __u32 status; __u32 irqmask; ... struct mydev_iolayout __iomem *regs = pci_map(...); status = ioread32(®s.status); which is often a lot more readable, and thus in fact _preferred_. It also adds another level of type-checking, so I applaud drivers that do this. Now, I'm _contemplating_ also allowing the "get_user()" kind of "unsized" access function for the new interface. Right now all the old (and the new) access functions are all explicitly sized. But for the "struct layout" case, it's actually often nice to just say status = ioread(®s.status); and the compiler can certainly figure out the size of the register on its own. This was impossible with the old interface, because the old interfaces didn't even take a _pointer_, much less one that could be sized up automatically. (The auto-sizing is something that "get_user()" and "put_user()" have always done, and it makes them very easy to use. It involved a few pretty ugly macros, but hey, that's all hidden away, and is actually pretty simple in the end). Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:57 ` Linus Torvalds @ 2004-09-15 18:06 ` Linus Torvalds 2004-09-15 19:34 ` Greg KH 2004-09-16 14:58 ` Horst von Brand 2 siblings, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 18:06 UTC (permalink / raw) To: Jörn Engel; +Cc: Kernel Mailing List On Wed, 15 Sep 2004, Linus Torvalds wrote: > > In short: if you don't go "ooh, that will simplify XXX", then you > should just ignore the new interfaces. Btw, to get an example of what _is_ simplified, look at drivers/scsi/libata-core.c: void ata_tf_load(struct ata_port *ap, struct ata_taskfile *tf) { if (ap->flags & ATA_FLAG_MMIO) ata_tf_load_mmio(ap, tf); else ata_tf_load_pio(ap, tf); } and realize that "ata_tf_load_mmio()" and "ata_tf_load_pio()" are exactly the SAME FUNCTION. Except one uses MMIO, the other uses PIO. With the new setup, it literally collapses into one function, and code size goes down pretty dramatically. Not to mention making the code more readable. For another example of this (of the static kind), look at something like drivers/net/8139too.c: #ifdef USE_IO_OPS #define RTL_R8(reg) inb (((unsigned long)ioaddr) + (reg)) #define RTL_R16(reg) inw (((unsigned long)ioaddr) + (reg)) #define RTL_R32(reg) ((unsigned long) inl (((unsigned long)ioaddr) + (reg))) ... #else ... /* read MMIO register */ #define RTL_R8(reg) readb (ioaddr + (reg)) #define RTL_R16(reg) readw (ioaddr + (reg)) #define RTL_R32(reg) ((unsigned long) readl (ioaddr + (reg))) see? In this case, USE_IO_OPS depends on a static configuration variable, namely CONFIG_8139TOO_PIO. So the user at _compile_ time has to decide whether he wants to use MMIO or PIO. See the Kconfig help file: This instructs the driver to use programmed I/O ports (PIO) instead of PCI shared memory (MMIO). This can possibly solve some problems in case your mainboard has memory consistency issues. If unsure, say N. In other words, the new interface is not designed to replace the old ones, it's designed to help drivers like these, that either go to a lot of extra pain in order to support both methods, or then have a _static_ config option that makes it really hard for system vendors to just release one driver that knows when it needs to use PIO and when it needs MMIO. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:57 ` Linus Torvalds 2004-09-15 18:06 ` Linus Torvalds @ 2004-09-15 19:34 ` Greg KH 2004-09-15 19:53 ` Linus Torvalds 2004-09-16 14:58 ` Horst von Brand 2 siblings, 1 reply; 34+ messages in thread From: Greg KH @ 2004-09-15 19:34 UTC (permalink / raw) To: Linus Torvalds; +Cc: J?rn Engel, Kernel Mailing List On Wed, Sep 15, 2004 at 10:57:25AM -0700, Linus Torvalds wrote: > > So you can absolutely still continue with > > struct mydev_iolayout { > __u32 status; > __u32 irqmask; > ... > > struct mydev_iolayout __iomem *regs = pci_map(...); > status = ioread32(®s.status); > > which is often a lot more readable, and thus in fact _preferred_. It also > adds another level of type-checking, so I applaud drivers that do this. Currently a few drivers do: status = readl(®s.status); which causes sparse warnings. How should that code be changed to prevent this? Convert them all to ioread32()? Or figure out a way to supress the warning for readl()? thanks, greg k-h ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 19:34 ` Greg KH @ 2004-09-15 19:53 ` Linus Torvalds 2004-09-15 20:06 ` Greg KH 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 19:53 UTC (permalink / raw) To: Greg KH; +Cc: J?rn Engel, Kernel Mailing List On Wed, 15 Sep 2004, Greg KH wrote: > > Currently a few drivers do: > status = readl(®s.status); I assume that's "®s->status", since regs had better be a pointer.. > which causes sparse warnings. > > How should that code be changed to prevent this? Convert them all to > ioread32()? Or figure out a way to supress the warning for readl()? Just make sure that you annotate "regs" as a pointer to IO space. Ie if you have struct xxxx __iomem *regs; then everything will be fine - sparse knows that "regs" is a IO pointer, and that obviously makes "®s->member" _also_ an IO pointer. You'll need that __iomem annotation anyway, since otherwise sparse would complain when you do the assignment from the "ioremap()" call (and the thing had better come from ioremap() some way, or it's not valid in the first place to do "readl()" on it). Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 19:53 ` Linus Torvalds @ 2004-09-15 20:06 ` Greg KH 0 siblings, 0 replies; 34+ messages in thread From: Greg KH @ 2004-09-15 20:06 UTC (permalink / raw) To: Linus Torvalds; +Cc: J?rn Engel, Kernel Mailing List On Wed, Sep 15, 2004 at 12:53:51PM -0700, Linus Torvalds wrote: > > > On Wed, 15 Sep 2004, Greg KH wrote: > > > > Currently a few drivers do: > > status = readl(®s.status); > > I assume that's "®s->status", since regs had better be a pointer.. Yes, sorry. > > which causes sparse warnings. > > > > How should that code be changed to prevent this? Convert them all to > > ioread32()? Or figure out a way to supress the warning for readl()? > > Just make sure that you annotate "regs" as a pointer to IO space. Ah, ok, that works. Thanks for the clarification, I now realize that __iomem can be a marker for any type of pointer, not just a void. That's where I was confused. thanks, greg k-h ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:57 ` Linus Torvalds 2004-09-15 18:06 ` Linus Torvalds 2004-09-15 19:34 ` Greg KH @ 2004-09-16 14:58 ` Horst von Brand 2 siblings, 0 replies; 34+ messages in thread From: Horst von Brand @ 2004-09-16 14:58 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jörn Engel, Kernel Mailing List [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1: Type: text/plain, Size: 762 bytes --] Linus Torvalds <torvalds@osdl.org> said: > On Wed, 15 Sep 2004, Jörn Engel wrote: > > > > But it still leaves me confused. Before I had this code: [...] > No, you can certainly continue to use non-void pointers. The "void __iomem > *" case is just the typeless one, exactly equivalent to regular void > pointer usage. > > So let me clarify my original post with two points: [...] Could you please put the clarified message into Documentation? It would be a shame if it got lost. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:07 ` Linus Torvalds 2004-09-15 17:32 ` Jörn Engel @ 2004-09-15 17:45 ` Nikita Danilov 2004-09-15 18:09 ` Linus Torvalds 2004-09-15 18:40 ` Chris Wedgwood 2 siblings, 1 reply; 34+ messages in thread From: Nikita Danilov @ 2004-09-15 17:45 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jörn Engel, Kernel Mailing List Linus Torvalds writes: > > > On Wed, 15 Sep 2004, JЖrn Engel wrote: > > > > C now supports pointer arithmetic with void*? > > C doesn't. gcc does. It's a documented extension, and it acts like if it > was done on a byte. > > See gcc's user guide "Extension to the C Language Family". > > It's a singularly good feature. Unfortunately it breaks even better identity foo *p; p + nr == (foo *)((char *)p + nr * sizeof *p) unless one thinks that is --together with gcc-- that nothing occupies exactly one byte. > > Linus Nikita. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:45 ` Nikita Danilov @ 2004-09-15 18:09 ` Linus Torvalds 0 siblings, 0 replies; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 18:09 UTC (permalink / raw) To: Nikita Danilov; +Cc: Jörn Engel, Kernel Mailing List On Wed, 15 Sep 2004, Nikita Danilov wrote: > > Unfortunately it breaks even better identity > > foo *p; > > p + nr == (foo *)((char *)p + nr * sizeof *p) No, gcc allows the above, by making sizeof(void) be 1. And sane compilers would just inform the user at compile-time with a nice readable message that he's doing something stupid. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:07 ` Linus Torvalds 2004-09-15 17:32 ` Jörn Engel 2004-09-15 17:45 ` Nikita Danilov @ 2004-09-15 18:40 ` Chris Wedgwood 2 siblings, 0 replies; 34+ messages in thread From: Chris Wedgwood @ 2004-09-15 18:40 UTC (permalink / raw) To: Linus Torvalds; +Cc: Jörn Engel, Kernel Mailing List On Wed, Sep 15, 2004 at 10:07:29AM -0700, Linus Torvalds wrote: > C doesn't. gcc does. It's a documented extension, and it acts like > if it was done on a byte. [...] > It's a singularly good feature. I dunno about that. Maybe it is, but it has some gotchas. Recently when doing a sparsification of code I noticed there are places which essentially do things like: void *foo; [...] foo += bar * n; Part of the fix (cleanup) was to change the 'void *foo' to 'gratuitous_typedef_t __user *foo' --- which silently breaks the math if you don't explicitly check for this. --cw ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:54 ` Jörn Engel 2004-09-15 17:07 ` Linus Torvalds @ 2004-09-15 17:07 ` Jeff Garzik 2004-09-15 17:16 ` Roland Dreier ` (2 subsequent siblings) 4 siblings, 0 replies; 34+ messages in thread From: Jeff Garzik @ 2004-09-15 17:07 UTC (permalink / raw) To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List On Wed, Sep 15, 2004 at 06:54:50PM +0200, Jörn Engel wrote: > On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote: > > > > For example, if you don't know (or, more importantly - don't care) what > > kind of IO interface you use, you can now do something like > > > > void __iomem * map = pci_iomap(dev, bar, maxbytes); > > ... > > status = ioread32(map + DRIVER_STATUS_OFFSET); > > C now supports pointer arithmetic with void*? I hope the width of a > void is not architecture dependent, that would introduce more subtle > bugs. gcc extension, which has been used in the kernel for ages. Jeff ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:54 ` Jörn Engel 2004-09-15 17:07 ` Linus Torvalds 2004-09-15 17:07 ` Jeff Garzik @ 2004-09-15 17:16 ` Roland Dreier 2004-09-15 17:39 ` Linus Torvalds 2004-09-15 17:36 ` Horst von Brand 2004-09-15 17:40 ` Brian Gerst 4 siblings, 1 reply; 34+ messages in thread From: Roland Dreier @ 2004-09-15 17:16 UTC (permalink / raw) To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List Jörn> C now supports pointer arithmetic with void*? I hope the Jörn> width of a void is not architecture dependent, that would Jörn> introduce more subtle bugs. Pointer arithmetic on void * has been a gcc extension since forever (gcc acts as though sizeof (void) == 1). However, I somewhat agree -- it's ugly for drivers rely on this and do arithmetic on void *. It should be OK for a driver to use char __iomem * for its IO base if it needs to add in offsets, right? - R. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:16 ` Roland Dreier @ 2004-09-15 17:39 ` Linus Torvalds 2004-09-15 20:07 ` Russell King 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 17:39 UTC (permalink / raw) To: Roland Dreier; +Cc: Jörn Engel, Kernel Mailing List On Wed, 15 Sep 2004, Roland Dreier wrote: > > However, I somewhat agree -- it's ugly for drivers rely on this and do > arithmetic on void *. It should be OK for a driver to use > char __iomem * for its IO base if it needs to add in offsets, right? "char __iomem *" will certainly work - all the normal pointer conversions are ok. Some people in fact use pointers to structures in MMIO space, and this is quite reasonable when working with a chip that uses "mailboxes" for commands. However, I disagree with "void *" arithmetic being ugly. It's a very nice feature to have a pointer that can be validly cast to any other type, and that is the whole _point_ of "void *". The fact that C++ got that wrong is arguably the worst failing of the language, causing tons of unnecessary casts that can silently hide real bugs (maybe the thing you cast wasn't a "void *" in the first place, but you'll never know - the compiler will do the cast for you). For example, to go back to the mailbox example, let's say that your hardware has an IO area that is 8kB in size, with the last 4kB being mailboxes. The _sane_ way to do that is to do void __iomem *base_io = ioremap(...); struct mailbox __iomem *mbox = base_io + MAILBOX_OFFSET; and then just work on that. In contrast, havign to cast to a "char *" in order to do arithmetic, and then casting back to the resultant structure type pointer is not only ugly and unreadable, it's a lot more prone to errors as a result. In other words, think of "void *" as a pointer to storage. Not "char" (which is the C name for a signed byte), but really, it's the pointer to whatever underlying memory there is. And a _fundamental_ part of such memory is the fact that it is addressable. Thus "pointer to storage arithmetic" really does make sense on a very fundamental level. It has nothing to do with C types, and that also explains why "void *" silently converts to anything else. It's a very internally consistent world-view. Now, I disagree with gcc when it comes to actually taking the "size" of void. Gcc will silently accept void *x; x = malloc(sizeof(*x)); which I consider to be an abomination (and the above _can_ happen, quite easily, as part of macros for doing allocation etc - nobody would write it in that form, but if you have an "MEMALLOC(x)" macro that does the sizeof, you could end up trying to feed the compiler bogus code). The fact that you can do arithmetic on typeless storage does _not_ imply that typeless storage would have a "size" in my book. So sparse will say: warning: cannot size expression and refuse to look at broken code like the above. But hey, the fact that I have better taste than anybody else in the universe is just something I have to live with. It's not easy being me. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 17:39 ` Linus Torvalds @ 2004-09-15 20:07 ` Russell King 0 siblings, 0 replies; 34+ messages in thread From: Russell King @ 2004-09-15 20:07 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roland Dreier, Jörn Engel, Kernel Mailing List On Wed, Sep 15, 2004 at 10:39:02AM -0700, Linus Torvalds wrote: > In other words, think of "void *" as a pointer to storage. Not "char" > (which is the C name for a signed byte), Common Programming Error #99: "char" is implementation whether it is signed or may be unsigned. Only a "char" type qualified by "signed" or "unsigned" can be relied upon to have the requested property. -- Russell King Linux kernel 2.6 ARM Linux - http://www.arm.linux.org.uk/ maintainer of: 2.6 PCMCIA - http://pcmcia.arm.linux.org.uk/ 2.6 Serial core ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:54 ` Jörn Engel ` (2 preceding siblings ...) 2004-09-15 17:16 ` Roland Dreier @ 2004-09-15 17:36 ` Horst von Brand 2004-09-15 17:40 ` Brian Gerst 4 siblings, 0 replies; 34+ messages in thread From: Horst von Brand @ 2004-09-15 17:36 UTC (permalink / raw) To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List =?iso-8859-1?Q?J=F6rn?= Engel <joern@wohnheim.fh-wedel.de> said: > On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote: [...] > > For example, if you don't know (or, more importantly - don't care) what > > kind of IO interface you use, you can now do something like > > > > void __iomem * map = pci_iomap(dev, bar, maxbytes); > > ... > > status = ioread32(map + DRIVER_STATUS_OFFSET); > C now supports pointer arithmetic with void*? It doesn't. It's a gcc-ism. > I hope the width of a > void is not architecture dependent, that would introduce more subtle > bugs. gcc takes it as a char pointer for such uses. -- Dr. Horst H. von Brand User #22616 counter.li.org Departamento de Informatica Fono: +56 32 654431 Universidad Tecnica Federico Santa Maria +56 32 654239 Casilla 110-V, Valparaiso, Chile Fax: +56 32 797513 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:54 ` Jörn Engel ` (3 preceding siblings ...) 2004-09-15 17:36 ` Horst von Brand @ 2004-09-15 17:40 ` Brian Gerst 4 siblings, 0 replies; 34+ messages in thread From: Brian Gerst @ 2004-09-15 17:40 UTC (permalink / raw) To: Jörn Engel; +Cc: Linus Torvalds, Kernel Mailing List Jörn Engel wrote: > On Wed, 15 September 2004 09:30:42 -0700, Linus Torvalds wrote: > >>For example, if you don't know (or, more importantly - don't care) what >>kind of IO interface you use, you can now do something like >> >> void __iomem * map = pci_iomap(dev, bar, maxbytes); >> ... >> status = ioread32(map + DRIVER_STATUS_OFFSET); > > > C now supports pointer arithmetic with void*? I hope the width of a > void is not architecture dependent, that would introduce more subtle > bugs. > > Jörn > http://gcc.gnu.org/onlinedocs/gcc-3.4.2/gcc/Pointer-Arith.html#Pointer-Arith 5.18 Arithmetic on void- and Function-Pointers In GNU C, addition and subtraction operations are supported on pointers to void and on pointers to functions. This is done by treating the size of a void or of a function as 1. A consequence of this is that sizeof is also allowed on void and on function types, and returns 1. The option -Wpointer-arith requests a warning if these extensions are used. -- Brian Gerst ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:30 ` Being more anal about iospace accesses Linus Torvalds 2004-09-15 16:54 ` Jörn Engel @ 2004-09-15 16:56 ` Dave Jones 2004-09-15 17:19 ` Roger Luethi ` (3 subsequent siblings) 5 siblings, 0 replies; 34+ messages in thread From: Dave Jones @ 2004-09-15 16:56 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, Sep 15, 2004 at 09:30:42AM -0700, Linus Torvalds wrote: > So right now the current snapshots (and 2.6.9-rc2) have this enabled, and > some drivers will be _very_ noisy when compiled. Most of the regular ones > are fine, so maybe people haven't even noticed it that much, but some of > them were using things like "u32" to store MMIO pointers, and are > generally extremely broken on anything but an x86. We'll hopefully get > around to fixing them up eventually, but in the meantime this should at > least explain the background for some of the new noise people may see. For the curious, 6MB of sparse output is generated from a make allmodconfig right now. (http://www.codemonkey.org.uk/junk/2.6.9-rc2-warnings.txt) You can filter out just the __iomem warnings by grepping for asn:2 Dave ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:30 ` Being more anal about iospace accesses Linus Torvalds 2004-09-15 16:54 ` Jörn Engel 2004-09-15 16:56 ` Dave Jones @ 2004-09-15 17:19 ` Roger Luethi 2004-09-15 17:23 ` Richard B. Johnson ` (2 subsequent siblings) 5 siblings, 0 replies; 34+ messages in thread From: Roger Luethi @ 2004-09-15 17:19 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, 15 Sep 2004 09:30:42 -0700, Linus Torvalds wrote: > This is a background mail mainly for driver writers and/or architecture Thanks, I appreciate the effort (speaking as the maintainer of an "even more interesting" driver). Roger ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:30 ` Being more anal about iospace accesses Linus Torvalds ` (2 preceding siblings ...) 2004-09-15 17:19 ` Roger Luethi @ 2004-09-15 17:23 ` Richard B. Johnson 2004-09-15 22:21 ` Deepak Saxena 2004-09-15 22:29 ` Roland Dreier 5 siblings, 0 replies; 34+ messages in thread From: Richard B. Johnson @ 2004-09-15 17:23 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Wed, 15 Sep 2004, Linus Torvalds wrote: > > This is a background mail mainly for driver writers and/or architecture > people. Or others that are just interested in really low-level hw access > details. Others - please feel free to ignore. > [SNIPPED mostly....] > For example, if you don't know (or, more importantly - don't care) what > kind of IO interface you use, you can now do something like > > void __iomem * map = pci_iomap(dev, bar, maxbytes); > ... > status = ioread32(map + DRIVER_STATUS_OFFSET); ^^^^^^^^^^^^^^^^ Doesn't this rely on the non-standard GNUism that you can perform pointer-arithmetic on a void-pointer? Which is illegal, immoral, and fattening. I'd much rather see char-pointers so it's valid to perform the offset math. That way, in the future, a new tool that follows (ANSI, IEEE, POSIX) rules doesn't barf. I suggest a new pointer type like (BASE *) or (BAR *) that hides the (unsigned char *) necessary to not barf, plus minimize side-effects. Cheers, Dick Johnson Penguin : Linux version 2.4.26 on an i686 machine (5570.56 BogoMips). Note 96.31% of all statistics are fiction. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:30 ` Being more anal about iospace accesses Linus Torvalds ` (3 preceding siblings ...) 2004-09-15 17:23 ` Richard B. Johnson @ 2004-09-15 22:21 ` Deepak Saxena 2004-09-15 22:46 ` Linus Torvalds 2004-09-15 22:29 ` Roland Dreier 5 siblings, 1 reply; 34+ messages in thread From: Deepak Saxena @ 2004-09-15 22:21 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List On Sep 15 2004, at 09:30, Linus Torvalds was caught saying: > At the same time, we've had the proper "accessor" functions (read[bwl](), > write[bwl]() and friends) that on purpose dropped all type information > from the MMIO pointer, mostly just because of historical reasons, and as a > result some drivers didn't use a pointer at all, but some kind of integer. > Sometimes even one that couldn't _fit_ a MMIO address in it on a 64-bit > machine. Linus, Since we are on the subject of io-access, I would like a clarification/opinion on the read*/write* & in*/out* accessors (and now the ioread/write equivalents). Are these functions only meant to be used for PCI memory-mapped devices or _any_ memory mapped devices? Same with ioremap(). I ask because there are bits of code in the kernel that use these on non-PCI devices and this sometimes causes some complication in platform-level code. For example, b/c of the way PCI access work on the IXP4xx (indirect access via register read/writes), we have to do the following to differentiate b/w PCI and non-PCI devices (include/asm-arm/arch-ixp4xx/io.h): static inline void * __ixp4xx_ioremap(unsigned long addr, size_t size, unsigned long flags, unsigned long align) { extern void * __ioremap(unsigned long, size_t, unsigned long, unsigned long); if((addr < 0x48000000) || (addr > 0x4fffffff)) return __ioremap(addr, size, flags, align); return (void *)addr; } static inline void __ixp4xx_writeb(u8 value, u32 addr) { u32 n, byte_enables, data; if (addr >= VMALLOC_START) { __raw_writeb(value, addr); return; } n = addr % 4; byte_enables = (0xf & ~BIT(n)) << IXP4XX_PCI_NP_CBE_BESL; data = value << (8*n); ixp4xx_pci_write(addr, byte_enables | NP_CMD_MEMWRITE, data); } #define writeb(p, v) __ixp4xx_writeb(p, v) (0x48000000 -> 0x4fffffff is the PCI memory window on this CPU). While this is not a huge level of uglyness, I have systems where this is going to get much uglier b/c we have overlapping addresses on different buses and we need to be able to differentiate accesses It raises the question of whether we need a different interface for non-PCI devices, if we should be passing a struct device into all the I/O accessors functions to make it easier for platform code to determine what to do, or if we should make I/O access functions a property of devices. So instead of doing read*/write/in*/out*, we would do either: a) pass device into io-access routines: cookie = iomap(dev, foo, len); bar = read32(dev, cookie + offset); or b) make access routines a function of the devices themselves cookie = dev->iomap(foo, len); bar = dev->read32(cookie + REG_OFFSET); The former is nicer b/c it allows the dev to be ignored on systems where we do not care about PCI vs non-PCI devices. Comments? ~Deepak -- Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/ "Unlike me, many of you have accepted the situation of your imprisonment and will die here like rotten cabbages." - Number 6 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 22:21 ` Deepak Saxena @ 2004-09-15 22:46 ` Linus Torvalds 2004-09-15 23:09 ` Deepak Saxena 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 22:46 UTC (permalink / raw) To: Deepak Saxena; +Cc: Kernel Mailing List On Wed, 15 Sep 2004, Deepak Saxena wrote: > > Since we are on the subject of io-access, I would like a > clarification/opinion on the read*/write* & in*/out* accessors > (and now the ioread/write equivalents). Are these functions only meant > to be used for PCI memory-mapped devices or _any_ memory mapped devices? It really depends on the bus architecture. At some point, if the bus is different enough from a "normal" setup, you should just use your own accessor functions. Trying to overload "readl/writel" is just too painful. However, at that point you should also realize that you can't re-use _any_ of the existing chip drivers, and you'll have to write your own. If the bus is exotic enough, that's not a problem, and you'd have to do that anyway. But there really aren't all that many "exotic" buses around any more. Quite frankly, of your two suggested interfaces, I would select neither. I'd just say that if your bus is special enough, just write your own drivers, and use cookie = ixp4xx_iomap(dev, xx); ... ixp4xx_iowrite(val, cookie + offset); which is perfectly valid. You don't have to make these devices even _look_ like a PCI device. Why should you? Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 22:46 ` Linus Torvalds @ 2004-09-15 23:09 ` Deepak Saxena 2004-09-16 12:51 ` Geert Uytterhoeven 0 siblings, 1 reply; 34+ messages in thread From: Deepak Saxena @ 2004-09-15 23:09 UTC (permalink / raw) To: Linus Torvalds; +Cc: linux-kernel On Sep 15 2004, at 15:46, Linus Torvalds was caught saying: > Quite frankly, of your two suggested interfaces, I would select neither. > I'd just say that if your bus is special enough, just write your own > drivers, and use > > cookie = ixp4xx_iomap(dev, xx); > ... > ixp4xx_iowrite(val, cookie + offset); > > which is perfectly valid. You don't have to make these devices even _look_ > like a PCI device. Why should you? Problem is that some of those devices are not that special. For example, the on-board 16550 is accessed using readb/writeb in the 8250.c driver. I don't think we want to add that level of low-level detail to that driver and instead should just hide it in the platform code. I look at it from the point of view that the driver should not care about how the access actually occurs on the bus. It just says, write data foo at location bar regardless of whether bar is ISA, PCI, on-chip, RapidIO, etc and that writing of the data is hidden in the implementation of the accessor API. ~Deepak -- Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/ "Unlike me, many of you have accepted the situation of your imprisonment and will die here like rotten cabbages." - Number 6 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 23:09 ` Deepak Saxena @ 2004-09-16 12:51 ` Geert Uytterhoeven 2004-09-16 22:10 ` Deepak Saxena 0 siblings, 1 reply; 34+ messages in thread From: Geert Uytterhoeven @ 2004-09-16 12:51 UTC (permalink / raw) To: Deepak Saxena; +Cc: Linus Torvalds, Linux Kernel Development On Wed, 15 Sep 2004, Deepak Saxena wrote: > On Sep 15 2004, at 15:46, Linus Torvalds was caught saying: > > Quite frankly, of your two suggested interfaces, I would select neither. > > I'd just say that if your bus is special enough, just write your own > > drivers, and use > > > > cookie = ixp4xx_iomap(dev, xx); > > ... > > ixp4xx_iowrite(val, cookie + offset); > > > > which is perfectly valid. You don't have to make these devices even _look_ > > like a PCI device. Why should you? > > Problem is that some of those devices are not that special. For example, > the on-board 16550 is accessed using readb/writeb in the 8250.c driver. Just what I was thinking... > I don't think we want to add that level of low-level detail to that > driver and instead should just hide it in the platform code. I look > at it from the point of view that the driver should not care about how > the access actually occurs on the bus. It just says, write data foo at > location bar regardless of whether bar is ISA, PCI, on-chip, RapidIO, > etc and that writing of the data is hidden in the implementation of > the accessor API. While 16550 serial is a bad example since it does byte accesses only (and thus doesn't suffer from endianness), I have no problem to imagine there exist platforms where you have multiple instances of a `standard' (cfr. 16550 serial) device block, while each of them must be accessed differently: - one of them is in PCI I/O space (little endian) - one of them is in PCI MMIO space (little endian) - one of them is on-chip MMIO (big endian) - one of them is somewhere else, but sparsely addressed (some bytes of padding between each register) and we can for sure come up with a few more examples of weird addressing. How to solve this, without cluttering each ioread*() with many if clauses? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-16 12:51 ` Geert Uytterhoeven @ 2004-09-16 22:10 ` Deepak Saxena 0 siblings, 0 replies; 34+ messages in thread From: Deepak Saxena @ 2004-09-16 22:10 UTC (permalink / raw) To: Geert Uytterhoeven; +Cc: Linus Torvalds, Linux Kernel Development On Sep 16 2004, at 14:51, Geert Uytterhoeven was caught saying: > While 16550 serial is a bad example since it does byte accesses only (and thus > doesn't suffer from endianness), I have no problem to imagine there exist > platforms where you have multiple instances of a `standard' (cfr. 16550 serial) No longer true. We now have UPIO_IOMEM32 for some of the fscked up HW that large silicon manufacturer has released with 32-bit wide registers that must be accessed as full 32-bits. > device block, while each of them must be accessed differently: > - one of them is in PCI I/O space (little endian) > - one of them is in PCI MMIO space (little endian) > - one of them is on-chip MMIO (big endian) > - one of them is somewhere else, but sparsely addressed (some bytes of > padding between each register) > and we can for sure come up with a few more examples of weird addressing. > > How to solve this, without cluttering each ioread*() with many if clauses? If clauses will still be needed, the only difference is that instead of basing them on hardcoded addresses we now base them on the devices coming in. ~Deepak -- Deepak Saxena - dsaxena at plexity dot net - http://www.plexity.net/ "Unlike me, many of you have accepted the situation of your imprisonment and will die here like rotten cabbages." - Number 6 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more anal about iospace accesses.. 2004-09-15 16:30 ` Being more anal about iospace accesses Linus Torvalds ` (4 preceding siblings ...) 2004-09-15 22:21 ` Deepak Saxena @ 2004-09-15 22:29 ` Roland Dreier 2004-09-15 23:26 ` Being more careful " Linus Torvalds 5 siblings, 1 reply; 34+ messages in thread From: Roland Dreier @ 2004-09-15 22:29 UTC (permalink / raw) To: Linus Torvalds; +Cc: Kernel Mailing List Linus, while we're on the subject of new sparse checks, could you give a quick recap of the semantics of the new __leXX types (and what __bitwise means to sparse)? I don't think I've ever seen this stuff described on LKML. Thanks, Roland ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more careful about iospace accesses.. 2004-09-15 22:29 ` Roland Dreier @ 2004-09-15 23:26 ` Linus Torvalds 2004-09-16 0:10 ` viro 2004-09-16 12:25 ` David Woodhouse 0 siblings, 2 replies; 34+ messages in thread From: Linus Torvalds @ 2004-09-15 23:26 UTC (permalink / raw) To: Roland Dreier, Al Viro; +Cc: Kernel Mailing List [ Subject line changed to avoid getting caught as spam ;] On Wed, 15 Sep 2004, Roland Dreier wrote: > > Linus, while we're on the subject of new sparse checks, could you give > a quick recap of the semantics of the new __leXX types (and what > __bitwise means to sparse)? I don't think I've ever seen this stuff > described on LKML. [ The bitwise checks are actually by Al Viro, but I'll explain the basic idea. Al is Cc'd so that he can add any corrections or extensions. ] Sparse allows a number of extra type qualifiers, including address spaces and various random extra restrictions on what you can do with them. There are "context" bits that allow you to use a symbol or type only in certain contexts, for example, and there are type qualifiers like "noderef" that just say that a pointer cannot be dereferenced (it looks _exactly_ like a pointer in all other respects, but trying to actually access anything through it will cause a sparse warning). The "bitwise" attribute is very much like the "noderef" one, in that it restricts how you can use an expression of that type. Unlike "noderef", it's designed for integer types, though. In fact, sparse will refuse to apply the bitwise attribute to non-integer types. As the name suggests, a "bitwise" expression is one that is restricted to only a certain "bitwise" operations that make sense within that class. In particular, you can't mix a "bitwise" class with a normal integer expression (the constant zero happens to be special, since it's "safe" for all bitwise ops), and in fact you can't even mix it with _another_ bitwise expression of a different type. And when I say "different", I mean even _slightly_ different. Each typedef creates a type of it's own, and will thus create a bitwise type that is not compatible with anything else. So if you declare int __bitwise i; int __bitwise j; the two variables "i" and "j" are _not_ compatible, simply because they were declared separately, while in the case of int __bitwise i, j; they _are_ compatible. The above is a horribly contrieved example, as it shows an extreme case that doesn't make much sense, but it shows how "bitwise" always creates its own new "class". Normally you'd always use "__bitwise" in a typedef, which effectively makes that particular typedef one single "bitwise class". After that, you can obviously declare any number of variables in that class. Now apart from the classes having to match, "bitwise" as it's name suggests, also restricts all operations within that class to a subset of "bit-safe" operations. For example, addition isn't "bit-safe", since clearly the carry-chain moves bits around. But you can do normal bit-wise operations, and you can compare the values against other values in the same class, since those are all "bit-safe". Oh, as an example of something that isn't obviously bit-safe: look out for things like bit negation: doing a ~ is ok on an bitwise "int" type, but it is _not_ ok on a bitwise "short" or "char". Why? Because on a bitwise "int" you actually stay within the type. But doing the same thing on a short or char will move "outside" the type by virtue of setting the high bits (normal C semantics: a short gets promoted to an "int", so doign a bitwise negation on a short will actually set the high bits). So as far as sparse is concerned, a "bitwise" type is not really so much about endianness as it is about making sure bits are never lost or moved around. For example, you can use the bitwise operation to verify the __GFP_XXX mask bits. Right now they are just regular integers, which means that you can write kmalloc(GFP_KERNEL, size); and the compiler will not notice anything wrong. But something is _seriously_ wrong: the GFP_KERNEL should be the _second_ argument. If we mark it to be a "bitwise" type (which it is), that bug would have been noticed immediately, and you could still do all the operations that are valid of GFP_xxx values. See the usage? In the byte-order case, what we have is: typedef __u16 __bitwise __le16; typedef __u16 __bitwise __be16; typedef __u32 __bitwise __le32; typedef __u32 __bitwise __be32; typedef __u64 __bitwise __le64; typedef __u64 __bitwise __be64; and if you think about the above rules about what is acceptable for bitwise types, you'll likely immediately notivce that it automatically means - you can never assign a __le16 type to any other integer type or any other bitwise type. You'd get a warnign about incompatible types. Makes sense, no? - you can only do operations that are safe within that byte order. For example, it is safe to do a bitwise "&" on two __le16 values. Clearly the result is meaningful. - if you want to go outside that bitwise type, you have to convert it properly first. For example, if you want to add a constant to a __le16 type, you can do so, but you have to use the proper sequence: __le16 sum, a, b; sum = a + b; /* INVALID! "warning: incompatible types for operation (+)" */ sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */ See? In short, "bitwise" is about more than just byte-order, but the semantics of bitwise-restricted ops happen to be the semantics that are valid for byte-order operations too. Oh, btw, right now you only get the warnings from sparse if you use "-Wbitwise" on the command line. Without that, sparse will ignore the bitwise attribute. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more careful about iospace accesses.. 2004-09-15 23:26 ` Being more careful " Linus Torvalds @ 2004-09-16 0:10 ` viro 2004-09-16 11:40 ` David Woodhouse 2004-09-16 12:25 ` David Woodhouse 1 sibling, 1 reply; 34+ messages in thread From: viro @ 2004-09-16 0:10 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roland Dreier, Kernel Mailing List On Wed, Sep 15, 2004 at 04:26:12PM -0700, Linus Torvalds wrote: > other bitwise type. You'd get a warnign about incompatible types. Makes > sense, no? > - you can only do operations that are safe within that byte order. For > example, it is safe to do a bitwise "&" on two __le16 values. Clearly > the result is meaningful. BTW, so far the most frequent class of endianness bugs had been along the lines of foo->le16_field = cpu_to_le32(12); and vice versa. On big-endian it's a guaranteed FUBAR - think carefully about the value that will end up there. > Oh, btw, right now you only get the warnings from sparse if you use > "-Wbitwise" on the command line. Without that, sparse will ignore the > bitwise attribute. We probably want __attribute__((opaque)) in addition to bitwise - e.g. for the handles of all sorts passed in network filesystem protocols. I'll look into that when we get the endianness warnings somewhat under control. For now I'm going to #define __opaque __bitwise and use it for stuff like typedef __u32 __opaque cifs_fid; etc. ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more careful about iospace accesses.. 2004-09-16 0:10 ` viro @ 2004-09-16 11:40 ` David Woodhouse 0 siblings, 0 replies; 34+ messages in thread From: David Woodhouse @ 2004-09-16 11:40 UTC (permalink / raw) To: viro; +Cc: Linus Torvalds, Roland Dreier, Kernel Mailing List On Thu, 2004-09-16 at 01:10 +0100, viro@parcelfarce.linux.theplanet.co.uk wrote: > On Wed, Sep 15, 2004 at 04:26:12PM -0700, Linus Torvalds wrote: > > > other bitwise type. You'd get a warnign about incompatible types. Makes > > sense, no? > > - you can only do operations that are safe within that byte order. For > > example, it is safe to do a bitwise "&" on two __le16 values. Clearly > > the result is meaningful. > > BTW, so far the most frequent class of endianness bugs had been along the > lines of > foo->le16_field = cpu_to_le32(12); > and vice versa. On big-endian it's a guaranteed FUBAR - think carefully about > the value that will end up there. Is that really more frequent than just 'foo->le16_field = 12' ? I'm surprised. Certainly, it was the frequency of pure assignment without _any_ attempt at byte-swapping which caused me to introduce the 'jint32_t' et al structures in jffs2, which even gcc then bitches about if you use them wrongly. I suppose I can ditch those now -- I always intended to after a while anyway. -- dwmw2 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more careful about iospace accesses.. 2004-09-15 23:26 ` Being more careful " Linus Torvalds 2004-09-16 0:10 ` viro @ 2004-09-16 12:25 ` David Woodhouse 2004-09-16 14:01 ` Linus Torvalds 1 sibling, 1 reply; 34+ messages in thread From: David Woodhouse @ 2004-09-16 12:25 UTC (permalink / raw) To: Linus Torvalds; +Cc: Roland Dreier, Al Viro, Kernel Mailing List On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote: > - if you want to go outside that bitwise type, you have to convert it > properly first. For example, if you want to add a constant to a __le16 > type, you can do so, but you have to use the proper sequence: > > __le16 sum, a, b; > > sum = a + b; /* INVALID! "warning: incompatible types for operation (+)" */ > sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */ > > See? Yeah right, that latter case is _so_ much more readable, and makes it _so_ easy for the compiler to optimise precisely when it wants to do the byte-swapping, especially if the back end has load-and-swap or store-and-swap instructions. :) It's even nicer when it ends up as: sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */ sum |= c; sum = cpu_to_le16(le16_to_cpu(sum) + le16_to_cpu(d)); I'd really quite like to see the real compiler know about endianness, too. I dare say I _could_ optimise the above (admittedly contrived but not _so_ unlikely) case, but I don't _want_ to hand-optimise my code -- that's what I keep a compiler _for_. -- dwmw2 ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more careful about iospace accesses.. 2004-09-16 12:25 ` David Woodhouse @ 2004-09-16 14:01 ` Linus Torvalds 2004-09-18 9:46 ` Kai Henningsen 0 siblings, 1 reply; 34+ messages in thread From: Linus Torvalds @ 2004-09-16 14:01 UTC (permalink / raw) To: David Woodhouse; +Cc: Roland Dreier, Al Viro, Kernel Mailing List On Thu, 16 Sep 2004, David Woodhouse wrote: > On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote: > > - if you want to go outside that bitwise type, you have to convert it > > properly first. For example, if you want to add a constant to a __le16 > > type, you can do so, but you have to use the proper sequence: > > > > __le16 sum, a, b; > > > > sum = a + b; /* INVALID! "warning: incompatible types for operation (+)" */ > > sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */ > > > > See? > > Yeah right, that latter case is _so_ much more readable It's not about readability. It's about the first case being WRONG! You can't add two values in the wrong byte-order. It's not an operation that makes sense. You _have_ to convert them to CPU byte order first. I certainly agree that the first version "looks nicer". > It's even nicer when it ends up as: > > sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */ > sum |= c; > sum = cpu_to_le16(le16_to_cpu(sum) + le16_to_cpu(d)); This is actually the strongest argument _against_ hiding endianness in the compiler, or hiding it behind macros. Make it very explicit, and just make sure there are tools (ie 'sparse') that can tell you when you do something wrong. Any programmer who sees the above will go "well that's stupid", and rewrite it as something saner instead. You can certainly rewrite it as cpu_sum = le16_to_cpu(a) + le16_to_cpu(b); cpu_sum |= le16_to_cpu(c); cpu_sum += le16_to_cpu(d); sum = cpu_to_le16(d); which gets rid of the double conversions. But if you hide the endianness in macro's, you'll never see the mess at all, and won't be able to fix it. > I'd really quite like to see the real compiler know about endianness, > too. I would have agreed with you some time ago. Having been bitten by too damn many bompiler bugs I'e become convinced that the compiler doing things behind your back to "help" you just isn't worth it. Not in a kernel, at least. It's much better to build up good typechecking and the infrastructure to help you get the job done. Expressions like the above might happen once or twice in a project with several million lines of code. It's just not worth compiler infrastructure for - that just makes people use it as if it is free, and impossible to find the bugs when they _do_ happen. Much better to have a type system that can warn about the bad uses, but that doesn't actually change any of the code generated.. Linus ^ permalink raw reply [flat|nested] 34+ messages in thread
* Re: Being more careful about iospace accesses.. 2004-09-16 14:01 ` Linus Torvalds @ 2004-09-18 9:46 ` Kai Henningsen 0 siblings, 0 replies; 34+ messages in thread From: Kai Henningsen @ 2004-09-18 9:46 UTC (permalink / raw) To: linux-kernel torvalds@osdl.org (Linus Torvalds) wrote on 16.09.04 in <Pine.LNX.4.58.0409160652460.2333@ppc970.osdl.org>: > On Thu, 16 Sep 2004, David Woodhouse wrote: > > > On Wed, 2004-09-15 at 16:26 -0700, Linus Torvalds wrote: > > > - if you want to go outside that bitwise type, you have to convert it > > > properly first. For example, if you want to add a constant to a > > > __le16 type, you can do so, but you have to use the proper sequence: > > > > > > __le16 sum, a, b; > > > > > > sum = a + b; /* INVALID! "warning: incompatible types for operation > > > (+)" */ sum = cpu_to_le16(le16_to_cpu(a) + le16_to_cpu(b)); /* Ok */ > > > > > > See? > > > > Yeah right, that latter case is _so_ much more readable > > It's not about readability. > > It's about the first case being WRONG! ... in general. And on the machines where it works (le machines), I'd certainly expect those conversion functions to be trivially eliminated by the compiler (ie, they're either trivial macros or trivial inline functions). > > I'd really quite like to see the real compiler know about endianness, > > too. Well, these day, optimizers often can recognize the usual endianness conversion idioms, so the compiler still gets a chance at inserting your load-with-swap or whatever. MfG Kai ^ permalink raw reply [flat|nested] 34+ messages in thread
end of thread, other threads:[~2004-09-18 13:06 UTC | newest] Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <Pine.LNX.4.58.0409081543320.5912@ppc970.osdl.org> [not found] ` <Pine.LNX.4.58.0409150737260.2333@ppc970.osdl.org> 2004-09-15 16:30 ` Being more anal about iospace accesses Linus Torvalds 2004-09-15 16:54 ` Jörn Engel 2004-09-15 17:07 ` Linus Torvalds 2004-09-15 17:32 ` Jörn Engel 2004-09-15 17:57 ` Linus Torvalds 2004-09-15 18:06 ` Linus Torvalds 2004-09-15 19:34 ` Greg KH 2004-09-15 19:53 ` Linus Torvalds 2004-09-15 20:06 ` Greg KH 2004-09-16 14:58 ` Horst von Brand 2004-09-15 17:45 ` Nikita Danilov 2004-09-15 18:09 ` Linus Torvalds 2004-09-15 18:40 ` Chris Wedgwood 2004-09-15 17:07 ` Jeff Garzik 2004-09-15 17:16 ` Roland Dreier 2004-09-15 17:39 ` Linus Torvalds 2004-09-15 20:07 ` Russell King 2004-09-15 17:36 ` Horst von Brand 2004-09-15 17:40 ` Brian Gerst 2004-09-15 16:56 ` Dave Jones 2004-09-15 17:19 ` Roger Luethi 2004-09-15 17:23 ` Richard B. Johnson 2004-09-15 22:21 ` Deepak Saxena 2004-09-15 22:46 ` Linus Torvalds 2004-09-15 23:09 ` Deepak Saxena 2004-09-16 12:51 ` Geert Uytterhoeven 2004-09-16 22:10 ` Deepak Saxena 2004-09-15 22:29 ` Roland Dreier 2004-09-15 23:26 ` Being more careful " Linus Torvalds 2004-09-16 0:10 ` viro 2004-09-16 11:40 ` David Woodhouse 2004-09-16 12:25 ` David Woodhouse 2004-09-16 14:01 ` Linus Torvalds 2004-09-18 9:46 ` Kai Henningsen
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).