linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Going beyond 256 PCI buses
@ 2001-06-13 10:02 Tom Gall
  2001-06-13 17:17 ` Albert D. Cahalan
                   ` (3 more replies)
  0 siblings, 4 replies; 44+ messages in thread
From: Tom Gall @ 2001-06-13 10:02 UTC (permalink / raw)
  To: linux-kernel

<Forgive if this is a dub... but the message I composed yesterday didn't
appear to be posted>

  Anyway, Hi All,

  I was wondering if there are any other folks out there like me who
have the 256 PCI bus limit looking at them straight in the face? If so,
it'd be nice to collaborate and come up with a more general solution
that would hopefully work towards the greater good.

  I live in ppc64 land which is a new arch that the linux kernel has
been ported to. The boxes we run on tend to be big.

  The box that I'm wrestling with, has a setup where each PHB has an
additional id, then each PHB can have up to 256 buses.  So when you are
talking to a device, the scheme is phbid, bus, dev etc etc. Pretty easy
really.

  I am getting for putting something like this into the kernel at large,
it would probably be best to have a CONFIG_GREATER_THAN_256_BUSES or
some such.

  Anyways, thoughts? opinions?

--
Hakuna Matata,

Tom

-----------------------------------------------------------
ppc64 Maintainer     IBM Linux Technology Center
"My heart is human, my blood is boiling, my brain IBM" -- Mr Roboto
tgall@rochcivictheatre.org
tom_gall@vnet.ibm.com



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-13 10:02 Going beyond 256 PCI buses Tom Gall
@ 2001-06-13 17:17 ` Albert D. Cahalan
  2001-06-13 18:29   ` Tom Gall
  2001-06-14 14:14 ` Jeff Garzik
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 44+ messages in thread
From: Albert D. Cahalan @ 2001-06-13 17:17 UTC (permalink / raw)
  To: Tom Gall; +Cc: linux-kernel

Tom Gall writes:

>   I was wondering if there are any other folks out there like me who
> have the 256 PCI bus limit looking at them straight in the face?

I might. The need to reserve bus numbers for hot-plug looks like
a quick way to waste all 256 bus numbers.

> each PHB has an
> additional id, then each PHB can have up to 256 buses.

Try not to think of him as a PHB with an extra id. Lots of people
have weird collections. If your boss wants to collect buses, well,
that's his business. Mine likes boats. It's not a big deal, really.

(Did you not mean your pointy-haired boss has mental problems?)



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-13 17:17 ` Albert D. Cahalan
@ 2001-06-13 18:29   ` Tom Gall
  0 siblings, 0 replies; 44+ messages in thread
From: Tom Gall @ 2001-06-13 18:29 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: linux-kernel

"Albert D. Cahalan" wrote:
> 
> Tom Gall writes:
> 
> >   I was wondering if there are any other folks out there like me who
> > have the 256 PCI bus limit looking at them straight in the face?
> 
> I might. The need to reserve bus numbers for hot-plug looks like
> a quick way to waste all 256 bus numbers.

Hi Albert,

  yeah I'll be worring about this one too in time. Two birds with one stone
wouldn't be a bad thing. Hopefully it doesn't translate into needing a
significantly larger stone. 

> > each PHB has an
> > additional id, then each PHB can have up to 256 buses.
> 
> Try not to think of him as a PHB with an extra id. Lots of people
> have weird collections. If your boss wants to collect buses, well,
> that's his business. Mine likes boats. It's not a big deal, really.

Heh.... PHB==Primary Host Bridge  ... but I'll be sure to pass the word onto my
PHB that there's a used greyhound sale... <bah-bum-bum-tshhh>

Anyway, it really is a new id, at least for the implementation on this box. So
PHB0 could have 256 buses, and PHB1 could have 10 buses, PHB2 could have ....
you get the idea.

Hot plug would still have the problem in that it'd have 256 bus numbers in the
namespace of a PHB to manage. Hot plug under a different PHB would have another
256 to play with.

Regards,

Tom

-- 
Tom Gall - PPC64 Maintainer      "Where's the ka-boom? There was
Linux Technology Center           supposed to be an earth
(w) tom_gall@vnet.ibm.com         shattering ka-boom!"
(w) 507-253-4558                 -- Marvin Martian
(h) tgall@rochcivictheatre.org
http://www.ibm.com/linux/ltc/projects/ppc

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-13 10:02 Going beyond 256 PCI buses Tom Gall
  2001-06-13 17:17 ` Albert D. Cahalan
@ 2001-06-14 14:14 ` Jeff Garzik
  2001-06-14 15:15   ` David S. Miller
  2001-06-14 17:59   ` Jonathan Lundell
  2001-06-14 14:24 ` David S. Miller
  2001-06-14 15:13 ` Jonathan Lundell
  3 siblings, 2 replies; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 14:14 UTC (permalink / raw)
  To: Tom Gall; +Cc: linux-kernel

Tom Gall wrote:
>   The box that I'm wrestling with, has a setup where each PHB has an
> additional id, then each PHB can have up to 256 buses.  So when you are
> talking to a device, the scheme is phbid, bus, dev etc etc. Pretty easy
> really.
> 
>   I am getting for putting something like this into the kernel at large,
> it would probably be best to have a CONFIG_GREATER_THAN_256_BUSES or
> some such.

We don't need such a CONFIG_xxx at all.  The current PCI core code
should scale up just fine.

According to the PCI spec it is -impossible- to have more than 256 buses
on a single "hose", so you simply have to implement multiple hoses, just
like Alpha (and Sparc64?) already do.  That's how the hardware is forced
to implement it...

	Jeff


-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-13 10:02 Going beyond 256 PCI buses Tom Gall
  2001-06-13 17:17 ` Albert D. Cahalan
  2001-06-14 14:14 ` Jeff Garzik
@ 2001-06-14 14:24 ` David S. Miller
  2001-06-14 14:32   ` Jeff Garzik
                     ` (4 more replies)
  2001-06-14 15:13 ` Jonathan Lundell
  3 siblings, 5 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 14:24 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Tom Gall, linux-kernel


Jeff Garzik writes:
 > According to the PCI spec it is -impossible- to have more than 256 buses
 > on a single "hose", so you simply have to implement multiple hoses, just
 > like Alpha (and Sparc64?) already do.  That's how the hardware is forced
 > to implement it...

Right, what userspace had to become aware of are "PCI domains" which
is just another fancy term for a "hose" or "controller".

All you have to do is (right now, the kernel supports this fully)
open up a /proc/bus/pci/${BUS}/${DEVICE} node and then go:

	domain = ioctl(fd, PCIIOC_CONTROLLER, 0);

Viola.

There are only two real issues:

1) Extending the type bus numbers use inside the kernel.

   Basically how most multi-controller platforms work now
   is they allocate bus numbers in the 256 bus space as
   controllers are probed.  If we change the internal type
   used by the kernel to "u32" or whatever, we expand that
   available space accordingly.

   For the lazy, basically go into include/linux/pci.h
   and change the "unsigned char"s in struct pci_bus into
   some larger type.  This is mindless work.

2) Figure out what to do wrt. sys_pciconfig_{read,write}()

   They ought to really be deprecated and the /proc/bus/pci
   way is the expected way to go about doing these things.
   The procfs interface is more intelligent, less clumsy, and
   even allows you to mmap() PCI cards portably (instead of
   doing crap like mmap()s on /dev/mem, yuck).

   Actually, there only real issue here is what happens when
   userspace does PCI config space reads to things which talk
   about "bus numbers" since those will be 8-bit as this is a
   hardware imposed type.  These syscalls take long args already
   so they could use >256 bus numbers just fine.

Basically, this 256 bus limit in Linux is a complete fallacy.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:24 ` David S. Miller
@ 2001-06-14 14:32   ` Jeff Garzik
  2001-06-14 14:42   ` David S. Miller
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 14:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Gall, linux-kernel

"David S. Miller" wrote:
> 1) Extending the type bus numbers use inside the kernel.
> 
>    Basically how most multi-controller platforms work now
>    is they allocate bus numbers in the 256 bus space as
>    controllers are probed.  If we change the internal type
>    used by the kernel to "u32" or whatever, we expand that
>    available space accordingly.
> 
>    For the lazy, basically go into include/linux/pci.h
>    and change the "unsigned char"s in struct pci_bus into
>    some larger type.  This is mindless work.

Why do you want to make the bus number larger than the PCI bus number
register?

It seems like adding 'unsigned int domain_num' makes more sense, and is
more correct.  Maybe that implies fixing up other code to use a
(domain,bus) pair, but that's IMHO a much better change than totally
changing the interpretation of pci_bus::bus_number...


> 2) Figure out what to do wrt. sys_pciconfig_{read,write}()

3) (tiny issue) Change pci_dev::slot_name such that it includes the
domain number.  This is passed to userspace by SCSI and net drivers as a
way to allow userspace to associate a kernel interface with a bus
device.


> Basically, this 256 bus limit in Linux is a complete fallacy.

yep

Regards,

	Jeff


-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:24 ` David S. Miller
  2001-06-14 14:32   ` Jeff Garzik
@ 2001-06-14 14:42   ` David S. Miller
  2001-06-14 15:29     ` Jeff Garzik
  2001-06-14 18:01   ` Albert D. Cahalan
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 44+ messages in thread
From: David S. Miller @ 2001-06-14 14:42 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Tom Gall, linux-kernel


Jeff Garzik writes:
 > Why do you want to make the bus number larger than the PCI bus number
 > register?

This isn't it.  What I'm trying to provoke thought on is
"is there a way to make mindless apps using these syscalls
work transparently"

I think the answer is no.  Apps should really fetch info out
of /proc/bus/pci and use the controller ioctl.

But someone could surprise me :-)

 > It seems like adding 'unsigned int domain_num' makes more sense, and is
 > more correct.  Maybe that implies fixing up other code to use a
 > (domain,bus) pair, but that's IMHO a much better change than totally
 > changing the interpretation of pci_bus::bus_number...

Correct, I agree.  But I don't even believe we should be sticking
the domain thing into struct pci_bus.

It's a platform thing.  Most platforms have a single domain, so why
clutter up struct pci_bus with this value?  By this reasoning we could
say that since it's arch-specific, this stuff belongs in sysdata or
wherever.

And this is what is happening right now.  So in essence, the work is
done :-)  The only "limiting factor" is that x86 doesn't support
multiple domains as some other platforms do.  So all these hot-plug
patches just need to use domains properly, and perhaps add domain
support to X86 when one of these hot-plug capable controllers are
being used.

 > > 2) Figure out what to do wrt. sys_pciconfig_{read,write}()
 > 
 > 3) (tiny issue) Change pci_dev::slot_name such that it includes the
 > domain number.  This is passed to userspace by SCSI and net drivers as a
 > way to allow userspace to associate a kernel interface with a bus
 > device.

Sure.  It's an address and the domain is part of the address.

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-13 10:02 Going beyond 256 PCI buses Tom Gall
                   ` (2 preceding siblings ...)
  2001-06-14 14:24 ` David S. Miller
@ 2001-06-14 15:13 ` Jonathan Lundell
  2001-06-14 15:17   ` Jeff Garzik
  3 siblings, 1 reply; 44+ messages in thread
From: Jonathan Lundell @ 2001-06-14 15:13 UTC (permalink / raw)
  To: Jeff Garzik, linux-kernel

At 10:14 AM -0400 2001-06-14, Jeff Garzik wrote:
>According to the PCI spec it is -impossible- to have more than 256 buses
>on a single "hose", so you simply have to implement multiple hoses, just
>like Alpha (and Sparc64?) already do.  That's how the hardware is forced
>to implement it...

That's right, of course. A small problem is that dev->slot_name 
becomes ambiguous, since it doesn't have any hose identification. Nor 
does it have any room for the hose id; it's fixed at 8 chars, and 
fully used (bb:dd.f\0).
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:14 ` Jeff Garzik
@ 2001-06-14 15:15   ` David S. Miller
  2001-06-14 17:59   ` Jonathan Lundell
  1 sibling, 0 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 15:15 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Jeff Garzik, linux-kernel


Jonathan Lundell writes:
 > That's right, of course. A small problem is that dev->slot_name 
 > becomes ambiguous, since it doesn't have any hose identification. Nor 
 > does it have any room for the hose id; it's fixed at 8 chars, and 
 > fully used (bb:dd.f\0).

Sure, Jeff and I already said that slot name strings need to change.

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 15:13 ` Jonathan Lundell
@ 2001-06-14 15:17   ` Jeff Garzik
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 15:17 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: linux-kernel

Jonathan Lundell wrote:
> 
> At 10:14 AM -0400 2001-06-14, Jeff Garzik wrote:
> >According to the PCI spec it is -impossible- to have more than 256 buses
> >on a single "hose", so you simply have to implement multiple hoses, just
> >like Alpha (and Sparc64?) already do.  That's how the hardware is forced
> >to implement it...
> 
> That's right, of course. A small problem is that dev->slot_name
> becomes ambiguous, since it doesn't have any hose identification. Nor
> does it have any room for the hose id; it's fixed at 8 chars, and
> fully used (bb:dd.f\0).

Ouch.  Good point.  Well, extending that field's size shouldn't break
anything except binary modules (which IMHO means, it doesn't break
anything).

	Jeff


-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:42   ` David S. Miller
@ 2001-06-14 15:29     ` Jeff Garzik
  2001-06-14 15:33       ` Jeff Garzik
  0 siblings, 1 reply; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 15:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Gall, linux-kernel

"David S. Miller" wrote:
> Jeff Garzik writes:
>  > Why do you want to make the bus number larger than the PCI bus number
>  > register?
> 
> This isn't it.  What I'm trying to provoke thought on is
> "is there a way to make mindless apps using these syscalls
> work transparently"
> 
> I think the answer is no.  Apps should really fetch info out
> of /proc/bus/pci and use the controller ioctl.
> 
> But someone could surprise me :-)

yeah, those syscalls weren't built with much eye towards the future. 
And I don't think they are present in other OS's either...


>  > It seems like adding 'unsigned int domain_num' makes more sense, and is
>  > more correct.  Maybe that implies fixing up other code to use a
>  > (domain,bus) pair, but that's IMHO a much better change than totally
>  > changing the interpretation of pci_bus::bus_number...
> 
> Correct, I agree.  But I don't even believe we should be sticking
> the domain thing into struct pci_bus.
> 
> It's a platform thing.  Most platforms have a single domain, so why
> clutter up struct pci_bus with this value?  By this reasoning we could
> say that since it's arch-specific, this stuff belongs in sysdata or
> wherever.

Pretty much any arch with a PCI slot can have multiple domains, now that
hotplug controllers are out and about.  So it seems a generic enough
concept to me...


> And this is what is happening right now.  So in essence, the work is
> done :-)  The only "limiting factor" is that x86 doesn't support
> multiple domains as some other platforms do.  So all these hot-plug
> patches just need to use domains properly, and perhaps add domain
> support to X86 when one of these hot-plug capable controllers are
> being used.

point.

Regards,

	Jeff


-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 15:29     ` Jeff Garzik
@ 2001-06-14 15:33       ` Jeff Garzik
  0 siblings, 0 replies; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 15:33 UTC (permalink / raw)
  To: David S. Miller; +Cc: Tom Gall, linux-kernel

Jeff Garzik wrote:
> Pretty much any arch with a PCI slot can have multiple domains, now that
> hotplug controllers are out and about.  So it seems a generic enough
> concept to me...

Um, correction:  that is assuming a certain implementation...  you can
certainly implement a hotplug controller as another PCI-PCI bridge,
AFAICS, too.

-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:14 ` Jeff Garzik
  2001-06-14 15:15   ` David S. Miller
@ 2001-06-14 17:59   ` Jonathan Lundell
  2001-06-14 20:50     ` Jonathan Lundell
  1 sibling, 1 reply; 44+ messages in thread
From: Jonathan Lundell @ 2001-06-14 17:59 UTC (permalink / raw)
  To: David S. Miller, Jeff Garzik; +Cc: Tom Gall, linux-kernel

At 7:24 AM -0700 2001-06-14, David S. Miller wrote:
>2) Figure out what to do wrt. sys_pciconfig_{read,write}()
>
>    They ought to really be deprecated and the /proc/bus/pci
>    way is the expected way to go about doing these things.
>    The procfs interface is more intelligent, less clumsy, and
>    even allows you to mmap() PCI cards portably (instead of
>    doing crap like mmap()s on /dev/mem, yuck).
>
>    Actually, there only real issue here is what happens when
>    userspace does PCI config space reads to things which talk
>    about "bus numbers" since those will be 8-bit as this is a
>    hardware imposed type.  These syscalls take long args already
>    so they could use >256 bus numbers just fine.
>
>Basically, this 256 bus limit in Linux is a complete fallacy.

There's also the question of pci_ops() (PCI config space access 
routines). Alpha (titan in this example) uses:

	struct pci_controler *hose = dev->sysdata;

but the general idea is that you need the hose info to do 
config-space access on PCI.

The 256-bus fallacy is caused in part by an ambiguity in what we mean 
by "bus". In PCI spec talk, a bus is a very specify thing with an 
8-bit bus number, and this is reflected in various registers and 
address formats. On the other hand we have the general concept of 
buses, which includes the possibility of multiple PCI controllers, 
and thus multiple domains of 256 buses each.

As I recall, even a midline chipset such as the ServerWorks LE 
supports the use of two north bridges, which implies two PCI bus 
domains.

I'd favor adopting something like the Alpha approach, but with a 
dedicated struct pci_device item (and with "controller" spelled 
correctly). And I suppose "domain" sounds a little more sober than 
"hose". But since the domain information is necessary to access 
configuration registers, it really needs to be included in struct 
pci_device.
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:24 ` David S. Miller
  2001-06-14 14:32   ` Jeff Garzik
  2001-06-14 14:42   ` David S. Miller
@ 2001-06-14 18:01   ` Albert D. Cahalan
  2001-06-14 18:47   ` David S. Miller
  2001-06-14 19:03   ` David S. Miller
  4 siblings, 0 replies; 44+ messages in thread
From: Albert D. Cahalan @ 2001-06-14 18:01 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jeff Garzik, Tom Gall, linux-kernel

David S. Miller writes:
> Jeff Garzik writes:

>> According to the PCI spec it is -impossible- to have more than 256
>> buses on a single "hose", so you simply have to implement multiple
>> hoses, just like Alpha (and Sparc64?) already do.  That's how the
>> hardware is forced to implement it...
>
> Right, what userspace had to become aware of are "PCI domains" which
> is just another fancy term for a "hose" or "controller".
>
> All you have to do is (right now, the kernel supports this fully)
> open up a /proc/bus/pci/${BUS}/${DEVICE} node and then go:
> 
> 	domain = ioctl(fd, PCIIOC_CONTROLLER, 0);
>
> Viola.
>
> There are only two real issues:

No, three.

0) The API needs to be taken out and shot.

   You've added an ioctl. This isn't just any ioctl. It's a
   wicked nasty ioctl. It's an OH MY GOD YOU CAN'T BE SERIOUS
   ioctl by any standard.

   Consider the logical tree:
   hose -> bus -> slot -> function -> bar

   Well, the hose and bar are missing. You specify the middle
   three parts in the filename (with slot and function merged),
   then use an ioctl to specify the hose and bar.

   Doing the whole thing by filename would be better. Else
   why not just say "screw it", open /proc/pci, and do the
   whole thing by ioctl? Using ioctl for both the most and
   least significant parts of the path while using a path
   for the middle part is Wrong, Bad, Evil, and Broken.

   Fix:

   /proc/bus/PCI/0/0/3/0/config   config space
   /proc/bus/PCI/0/0/3/0/0        the first bar
   /proc/bus/PCI/0/0/3/0/1        the second bar
   /proc/bus/PCI/0/0/3/0/driver   info about the driver, if any
   /proc/bus/PCI/0/0/3/0/event    hot-plug, messages from driver...

   Then we have arch-specific MMU cruft. For example the PowerPC
   defines bits that affect caching, ordering, and merging policy.
   The chips from IBM also define an endianness bit. I don't think
   this ought to be an ioctl either. Maybe mmap() flags would be
   reasonable. This isn't just for PCI; one might do an anon mmap
   with pages locked and cache-incoherent for better performance.

> 1) Extending the type bus numbers use inside the kernel.
...
> 2) Figure out what to do wrt. sys_pciconfig_{read,write}()
...

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:24 ` David S. Miller
                     ` (2 preceding siblings ...)
  2001-06-14 18:01   ` Albert D. Cahalan
@ 2001-06-14 18:47   ` David S. Miller
  2001-06-14 19:04     ` Albert D. Cahalan
  2001-06-14 19:12     ` David S. Miller
  2001-06-14 19:03   ` David S. Miller
  4 siblings, 2 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 18:47 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Jeff Garzik, Tom Gall, linux-kernel


Albert D. Cahalan writes:
 >    You've added an ioctl. This isn't just any ioctl. It's a
 >    wicked nasty ioctl. It's an OH MY GOD YOU CAN'T BE SERIOUS
 >    ioctl by any standard.

It's an ioctl which allows things to work properly in the framework we
currently have.

 >    Fix:
 > 
 >    /proc/bus/PCI/0/0/3/0/config   config space

Which breaks xfree86 instantly.  This fix is unacceptable.

In fact, the current ioctl/mmap machanism was discussed with and
agreed to by the PPC, Alpha, and Sparc64 folks.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 14:24 ` David S. Miller
                     ` (3 preceding siblings ...)
  2001-06-14 18:47   ` David S. Miller
@ 2001-06-14 19:03   ` David S. Miller
  2001-06-14 20:56     ` David S. Miller
  4 siblings, 1 reply; 44+ messages in thread
From: David S. Miller @ 2001-06-14 19:03 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Jeff Garzik, Tom Gall, linux-kernel


Jonathan Lundell writes:
 > As I recall, even a midline chipset such as the ServerWorks LE 
 > supports the use of two north bridges, which implies two PCI bus 
 > domains.

It hides this fact by making config space accesses respond in such a
way that it appears that it is all behind one PCI controller.  The
BIOS even avoids allowing any of the MEM and I/O resources from
overlapping.

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 18:47   ` David S. Miller
@ 2001-06-14 19:04     ` Albert D. Cahalan
  2001-06-14 19:12     ` David S. Miller
  1 sibling, 0 replies; 44+ messages in thread
From: Albert D. Cahalan @ 2001-06-14 19:04 UTC (permalink / raw)
  To: David S. Miller; +Cc: Albert D. Cahalan, Jeff Garzik, Tom Gall, linux-kernel

David S. Miller writes:
> Albert D. Cahalan writes:

>>    You've added an ioctl. This isn't just any ioctl. It's a
>>    wicked nasty ioctl. It's an OH MY GOD YOU CAN'T BE SERIOUS
>>    ioctl by any standard.
>
> It's an ioctl which allows things to work properly in the
> framework we currently have.

It's a hack that keeps us stuck with the existing mistakes.
We need a transition path to get us away from the old mess.

>>    Fix:
>>
>>    /proc/bus/PCI/0/0/3/0/config   config space
>
> Which breaks xfree86 instantly.  This fix is unacceptable.

Nope. Keep /proc/bus/pci until Linux 3.14 if you like.
The above is /proc/bus/PCI. That's "PCI", not "pci".
We still have /proc/pci after all.

> In fact, the current ioctl/mmap machanism was discussed with and
> agreed to by the PPC, Alpha, and Sparc64 folks.

Did you somehow miss when Linus scolded you a few weeks ago?
How about asking somebody who helps maintain /proc, like Al Viro?

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 18:47   ` David S. Miller
  2001-06-14 19:04     ` Albert D. Cahalan
@ 2001-06-14 19:12     ` David S. Miller
  2001-06-14 19:41       ` Jeff Garzik
                         ` (3 more replies)
  1 sibling, 4 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 19:12 UTC (permalink / raw)
  To: Albert D. Cahalan; +Cc: Jeff Garzik, Tom Gall, linux-kernel


Albert D. Cahalan writes:
 > >>    /proc/bus/PCI/0/0/3/0/config   config space
 > >
 > > Which breaks xfree86 instantly.  This fix is unacceptable.
 > 
 > Nope. Keep /proc/bus/pci until Linux 3.14 if you like.
 > The above is /proc/bus/PCI. That's "PCI", not "pci".
 > We still have /proc/pci after all.

Oh I see.

Well, xfree86 and other programs aren't going to look there, so
something had to be done about the existing /proc/bus/pci/* hierarchy.

To be honest, xfree86 needs the controller information not for the
sake of device probing, it needs it to detect resource conflicts.

Anyways, I agree with your change, sure.

 > Did you somehow miss when Linus scolded you a few weeks ago?

You mean specifically what?

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 19:12     ` David S. Miller
@ 2001-06-14 19:41       ` Jeff Garzik
  2001-06-14 19:57       ` David S. Miller
                         ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 19:41 UTC (permalink / raw)
  To: linux-kernel; +Cc: David S. Miller, Albert D. Cahalan, Tom Gall

Thinking a bit more independently of bus type, and with an eye toward's
2.5's s/pci_dev/device/ and s/pci_driver/driver/, would it be useful to
go ahead and codify the concept of PCI domains into a more generic
concept of bus tree numbers?  (or something along those lines)  That
would allow for a more general picture of the entire system's device
tree, across buses.

First sbus bus is tree-0, first PCI bus tree is tree-1, second PCI bus
tree is tree-2, ...

-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 19:12     ` David S. Miller
  2001-06-14 19:41       ` Jeff Garzik
@ 2001-06-14 19:57       ` David S. Miller
  2001-06-14 20:08         ` Jeff Garzik
  2001-06-14 20:14         ` David S. Miller
  2001-06-15  8:42       ` Geert Uytterhoeven
  2001-06-15 15:38       ` David S. Miller
  3 siblings, 2 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 19:57 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Albert D. Cahalan, Tom Gall


Jeff Garzik writes:
 > Thinking a bit more independently of bus type, and with an eye toward's
 > 2.5's s/pci_dev/device/ and s/pci_driver/driver/, would it be useful to
 > go ahead and codify the concept of PCI domains into a more generic
 > concept of bus tree numbers?  (or something along those lines)  That
 > would allow for a more general picture of the entire system's device
 > tree, across buses.
 > 
 > First sbus bus is tree-0, first PCI bus tree is tree-1, second PCI bus
 > tree is tree-2, ...

If you're going to do something like this, ie. true hierarchy, why not
make one tree which is "system", right? Use /proc/bus/${controllernum}
ala:

/proc/bus/0/type	--> "sbus", "pci", "zorro", etc.
/proc/bus/0/*		--> for type == "pci" ${bus}/${dev}.${fn}
			    for type == "sbus" ${slot}
			    ...

How about this?

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 19:57       ` David S. Miller
@ 2001-06-14 20:08         ` Jeff Garzik
  2001-06-14 20:14         ` David S. Miller
  1 sibling, 0 replies; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 20:08 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, Albert D. Cahalan, Tom Gall

"David S. Miller" wrote:
> 
> Jeff Garzik writes:
>  > Thinking a bit more independently of bus type, and with an eye toward's
>  > 2.5's s/pci_dev/device/ and s/pci_driver/driver/, would it be useful to
>  > go ahead and codify the concept of PCI domains into a more generic
>  > concept of bus tree numbers?  (or something along those lines)  That
>  > would allow for a more general picture of the entire system's device
>  > tree, across buses.
>  >
>  > First sbus bus is tree-0, first PCI bus tree is tree-1, second PCI bus
>  > tree is tree-2, ...
> 
> If you're going to do something like this, ie. true hierarchy, why not
> make one tree which is "system", right? Use /proc/bus/${controllernum}
> ala:
> 
> /proc/bus/0/type        --> "sbus", "pci", "zorro", etc.
> /proc/bus/0/*           --> for type == "pci" ${bus}/${dev}.${fn}
>                             for type == "sbus" ${slot}
>                             ...
> 
> How about this?

ok with me.  would bus #0 be the system or root bus?  that would be my
preference, in a tiered system like this.

-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 19:57       ` David S. Miller
  2001-06-14 20:08         ` Jeff Garzik
@ 2001-06-14 20:14         ` David S. Miller
  2001-06-14 21:30           ` Benjamin Herrenschmidt
                             ` (2 more replies)
  1 sibling, 3 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 20:14 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, Albert D. Cahalan, Tom Gall


Jeff Garzik writes:
 > ok with me.  would bus #0 be the system or root bus?  that would be my
 > preference, in a tiered system like this.

Bus 0 is controller 0, of whatever bus type that happens to be.
If we want to do something special we could create something
like /proc/bus/root or whatever, but I feel this unnecessary.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 17:59   ` Jonathan Lundell
@ 2001-06-14 20:50     ` Jonathan Lundell
  0 siblings, 0 replies; 44+ messages in thread
From: Jonathan Lundell @ 2001-06-14 20:50 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jeff Garzik, Tom Gall, linux-kernel

At 12:03 PM -0700 2001-06-14, David S. Miller wrote:
>Jonathan Lundell writes:
>  > As I recall, even a midline chipset such as the ServerWorks LE
>  > supports the use of two north bridges, which implies two PCI bus
>  > domains.
>
>It hides this fact by making config space accesses respond in such a
>way that it appears that it is all behind one PCI controller.  The
>BIOS even avoids allowing any of the MEM and I/O resources from
>overlapping.

So we end up with a single domain and max 256 buses. Still, it's not 
behavior one can count on. Sun's U2P PCI controller certainly creates 
a new PCI domain for each controller. It's easier in architectures 
other than IA32, in a way, since they typically don't have the 64KB 
IO-space addressing limitation that makes heavily bridged systems 
problematical on IA32 (one tends to run out of IO space).
-- 
/Jonathan Lundell.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 19:03   ` David S. Miller
@ 2001-06-14 20:56     ` David S. Miller
  0 siblings, 0 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 20:56 UTC (permalink / raw)
  To: Jonathan Lundell; +Cc: Jeff Garzik, Tom Gall, linux-kernel


Jonathan Lundell writes:
 > It's easier in architectures other than IA32, in a way, since they
 > typically don't have the 64KB IO-space addressing limitation that
 > makes heavily bridged systems problematical on IA32 (one tends to
 > run out of IO space).

Right, I was even going to mention this.

But nothing stops an ia32 PCI controller vendor from doing what the
Sun PCI controller does, which is to make I/O space memory mapped.
Well, one thing stops them, no OS would support this from the get
go.

Luckily, it would be quite easy to make Linux handle this kind of
thing on x86 since other platforms built up the infrastructure
needed to make it work.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 20:14         ` David S. Miller
@ 2001-06-14 21:30           ` Benjamin Herrenschmidt
  2001-06-14 21:46             ` Jeff Garzik
  2001-06-14 21:48             ` David S. Miller
  2001-06-14 21:35           ` Going beyond 256 PCI buses David S. Miller
  2001-06-16 21:32           ` Jeff Garzik
  2 siblings, 2 replies; 44+ messages in thread
From: Benjamin Herrenschmidt @ 2001-06-14 21:30 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jeff Garzik, linux-kernel, Albert D. Cahalan, Tom Gall

>
>Bus 0 is controller 0, of whatever bus type that happens to be.
>If we want to do something special we could create something
>like /proc/bus/root or whatever, but I feel this unnecessary.

<old rant>

Mostly, except for one thing: legacy devices expecting ISA-like
ops on a given domain which currently need some way to know
what PCI bus hold the ISA bus. 

While we are at it, I'd be really glad if we could agree on a
way to abstract the current PIO scheme to understand the fact
that any domain can actually have "legacy ISA-like" devices.

One example is that any domain can have a VGA controller that
requires a bit of legacy PIO & ISA-mem stuff. In the same vein,
any domain can have an ISA-bridge used to wired 16bits devices

Another example is an embedded device which could use the
domain abstraction to represent different IO busses on which
old-style 16bits chips are wired.

I beleive there will always be need for some platform specific
hacking at probe-time to handle those, but we can at least make
the inx/outx functions/macros compatible with such a scheme,
possibly by requesting an ioremap equivalent to be done so that
we stop passing them real PIO addresses, but a cookie obtained
in various platform specific ways.

For the case of PCI (which would handle both the VGA case and
the multiple PCI<->ISA bridge case), one possibility is to
provide a function returning resources for the "legacy" PIO
and MMIO regions if any on a given domain. This is especially
true for ISA-memory (used mostly for VGA) as host controllers
for non-x86 platforms usually have a special window somewhere
in the bus space for generating <64k mem cycles on the PCI bus.

</old rant>

Ben.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 20:14         ` David S. Miller
  2001-06-14 21:30           ` Benjamin Herrenschmidt
@ 2001-06-14 21:35           ` David S. Miller
  2001-06-14 21:46             ` Benjamin Herrenschmidt
  2001-06-16 21:32           ` Jeff Garzik
  2 siblings, 1 reply; 44+ messages in thread
From: David S. Miller @ 2001-06-14 21:35 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: Jeff Garzik, linux-kernel, Albert D. Cahalan, Tom Gall


Benjamin Herrenschmidt writes:
 > I beleive there will always be need for some platform specific
 > hacking at probe-time to handle those, but we can at least make
 > the inx/outx functions/macros compatible with such a scheme,
 > possibly by requesting an ioremap equivalent to be done so that
 > we stop passing them real PIO addresses, but a cookie obtained
 > in various platform specific ways.

The cookie can be encoded into the address itself.

This is why readl() etc. take one arg, the address, not a billion
other arguments like some systems do.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 21:30           ` Benjamin Herrenschmidt
@ 2001-06-14 21:46             ` Jeff Garzik
  2001-06-14 21:48             ` David S. Miller
  1 sibling, 0 replies; 44+ messages in thread
From: Jeff Garzik @ 2001-06-14 21:46 UTC (permalink / raw)
  To: Benjamin Herrenschmidt
  Cc: David S. Miller, linux-kernel, Albert D. Cahalan, Tom Gall

Benjamin Herrenschmidt wrote:
> While we are at it, I'd be really glad if we could agree on a
> way to abstract the current PIO scheme to understand the fact
> that any domain can actually have "legacy ISA-like" devices.

ioremap for outb/outw/outl.

IMHO of course.

I think rth requested pci_ioremap also...

-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 21:35           ` Going beyond 256 PCI buses David S. Miller
@ 2001-06-14 21:46             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 44+ messages in thread
From: Benjamin Herrenschmidt @ 2001-06-14 21:46 UTC (permalink / raw)
  To: David S. Miller; +Cc: Jeff Garzik, linux-kernel, Albert D. Cahalan, Tom Gall

> > I beleive there will always be need for some platform specific
> > hacking at probe-time to handle those, but we can at least make
> > the inx/outx functions/macros compatible with such a scheme,
> > possibly by requesting an ioremap equivalent to be done so that
> > we stop passing them real PIO addresses, but a cookie obtained
> > in various platform specific ways.
>
>The cookie can be encoded into the address itself.
>
>This is why readl() etc. take one arg, the address, not a billion
>other arguments like some systems do.

Right, I don't want an additional parameter. Just a clear definition
that, like read/writex(), the in/outx() functions "address" parameter
is not a magic IO address, but a cookie obtained from an ioremap-like
function.

Now, the parameter passed to that ioremap-like function are a different
story. I beleive it could be something like

isa_pioremap(domain, address, size)  for ISA-like PIO
isa_ioremap(domain, address, size)   for ISA-like mem

domain can be 0 for "default" case (or maybe -1 as 0 is a valid
domain number and we may want the default domain to be another one)

For PCI PIOs, we need a pioremap(pci_dev, address, size) counterpart
to ioremap, as mapping of IO space is usually different on each domain.

A nice side-effect of enforcing those rules is that we no-longer need
to have the entire IO space of all domain beeing mapped in kernel virtual
space all the time as it's the case now (at least on PPC), thus saving
kernel virtual address space.

Now, we can argue on the "pioremap" name itself ;)

A typical use is a fbdev driver for a VGA-compatible PCI card. Some of
these still require some "legacy" access before beeing fully useable with
PCI MMIO only. Such driver can use isa_pioremap & isa_ioremap with the
domain number extracted from the pci_dev of the card in order to generate
those necessary cycles.

Ben.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 21:30           ` Benjamin Herrenschmidt
  2001-06-14 21:46             ` Jeff Garzik
@ 2001-06-14 21:48             ` David S. Miller
  2001-06-14 21:57               ` Benjamin Herrenschmidt
  2001-06-14 22:12               ` David S. Miller
  1 sibling, 2 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 21:48 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Benjamin Herrenschmidt, linux-kernel, Albert D. Cahalan, Tom Gall


Jeff Garzik writes:
 > I think rth requested pci_ioremap also...

It really isn't needed, and I understand why Linus didn't like the
idea either.  Because you can encode the bus etc. info into the
resource addresses themselves.

On sparc64 we just so happen to stick raw physical addresses into the
resources, but that is just one way of implementing it.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 21:48             ` David S. Miller
@ 2001-06-14 21:57               ` Benjamin Herrenschmidt
  2001-06-14 22:12               ` David S. Miller
  1 sibling, 0 replies; 44+ messages in thread
From: Benjamin Herrenschmidt @ 2001-06-14 21:57 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

>It really isn't needed, and I understand why Linus didn't like the
>idea either.  Because you can encode the bus etc. info into the
>resource addresses themselves.
>
>On sparc64 we just so happen to stick raw physical addresses into the
>resources, but that is just one way of implementing it.

That would be fine for PIO on PCI, but still is an issue for
VGA-like devices that need to issue some "legacy" cycles on
a given domain. Currently, on PPC, inx/outx will only go to
one bus (arbitrarily choosen during boot) because of that,
meaning that we can't have 2 VGA cards on 2 different domains

That's why I'd love to see a review of the "legacy" (ISA) stuff
in general. I understand that can require a bit of updating of
a lot of legacy drivers to do the proper ioremap's, but that would
help a lot, including some weird embedded archs which love using
those cheap 16 bits devices on all sorts of custom busses. In
those case, only the probe part will have to be hacked since the
drivers will all cleanly use a "base" obtained from that probe-time
ioremap before doing inx/outx.

I'd be happy to help bringing drivers up-to-date (however, I don't
have an x86 box to test with) once we agree on the way do go.

Ben.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 21:48             ` David S. Miller
  2001-06-14 21:57               ` Benjamin Herrenschmidt
@ 2001-06-14 22:12               ` David S. Miller
  2001-06-14 22:29                 ` Benjamin Herrenschmidt
                                   ` (3 more replies)
  1 sibling, 4 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 22:12 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel


Benjamin Herrenschmidt writes:
 > That would be fine for PIO on PCI, but still is an issue for
 > VGA-like devices that need to issue some "legacy" cycles on
 > a given domain. Currently, on PPC, inx/outx will only go to
 > one bus (arbitrarily choosen during boot) because of that,
 > meaning that we can't have 2 VGA cards on 2 different domains

It's funny you mention this because I have been working on something
similar recently.  Basically making xfree86 int10 and VGA poking happy
on sparc64.

But this has no real use in the kernel.  (actually I take this back,
read below)

You have a primary VGA device, that is the one the bios (boot
firmware, whatever you want to call it) enables to respond to I/O and
MEM accesses, the rest are configured to VGA pallette snoop and that's
it.  The primary VGA device is the kernel console (unless using some
fbcon driver of course), and that's that.

The secondary VGA devices are only interesting to things like the X
server, and xfree86 does all the enable/disable/bridge-forward-vga
magic when doing multi-head.

Perhaps, you might need to program the VGA resources of some device to
use it in a fbcon driver (ie. to init it or set screen crt parameters,
I believe the tdfx requires the latter which is why I'm having a devil
of a time getting it to work on my sparc64 box).  This would be a
seperate issue, and I would not mind at all seeing an abstraction for
this sort of thing, let us call it:

	struct pci_vga_resource {
		struct resource io, mem;
	};

	int pci_route_vga(struct pci_dev *pdev, struct pci_vga_resource *res);
	pci_restore_vga(void);

So you'd go:

	struct pci_vga_resource vga_res;
	int err;

	err = pci_route_vga(tdfx_pdev, &vga_res);

	if (err)
		barf();
	vga_ports = ioremap(vga_res.io.start, vga_res.io.end-vga_res.io.start+1);
	program_video_crtc_params(vga_ports);
	iounmap(vga_ports);
	vga_fb = ioremap(vga_res.mem.start, vga_res.mem.end-vga_res.mem.start+1);
	clear_vga_fb(vga_fb);
	iounmap(vga_fb);

	pci_restore_vga();
	
pci_route_vga does several things:

1) It saves the current VGA routing information.
2) It configures busses and VGA devices such that PDEV responds to
   VGA accesses, and other VGA devices just VGA palette snoop.
3) Fills in the pci_vga_resources with
   io: 0x320-->0x340 in domain PDEV lives, vga I/O regs
   mem: 0xa0000-->0xc0000 in domain PDEV lives, video ram

pci_restore_vga, as the name suggests, restores things back to how
they were before the pci_route_vga() call.  Maybe also some semaphore
so only one driver can do this at once and you can't drop the
semaphore without calling pci_restore_vga().  VC switching into the X
server would need to grab this thing too.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 22:12               ` David S. Miller
@ 2001-06-14 22:29                 ` Benjamin Herrenschmidt
  2001-06-14 22:49                 ` David S. Miller
                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 44+ messages in thread
From: Benjamin Herrenschmidt @ 2001-06-14 22:29 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

>It's funny you mention this because I have been working on something
>similar recently.  Basically making xfree86 int10 and VGA poking happy
>on sparc64.

Heh, world is small ;)

>But this has no real use in the kernel.  (actually I take this back,
>read below)

yup, fbcon at least... 

>You have a primary VGA device, that is the one the bios (boot
>firmware, whatever you want to call it) enables to respond to I/O and
>MEM accesses, the rest are configured to VGA pallette snoop and that's
>it.  The primary VGA device is the kernel console (unless using some
>fbcon driver of course), and that's that.

Yup, fbcon is what I have in mind here

>The secondary VGA devices are only interesting to things like the X
>server, and xfree86 does all the enable/disable/bridge-forward-vga
>magic when doing multi-head.

and multihead fbcon. 

>Perhaps, you might need to program the VGA resources of some device to
>use it in a fbcon driver (ie. to init it or set screen crt parameters,
>I believe the tdfx requires the latter which is why I'm having a devil
>of a time getting it to work on my sparc64 box).  This would be a
>seperate issue, and I would not mind at all seeing an abstraction for
>this sort of thing, let us call it:
>
>	struct pci_vga_resource {
>		struct resource io, mem;
>	};
>
>	int pci_route_vga(struct pci_dev *pdev, struct pci_vga_resource *res);
>	pci_restore_vga(void);
>
> [.../...]

Well... that would work for VGA itself (note that this semaphore
you are talking about should be shared some way with the /proc
interface so XFree can be properly sync'ed as well).

But I still think it may be useful to generalize the idea to 
all kind of legacy IO & PIOs. I definitely agree that VGA is a kind
of special case, mostly because of the necessary exclusion on
the VGA IO response.

But what about all those legacy drivers that will issue inx/outx
calls without an ioremap ? Should they call ioremap with hard-coded
legacy addresses ? There are chipsets containing things like legacy
timers, legacy keyboard controllers, etc... and in some (rare I admit)
cases, those may be scattered (or multiplied) on various domains. 
If we decide we don't handle those, then well, I won't argue more
(it's mostly an estethic rant on my side ;), but the problem of
wether they should call ioremap or not is there, and since the
ISA bus can be "mapped" anywhere in the bus space by the host bridge,
there need to be a way to retreive the ISA resources in general for
a given domain.

That's why I'd suggest something like 

pci_get_isa_mem(struct resource* isa_mem);
pci_get_isa_io(struct resource* isa_io);

(I prefer 2 different functions as some platforms like powermac just
don't provide the ISA mem space at all, there's no way to generate
a memory cycle in the low-address range on the PCI bus of those and
they don't have a PCI<->ISA bridge), so I like having the ability of
one of the functions returning an error and not the other.

Also, having the same ioremap() call for both mem IO and PIO means
that things like 0xc0000 cannot be interpreted. It's a valid ISA-mem
address in the VGA space and a valid PIO address on a PCI bus that
supports >64k of PIO space.

I beleive it would make things clearer (and probably implementation
simpler) to separate ioremap and pioremap.

Ben.

>So you'd go:
>
>	struct pci_vga_resource vga_res;
>	int err;
>
>	err = pci_route_vga(tdfx_pdev, &vga_res);
>
>	if (err)
>		barf();
>	vga_ports = ioremap(vga_res.io.start, vga_res.io.end-vga_res.io.start+1);
>	program_video_crtc_params(vga_ports);
>	iounmap(vga_ports);
>	vga_fb = ioremap(vga_res.mem.start, vga_res.mem.end-vga_res.mem.start+1);
>	clear_vga_fb(vga_fb);
>	iounmap(vga_fb);
>
>	pci_restore_vga();
>	
>pci_route_vga does several things:
>
>1) It saves the current VGA routing information.
>2) It configures busses and VGA devices such that PDEV responds to
>   VGA accesses, and other VGA devices just VGA palette snoop.
>3) Fills in the pci_vga_resources with
>   io: 0x320-->0x340 in domain PDEV lives, vga I/O regs
>   mem: 0xa0000-->0xc0000 in domain PDEV lives, video ram
>
>pci_restore_vga, as the name suggests, restores things back to how
>they were before the pci_route_vga() call.  Maybe also some semaphore
>so only one driver can do this at once and you can't drop the
>semaphore without calling pci_restore_vga().  VC switching into the X
>server would need to grab this thing too.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 22:12               ` David S. Miller
  2001-06-14 22:29                 ` Benjamin Herrenschmidt
@ 2001-06-14 22:49                 ` David S. Miller
  2001-06-14 23:35                   ` Benjamin Herrenschmidt
  2001-06-14 23:35                 ` VGA handling was [Re: Going beyond 256 PCI buses] James Simmons
  2001-06-14 23:42                 ` David S. Miller
  3 siblings, 1 reply; 44+ messages in thread
From: David S. Miller @ 2001-06-14 22:49 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: linux-kernel


Benjamin Herrenschmidt writes:
 > Well... that would work for VGA itself (note that this semaphore
 > you are talking about should be shared some way with the /proc
 > interface so XFree can be properly sync'ed as well).

Once XFree takes ownership of the VC it has control of all the
graphics cards in the system.  It does this before it does any poking
around.

 > But I still think it may be useful to generalize the idea to 
 > all kind of legacy IO & PIOs. I definitely agree that VGA is a kind
 > of special case, mostly because of the necessary exclusion on
 > the VGA IO response.

I don't think this is the way to go.

First, what would you use it for?  Let's check some examples:

1) "legacy serial on Super I/O"

   This is best handled by a asm/serial.h mechanism, something akin
   to how asm/floppy.h works.  Let the platform say how to get at
   the ports.

   Wait, there already is a asm/serial.h that does just this :-)

2) "floppy"

   See asm/floppy.h :-)

3) "PS/2 keyboard and mouse"

   I believe there is an asm/keyboard.h for the keyboard side of this.
   Yes, and it has macros for the register accesses.

4) "legacy keyboard beeper"

   Make an asm/kbdbeep.h or something.

Add whatever else you might be interested that things tend to
inb/outb.

And if your concern is having multiple of these in your system, the
only ones that make sense are floppy and serial and those are handled
just fine by the asm/serial.h mechanism.

This way of doing this allows 16550's, floppies, etc. to be handled on
any bus whatsoever.

I mean, besides this and VGA what is left and even matters?

Later,
David S. Miller
davem@redhat.com


^ permalink raw reply	[flat|nested] 44+ messages in thread

* VGA handling was [Re: Going beyond 256 PCI buses]
  2001-06-14 22:12               ` David S. Miller
  2001-06-14 22:29                 ` Benjamin Herrenschmidt
  2001-06-14 22:49                 ` David S. Miller
@ 2001-06-14 23:35                 ` James Simmons
  2001-06-14 23:42                 ` David S. Miller
  3 siblings, 0 replies; 44+ messages in thread
From: James Simmons @ 2001-06-14 23:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: Benjamin Herrenschmidt, linux-kernel


> The secondary VGA devices are only interesting to things like the X
> server, and xfree86 does all the enable/disable/bridge-forward-vga
> magic when doing multi-head.

   Actually this weekend I'm working on this for the console system. I
plan to get my TNT card and 3Dfx card both working at the same time both in
VGA text mode. If you also really want to get multiple card in vga text
mode it is going to get far more complex in a multihead besides the
problem of multiple buses.

   First the approach I'm taking is be able to pass to vgacon different
register and memory regions as well as different layouts for vga text
mode. Different cards have different memory layouts for vga text mode. I
discovered this the hard way :-( This way I can get these VGA regions from
the pci bus and pass this to vgacon. This approach will solve alot of
problems. We still havethe problem of multibuses which I haven't yet
figured out. 



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 22:49                 ` David S. Miller
@ 2001-06-14 23:35                   ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 44+ messages in thread
From: Benjamin Herrenschmidt @ 2001-06-14 23:35 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel

>Add whatever else you might be interested that things tend to
>inb/outb.
>
>And if your concern is having multiple of these in your system, the
>only ones that make sense are floppy and serial and those are handled
>just fine by the asm/serial.h mechanism.
>
>This way of doing this allows 16550's, floppies, etc. to be handled on
>any bus whatsoever.
>
>I mean, besides this and VGA what is left and even matters?

Ok, I capitulate ;)

So basically, all is needed is to enforce those drivers to use
ioremap before doing their IOs.

I still think there's a potential difficulty with having the same
ioremap function for both MMIO and PIO as the address space may overlap.

For once, the x86 enters the dance as it has really separate bus spaces for
them. Other archs can work around this by using the physical address
where the PIO is mapped in the IO resources.

Ben.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: VGA handling was [Re: Going beyond 256 PCI buses]
  2001-06-14 22:12               ` David S. Miller
                                   ` (2 preceding siblings ...)
  2001-06-14 23:35                 ` VGA handling was [Re: Going beyond 256 PCI buses] James Simmons
@ 2001-06-14 23:42                 ` David S. Miller
  2001-06-14 23:55                   ` James Simmons
                                     ` (2 more replies)
  3 siblings, 3 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-14 23:42 UTC (permalink / raw)
  To: James Simmons; +Cc: Benjamin Herrenschmidt, linux-kernel


James Simmons writes:
 >    Actually this weekend I'm working on this for the console system. I
 > plan to get my TNT card and 3Dfx card both working at the same time both in
 > VGA text mode. If you also really want to get multiple card in vga text
 > mode it is going to get far more complex in a multihead besides the
 > problem of multiple buses.

You going to have to enable/disable I/O, MEM access, and VGA pallette
snooping in the PCI_COMMAND register of both boards every time you go
from rendering text on one to rendering text on the other.  If there
are bridges leading to either device, you may need to fiddle with VGA
forwarding during each switch as well.

You'll also need a semaphore or similar to control this "active VGA"
state.

Really, I don't think this is all that good of an idea.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: VGA handling was [Re: Going beyond 256 PCI buses]
  2001-06-14 23:42                 ` David S. Miller
@ 2001-06-14 23:55                   ` James Simmons
  2001-06-15 15:14                     ` Pavel Machek
  2001-06-15  2:06                   ` Albert D. Cahalan
  2001-06-15  8:52                   ` Matan Ziv-Av
  2 siblings, 1 reply; 44+ messages in thread
From: James Simmons @ 2001-06-14 23:55 UTC (permalink / raw)
  To: David S. Miller; +Cc: Benjamin Herrenschmidt, linux-kernel


> You going to have to enable/disable I/O, MEM access, and VGA pallette
> snooping in the PCI_COMMAND register of both boards every time you go
> from rendering text on one to rendering text on the other.  If there
> are bridges leading to either device, you may need to fiddle with VGA
> forwarding during each switch as well.
> 
> You'll also need a semaphore or similar to control this "active VGA"
> state.
> 
> Really, I don't think this is all that good of an idea.

Yes I know. Also each card needs it own special functions to handle
programming the CRTC, SEQ registers etc. Perhaps for real multihead
support I guess the user will have to use fbdev. vgacon can just exist for
single head systems. I guess it is time to let vga go. It is old technology. 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: VGA handling was [Re: Going beyond 256 PCI buses]
  2001-06-14 23:42                 ` David S. Miller
  2001-06-14 23:55                   ` James Simmons
@ 2001-06-15  2:06                   ` Albert D. Cahalan
  2001-06-15  8:52                   ` Matan Ziv-Av
  2 siblings, 0 replies; 44+ messages in thread
From: Albert D. Cahalan @ 2001-06-15  2:06 UTC (permalink / raw)
  To: David S. Miller; +Cc: James Simmons, Benjamin Herrenschmidt, linux-kernel

David S. Miller writes:

> You going to have to enable/disable I/O, MEM access, and VGA pallette
> snooping in the PCI_COMMAND register of both boards every time you go
> from rendering text on one to rendering text on the other.  If there
> are bridges leading to either device, you may need to fiddle with VGA
> forwarding during each switch as well.
> 
> You'll also need a semaphore or similar to control this "active VGA"
> state.
> 
> Really, I don't think this is all that good of an idea.

It might not be so bad if you assume that one doesn't blast away
with interleaved operations between the displays. Lazy switching
means you can do a whole user-interface action without needing
to muck with bridges. By "user-interface action" I mean something
like a "make menuconfig" screen refresh.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 19:12     ` David S. Miller
  2001-06-14 19:41       ` Jeff Garzik
  2001-06-14 19:57       ` David S. Miller
@ 2001-06-15  8:42       ` Geert Uytterhoeven
  2001-06-15 15:38       ` David S. Miller
  3 siblings, 0 replies; 44+ messages in thread
From: Geert Uytterhoeven @ 2001-06-15  8:42 UTC (permalink / raw)
  To: David S. Miller; +Cc: Albert D. Cahalan, Jeff Garzik, Tom Gall, linux-kernel

On Thu, 14 Jun 2001, David S. Miller wrote:
> Albert D. Cahalan writes:
>  > >>    /proc/bus/PCI/0/0/3/0/config   config space
>  > >
>  > > Which breaks xfree86 instantly.  This fix is unacceptable.
>  > 
>  > Nope. Keep /proc/bus/pci until Linux 3.14 if you like.
>  > The above is /proc/bus/PCI. That's "PCI", not "pci".
>  > We still have /proc/pci after all.
> 
> Oh I see.
> 
> Well, xfree86 and other programs aren't going to look there, so
> something had to be done about the existing /proc/bus/pci/* hierarchy.
> 
> To be honest, xfree86 needs the controller information not for the
> sake of device probing, it needs it to detect resource conflicts.

Well, those resource conflicts shouldn't be there in the first place. They
should be handled by the OS.

Gr{oetje,eeting}s,

						Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
							    -- Linus Torvalds


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: VGA handling was [Re: Going beyond 256 PCI buses]
  2001-06-14 23:42                 ` David S. Miller
  2001-06-14 23:55                   ` James Simmons
  2001-06-15  2:06                   ` Albert D. Cahalan
@ 2001-06-15  8:52                   ` Matan Ziv-Av
  2 siblings, 0 replies; 44+ messages in thread
From: Matan Ziv-Av @ 2001-06-15  8:52 UTC (permalink / raw)
  To: David S. Miller; +Cc: James Simmons, Benjamin Herrenschmidt, linux-kernel

On Thu, 14 Jun 2001, David S. Miller wrote:

> James Simmons writes:
>  >  Actually this weekend I'm working on this for the console system. I
>  > plan to get my TNT card and 3Dfx card both working at the same time both in
>  > VGA text mode. If you also really want to get multiple card in vga text
>  > mode it is going to get far more complex in a multihead besides the
>  > problem of multiple buses.
> 
> You going to have to enable/disable I/O, MEM access, and VGA pallette
> snooping in the PCI_COMMAND register of both boards every time you go
> from rendering text on one to rendering text on the other.If there
> are bridges leading to either device, you may need to fiddle with VGA
> forwarding during each switch as well.
> 
> You'll also need a semaphore or similar to control this "active VGA"
> state.

Why do that? You ignore the vga regions of all the cards except for the
primary, and program all other cards by accessing their PCI mapped
regions, which are programmed not to overlap, so they are completely
independent.
This is what nvvgacon does for using text mode of secondary nvidia card. 

-- 
Matan Ziv-Av.                         matan@svgalib.org



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: VGA handling was [Re: Going beyond 256 PCI buses]
  2001-06-14 23:55                   ` James Simmons
@ 2001-06-15 15:14                     ` Pavel Machek
  0 siblings, 0 replies; 44+ messages in thread
From: Pavel Machek @ 2001-06-15 15:14 UTC (permalink / raw)
  To: James Simmons; +Cc: David S. Miller, Benjamin Herrenschmidt, linux-kernel

Hi!

> Yes I know. Also each card needs it own special functions to handle
> programming the CRTC, SEQ registers etc. Perhaps for real multihead
> support I guess the user will have to use fbdev. vgacon can just exist for
> single head systems. I guess it is time to let vga go. It is old technology. 

It is still faster than fbdev, and not all cards have fbdev drivers. This
should work with any vga card, right?
								Pavel
-- 
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 19:12     ` David S. Miller
                         ` (2 preceding siblings ...)
  2001-06-15  8:42       ` Geert Uytterhoeven
@ 2001-06-15 15:38       ` David S. Miller
  3 siblings, 0 replies; 44+ messages in thread
From: David S. Miller @ 2001-06-15 15:38 UTC (permalink / raw)
  To: Geert Uytterhoeven; +Cc: Albert D. Cahalan, Jeff Garzik, Tom Gall, linux-kernel


Geert Uytterhoeven writes:
 > Well, those resource conflicts shouldn't be there in the first place. They
 > should be handled by the OS.

I agree, completely.

But this doesn't solve the issue of reserving resources inside the
X server.  Ie. making sure one driver doesn't take the register area
another driver is using.

You still need to get the controller number into the resources to do
that properly.

Later,
David S. Miller
davem@redhat.com

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-14 20:14         ` David S. Miller
  2001-06-14 21:30           ` Benjamin Herrenschmidt
  2001-06-14 21:35           ` Going beyond 256 PCI buses David S. Miller
@ 2001-06-16 21:32           ` Jeff Garzik
  2001-06-16 23:29             ` Benjamin Herrenschmidt
  2 siblings, 1 reply; 44+ messages in thread
From: Jeff Garzik @ 2001-06-16 21:32 UTC (permalink / raw)
  To: David S. Miller; +Cc: linux-kernel, Albert D. Cahalan, Tom Gall

"David S. Miller" wrote:
> 
> Jeff Garzik writes:
>  > ok with me.  would bus #0 be the system or root bus?  that would be my
>  > preference, in a tiered system like this.
> 
> Bus 0 is controller 0, of whatever bus type that happens to be.
> If we want to do something special we could create something
> like /proc/bus/root or whatever, but I feel this unnecessary.

Basically I would prefer some sort of global tree so we can figure out a
sane ordering for PM.  Power down the pcmcia cards before the add-on
card containing a PCI-pcmcia bridge, that sort of thing.  Cross-bus-type
ordering.

-- 
Jeff Garzik      | Andre the Giant has a posse.
Building 1024    |
MandrakeSoft     |

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: Going beyond 256 PCI buses
  2001-06-16 21:32           ` Jeff Garzik
@ 2001-06-16 23:29             ` Benjamin Herrenschmidt
  0 siblings, 0 replies; 44+ messages in thread
From: Benjamin Herrenschmidt @ 2001-06-16 23:29 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel, David S. Miller, Albert D. Cahalan, Tom Gall

>"David S. Miller" wrote:
>> 
>> Jeff Garzik writes:
>>  > ok with me.  would bus #0 be the system or root bus?  that would be my
>>  > preference, in a tiered system like this.
>> 
>> Bus 0 is controller 0, of whatever bus type that happens to be.
>> If we want to do something special we could create something
>> like /proc/bus/root or whatever, but I feel this unnecessary.
>
>Basically I would prefer some sort of global tree so we can figure out a
>sane ordering for PM.  Power down the pcmcia cards before the add-on
>card containing a PCI-pcmcia bridge, that sort of thing.  Cross-bus-type
>ordering.

Welcome to the PM tree....

What I would have liked, but it looks like our current design is not
going that way, would have been a tree structure for the PM notifiers
from the beginning. And instead of having various kind of callbacks
(like PCI suspend/resume/whatever, others for USB, FW, etc..), we
can just have a PM notifier node for each device and have notifiers
handle calling their childs.

That also allow bus "controllers" (in general) to broadcast specific
messages to their childs for things that don't fit in the D0..D3
scheme.

For PCI, instead of having the PCI layer itself having one node and
call the tree, I'd rather see it having one node per pci_dev, and 
layout them according to the PCI tree by default. I can see (and
already know of) cases where the PM tree is _not_ the PCI tree
(because power/reset lines are wired to various ASICs on a motherboard),
and having this PM tree structure separate allow the arch to 
influence it if necessary.

It's simple (a notifier node is a lightweight structure, only one
callback function is implemented, only a few messages are usually
needed to be handled in a given node).

Ben.



^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2001-06-16 23:30 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2001-06-13 10:02 Going beyond 256 PCI buses Tom Gall
2001-06-13 17:17 ` Albert D. Cahalan
2001-06-13 18:29   ` Tom Gall
2001-06-14 14:14 ` Jeff Garzik
2001-06-14 15:15   ` David S. Miller
2001-06-14 17:59   ` Jonathan Lundell
2001-06-14 20:50     ` Jonathan Lundell
2001-06-14 14:24 ` David S. Miller
2001-06-14 14:32   ` Jeff Garzik
2001-06-14 14:42   ` David S. Miller
2001-06-14 15:29     ` Jeff Garzik
2001-06-14 15:33       ` Jeff Garzik
2001-06-14 18:01   ` Albert D. Cahalan
2001-06-14 18:47   ` David S. Miller
2001-06-14 19:04     ` Albert D. Cahalan
2001-06-14 19:12     ` David S. Miller
2001-06-14 19:41       ` Jeff Garzik
2001-06-14 19:57       ` David S. Miller
2001-06-14 20:08         ` Jeff Garzik
2001-06-14 20:14         ` David S. Miller
2001-06-14 21:30           ` Benjamin Herrenschmidt
2001-06-14 21:46             ` Jeff Garzik
2001-06-14 21:48             ` David S. Miller
2001-06-14 21:57               ` Benjamin Herrenschmidt
2001-06-14 22:12               ` David S. Miller
2001-06-14 22:29                 ` Benjamin Herrenschmidt
2001-06-14 22:49                 ` David S. Miller
2001-06-14 23:35                   ` Benjamin Herrenschmidt
2001-06-14 23:35                 ` VGA handling was [Re: Going beyond 256 PCI buses] James Simmons
2001-06-14 23:42                 ` David S. Miller
2001-06-14 23:55                   ` James Simmons
2001-06-15 15:14                     ` Pavel Machek
2001-06-15  2:06                   ` Albert D. Cahalan
2001-06-15  8:52                   ` Matan Ziv-Av
2001-06-14 21:35           ` Going beyond 256 PCI buses David S. Miller
2001-06-14 21:46             ` Benjamin Herrenschmidt
2001-06-16 21:32           ` Jeff Garzik
2001-06-16 23:29             ` Benjamin Herrenschmidt
2001-06-15  8:42       ` Geert Uytterhoeven
2001-06-15 15:38       ` David S. Miller
2001-06-14 19:03   ` David S. Miller
2001-06-14 20:56     ` David S. Miller
2001-06-14 15:13 ` Jonathan Lundell
2001-06-14 15:17   ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).