linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* RE: MSI fix for buggy PCI/PCI-X hardware
@ 2003-09-09 22:14 Nakajima, Jun
  2003-09-09 22:52 ` Jeff Garzik
  0 siblings, 1 reply; 6+ messages in thread
From: Nakajima, Jun @ 2003-09-09 22:14 UTC (permalink / raw)
  To: Jeff Garzik, long; +Cc: linux-kernel, greg, Nguyen, Tom L, zwane

Jeff, Hi.

How about the default behavior? I'm not a fan of disable_msi(), because
we need to update the driver as we find problems, and we cannot predict
which PCI/PCI-X devices in the world have such a problem, although we
know some will. The workaround in drivers/pci/quirk.c is much better,
compared to modifying the driver, but we still need to update the file
(and rebuild the kernel) as we find problems.

In my opinion, we might want to use drivers/pci/quirk.c to blacklist PCI
Express devices if any (hope not). For PCI/PCI-X devices, we might want
to enable MSI once verified for it. To that end we can also use
drivers/pci/quirk.c to whitelist them (or it's abuse?). That way we can
avoid situations like "it hangs, it does not get interrupts", "disable
ACPI, oh no, MSI".

Thanks,
Jun

> -----Original Message-----
> From: Jeff Garzik [mailto:jgarzik@pobox.com]
> Sent: Tuesday, September 09, 2003 11:41 AM
> To: long
> Cc: linux-kernel@vger.kernel.org; greg@kroah.com; Nakajima, Jun;
Nguyen,
> Tom L; zwane@linuxpower.ca
> Subject: Re: MSI fix for buggy PCI/PCI-X hardware
> 
> On Tue, Sep 09, 2003 at 08:39:37AM -0700, long wrote:
> > The proposed solution is to provide a new API, named "int
> > disable_msi(struct pci_dev *dev)", to allow IHV's who have
> > shipped PCI/PCI-X hardware that does not work in MSI mode to update
> > their software drivers to request the kernel to switch the
> > interrupt mode from MSI mode back to IRQ pin-assertion mode.
> 
> No need for a new API.  We have drivers/pci/quirk.c where we add PCI
> devices with known bugs.  If there is commonality of the bugs across
> hardware, we can easily add a common "quirk" which fixes the
situation.
> 
> Individual drivers shouldn't need to bother with this, really.
> 
> As an aside, if you need to blacklist certain _systems_ rather than
> certain PCI devices, you should either modify drivers/pci/quirks.c or
> arch/i386/kernel/dmi_scan.c.
> 
> 	Jeff
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MSI fix for buggy PCI/PCI-X hardware
  2003-09-09 22:14 MSI fix for buggy PCI/PCI-X hardware Nakajima, Jun
@ 2003-09-09 22:52 ` Jeff Garzik
  2003-09-10 21:31   ` Zwane Mwaikambo
  0 siblings, 1 reply; 6+ messages in thread
From: Jeff Garzik @ 2003-09-09 22:52 UTC (permalink / raw)
  To: Nakajima, Jun; +Cc: long, linux-kernel, greg, Nguyen, Tom L, zwane

Nakajima, Jun wrote:
> How about the default behavior? I'm not a fan of disable_msi(), because
> we need to update the driver as we find problems, and we cannot predict
> which PCI/PCI-X devices in the world have such a problem, although we
> know some will. The workaround in drivers/pci/quirk.c is much better,
> compared to modifying the driver, but we still need to update the file
> (and rebuild the kernel) as we find problems.

Agreed.

That's the pain of buggy hardware.  The solution is to not produce buggy 
hardware ;-)  Failing that, it is unavoidable that the kernel would need 
to be updated to notice or work around buggy hardware.  That's precisely 
the reason for quirks/dmi_scan existence:  the special cases.  Special 
cases are never easy or enjoyable to maintain ;-)


> In my opinion, we might want to use drivers/pci/quirk.c to blacklist PCI
> Express devices if any (hope not). For PCI/PCI-X devices, we might want
> to enable MSI once verified for it. To that end we can also use
> drivers/pci/quirk.c to whitelist them (or it's abuse?). That way we can
> avoid situations like "it hangs, it does not get interrupts", "disable
> ACPI, oh no, MSI".


Five points here:

1) If we did that with ACPI, you guys would have only recieved a 
_fraction_ of the feedback you received.  IMO we want to turn on MSI 
(where supported), and see what breaks.  It _should_ work, otherwise the 
hardware guys wouldn't have put MSI on their PCI device :)

You'll never get feedback and testing if it's turned off by default.

2) MSI is more optimal than standard (should I start calling them 
legacy?) x86 interrupts.  And I think they're just plain cool.  So of 
course I will push to default MSI to on!  ;-)

3) I think this view is colored by "right now".  The current MSI errata 
may be worrying you, but...   MSI is the future.  If you choose to 
whitelist, then you're creating a maintenance nightmare for the future. 
  You would have to qualify _every_ MSI device!  Think how much it would 
suck if we have to do that with PCI devices today.

Furthermore, a whitelist unfairly punishes working MSI hardware and 
perhaps unfairly highlights a few key vendors at the start ;-)  This is 
why I like blacklists.

Broken hardware is a special case, and not something we should invest a 
whole lot of time worrying about.  _Assume_ the hardware is working, 
then deal with the cases where it isn't.  _That_ is the Linus Torvalds 
model of an optimal system (IMO :))

4) I have a real-life example:  tg3.  The BroadCom 57xx chips are 
MSI-brain-damaged.  So we unconditionally program the hardware in 
non-MSI mode.  No special APIs needed at all.

5) Another option is to enable MSI only for devices which call 
request_msi().  This idea follows the current model of 
pci_enable_device():  PCI resources and interrupts are guaranteed to be 
assigned and set up only after a successful call to pci_enable_device(). 
  Then, later on, the driver will call request_irq(), which will unmask 
the irq (if it's not already shared).  Continuing this model, a driver's 
call to request_msi() would signal that MSI is to be enabled for that 
device....  and ensure that the PCI core does not unconditionally enable 
MSI for any device outside of request_msi() call.

	Jeff




^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MSI fix for buggy PCI/PCI-X hardware
  2003-09-09 22:52 ` Jeff Garzik
@ 2003-09-10 21:31   ` Zwane Mwaikambo
  0 siblings, 0 replies; 6+ messages in thread
From: Zwane Mwaikambo @ 2003-09-10 21:31 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Nakajima, Jun, long, Linux Kernel, Greg Kroah-Hartmann, Nguyen, Tom L

On Tue, 9 Sep 2003, Jeff Garzik wrote:

> 5) Another option is to enable MSI only for devices which call 
> request_msi().  This idea follows the current model of 
> pci_enable_device():  PCI resources and interrupts are guaranteed to be 
> assigned and set up only after a successful call to pci_enable_device(). 
>   Then, later on, the driver will call request_irq(), which will unmask 
> the irq (if it's not already shared).  Continuing this model, a driver's 
> call to request_msi() would signal that MSI is to be enabled for that 
> device....  and ensure that the PCI core does not unconditionally enable 
> MSI for any device outside of request_msi() call.

Jun, i'm of the opinion that we can't be doing this in the kernel. The 
bugs are device specific so putting blacklists in the core kernel 
code sounds a bit wrong. What i would prefer is if this is done on a driver 
level via a flag (e.g. say it's a buggy batch upto a given pci revision). 
I don't believe we'd be "violating" the PCI 3.0 mandate if we only check 
revisions when making a decision on driving a device via the irq line or 
msi. So i'm actually agreeing with the request_msi/request_irq pair and 
using that as the interface a driver writer uses to enable the respective 
modes. I know it's kind of cheating and normally the communication path 
when doing this query should be the kernel informing the driver of 
capability and the driver working with the specified mode. But driver 
informing the kernel of specific preferred mode for a device should save 
some headaches. Workable?

Btw have there been some updates? I want to have another look.

Thanks,
	Zwane

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: MSI fix for buggy PCI/PCI-X hardware
@ 2003-09-10  1:26 Nakajima, Jun
  0 siblings, 0 replies; 6+ messages in thread
From: Nakajima, Jun @ 2003-09-10  1:26 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: long, linux-kernel, greg, Nguyen, Tom L, zwane

Thanks for the good points.

> 5) Another option is to enable MSI only for devices which call
> request_msi().  This idea follows the current model of
> pci_enable_device():  PCI resources and interrupts are guaranteed to
be
> assigned and set up only after a successful call to
pci_enable_device().
>   Then, later on, the driver will call request_irq(), which will
unmask
> the irq (if it's not already shared).  Continuing this model, a
driver's
> call to request_msi() would signal that MSI is to be enabled for that
> device....  and ensure that the PCI core does not unconditionally
enable
> MSI for any device outside of request_msi() call.

In fact, this was something I had in my mind (mine was enable_msi()).
The drawback of this is, of course, (simple) modifications to the driver
are required to get MSI enabled. Since we definitely need more APIs for
MSI-X, I think it's a consistent and clean extension required for MSI.
The benefit of this is that each device driver can have more detailed
control with enabling/disabling MSI for particular devices when
supporting a family of devices in the same binary (compared to
drivers/pci/quirk.c). 

Thanks,
Jun

> -----Original Message-----
> From: Jeff Garzik [mailto:jgarzik@pobox.com]
> Sent: Tuesday, September 09, 2003 3:52 PM
> To: Nakajima, Jun
> Cc: long; linux-kernel@vger.kernel.org; greg@kroah.com; Nguyen, Tom L;
> zwane@linuxpower.ca
> Subject: Re: MSI fix for buggy PCI/PCI-X hardware
> 
> Nakajima, Jun wrote:
> > How about the default behavior? I'm not a fan of disable_msi(),
because
> > we need to update the driver as we find problems, and we cannot
predict
> > which PCI/PCI-X devices in the world have such a problem, although
we
> > know some will. The workaround in drivers/pci/quirk.c is much
better,
> > compared to modifying the driver, but we still need to update the
file
> > (and rebuild the kernel) as we find problems.
> 
> Agreed.
> 
> That's the pain of buggy hardware.  The solution is to not produce
buggy
> hardware ;-)  Failing that, it is unavoidable that the kernel would
need
> to be updated to notice or work around buggy hardware.  That's
precisely
> the reason for quirks/dmi_scan existence:  the special cases.  Special
> cases are never easy or enjoyable to maintain ;-)
> 
> 
> > In my opinion, we might want to use drivers/pci/quirk.c to blacklist
PCI
> > Express devices if any (hope not). For PCI/PCI-X devices, we might
want
> > to enable MSI once verified for it. To that end we can also use
> > drivers/pci/quirk.c to whitelist them (or it's abuse?). That way we
can
> > avoid situations like "it hangs, it does not get interrupts",
"disable
> > ACPI, oh no, MSI".
> 
> 
> Five points here:
> 
> 1) If we did that with ACPI, you guys would have only recieved a
> _fraction_ of the feedback you received.  IMO we want to turn on MSI
> (where supported), and see what breaks.  It _should_ work, otherwise
the
> hardware guys wouldn't have put MSI on their PCI device :)
> 
> You'll never get feedback and testing if it's turned off by default.
> 
> 2) MSI is more optimal than standard (should I start calling them
> legacy?) x86 interrupts.  And I think they're just plain cool.  So of
> course I will push to default MSI to on!  ;-)
> 
> 3) I think this view is colored by "right now".  The current MSI
errata
> may be worrying you, but...   MSI is the future.  If you choose to
> whitelist, then you're creating a maintenance nightmare for the
future.
>   You would have to qualify _every_ MSI device!  Think how much it
would
> suck if we have to do that with PCI devices today.
> 
> Furthermore, a whitelist unfairly punishes working MSI hardware and
> perhaps unfairly highlights a few key vendors at the start ;-)  This
is
> why I like blacklists.
> 
> Broken hardware is a special case, and not something we should invest
a
> whole lot of time worrying about.  _Assume_ the hardware is working,
> then deal with the cases where it isn't.  _That_ is the Linus Torvalds
> model of an optimal system (IMO :))
> 
> 4) I have a real-life example:  tg3.  The BroadCom 57xx chips are
> MSI-brain-damaged.  So we unconditionally program the hardware in
> non-MSI mode.  No special APIs needed at all.
> 
> 5) Another option is to enable MSI only for devices which call
> request_msi().  This idea follows the current model of
> pci_enable_device():  PCI resources and interrupts are guaranteed to
be
> assigned and set up only after a successful call to
pci_enable_device().
>   Then, later on, the driver will call request_irq(), which will
unmask
> the irq (if it's not already shared).  Continuing this model, a
driver's
> call to request_msi() would signal that MSI is to be enabled for that
> device....  and ensure that the PCI core does not unconditionally
enable
> MSI for any device outside of request_msi() call.
> 
> 	Jeff
> 
> 


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: MSI fix for buggy PCI/PCI-X hardware
  2003-09-09 15:39 long
@ 2003-09-09 18:41 ` Jeff Garzik
  0 siblings, 0 replies; 6+ messages in thread
From: Jeff Garzik @ 2003-09-09 18:41 UTC (permalink / raw)
  To: long; +Cc: linux-kernel, greg, jun.nakajima, tom.l.nguyen, zwane

On Tue, Sep 09, 2003 at 08:39:37AM -0700, long wrote:
> The proposed solution is to provide a new API, named "int 
> disable_msi(struct pci_dev *dev)", to allow IHV's who have 
> shipped PCI/PCI-X hardware that does not work in MSI mode to update 
> their software drivers to request the kernel to switch the 
> interrupt mode from MSI mode back to IRQ pin-assertion mode. 

No need for a new API.  We have drivers/pci/quirk.c where we add PCI
devices with known bugs.  If there is commonality of the bugs across
hardware, we can easily add a common "quirk" which fixes the situation.

Individual drivers shouldn't need to bother with this, really.

As an aside, if you need to blacklist certain _systems_ rather than
certain PCI devices, you should either modify drivers/pci/quirks.c or
arch/i386/kernel/dmi_scan.c.

	Jeff




^ permalink raw reply	[flat|nested] 6+ messages in thread

* MSI fix for buggy PCI/PCI-X hardware
@ 2003-09-09 15:39 long
  2003-09-09 18:41 ` Jeff Garzik
  0 siblings, 1 reply; 6+ messages in thread
From: long @ 2003-09-09 15:39 UTC (permalink / raw)
  To: linux-kernel; +Cc: greg, jgarzik, jun.nakajima, tom.l.nguyen, zwane

The latest patches that enable MSI in the Linux kernel were 
sent out a few weeks ago on lkml.  These patches will enable 
MSI support if users rebuild the kernel with CONFIG_PCI_MSI 
set to 'Y'. Option one (default option) enables MSI on 
specific individual PCI/PCI-X MSI-capable devices listed in 
the boot parameter "device_msi=". Users may choose option 
two, which enables MSI on all PCI/PCI-X MSI-capable devices, 
by setting CONFIG_PCI_MSI_ON_SPECIFIC_DEVICES to 'N'. 
Depending on if either option one or option two is selected, 
the kernel detects and enables MSI on requested PCI/PCI-X 
MSI-capable devices. 
 
During testing we have found many currently shipping 
PCI/PCI-X MSI-capable devices have silicon bugs specific to 
MSI support. Most of these will cause system failures when 
MSI is enabled. To filter out MSI buggy hardware requires the 
kernel to detect and disable the MSI support on specific hardware. 
 
The proposed solution is to provide a new API, named "int 
disable_msi(struct pci_dev *dev)", to allow IHV's who have 
shipped PCI/PCI-X hardware that does not work in MSI mode to update 
their software drivers to request the kernel to switch the 
interrupt mode from MSI mode back to IRQ pin-assertion mode. 
There are some reasons to justify this as below:

1) According to the PCI spec 3.0 or latest, the software 
driver is prohibited from directly interfacing with its 
device to switch MSI mode to IRQ pin-assertion mode. 
Consequently, a kernel interface is required since the kernel 
is the only entity permitted to change modes.
 
2) All PCI-Express endpoints require MSI support. These 
endpoints should have MSI enabled automatically by default. 
For PCI/PCI-X hardware that can't use MSI due to hardware bugs,
it is the IHVs' responsibility to update their software drivers to 
request the kernel to disable MSI so the HW can function with MSI 
enabled for all other devices.
 
3) Some PCI/PCI-X MSI-capable hardware devices out there work with 
this patch without requiring any changes to their existing software 
drivers.

I'd appreciate any feedback you have on this proposed solution. 

Thanks,
Long 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2003-09-10 21:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-09-09 22:14 MSI fix for buggy PCI/PCI-X hardware Nakajima, Jun
2003-09-09 22:52 ` Jeff Garzik
2003-09-10 21:31   ` Zwane Mwaikambo
  -- strict thread matches above, loose matches on Subject: below --
2003-09-10  1:26 Nakajima, Jun
2003-09-09 15:39 long
2003-09-09 18:41 ` Jeff Garzik

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).