All of lore.kernel.org
 help / color / mirror / Atom feed
* Support for multiple MSI
@ 2009-02-23 17:27 Matthew Wilcox
  2009-02-23 17:27 ` [PATCH 1/6] Rewrite MSI-HOWTO Matthew Wilcox
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-23 17:27 UTC (permalink / raw)
  To: linux-pci, jbarnes, linux-kernel


Currently, Linux supports multiple MSI-X interrupts per device, but only
a single MSI interrupt.  This patch series adds support to the generic
PCI code for supporting multiple MSI interrupts.  Architectures will
need to add support for multiple MSIs, and I have a patch to do that for
x86 (which needs some more work).  Getting this patch series in first
is important so we can start supporting this interface in drivers and
architectures independently.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
@ 2009-02-23 17:27 ` Matthew Wilcox
  2009-02-24 20:00   ` Randy Dunlap
  2009-02-27  6:15   ` Grant Grundler
  2009-02-23 17:27 ` [PATCH 2/6] PCI MSI: Replace 'type' with 'is_msix' Matthew Wilcox
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-23 17:27 UTC (permalink / raw)
  To: linux-pci, jbarnes, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

I didn't find the previous version very useful, so I rewrote it.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 Documentation/PCI/MSI-HOWTO.txt |  756 ++++++++++++++-------------------------
 1 files changed, 275 insertions(+), 481 deletions(-)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index 256defd..f1b9f09 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -4,506 +4,300 @@
 	Revised Feb 12, 2004 by Martine Silbermann
 		email: Martine.Silbermann@hp.com
 	Revised Jun 25, 2004 by Tom L Nguyen
+	Revised Jul  9, 2008 by Matthew Wilcox <willy@linux.intel.com>
+		Copyright 2003, 2008 Intel Corporation
 
 1. About this guide
 
-This guide describes the basics of Message Signaled Interrupts (MSI),
-the advantages of using MSI over traditional interrupt mechanisms,
-and how to enable your driver to use MSI or MSI-X. Also included is
-a Frequently Asked Questions (FAQ) section.
-
-1.1 Terminology
-
-PCI devices can be single-function or multi-function.  In either case,
-when this text talks about enabling or disabling MSI on a "device
-function," it is referring to one specific PCI device and function and
-not to all functions on a PCI device (unless the PCI device has only
-one function).
-
-2. Copyright 2003 Intel Corporation
-
-3. What is MSI/MSI-X?
-
-Message Signaled Interrupt (MSI), as described in the PCI Local Bus
-Specification Revision 2.3 or later, is an optional feature, and a
-required feature for PCI Express devices. MSI enables a device function
-to request service by sending an Inbound Memory Write on its PCI bus to
-the FSB as a Message Signal Interrupt transaction. Because MSI is
-generated in the form of a Memory Write, all transaction conditions,
-such as a Retry, Master-Abort, Target-Abort or normal completion, are
-supported.
-
-A PCI device that supports MSI must also support pin IRQ assertion
-interrupt mechanism to provide backward compatibility for systems that
-do not support MSI. In systems which support MSI, the bus driver is
-responsible for initializing the message address and message data of
-the device function's MSI/MSI-X capability structure during device
-initial configuration.
-
-An MSI capable device function indicates MSI support by implementing
-the MSI/MSI-X capability structure in its PCI capability list. The
-device function may implement both the MSI capability structure and
-the MSI-X capability structure; however, the bus driver should not
-enable both.
-
-The MSI capability structure contains Message Control register,
-Message Address register and Message Data register. These registers
-provide the bus driver control over MSI. The Message Control register
-indicates the MSI capability supported by the device. The Message
-Address register specifies the target address and the Message Data
-register specifies the characteristics of the message. To request
-service, the device function writes the content of the Message Data
-register to the target address. The device and its software driver
-are prohibited from writing to these registers.
-
-The MSI-X capability structure is an optional extension to MSI. It
-uses an independent and separate capability structure. There are
-some key advantages to implementing the MSI-X capability structure
-over the MSI capability structure as described below.
-
-	- Support a larger maximum number of vectors per function.
-
-	- Provide the ability for system software to configure
-	each vector with an independent message address and message
-	data, specified by a table that resides in Memory Space.
-
-        - MSI and MSI-X both support per-vector masking. Per-vector
-	masking is an optional extension of MSI but a required
-	feature for MSI-X. Per-vector masking provides the kernel the
-	ability to mask/unmask a single MSI while running its
-	interrupt service routine. If per-vector masking is
-	not supported, then the device driver should provide the
-	hardware/software synchronization to ensure that the device
-	generates MSI when the driver wants it to do so.
-
-4. Why use MSI?
-
-As a benefit to the simplification of board design, MSI allows board
-designers to remove out-of-band interrupt routing. MSI is another
-step towards a legacy-free environment.
-
-Due to increasing pressure on chipset and processor packages to
-reduce pin count, the need for interrupt pins is expected to
-diminish over time. Devices, due to pin constraints, may implement
-messages to increase performance.
-
-PCI Express endpoints uses INTx emulation (in-band messages) instead
-of IRQ pin assertion. Using INTx emulation requires interrupt
-sharing among devices connected to the same node (PCI bridge) while
-MSI is unique (non-shared) and does not require BIOS configuration
-support. As a result, the PCI Express technology requires MSI
-support for better interrupt performance.
-
-Using MSI enables the device functions to support two or more
-vectors, which can be configured to target different CPUs to
-increase scalability.
-
-5. Configuring a driver to use MSI/MSI-X
-
-By default, the kernel will not enable MSI/MSI-X on all devices that
-support this capability. The CONFIG_PCI_MSI kernel option
-must be selected to enable MSI/MSI-X support.
-
-5.1 Including MSI/MSI-X support into the kernel
-
-To allow MSI/MSI-X capable device drivers to selectively enable
-MSI/MSI-X (using pci_enable_msi()/pci_enable_msix() as described
-below), the VECTOR based scheme needs to be enabled by setting
-CONFIG_PCI_MSI during kernel config.
-
-Since the target of the inbound message is the local APIC, providing
-CONFIG_X86_LOCAL_APIC must be enabled as well as CONFIG_PCI_MSI.
-
-5.2 Configuring for MSI support
-
-Due to the non-contiguous fashion in vector assignment of the
-existing Linux kernel, this version does not support multiple
-messages regardless of a device function is capable of supporting
-more than one vector. To enable MSI on a device function's MSI
-capability structure requires a device driver to call the function
-pci_enable_msi() explicitly.
-
-5.2.1 API pci_enable_msi
+This guide describes the basics of Message Signaled Interrupts (MSIs),
+the advantages of using MSI over traditional interrupt mechanisms, how
+to change your driver to use MSI or MSI-X and some basic diagnostics to
+try if a device doesn't support MSIs.
+
+
+2. What are MSIs?
+
+A Message Signalled Interrupt is a write from the device to a special
+address which causes an interrupt to be received by the CPU.
+
+The MSI capability was first specified in PCI 2.2 and was later enhanced
+in PCI 3.0 to allow each interrupt to be masked individually.  The MSI-X
+capability was also introduced with PCI 3.0.  It supports more interrupts
+per device than MSI and allows interrupts to be independently configured.
+
+Devices may support both MSI and MSI-X, but only one can be enabled at
+a time.
+
+
+3. Why use MSIs?
+
+There are three reasons why using MSIs can give an advantage over
+traditional pin-based interrupts.
+
+Pin-based PCI interrupts are often shared amongst several devices.
+To support this, the kernel must call each interrupt handler associated
+with an interrupt which leads to reduced performance for the system as
+a whole.  MSIs are never shared, so this problem cannot arise.
+
+When a device writes data to memory, then raises a pin-based interrupt,
+it is possible that the interrupt may arrive before all the data has
+arrived in memory (this becomes more likely with devices behind PCI-PCI
+bridges).  In order to ensure that all the data has arrived in memory,
+the interrupt handler must read a register on the device which raised
+the interrupt.  PCI transaction ordering rules require that all the data
+arrives in memory before the value can be returned from the register.
+Using MSIs avoids this problem as the interrupt-generating write cannot
+pass the data writes, so by the time the interrupt is raised, the driver
+knows that all the data has arrived in memory.
+
+PCI devices can only support a single pin-based interrupt per function.
+Often drivers have to query the device to find out what event has
+occurred, slowing down interrupt handling for the common case.  With
+MSIs, a device can support more interrupts, allowing each interrupt
+to be specialised to a different purpose.  One possible design gives
+infrequent conditions (such as errors) their own interrupt which allows
+the driver to handle the normal interrupt handling path more efficiently.
+Other possible designs include giving one interrupt to each packet queue
+in a network card or each port in a storage controller.
+
+
+4. How to use MSIs
+
+PCI devices are initialised to use pin-based interrupts.  The device
+driver has to set up the device to use MSI or MSI-X.  Not all machines
+support MSIs correctly, and for those machines, the APIs described below
+will simply fail and the device will continue to use pin-based interrupts.
+
+4.1 Include kernel support for MSIs
+
+To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
+option enabled.  This option is only available on some architectures,
+and it may depend on some other options also being set.  For example,
+on x86, you must also enable X86_UP_APIC or SMP in order to see the
+CONFIG_PCI_MSI option.
+
+4.2 Using MSI
+
+Most of the hard work is done for the driver in the PCI layer.  It simply
+has to request that the PCI layer set up the MSI capability for this
+device.
+
+4.2.1 pci_enable_msi
 
 int pci_enable_msi(struct pci_dev *dev)
 
-With this new API, a device driver that wants to have MSI
-enabled on its device function must call this API to enable MSI.
-A successful call will initialize the MSI capability structure
-with ONE vector, regardless of whether a device function is
-capable of supporting multiple messages. This vector replaces the
-pre-assigned dev->irq with a new MSI vector. To avoid a conflict
-of the new assigned vector with existing pre-assigned vector requires
-a device driver to call this API before calling request_irq().
+A successful call will allocate ONE interrupt to the device, regardless
+of how many MSIs the device supports.  The device will be switched from
+pin-based interrupt mode to MSI mode.  The dev->irq number is changed
+to a new number which represents the message signaled interrupt.
+This function should be called before the driver calls request_irq()
+since enabling MSIs disables the pin-based IRQ and the driver will not
+receive interrupts on the old interrupt.
 
-5.2.2 API pci_disable_msi
+4.2.2 pci_disable_msi
 
 void pci_disable_msi(struct pci_dev *dev)
 
-This API should always be used to undo the effect of pci_enable_msi()
-when a device driver is unloading. This API restores dev->irq with
-the pre-assigned IOAPIC vector and switches a device's interrupt
-mode to PCI pin-irq assertion/INTx emulation mode.
-
-Note that a device driver should always call free_irq() on the MSI vector
-that it has done request_irq() on before calling this API. Failure to do
-so results in a BUG_ON() and a device will be left with MSI enabled and
-leaks its vector.
-
-5.2.3 MSI mode vs. legacy mode diagram
-
-The below diagram shows the events which switch the interrupt
-mode on the MSI-capable device function between MSI mode and
-PIN-IRQ assertion mode.
-
-	 ------------   pci_enable_msi 	 ------------------------
-	|	     | <===============	| 			 |
-	| MSI MODE   |	  	     	| PIN-IRQ ASSERTION MODE |
-	| 	     | ===============>	|			 |
- 	 ------------	pci_disable_msi  ------------------------
-
-
-Figure 1. MSI Mode vs. Legacy Mode
-
-In Figure 1, a device operates by default in legacy mode. Legacy
-in this context means PCI pin-irq assertion or PCI-Express INTx
-emulation. A successful MSI request (using pci_enable_msi()) switches
-a device's interrupt mode to MSI mode. A pre-assigned IOAPIC vector
-stored in dev->irq will be saved by the PCI subsystem and a new
-assigned MSI vector will replace dev->irq.
-
-To return back to its default mode, a device driver should always call
-pci_disable_msi() to undo the effect of pci_enable_msi(). Note that a
-device driver should always call free_irq() on the MSI vector it has
-done request_irq() on before calling pci_disable_msi(). Failure to do
-so results in a BUG_ON() and a device will be left with MSI enabled and
-leaks its vector. Otherwise, the PCI subsystem restores a device's
-dev->irq with a pre-assigned IOAPIC vector and marks the released
-MSI vector as unused.
-
-Once being marked as unused, there is no guarantee that the PCI
-subsystem will reserve this MSI vector for a device. Depending on
-the availability of current PCI vector resources and the number of
-MSI/MSI-X requests from other drivers, this MSI may be re-assigned.
-
-For the case where the PCI subsystem re-assigns this MSI vector to
-another driver, a request to switch back to MSI mode may result
-in being assigned a different MSI vector or a failure if no more
-vectors are available.
-
-5.3 Configuring for MSI-X support
-
-Due to the ability of the system software to configure each vector of
-the MSI-X capability structure with an independent message address
-and message data, the non-contiguous fashion in vector assignment of
-the existing Linux kernel has no impact on supporting multiple
-messages on an MSI-X capable device functions. To enable MSI-X on
-a device function's MSI-X capability structure requires its device
-driver to call the function pci_enable_msix() explicitly.
-
-The function pci_enable_msix(), once invoked, enables either
-all or nothing, depending on the current availability of PCI vector
-resources. If the PCI vector resources are available for the number
-of vectors requested by a device driver, this function will configure
-the MSI-X table of the MSI-X capability structure of a device with
-requested messages. To emphasize this reason, for example, a device
-may be capable for supporting the maximum of 32 vectors while its
-software driver usually may request 4 vectors. It is recommended
-that the device driver should call this function once during the
-initialization phase of the device driver.
-
-Unlike the function pci_enable_msi(), the function pci_enable_msix()
-does not replace the pre-assigned IOAPIC dev->irq with a new MSI
-vector because the PCI subsystem writes the 1:1 vector-to-entry mapping
-into the field vector of each element contained in a second argument.
-Note that the pre-assigned IOAPIC dev->irq is valid only if the device
-operates in PIN-IRQ assertion mode. In MSI-X mode, any attempt at
-using dev->irq by the device driver to request for interrupt service
-may result in unpredictable behavior.
-
-For each MSI-X vector granted, a device driver is responsible for calling
-other functions like request_irq(), enable_irq(), etc. to enable
-this vector with its corresponding interrupt service handler. It is
-a device driver's choice to assign all vectors with the same
-interrupt service handler or each vector with a unique interrupt
-service handler.
-
-5.3.1 Handling MMIO address space of MSI-X Table
-
-The PCI 3.0 specification has implementation notes that MMIO address
-space for a device's MSI-X structure should be isolated so that the
-software system can set different pages for controlling accesses to the
-MSI-X structure. The implementation of MSI support requires the PCI
-subsystem, not a device driver, to maintain full control of the MSI-X
-table/MSI-X PBA (Pending Bit Array) and MMIO address space of the MSI-X
-table/MSI-X PBA.  A device driver should not access the MMIO address
-space of the MSI-X table/MSI-X PBA.
-
-5.3.2 API pci_enable_msix
-
-int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
+This function should be used to undo the effect of pci_enable_msi().
+Calling it restores dev->irq to the pin-based interrupt number and frees
+the previously allocated message signaled interrupt(s).  The interrupt
+may subsequently be assigned to another device, so drivers should not
+cache the value of dev->irq.
 
-This API enables a device driver to request the PCI subsystem
-to enable MSI-X messages on its hardware device. Depending on
-the availability of PCI vectors resources, the PCI subsystem enables
-either all or none of the requested vectors.
+A device driver must always call free_irq() on the interrupt(s)
+for which it has called request_irq() before calling this function.
+Failure to do so will result in a BUG_ON(), the device will be left with
+MSI enabled and will leak its vector.
 
-Argument 'dev' points to the device (pci_dev) structure.
+4.3 Using MSI-X
 
-Argument 'entries' is a pointer to an array of msix_entry structs.
-The number of entries is indicated in argument 'nvec'.
-struct msix_entry is defined in /driver/pci/msi.h:
+The MSI-X capability is much more flexible than the MSI capability.
+It supports up to 2048 interrupts, each of which can be separately
+assigned.  To support this flexibility, drivers must use an array of
+`struct msix_entry':
 
 struct msix_entry {
 	u16 	vector; /* kernel uses to write alloc vector */
 	u16	entry; /* driver uses to specify entry */
 };
 
-A device driver is responsible for initializing the field 'entry' of
-each element with a unique entry supported by MSI-X table. Otherwise,
--EINVAL will be returned as a result. A successful return of zero
-indicates the PCI subsystem completed initializing each of the requested
-entries of the MSI-X table with message address and message data.
-Last but not least, the PCI subsystem will write the 1:1
-vector-to-entry mapping into the field 'vector' of each element. A
-device driver is responsible for keeping track of allocated MSI-X
-vectors in its internal data structure.
-
-A return of zero indicates that the number of MSI-X vectors was
-successfully allocated. A return of greater than zero indicates
-MSI-X vector shortage. Or a return of less than zero indicates
-a failure. This failure may be a result of duplicate entries
-specified in second argument, or a result of no available vector,
-or a result of failing to initialize MSI-X table entries.
-
-5.3.3 API pci_disable_msix
+This allows for the device to use these interrupts in a sparse fashion;
+for example it could use interrupts 3 and 1027 and allocate only a
+two-element array.  The driver is expected to fill in the 'entry' value
+in each element of the array to indicate which entries it wants the kernel
+to assign interrupts for.  It is invalid to fill in two entries with the
+same number.
+
+4.3.1 pci_enable_msix
+
+int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
+
+Calling this function asks the PCI subsystem to allocate 'nvec' MSIs.
+The 'entries' argument is a pointer to an array of msix_entry structs
+which should be at least 'nvec' entries in size.  On success, the
+function will return 0 and the device will have been switched into
+MSI-X interrupt mode.  The 'vector' elements in each entry will have
+been filled in with the interrupt number.  The driver should then call
+request_irq() for each 'vector' that it decides to use.
+
+If this function returns a negative number, it indicates an error and
+the driver should not attempt to allocate any more MSI-X interrupts for
+this device.  If it returns a positive number, it indicates the maximum
+number of interrupt vectors that could have been allocated.
+
+This function, in contrast with pci_enable_msi(), does not adjust
+dev->irq.  The device will not generate interrupts for this interrupt
+number once MSI-X is enabled.  The device driver is responsible for
+keeping track of the interrupts assigned to the MSI-X vectors so it can
+free them again later.
+
+Device drivers should normally call this function once per device
+during the initialization phase.
+
+4.3.2 pci_disable_msix
 
 void pci_disable_msix(struct pci_dev *dev)
 
-This API should always be used to undo the effect of pci_enable_msix()
-when a device driver is unloading. Note that a device driver should
-always call free_irq() on all MSI-X vectors it has done request_irq()
-on before calling this API. Failure to do so results in a BUG_ON() and
-a device will be left with MSI-X enabled and leaks its vectors.
-
-5.3.4 MSI-X mode vs. legacy mode diagram
-
-The below diagram shows the events which switch the interrupt
-mode on the MSI-X capable device function between MSI-X mode and
-PIN-IRQ assertion mode (legacy).
-
-	 ------------   pci_enable_msix(,,n) ------------------------
-	|	     | <===============	    | 			     |
-	| MSI-X MODE |	  	     	    | PIN-IRQ ASSERTION MODE |
-	| 	     | ===============>	    |			     |
- 	 ------------	pci_disable_msix     ------------------------
-
-Figure 2. MSI-X Mode vs. Legacy Mode
-
-In Figure 2, a device operates by default in legacy mode. A
-successful MSI-X request (using pci_enable_msix()) switches a
-device's interrupt mode to MSI-X mode. A pre-assigned IOAPIC vector
-stored in dev->irq will be saved by the PCI subsystem; however,
-unlike MSI mode, the PCI subsystem will not replace dev->irq with
-assigned MSI-X vector because the PCI subsystem already writes the 1:1
-vector-to-entry mapping into the field 'vector' of each element
-specified in second argument.
-
-To return back to its default mode, a device driver should always call
-pci_disable_msix() to undo the effect of pci_enable_msix(). Note that
-a device driver should always call free_irq() on all MSI-X vectors it
-has done request_irq() on before calling pci_disable_msix(). Failure
-to do so results in a BUG_ON() and a device will be left with MSI-X
-enabled and leaks its vectors. Otherwise, the PCI subsystem switches a
-device function's interrupt mode from MSI-X mode to legacy mode and
-marks all allocated MSI-X vectors as unused.
-
-Once being marked as unused, there is no guarantee that the PCI
-subsystem will reserve these MSI-X vectors for a device. Depending on
-the availability of current PCI vector resources and the number of
-MSI/MSI-X requests from other drivers, these MSI-X vectors may be
-re-assigned.
-
-For the case where the PCI subsystem re-assigned these MSI-X vectors
-to other drivers, a request to switch back to MSI-X mode may result
-being assigned with another set of MSI-X vectors or a failure if no
-more vectors are available.
-
-5.4 Handling function implementing both MSI and MSI-X capabilities
-
-For the case where a function implements both MSI and MSI-X
-capabilities, the PCI subsystem enables a device to run either in MSI
-mode or MSI-X mode but not both. A device driver determines whether it
-wants MSI or MSI-X enabled on its hardware device. Once a device
-driver requests for MSI, for example, it is prohibited from requesting
-MSI-X; in other words, a device driver is not permitted to ping-pong
-between MSI mod MSI-X mode during a run-time.
-
-5.5 Hardware requirements for MSI/MSI-X support
-
-MSI/MSI-X support requires support from both system hardware and
-individual hardware device functions.
-
-5.5.1 Required x86 hardware support
-
-Since the target of MSI address is the local APIC CPU, enabling
-MSI/MSI-X support in the Linux kernel is dependent on whether existing
-system hardware supports local APIC. Users should verify that their
-system supports local APIC operation by testing that it runs when
-CONFIG_X86_LOCAL_APIC=y.
-
-In SMP environment, CONFIG_X86_LOCAL_APIC is automatically set;
-however, in UP environment, users must manually set
-CONFIG_X86_LOCAL_APIC. Once CONFIG_X86_LOCAL_APIC=y, setting
-CONFIG_PCI_MSI enables the VECTOR based scheme and the option for
-MSI-capable device drivers to selectively enable MSI/MSI-X.
-
-Note that CONFIG_X86_IO_APIC setting is irrelevant because MSI/MSI-X
-vector is allocated new during runtime and MSI/MSI-X support does not
-depend on BIOS support. This key independency enables MSI/MSI-X
-support on future IOxAPIC free platforms.
-
-5.5.2 Device hardware support
-
-The hardware device function supports MSI by indicating the
-MSI/MSI-X capability structure on its PCI capability list. By
-default, this capability structure will not be initialized by
-the kernel to enable MSI during the system boot. In other words,
-the device function is running on its default pin assertion mode.
-Note that in many cases the hardware supporting MSI have bugs,
-which may result in system hangs. The software driver of specific
-MSI-capable hardware is responsible for deciding whether to call
-pci_enable_msi or not. A return of zero indicates the kernel
-successfully initialized the MSI/MSI-X capability structure of the
-device function. The device function is now running on MSI/MSI-X mode.
-
-5.6 How to tell whether MSI/MSI-X is enabled on device function
-
-At the driver level, a return of zero from the function call of
-pci_enable_msi()/pci_enable_msix() indicates to a device driver that
-its device function is initialized successfully and ready to run in
-MSI/MSI-X mode.
-
-At the user level, users can use the command 'cat /proc/interrupts'
-to display the vectors allocated for devices and their interrupt
-MSI/MSI-X modes ("PCI-MSI"/"PCI-MSI-X"). Below shows MSI mode is
-enabled on a SCSI Adaptec 39320D Ultra320 controller.
-
-           CPU0       CPU1
-  0:     324639          0    IO-APIC-edge  timer
-  1:       1186          0    IO-APIC-edge  i8042
-  2:          0          0          XT-PIC  cascade
- 12:       2797          0    IO-APIC-edge  i8042
- 14:       6543          0    IO-APIC-edge  ide0
- 15:          1          0    IO-APIC-edge  ide1
-169:          0          0   IO-APIC-level  uhci-hcd
-185:          0          0   IO-APIC-level  uhci-hcd
-193:        138         10         PCI-MSI  aic79xx
-201:         30          0         PCI-MSI  aic79xx
-225:         30          0   IO-APIC-level  aic7xxx
-233:         30          0   IO-APIC-level  aic7xxx
-NMI:          0          0
-LOC:     324553     325068
-ERR:          0
-MIS:          0
-
-6. MSI quirks
-
-Several PCI chipsets or devices are known to not support MSI.
-The PCI stack provides 3 possible levels of MSI disabling:
-* on a single device
-* on all devices behind a specific bridge
-* globally
-
-6.1. Disabling MSI on a single device
-
-Under some circumstances it might be required to disable MSI on a
-single device.  This may be achieved by either not calling pci_enable_msi()
-or all, or setting the pci_dev->no_msi flag before (most of the time
-in a quirk).
-
-6.2. Disabling MSI below a bridge
-
-The vast majority of MSI quirks are required by PCI bridges not
-being able to route MSI between busses. In this case, MSI have to be
-disabled on all devices behind this bridge. It is achieves by setting
-the PCI_BUS_FLAGS_NO_MSI flag in the pci_bus->bus_flags of the bridge
-subordinate bus. There is no need to set the same flag on bridges that
-are below the broken bridge. When pci_enable_msi() is called to enable
-MSI on a device, pci_msi_supported() takes care of checking the NO_MSI
-flag in all parent busses of the device.
-
-Some bridges actually support dynamic MSI support enabling/disabling
-by changing some bits in their PCI configuration space (especially
-the Hypertransport chipsets such as the nVidia nForce and Serverworks
-HT2000). It may then be required to update the NO_MSI flag on the
-corresponding devices in the sysfs hierarchy. To enable MSI support
-on device "0000:00:0e", do:
-
-	echo 1 > /sys/bus/pci/devices/0000:00:0e/msi_bus
-
-To disable MSI support, echo 0 instead of 1. Note that it should be
-used with caution since changing this value might break interrupts.
-
-6.3. Disabling MSI globally
-
-Some extreme cases may require to disable MSI globally on the system.
-For now, the only known case is a Serverworks PCI-X chipsets (MSI are
-not supported on several busses that are not all connected to the
-chipset in the Linux PCI hierarchy). In the vast majority of other
-cases, disabling only behind a specific bridge is enough.
-
-For debugging purpose, the user may also pass pci=nomsi on the kernel
-command-line to explicitly disable MSI globally. But, once the appro-
-priate quirks are added to the kernel, this option should not be
-required anymore.
-
-6.4. Finding why MSI cannot be enabled on a device
-
-Assuming that MSI are not enabled on a device, you should look at
-dmesg to find messages that quirks may output when disabling MSI
-on some devices, some bridges or even globally.
-Then, lspci -t gives the list of bridges above a device. Reading
-/sys/bus/pci/devices/0000:00:0e/msi_bus will tell you whether MSI
-are enabled (1) or disabled (0). In 0 is found in a single bridge
-msi_bus file above the device, MSI cannot be enabled.
-
-7. FAQ
-
-Q1. Are there any limitations on using the MSI?
-
-A1. If the PCI device supports MSI and conforms to the
-specification and the platform supports the APIC local bus,
-then using MSI should work.
-
-Q2. Will it work on all the Pentium processors (P3, P4, Xeon,
-AMD processors)? In P3 IPI's are transmitted on the APIC local
-bus and in P4 and Xeon they are transmitted on the system
-bus. Are there any implications with this?
-
-A2. MSI support enables a PCI device sending an inbound
-memory write (0xfeexxxxx as target address) on its PCI bus
-directly to the FSB. Since the message address has a
-redirection hint bit cleared, it should work.
-
-Q3. The target address 0xfeexxxxx will be translated by the
-Host Bridge into an interrupt message. Are there any
-limitations on the chipsets such as Intel 8xx, Intel e7xxx,
-or VIA?
-
-A3. If these chipsets support an inbound memory write with
-target address set as 0xfeexxxxx, as conformed to PCI
-specification 2.3 or latest, then it should work.
-
-Q4. From the driver point of view, if the MSI is lost because
-of errors occurring during inbound memory write, then it may
-wait forever. Is there a mechanism for it to recover?
-
-A4. Since the target of the transaction is an inbound memory
-write, all transaction termination conditions (Retry,
-Master-Abort, Target-Abort, or normal completion) are
-supported. A device sending an MSI must abide by all the PCI
-rules and conditions regarding that inbound memory write. So,
-if a retry is signaled it must retry, etc... We believe that
-the recommendation for Abort is also a retry (refer to PCI
-specification 2.3 or latest).
+This API should be used to undo the effect of pci_enable_msix().  It frees
+the previously allocated message signaled interrupts.  The interrupts may
+subsequently be assigned to another device, so drivers should not cache
+the value of the 'vector' elements over a call to pci_disable_msix().
+
+A device driver must always call free_irq() on the interrupt(s)
+for which it has called request_irq() before calling this function.
+Failure to do so will result in a BUG_ON(), the device will be left with
+MSI enabled and will leak its vector.
+
+4.3.3 The MSI-X Table
+
+The MSI-X capability specifies a BAR and offset within that BAR for the
+MSI-X Table.  This address is mapped by the PCI subsystem, and should not
+be accessed directly by the device driver.  If the driver wishes to
+mask or unmask an interrupt, it should call disable_irq() / enable_irq().
+
+4.4 Handling devices implementing both MSI and MSI-X capabilities
+
+If a device implements both MSI and MSI-X capabilities, it can
+run in either MSI mode or MSI-X mode but not both simultaneously.
+This is a requirement of the PCI spec, and it is enforced by the
+PCI layer.  Calling pci_enable_msi() when MSI-X is already enabled or
+pci_enable_msix() when MSI is already enabled will result in an error.
+If a device driver wishes to switch between MSI and MSI-X at runtime,
+it must first quiesce the device, then switch it back to pin-interrupt
+mode, before calling pci_enable_msi() or pci_enable_msix() and resuming
+operation.  This is not expected to be a common operation but may be
+useful for debugging or testing during development.
+
+4.5 Considerations when using MSIs
+
+4.5.1 Choosing between MSI-X and MSI
+
+If your device supports both MSI-X and MSI capabilities, you should use
+the MSI-X facilities in preference to the MSI facilities.  As mentioned
+above, MSI-X supports any number of interrupts between 1 and 2048.
+In constrast, MSI is restricted to a maximum of 32 interrupts (and
+must be a power of two).  In addition, the MSI interrupt vectors must
+be allocated consecutively, so the system may not be able to allocate
+as many vectors for MSI as it could for MSI-X.  On some platforms, MSI
+interrupts must all be targetted at the same set of CPUs whereas MSI-X
+interrupts can all be targetted at different CPUs.
+
+4.5.2 Spinlocks
+
+Most device drivers have a per-device spinlock which is taken in the
+interrupt handler.  With pin-based interrupts or a single MSI, it is not
+necessary to disable interrupts (Linux guarantees the same interrupt will
+not be re-entered).  If a device uses multiple interrupts, the driver
+must disable interrupts while the lock is held.  If the device sends
+a different interrupt, the driver will deadlock trying to recursively
+acquire the spinlock.
+
+There are two solutions.  The first is to take the
+lock with spin_lock_irqsave() or spin_lock_irq() (see
+Documentation/DocBook/kernel-locking).  The second is to specify
+IRQF_DISABLED to request_irq() so that the kernel runs the entire
+interrupt routine with interrupts disabled.
+
+If your MSI interrupt routine does not hold the lock for the whole time
+it is running, the first solution may be best.  The second solution is
+normally preferred as it avoids making two transitions from interrupt
+disabled to enabled and back again.
+
+4.6 How to tell whether MSI/MSI-X is enabled on a device
+
+Using lspci -v (as root) will show some devices with "Message Signalled
+Interrupts" and others with "MSI-X".  Each of these capabilities have an
+'Enable' flag which will be followed with either "+" (enabled) or "-"
+(disabled).
+
+
+5. MSI quirks
+
+Several PCI chipsets or devices are known not to support MSIs.
+The PCI stack provides three ways to disable MSIs:
+
+1. globally
+2. on all devices behind a specific bridge
+3. on a single device
+
+5.1. Disabling MSIs globally
+
+Some host chipsets simply don't support MSIs properly.  If we're
+lucky, the manufacturer knows this and has indicated it in the ACPI
+FADT table.  In this case, Linux will automatically disable MSIs.
+Some boards don't include this information in the table and so we have
+to detect them ourselves.  The complete list of these is found near the
+quirk_disable_all_msi() function in drivers/pci/quirks.c.
+
+If you have a board which has problems with MSIs, you can pass pci=nomsi
+on the kernel command line to disable MSIs on all devices.  It would be
+in your best interests to report the problem to linux-pci@vger.kernel.org
+including a full lspci -v so we can add the quirks to the kernel.
+
+5.2. Disabling MSIs below a bridge
+
+Some PCI bridges are not able to route MSIs between busses properly.
+In this case, MSIs must be disabled on all devices behind the bridge.
+
+Some bridges allow you to enable MSIs by changing some bits in their
+PCI configuration space (especially the Hypertransport chipsets such
+as the nVidia nForce and Serverworks HT2000).  As with host chipsets,
+Linux mostly knows about them and automatically enables MSIs if it can.
+If you have a bridge which Linux doesn't yet know about, you can enable
+MSIs in configuration space using whatever method you know works, then
+enable MSIs on that bridge by doing:
+
+       echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
+
+where $bridge is the PCI address of the bridge you've enabled (eg
+0000:00:0e.0).
+
+To disable MSIs, echo 0 instead of 1.  Changing this value should be
+done with caution as it can break interrupt handling for all devices
+below this bridge.
+
+Again, please notify linux-pci@vger.kernel.org of any bridges that need
+special handling.
+
+5.3. Disabling MSIs on a single device
+
+Some devices are known to have faulty MSI implementations.  Usually this
+is handled in the individual device driver but occasionally it's necessary
+to handle this with a quirk.  Some drivers have an option to disable MSIs;
+this is deprecated.
+
+5.4. Finding why MSIs are disabled on a device
+
+From the above three sections, you can see that there are many reasons
+why MSIs may not be enabled for a given device.  Your first step should
+be to examine your dmesg carefully to determine whether MSIs are enabled
+for your machine.  You should also check your .config to be sure you
+have enabled CONFIG_PCI_MSI.
+
+Then, lspci -t gives the list of bridges above a device.  Reading
+/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
+or disabled (0).  If 0 is found in any of the msi_bus files belonging
+to bridges between the PCI root and the device, MSIs are disabled.
+
+It is also worth checking whether the device driver supports MSIs.
+
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/6] PCI MSI: Replace 'type' with 'is_msix'
  2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
  2009-02-23 17:27 ` [PATCH 1/6] Rewrite MSI-HOWTO Matthew Wilcox
@ 2009-02-23 17:27 ` Matthew Wilcox
  2009-03-03  0:16   ` Michael Ellerman
  2009-02-23 17:27 ` [PATCH 3/6] PCI MSI: msi_desc->dev is always initialised Matthew Wilcox
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-23 17:27 UTC (permalink / raw)
  To: linux-pci, jbarnes, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

By changing from a 5-bit field to a 1-bit field, we free up some bits
that can be used by a later patch.  Also rearrange the fields for better
packing on 64-bit platforms (reducing the size of msi_desc from 72 bytes
to 64 bytes).

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 drivers/pci/msi.c   |  115 +++++++++++++++++---------------------------------
 include/linux/msi.h |    4 +-
 2 files changed, 41 insertions(+), 78 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index dceea56..b3db438 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -111,20 +111,10 @@ static void msix_flush_writes(struct irq_desc *desc)
 
 	entry = get_irq_desc_msi(desc);
 	BUG_ON(!entry || !entry->dev);
-	switch (entry->msi_attrib.type) {
-	case PCI_CAP_ID_MSI:
-		/* nothing to do */
-		break;
-	case PCI_CAP_ID_MSIX:
-	{
+	if (entry->msi_attrib.is_msix) {
 		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
 			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
 		readl(entry->mask_base + offset);
-		break;
-	}
-	default:
-		BUG();
-		break;
 	}
 }
 
@@ -143,32 +133,23 @@ static int msi_set_mask_bits(struct irq_desc *desc, u32 mask, u32 flag)
 
 	entry = get_irq_desc_msi(desc);
 	BUG_ON(!entry || !entry->dev);
-	switch (entry->msi_attrib.type) {
-	case PCI_CAP_ID_MSI:
-		if (entry->msi_attrib.maskbit) {
-			int pos;
-			u32 mask_bits;
-
-			pos = (long)entry->mask_base;
-			pci_read_config_dword(entry->dev, pos, &mask_bits);
-			mask_bits &= ~(mask);
-			mask_bits |= flag & mask;
-			pci_write_config_dword(entry->dev, pos, mask_bits);
-		} else {
-			return 0;
-		}
-		break;
-	case PCI_CAP_ID_MSIX:
-	{
+	if (entry->msi_attrib.is_msix) {
 		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
 			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
 		writel(flag, entry->mask_base + offset);
 		readl(entry->mask_base + offset);
-		break;
-	}
-	default:
-		BUG();
-		break;
+	} else {
+		int pos;
+		u32 mask_bits;
+
+		if (!entry->msi_attrib.maskbit)
+			return 0;
+
+		pos = (long)entry->mask_base;
+		pci_read_config_dword(entry->dev, pos, &mask_bits);
+		mask_bits &= ~mask;
+		mask_bits |= flag & mask;
+		pci_write_config_dword(entry->dev, pos, mask_bits);
 	}
 	entry->msi_attrib.masked = !!flag;
 	return 1;
@@ -177,9 +158,14 @@ static int msi_set_mask_bits(struct irq_desc *desc, u32 mask, u32 flag)
 void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 {
 	struct msi_desc *entry = get_irq_desc_msi(desc);
-	switch(entry->msi_attrib.type) {
-	case PCI_CAP_ID_MSI:
-	{
+	if (entry->msi_attrib.is_msix) {
+		void __iomem *base = entry->mask_base +
+			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
+
+		msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+		msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+		msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
+	} else {
 		struct pci_dev *dev = entry->dev;
 		int pos = entry->msi_attrib.pos;
 		u16 data;
@@ -195,21 +181,6 @@ void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 			pci_read_config_word(dev, msi_data_reg(pos, 0), &data);
 		}
 		msg->data = data;
-		break;
-	}
-	case PCI_CAP_ID_MSIX:
-	{
-		void __iomem *base;
-		base = entry->mask_base +
-			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
-
-		msg->address_lo = readl(base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
-		msg->address_hi = readl(base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
-		msg->data = readl(base + PCI_MSIX_ENTRY_DATA_OFFSET);
- 		break;
- 	}
- 	default:
-		BUG();
 	}
 }
 
@@ -223,9 +194,17 @@ void read_msi_msg(unsigned int irq, struct msi_msg *msg)
 void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 {
 	struct msi_desc *entry = get_irq_desc_msi(desc);
-	switch (entry->msi_attrib.type) {
-	case PCI_CAP_ID_MSI:
-	{
+	if (entry->msi_attrib.is_msix) {
+		void __iomem *base;
+		base = entry->mask_base +
+			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
+
+		writel(msg->address_lo,
+			base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
+		writel(msg->address_hi,
+			base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
+		writel(msg->data, base + PCI_MSIX_ENTRY_DATA_OFFSET);
+	} else {
 		struct pci_dev *dev = entry->dev;
 		int pos = entry->msi_attrib.pos;
 
@@ -240,23 +219,6 @@ void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 			pci_write_config_word(dev, msi_data_reg(pos, 0),
 						msg->data);
 		}
-		break;
-	}
-	case PCI_CAP_ID_MSIX:
-	{
-		void __iomem *base;
-		base = entry->mask_base +
-			entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE;
-
-		writel(msg->address_lo,
-			base + PCI_MSIX_ENTRY_LOWER_ADDR_OFFSET);
-		writel(msg->address_hi,
-			base + PCI_MSIX_ENTRY_UPPER_ADDR_OFFSET);
-		writel(msg->data, base + PCI_MSIX_ENTRY_DATA_OFFSET);
-		break;
-	}
-	default:
-		BUG();
 	}
 	entry->msg = *msg;
 }
@@ -393,7 +355,7 @@ static int msi_capability_init(struct pci_dev *dev)
 	if (!entry)
 		return -ENOMEM;
 
-	entry->msi_attrib.type = PCI_CAP_ID_MSI;
+	entry->msi_attrib.is_msix = 0;
 	entry->msi_attrib.is_64 = is_64bit_address(control);
 	entry->msi_attrib.entry_nr = 0;
 	entry->msi_attrib.maskbit = is_mask_bit_support(control);
@@ -475,7 +437,7 @@ static int msix_capability_init(struct pci_dev *dev,
 			break;
 
  		j = entries[i].entry;
-		entry->msi_attrib.type = PCI_CAP_ID_MSIX;
+		entry->msi_attrib.is_msix = 1;
 		entry->msi_attrib.is_64 = 1;
 		entry->msi_attrib.entry_nr = j;
 		entry->msi_attrib.maskbit = 1;
@@ -619,12 +581,13 @@ void pci_msi_shutdown(struct pci_dev* dev)
 		struct irq_desc *desc = irq_to_desc(dev->irq);
 		msi_set_mask_bits(desc, mask, ~mask);
 	}
-	if (!entry->dev || entry->msi_attrib.type != PCI_CAP_ID_MSI)
+	if (!entry->dev || entry->msi_attrib.is_msix)
 		return;
 
 	/* Restore dev->irq to its default pin-assertion irq */
 	dev->irq = entry->msi_attrib.default_irq;
 }
+
 void pci_disable_msi(struct pci_dev* dev)
 {
 	struct msi_desc *entry;
@@ -635,7 +598,7 @@ void pci_disable_msi(struct pci_dev* dev)
 	pci_msi_shutdown(dev);
 
 	entry = list_entry(dev->msi_list.next, struct msi_desc, list);
-	if (!entry->dev || entry->msi_attrib.type != PCI_CAP_ID_MSI)
+	if (!entry->dev || entry->msi_attrib.is_msix)
 		return;
 
 	msi_free_irqs(dev);
@@ -654,7 +617,7 @@ static int msi_free_irqs(struct pci_dev* dev)
 	arch_teardown_msi_irqs(dev);
 
 	list_for_each_entry_safe(entry, tmp, &dev->msi_list, list) {
-		if (entry->msi_attrib.type == PCI_CAP_ID_MSIX) {
+		if (entry->msi_attrib.is_msix) {
 			writel(1, entry->mask_base + entry->msi_attrib.entry_nr
 				  * PCI_MSIX_ENTRY_SIZE
 				  + PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
diff --git a/include/linux/msi.h b/include/linux/msi.h
index d2b8a1e..9c5ce21 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -20,13 +20,13 @@ extern void write_msi_msg(unsigned int irq, struct msi_msg *msg);
 
 struct msi_desc {
 	struct {
-		__u8	type	: 5; 	/* {0: unused, 5h:MSI, 11h:MSI-X} */
+		__u8	is_msix	: 1;
 		__u8	maskbit	: 1; 	/* mask-pending bit supported ?   */
 		__u8	masked	: 1;
 		__u8	is_64	: 1;	/* Address size: 0=32bit 1=64bit  */
 		__u8	pos;	 	/* Location of the msi capability */
-		__u32	maskbits_mask;  /* mask bits mask */
 		__u16	entry_nr;    	/* specific enabled entry 	  */
+		__u32	maskbits_mask;  /* mask bits mask */
 		unsigned default_irq;	/* default pre-assigned irq	  */
 	}msi_attrib;
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/6] PCI MSI: msi_desc->dev is always initialised
  2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
  2009-02-23 17:27 ` [PATCH 1/6] Rewrite MSI-HOWTO Matthew Wilcox
  2009-02-23 17:27 ` [PATCH 2/6] PCI MSI: Replace 'type' with 'is_msix' Matthew Wilcox
@ 2009-02-23 17:27 ` Matthew Wilcox
  2009-02-23 17:28 ` [PATCH 4/6] PCI MSI: Use mask_pos instead of mask_base when appropriate Matthew Wilcox
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-23 17:27 UTC (permalink / raw)
  To: linux-pci, jbarnes, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

By passing the pci_dev into alloc_msi_entry() we can be sure that
the ->dev entry is always assigned and so we don't need to check it.
Also, we used kzalloc() so we don't need to initialise ->irq to 0.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 drivers/pci/msi.c |   29 ++++++++++++-----------------
 1 files changed, 12 insertions(+), 17 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index b3db438..a658c0f 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -110,7 +110,7 @@ static void msix_flush_writes(struct irq_desc *desc)
 	struct msi_desc *entry;
 
 	entry = get_irq_desc_msi(desc);
-	BUG_ON(!entry || !entry->dev);
+	BUG_ON(!entry);
 	if (entry->msi_attrib.is_msix) {
 		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
 			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
@@ -132,7 +132,7 @@ static int msi_set_mask_bits(struct irq_desc *desc, u32 mask, u32 flag)
 	struct msi_desc *entry;
 
 	entry = get_irq_desc_msi(desc);
-	BUG_ON(!entry || !entry->dev);
+	BUG_ON(!entry);
 	if (entry->msi_attrib.is_msix) {
 		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
 			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
@@ -248,19 +248,16 @@ void unmask_msi_irq(unsigned int irq)
 
 static int msi_free_irqs(struct pci_dev* dev);
 
-static struct msi_desc* alloc_msi_entry(void)
+static struct msi_desc *alloc_msi_entry(struct pci_dev *dev)
 {
-	struct msi_desc *entry;
-
-	entry = kzalloc(sizeof(struct msi_desc), GFP_KERNEL);
-	if (!entry)
+	struct msi_desc *desc = kzalloc(sizeof(*desc), GFP_KERNEL);
+	if (!desc)
 		return NULL;
 
-	INIT_LIST_HEAD(&entry->list);
-	entry->irq = 0;
-	entry->dev = NULL;
+	INIT_LIST_HEAD(&desc->list);
+	desc->dev = dev;
 
-	return entry;
+	return desc;
 }
 
 static void pci_intx_for_msi(struct pci_dev *dev, int enable)
@@ -351,7 +348,7 @@ static int msi_capability_init(struct pci_dev *dev)
    	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
 	pci_read_config_word(dev, msi_control_reg(pos), &control);
 	/* MSI Entry Initialization */
-	entry = alloc_msi_entry();
+	entry = alloc_msi_entry(dev);
 	if (!entry)
 		return -ENOMEM;
 
@@ -362,7 +359,6 @@ static int msi_capability_init(struct pci_dev *dev)
 	entry->msi_attrib.masked = 1;
 	entry->msi_attrib.default_irq = dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.pos = pos;
-	entry->dev = dev;
 	if (entry->msi_attrib.maskbit) {
 		unsigned int base, maskbits, temp;
 
@@ -432,7 +428,7 @@ static int msix_capability_init(struct pci_dev *dev,
 
 	/* MSI-X Table Initialization */
 	for (i = 0; i < nvec; i++) {
-		entry = alloc_msi_entry();
+		entry = alloc_msi_entry(dev);
 		if (!entry)
 			break;
 
@@ -444,7 +440,6 @@ static int msix_capability_init(struct pci_dev *dev,
 		entry->msi_attrib.masked = 1;
 		entry->msi_attrib.default_irq = dev->irq;
 		entry->msi_attrib.pos = pos;
-		entry->dev = dev;
 		entry->mask_base = base;
 
 		list_add_tail(&entry->list, &dev->msi_list);
@@ -581,7 +576,7 @@ void pci_msi_shutdown(struct pci_dev* dev)
 		struct irq_desc *desc = irq_to_desc(dev->irq);
 		msi_set_mask_bits(desc, mask, ~mask);
 	}
-	if (!entry->dev || entry->msi_attrib.is_msix)
+	if (entry->msi_attrib.is_msix)
 		return;
 
 	/* Restore dev->irq to its default pin-assertion irq */
@@ -598,7 +593,7 @@ void pci_disable_msi(struct pci_dev* dev)
 	pci_msi_shutdown(dev);
 
 	entry = list_entry(dev->msi_list.next, struct msi_desc, list);
-	if (!entry->dev || entry->msi_attrib.is_msix)
+	if (entry->msi_attrib.is_msix)
 		return;
 
 	msi_free_irqs(dev);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/6] PCI MSI: Use mask_pos instead of mask_base when appropriate
  2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
                   ` (2 preceding siblings ...)
  2009-02-23 17:27 ` [PATCH 3/6] PCI MSI: msi_desc->dev is always initialised Matthew Wilcox
@ 2009-02-23 17:28 ` Matthew Wilcox
  2009-02-23 17:28 ` [PATCH 5/6] PCI MSI: Refactor interrupt masking code Matthew Wilcox
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-23 17:28 UTC (permalink / raw)
  To: linux-pci, jbarnes, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

MSI interrupts have a mask_pos where MSI-X have a mask_base.  Use a
transparent union to get rid of some ugly casts.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 drivers/pci/msi.c   |    5 ++---
 include/linux/msi.h |    5 ++++-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index a658c0f..fcde04d 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -145,7 +145,7 @@ static int msi_set_mask_bits(struct irq_desc *desc, u32 mask, u32 flag)
 		if (!entry->msi_attrib.maskbit)
 			return 0;
 
-		pos = (long)entry->mask_base;
+		pos = entry->mask_pos;
 		pci_read_config_dword(entry->dev, pos, &mask_bits);
 		mask_bits &= ~mask;
 		mask_bits |= flag & mask;
@@ -363,8 +363,7 @@ static int msi_capability_init(struct pci_dev *dev)
 		unsigned int base, maskbits, temp;
 
 		base = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
-		entry->mask_base = (void __iomem *)(long)base;
-
+		entry->mask_pos = base;
 		/* All MSIs are unmasked by default, Mask them all */
 		pci_read_config_dword(dev, base, &maskbits);
 		temp = msi_mask((control & PCI_MSI_FLAGS_QMASK) >> 1);
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 9c5ce21..5025ca4 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -33,7 +33,10 @@ struct msi_desc {
 	unsigned int irq;
 	struct list_head list;
 
-	void __iomem *mask_base;
+	union {
+		void __iomem *mask_base;
+		u8 mask_pos;
+	};
 	struct pci_dev *dev;
 
 	/* Last set MSI message */
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/6] PCI MSI: Refactor interrupt masking code
  2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
                   ` (3 preceding siblings ...)
  2009-02-23 17:28 ` [PATCH 4/6] PCI MSI: Use mask_pos instead of mask_base when appropriate Matthew Wilcox
@ 2009-02-23 17:28 ` Matthew Wilcox
  2009-03-03  0:16   ` Michael Ellerman
  2009-02-23 17:28 ` [PATCH 6/6] PCI MSI: Add support for multiple MSI Matthew Wilcox
  2009-03-04 14:52 ` Support " Eric W. Biederman
  6 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-23 17:28 UTC (permalink / raw)
  To: linux-pci, jbarnes, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

Since most of the callers already know whether they have an MSI or
an MSI-X capability, split msi_set_mask_bits() into msi_mask_irq()
and msix_mask_irq().  The only callers which don't (mask_msi_irq()
and unmask_msi_irq()) can share code in msi_set_mask_bit().  This then
becomes the only caller of msix_flush_writes(), so we can inline it.
The flushing read can be to any address that belongs to the device,
so we can eliminate the calculation too.

We can also get rid of maskbits_mask from struct msi_desc and simply
recalculate it on the rare occasion that we need it.  The single-bit
'masked' element is replaced by a copy of the 32-bit 'masked' register,
so this patch does not affect the size of msi_desc.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 drivers/pci/msi.c   |  157 +++++++++++++++++++++++++--------------------------
 include/linux/msi.h |    5 +-
 2 files changed, 79 insertions(+), 83 deletions(-)

diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index fcde04d..41f18cb 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -105,17 +105,14 @@ static inline __attribute_const__ u32 msi_mask(unsigned x)
 	return (1 << (1 << x)) - 1;
 }
 
-static void msix_flush_writes(struct irq_desc *desc)
+static inline __attribute_const__ u32 msi_capable_mask(u16 control)
 {
-	struct msi_desc *entry;
+	return msi_mask((control >> 1) & 7);
+}
 
-	entry = get_irq_desc_msi(desc);
-	BUG_ON(!entry);
-	if (entry->msi_attrib.is_msix) {
-		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
-			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
-		readl(entry->mask_base + offset);
-	}
+static inline __attribute_const__ u32 msi_enabled_mask(u16 control)
+{
+	return msi_mask((control >> 4) & 7);
 }
 
 /*
@@ -127,34 +124,62 @@ static void msix_flush_writes(struct irq_desc *desc)
  * Returns 1 if it succeeded in masking the interrupt and 0 if the device
  * doesn't support MSI masking.
  */
-static int msi_set_mask_bits(struct irq_desc *desc, u32 mask, u32 flag)
+static int msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
 {
-	struct msi_desc *entry;
+	u32 mask_bits = desc->masked;
 
-	entry = get_irq_desc_msi(desc);
-	BUG_ON(!entry);
-	if (entry->msi_attrib.is_msix) {
-		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
-			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
-		writel(flag, entry->mask_base + offset);
-		readl(entry->mask_base + offset);
-	} else {
-		int pos;
-		u32 mask_bits;
+	if (!desc->msi_attrib.maskbit)
+		return 0;
 
-		if (!entry->msi_attrib.maskbit)
-			return 0;
+	mask_bits &= ~mask;
+	mask_bits |= flag;
+	pci_write_config_dword(desc->dev, desc->mask_pos, mask_bits);
+	desc->masked = mask_bits;
 
-		pos = entry->mask_pos;
-		pci_read_config_dword(entry->dev, pos, &mask_bits);
-		mask_bits &= ~mask;
-		mask_bits |= flag & mask;
-		pci_write_config_dword(entry->dev, pos, mask_bits);
-	}
-	entry->msi_attrib.masked = !!flag;
 	return 1;
 }
 
+/*
+ * This internal function does not flush PCI writes to the device.
+ * All users must ensure that they read from the device before either
+ * assuming that the device state is up to date, or returning out of this
+ * file.  This saves a few milliseconds when initialising devices with lots
+ * of MSI-X interrupts.
+ */
+static void msix_mask_irq(struct msi_desc *desc, u32 flag)
+{
+	u32 mask_bits = desc->masked;
+	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
+					PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
+	mask_bits &= ~1;
+	mask_bits |= flag;
+	writel(mask_bits, desc->mask_base + offset);
+	desc->masked = mask_bits;
+}
+
+static int msi_set_mask_bit(unsigned irq, u32 flag)
+{
+	struct msi_desc *desc = get_irq_msi(irq);
+
+	if (desc->msi_attrib.is_msix) {
+		msix_mask_irq(desc, flag);
+		readl(desc->mask_base);		/* Flush write to device */
+		return 1;
+	} else {
+		return msi_mask_irq(desc, 1, flag);
+	}
+}
+
+void mask_msi_irq(unsigned int irq)
+{
+	msi_set_mask_bit(irq, 1);
+}
+
+void unmask_msi_irq(unsigned int irq)
+{
+	msi_set_mask_bit(irq, 0);
+}
+
 void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 {
 	struct msi_desc *entry = get_irq_desc_msi(desc);
@@ -230,22 +255,6 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg)
 	write_msi_msg_desc(desc, msg);
 }
 
-void mask_msi_irq(unsigned int irq)
-{
-	struct irq_desc *desc = irq_to_desc(irq);
-
-	msi_set_mask_bits(desc, 1, 1);
-	msix_flush_writes(desc);
-}
-
-void unmask_msi_irq(unsigned int irq)
-{
-	struct irq_desc *desc = irq_to_desc(irq);
-
-	msi_set_mask_bits(desc, 1, 0);
-	msix_flush_writes(desc);
-}
-
 static int msi_free_irqs(struct pci_dev* dev);
 
 static struct msi_desc *alloc_msi_entry(struct pci_dev *dev)
@@ -281,13 +290,9 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
 	pci_intx_for_msi(dev, 0);
 	msi_set_enable(dev, 0);
 	write_msi_msg(dev->irq, &entry->msg);
-	if (entry->msi_attrib.maskbit) {
-		struct irq_desc *desc = irq_to_desc(dev->irq);
-		msi_set_mask_bits(desc, entry->msi_attrib.maskbits_mask,
-				  entry->msi_attrib.masked);
-	}
 
 	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &control);
+	msi_mask_irq(entry, msi_capable_mask(control), entry->masked);
 	control &= ~PCI_MSI_FLAGS_QSIZE;
 	control |= PCI_MSI_FLAGS_ENABLE;
 	pci_write_config_word(dev, pos + PCI_MSI_FLAGS, control);
@@ -307,9 +312,8 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
 	msix_set_enable(dev, 0);
 
 	list_for_each_entry(entry, &dev->msi_list, list) {
-		struct irq_desc *desc = irq_to_desc(entry->irq);
 		write_msi_msg(entry->irq, &entry->msg);
-		msi_set_mask_bits(desc, 1, entry->msi_attrib.masked);
+		msix_mask_irq(entry, entry->masked);
 	}
 
 	BUG_ON(list_empty(&dev->msi_list));
@@ -342,6 +346,7 @@ static int msi_capability_init(struct pci_dev *dev)
 	struct msi_desc *entry;
 	int pos, ret;
 	u16 control;
+	unsigned mask;
 
 	msi_set_enable(dev, 0);	/* Ensure msi is disabled as I set it up */
 
@@ -356,21 +361,15 @@ static int msi_capability_init(struct pci_dev *dev)
 	entry->msi_attrib.is_64 = is_64bit_address(control);
 	entry->msi_attrib.entry_nr = 0;
 	entry->msi_attrib.maskbit = is_mask_bit_support(control);
-	entry->msi_attrib.masked = 1;
 	entry->msi_attrib.default_irq = dev->irq;	/* Save IOAPIC IRQ */
 	entry->msi_attrib.pos = pos;
-	if (entry->msi_attrib.maskbit) {
-		unsigned int base, maskbits, temp;
-
-		base = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
-		entry->mask_pos = base;
-		/* All MSIs are unmasked by default, Mask them all */
-		pci_read_config_dword(dev, base, &maskbits);
-		temp = msi_mask((control & PCI_MSI_FLAGS_QMASK) >> 1);
-		maskbits |= temp;
-		pci_write_config_dword(dev, base, maskbits);
-		entry->msi_attrib.maskbits_mask = temp;
-	}
+
+	entry->mask_pos = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
+	/* All MSIs are unmasked by default, Mask them all */
+	pci_read_config_dword(dev, entry->mask_pos, &entry->masked);
+	mask = msi_capable_mask(control);
+	msi_mask_irq(entry, mask, mask);
+
 	list_add_tail(&entry->list, &dev->msi_list);
 
 	/* Configure MSI capability structure */
@@ -435,11 +434,12 @@ static int msix_capability_init(struct pci_dev *dev,
 		entry->msi_attrib.is_msix = 1;
 		entry->msi_attrib.is_64 = 1;
 		entry->msi_attrib.entry_nr = j;
-		entry->msi_attrib.maskbit = 1;
-		entry->msi_attrib.masked = 1;
 		entry->msi_attrib.default_irq = dev->irq;
 		entry->msi_attrib.pos = pos;
 		entry->mask_base = base;
+		entry->masked = readl(base + j * PCI_MSIX_ENTRY_SIZE +
+					PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
+		msix_mask_irq(entry, 1);
 
 		list_add_tail(&entry->list, &dev->msi_list);
 	}
@@ -556,9 +556,11 @@ int pci_enable_msi(struct pci_dev* dev)
 }
 EXPORT_SYMBOL(pci_enable_msi);
 
-void pci_msi_shutdown(struct pci_dev* dev)
+void pci_msi_shutdown(struct pci_dev *dev)
 {
-	struct msi_desc *entry;
+	struct msi_desc *desc;
+	u32 mask;
+	u16 ctrl;
 
 	if (!pci_msi_enable || !dev || !dev->msi_enabled)
 		return;
@@ -568,18 +570,13 @@ void pci_msi_shutdown(struct pci_dev* dev)
 	dev->msi_enabled = 0;
 
 	BUG_ON(list_empty(&dev->msi_list));
-	entry = list_entry(dev->msi_list.next, struct msi_desc, list);
-	/* Return the the pci reset with msi irqs unmasked */
-	if (entry->msi_attrib.maskbit) {
-		u32 mask = entry->msi_attrib.maskbits_mask;
-		struct irq_desc *desc = irq_to_desc(dev->irq);
-		msi_set_mask_bits(desc, mask, ~mask);
-	}
-	if (entry->msi_attrib.is_msix)
-		return;
+	desc = list_first_entry(&dev->msi_list, struct msi_desc, list);
+	pci_read_config_word(dev, desc->msi_attrib.pos + PCI_MSI_FLAGS, &ctrl);
+	mask = msi_capable_mask(ctrl);
+	msi_mask_irq(desc, mask, ~mask);
 
 	/* Restore dev->irq to its default pin-assertion irq */
-	dev->irq = entry->msi_attrib.default_irq;
+	dev->irq = desc->msi_attrib.default_irq;
 }
 
 void pci_disable_msi(struct pci_dev* dev)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 5025ca4..37c1bbe 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -22,14 +22,13 @@ struct msi_desc {
 	struct {
 		__u8	is_msix	: 1;
 		__u8	maskbit	: 1; 	/* mask-pending bit supported ?   */
-		__u8	masked	: 1;
 		__u8	is_64	: 1;	/* Address size: 0=32bit 1=64bit  */
 		__u8	pos;	 	/* Location of the msi capability */
 		__u16	entry_nr;    	/* specific enabled entry 	  */
-		__u32	maskbits_mask;  /* mask bits mask */
 		unsigned default_irq;	/* default pre-assigned irq	  */
-	}msi_attrib;
+	} msi_attrib;
 
+	u32 masked;			/* mask bits */
 	unsigned int irq;
 	struct list_head list;
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/6] PCI MSI: Add support for multiple MSI
  2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
                   ` (4 preceding siblings ...)
  2009-02-23 17:28 ` [PATCH 5/6] PCI MSI: Refactor interrupt masking code Matthew Wilcox
@ 2009-02-23 17:28 ` Matthew Wilcox
  2009-03-03  0:16   ` Michael Ellerman
  2009-03-04 14:52 ` Support " Eric W. Biederman
  6 siblings, 1 reply; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-23 17:28 UTC (permalink / raw)
  To: linux-pci, jbarnes, linux-kernel; +Cc: Matthew Wilcox

From: Matthew Wilcox <willy@linux.intel.com>

Add the new API pci_enable_msi_block() to allow drivers to
request multiple MSI and reimplement pci_enable_msi in terms of
pci_enable_msi_block.  Ensure that the architecture back ends don't
have to know about multiple MSI.

Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
---
 Documentation/PCI/MSI-HOWTO.txt |   45 +++++++++++++++++---
 arch/powerpc/kernel/msi.c       |    4 ++
 arch/x86/kernel/io_apic.c       |    4 ++
 drivers/pci/msi.c               |   91 +++++++++++++++++++++++++++-----------
 drivers/pci/msi.h               |    6 ---
 include/linux/msi.h             |    1 +
 include/linux/pci.h             |    6 ++-
 7 files changed, 116 insertions(+), 41 deletions(-)

diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
index f1b9f09..18a1e20 100644
--- a/Documentation/PCI/MSI-HOWTO.txt
+++ b/Documentation/PCI/MSI-HOWTO.txt
@@ -94,15 +94,48 @@ This function should be called before the driver calls request_irq()
 since enabling MSIs disables the pin-based IRQ and the driver will not
 receive interrupts on the old interrupt.
 
-4.2.2 pci_disable_msi
+4.2.2 pci_enable_msi_block
+
+int pci_enable_msi_block(struct pci_dev *dev, int count)
+
+This variation on the above call allows a device driver to request multiple
+MSIs.  The MSI specification only allows interrupts to be allocated in
+powers of two, up to a maximum of 2^5 (32).
+
+If this function returns 0, it has succeeded in allocating at least as many
+interrupts as the driver requested (it may have allocated more in order
+to satisfy the power-of-two requirement).  In this case, the function
+enables MSI on this device and updates dev->irq to be the lowest of
+the new interrupts assigned to it.  The other interrupts assigned to
+the device are in the range dev->irq to dev->irq + count - 1.
+
+If this function returns a negative number, it indicates an error and
+the driver should not attempt to request any more MSI interrupts for
+this device.  If this function returns a positive number, it will be
+less than 'count' and indicate the number of interrupts that could have
+been allocated.  In neither case will the irq value have been
+updated, nor will the device have been switched into MSI mode.
+
+The device driver must decide what action to take if
+pci_enable_msi_block() returns a value less than the number asked for.
+Some devices can make use of fewer interrupts than the maximum they
+request; in this case the driver should call pci_enable_msi_block()
+again.  Note that it is not guaranteed to succeed, even when the
+'count' has been reduced to the value returned from a previous call to
+pci_enable_msi_block().  This is because there are multiple constraints
+on the number of vectors that can be allocated; pci_enable_msi_block()
+will return as soon as it finds any constraint that doesn't allow the
+call to succeed.
+
+4.2.3 pci_disable_msi
 
 void pci_disable_msi(struct pci_dev *dev)
 
-This function should be used to undo the effect of pci_enable_msi().
-Calling it restores dev->irq to the pin-based interrupt number and frees
-the previously allocated message signaled interrupt(s).  The interrupt
-may subsequently be assigned to another device, so drivers should not
-cache the value of dev->irq.
+This function should be used to undo the effect of pci_enable_msi() or
+pci_enable_msi_block().  Calling it restores dev->irq to the pin-based
+interrupt number and frees the previously allocated message signaled
+interrupt(s).  The interrupt may subsequently be assigned to another
+device, so drivers should not cache the value of dev->irq.
 
 A device driver must always call free_irq() on the interrupt(s)
 for which it has called request_irq() before calling this function.
diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
index 3bb7d3d..0c16e2a 100644
--- a/arch/powerpc/kernel/msi.c
+++ b/arch/powerpc/kernel/msi.c
@@ -19,6 +19,10 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
 		return -ENOSYS;
 	}
 
+	/* PowerPC doesn't support multiple MSI yet */
+	if (type == PCI_CAP_ID_MSI && nvec > 1)
+		return 1;
+
 	if (ppc_md.msi_check_device) {
 		pr_debug("msi: Using platform check routine.\n");
 		return ppc_md.msi_check_device(dev, nvec, type);
diff --git a/arch/x86/kernel/io_apic.c b/arch/x86/kernel/io_apic.c
index bc7ac4d..a09549a 100644
--- a/arch/x86/kernel/io_apic.c
+++ b/arch/x86/kernel/io_apic.c
@@ -3510,6 +3510,10 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 	int index = 0;
 #endif
 
+	/* x86 doesn't support multiple MSI yet */
+	if (type == PCI_CAP_ID_MSI && nvec > 1)
+		return 1;
+
 	irq_want = nr_irqs_gsi;
 	sub_handle = 0;
 	list_for_each_entry(msidesc, &dev->msi_list, list) {
diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
index 41f18cb..bd3c7e8 100644
--- a/drivers/pci/msi.c
+++ b/drivers/pci/msi.c
@@ -40,6 +40,13 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type)
 	struct msi_desc *entry;
 	int ret;
 
+	/*
+	 * If an architecture wants to support multiple MSI, it needs to
+	 * override arch_setup_msi_irqs()
+	 */
+	if (type == PCI_CAP_ID_MSI && nvec > 1)
+		return 1;
+
 	list_for_each_entry(entry, &dev->msi_list, list) {
 		ret = arch_setup_msi_irq(dev, entry);
 		if (ret < 0)
@@ -58,8 +65,12 @@ void arch_teardown_msi_irqs(struct pci_dev *dev)
 	struct msi_desc *entry;
 
 	list_for_each_entry(entry, &dev->msi_list, list) {
-		if (entry->irq != 0)
-			arch_teardown_msi_irq(entry->irq);
+		int i, nvec;
+		if (entry->irq == 0)
+			continue;
+		nvec = 1 << entry->msi_attrib.multiple;
+		for (i = 0; i < nvec; i++)
+			arch_teardown_msi_irq(entry->irq + i);
 	}
 }
 #endif
@@ -166,7 +177,8 @@ static int msi_set_mask_bit(unsigned irq, u32 flag)
 		readl(desc->mask_base);		/* Flush write to device */
 		return 1;
 	} else {
-		return msi_mask_irq(desc, 1, flag);
+		unsigned offset = irq - desc->dev->irq;
+		return msi_mask_irq(desc, 1 << offset, flag << offset);
 	}
 }
 
@@ -232,6 +244,12 @@ void write_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
 	} else {
 		struct pci_dev *dev = entry->dev;
 		int pos = entry->msi_attrib.pos;
+		u16 msgctl;
+
+		pci_read_config_word(dev, msi_control_reg(pos), &msgctl);
+		msgctl &= ~PCI_MSI_FLAGS_QSIZE;
+		msgctl |= entry->msi_attrib.multiple << 4;
+		pci_write_config_word(dev, msi_control_reg(pos), msgctl);
 
 		pci_write_config_dword(dev, msi_lower_address_reg(pos),
 					msg->address_lo);
@@ -294,7 +312,7 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
 	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &control);
 	msi_mask_irq(entry, msi_capable_mask(control), entry->masked);
 	control &= ~PCI_MSI_FLAGS_QSIZE;
-	control |= PCI_MSI_FLAGS_ENABLE;
+	control |= (entry->msi_attrib.multiple << 4) | PCI_MSI_FLAGS_ENABLE;
 	pci_write_config_word(dev, pos + PCI_MSI_FLAGS, control);
 }
 
@@ -335,13 +353,15 @@ EXPORT_SYMBOL_GPL(pci_restore_msi_state);
 /**
  * msi_capability_init - configure device's MSI capability structure
  * @dev: pointer to the pci_dev data structure of MSI device function
+ * @nvec: number of interrupts to allocate
  *
- * Setup the MSI capability structure of device function with a single
- * MSI irq, regardless of device function is capable of handling
- * multiple messages. A return of zero indicates the successful setup
- * of an entry zero with the new MSI irq or non-zero for otherwise.
- **/
-static int msi_capability_init(struct pci_dev *dev)
+ * Setup the MSI capability structure of the device with the requested
+ * number of interrupts.  A return value of zero indicates the successful
+ * setup of an entry with the new MSI irq.  A negative return value indicates
+ * an error, and a positive return value indicates the number of interrupts
+ * which could have been allocated.
+ */
+static int msi_capability_init(struct pci_dev *dev, int nvec)
 {
 	struct msi_desc *entry;
 	int pos, ret;
@@ -373,7 +393,7 @@ static int msi_capability_init(struct pci_dev *dev)
 	list_add_tail(&entry->list, &dev->msi_list);
 
 	/* Configure MSI capability structure */
-	ret = arch_setup_msi_irqs(dev, 1, PCI_CAP_ID_MSI);
+	ret = arch_setup_msi_irqs(dev, nvec, PCI_CAP_ID_MSI);
 	if (ret) {
 		msi_free_irqs(dev);
 		return ret;
@@ -526,35 +546,48 @@ static int pci_msi_check_device(struct pci_dev* dev, int nvec, int type)
 }
 
 /**
- * pci_enable_msi - configure device's MSI capability structure
- * @dev: pointer to the pci_dev data structure of MSI device function
+ * pci_enable_msi_block - configure device's MSI capability structure
+ * @dev: device to configure
+ * @nvec: number of interrupts to configure
  *
- * Setup the MSI capability structure of device function with
- * a single MSI irq upon its software driver call to request for
- * MSI mode enabled on its hardware device function. A return of zero
- * indicates the successful setup of an entry zero with the new MSI
- * irq or non-zero for otherwise.
- **/
-int pci_enable_msi(struct pci_dev* dev)
+ * Allocate IRQs for a device with the MSI capability.
+ * This function returns a negative errno if an error occurs.  If it
+ * is unable to allocate the number of interrupts requested, it returns
+ * the number of interrupts it might be able to allocate.  If it successfully
+ * allocates at least the number of interrupts requested, it returns 0 and
+ * updates the @dev's irq member to the lowest new interrupt number; the
+ * other interrupt numbers allocated to this device are consecutive.
+ */
+int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
 {
-	int status;
+	int status, pos, maxvec;
+	u16 msgctl;
 
-	status = pci_msi_check_device(dev, 1, PCI_CAP_ID_MSI);
+	pos = pci_find_capability(dev, PCI_CAP_ID_MSI);
+	if (!pos)
+		return -EINVAL;
+	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &msgctl);
+	maxvec = 1 << ((msgctl & PCI_MSI_FLAGS_QMASK) >> 1);
+	if (nvec > maxvec)
+		return maxvec;
+
+	status = pci_msi_check_device(dev, nvec, PCI_CAP_ID_MSI);
 	if (status)
 		return status;
 
 	WARN_ON(!!dev->msi_enabled);
 
-	/* Check whether driver already requested for MSI-X irqs */
+	/* Check whether driver already requested MSI-X irqs */
 	if (dev->msix_enabled) {
 		dev_info(&dev->dev, "can't enable MSI "
 			 "(MSI-X already enabled)\n");
 		return -EINVAL;
 	}
-	status = msi_capability_init(dev);
+
+	status = msi_capability_init(dev, nvec);
 	return status;
 }
-EXPORT_SYMBOL(pci_enable_msi);
+EXPORT_SYMBOL(pci_enable_msi_block);
 
 void pci_msi_shutdown(struct pci_dev *dev)
 {
@@ -601,8 +634,12 @@ static int msi_free_irqs(struct pci_dev* dev)
 	struct msi_desc *entry, *tmp;
 
 	list_for_each_entry(entry, &dev->msi_list, list) {
-		if (entry->irq)
-			BUG_ON(irq_has_action(entry->irq));
+		int i, nvec;
+		if (!entry->irq)
+			continue;
+		nvec = 1 << entry->msi_attrib.multiple;
+		for (i = 0; i < nvec; i++)
+			BUG_ON(irq_has_action(entry->irq + i));
 	}
 
 	arch_teardown_msi_irqs(dev);
diff --git a/drivers/pci/msi.h b/drivers/pci/msi.h
index 3898f52..71f4df2 100644
--- a/drivers/pci/msi.h
+++ b/drivers/pci/msi.h
@@ -20,14 +20,8 @@
 #define msi_mask_bits_reg(base, is64bit) \
 	( (is64bit == 1) ? base+PCI_MSI_MASK_BIT : base+PCI_MSI_MASK_BIT-4)
 #define msi_disable(control)		control &= ~PCI_MSI_FLAGS_ENABLE
-#define multi_msi_capable(control) \
-	(1 << ((control & PCI_MSI_FLAGS_QMASK) >> 1))
-#define multi_msi_enable(control, num) \
-	control |= (((num >> 1) << 4) & PCI_MSI_FLAGS_QSIZE);
 #define is_64bit_address(control)	(!!(control & PCI_MSI_FLAGS_64BIT))
 #define is_mask_bit_support(control)	(!!(control & PCI_MSI_FLAGS_MASKBIT))
-#define msi_enable(control, num) multi_msi_enable(control, num); \
-	control |= PCI_MSI_FLAGS_ENABLE
 
 #define msix_table_offset_reg(base)	(base + 0x04)
 #define msix_pba_offset_reg(base)	(base + 0x08)
diff --git a/include/linux/msi.h b/include/linux/msi.h
index 37c1bbe..6991ab5 100644
--- a/include/linux/msi.h
+++ b/include/linux/msi.h
@@ -21,6 +21,7 @@ extern void write_msi_msg(unsigned int irq, struct msi_msg *msg);
 struct msi_desc {
 	struct {
 		__u8	is_msix	: 1;
+		__u8	multiple: 3;	/* log2 number of messages */
 		__u8	maskbit	: 1; 	/* mask-pending bit supported ?   */
 		__u8	is_64	: 1;	/* Address size: 0=32bit 1=64bit  */
 		__u8	pos;	 	/* Location of the msi capability */
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7baf2a5..1f6c5dd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -789,7 +789,7 @@ struct msix_entry {
 
 
 #ifndef CONFIG_PCI_MSI
-static inline int pci_enable_msi(struct pci_dev *dev)
+static inline int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec)
 {
 	return -1;
 }
@@ -824,7 +824,7 @@ static inline int pci_msi_enabled(void)
 	return 0;
 }
 #else
-extern int pci_enable_msi(struct pci_dev *dev);
+extern int pci_enable_msi_block(struct pci_dev *dev, unsigned int nvec);
 extern void pci_msi_shutdown(struct pci_dev *dev);
 extern void pci_disable_msi(struct pci_dev *dev);
 extern int pci_msix_table_size(struct pci_dev *dev);
@@ -846,6 +846,8 @@ static inline int pcie_aspm_enabled(void)
 extern int pcie_aspm_enabled(void);
 #endif
 
+#define pci_enable_msi(pdev)	pci_enable_msi_block(pdev, 1)
+
 #ifdef CONFIG_HT_IRQ
 /* The functions a driver should call */
 int  ht_create_irq(struct pci_dev *dev, int idx);
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-23 17:27 ` [PATCH 1/6] Rewrite MSI-HOWTO Matthew Wilcox
@ 2009-02-24 20:00   ` Randy Dunlap
  2009-02-24 20:28     ` Matthew Wilcox
  2009-02-27  6:15   ` Grant Grundler
  1 sibling, 1 reply; 23+ messages in thread
From: Randy Dunlap @ 2009-02-24 20:00 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> I didn't find the previous version very useful, so I rewrote it.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> ---
>  Documentation/PCI/MSI-HOWTO.txt |  756 ++++++++++++++-------------------------
>  1 files changed, 275 insertions(+), 481 deletions(-)
> 
> diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
> index 256defd..f1b9f09 100644
> --- a/Documentation/PCI/MSI-HOWTO.txt
> +++ b/Documentation/PCI/MSI-HOWTO.txt
> @@ -4,506 +4,300 @@
>  	Revised Feb 12, 2004 by Martine Silbermann
>  		email: Martine.Silbermann@hp.com
>  	Revised Jun 25, 2004 by Tom L Nguyen
> +	Revised Jul  9, 2008 by Matthew Wilcox <willy@linux.intel.com>
> +		Copyright 2003, 2008 Intel Corporation
>  
>  1. About this guide
>  

[deletia]

> -5.2.1 API pci_enable_msi
> +This guide describes the basics of Message Signaled Interrupts (MSIs),
> +the advantages of using MSI over traditional interrupt mechanisms, how
> +to change your driver to use MSI or MSI-X and some basic diagnostics to
> +try if a device doesn't support MSIs.
> +
> +
> +2. What are MSIs?
> +
> +A Message Signalled Interrupt is a write from the device to a special

Consistent spelling, please.

> +address which causes an interrupt to be received by the CPU.
> +
> +The MSI capability was first specified in PCI 2.2 and was later enhanced
> +in PCI 3.0 to allow each interrupt to be masked individually.  The MSI-X
> +capability was also introduced with PCI 3.0.  It supports more interrupts
> +per device than MSI and allows interrupts to be independently configured.
> +
> +Devices may support both MSI and MSI-X, but only one can be enabled at
> +a time.
> +
> +
> +3. Why use MSIs?
> +
> +There are three reasons why using MSIs can give an advantage over
> +traditional pin-based interrupts.
> +
> +Pin-based PCI interrupts are often shared amongst several devices.
> +To support this, the kernel must call each interrupt handler associated
> +with an interrupt which leads to reduced performance for the system as

   with an interrupt,

> +a whole.  MSIs are never shared, so this problem cannot arise.
> +
> +When a device writes data to memory, then raises a pin-based interrupt,
> +it is possible that the interrupt may arrive before all the data has
> +arrived in memory (this becomes more likely with devices behind PCI-PCI
> +bridges).  In order to ensure that all the data has arrived in memory,
> +the interrupt handler must read a register on the device which raised
> +the interrupt.  PCI transaction ordering rules require that all the data
> +arrives in memory before the value can be returned from the register.
> +Using MSIs avoids this problem as the interrupt-generating write cannot
> +pass the data writes, so by the time the interrupt is raised, the driver
> +knows that all the data has arrived in memory.
> +
> +PCI devices can only support a single pin-based interrupt per function.
> +Often drivers have to query the device to find out what event has
> +occurred, slowing down interrupt handling for the common case.  With
> +MSIs, a device can support more interrupts, allowing each interrupt
> +to be specialised to a different purpose.  One possible design gives
> +infrequent conditions (such as errors) their own interrupt which allows
> +the driver to handle the normal interrupt handling path more efficiently.
> +Other possible designs include giving one interrupt to each packet queue
> +in a network card or each port in a storage controller.
> +
> +
> +4. How to use MSIs
> +
> +PCI devices are initialised to use pin-based interrupts.  The device
> +driver has to set up the device to use MSI or MSI-X.  Not all machines
> +support MSIs correctly, and for those machines, the APIs described below
> +will simply fail and the device will continue to use pin-based interrupts.
> +
> +4.1 Include kernel support for MSIs
> +
> +To support MSI or MSI-X, the kernel must be built with the CONFIG_PCI_MSI
> +option enabled.  This option is only available on some architectures,
> +and it may depend on some other options also being set.  For example,
> +on x86, you must also enable X86_UP_APIC or SMP in order to see the
> +CONFIG_PCI_MSI option.
> +
> +4.2 Using MSI
> +
> +Most of the hard work is done for the driver in the PCI layer.  It simply
> +has to request that the PCI layer set up the MSI capability for this
> +device.
> +
> +4.2.1 pci_enable_msi
>  
>  int pci_enable_msi(struct pci_dev *dev)
>  

[deletia]

> +A successful call will allocate ONE interrupt to the device, regardless
> +of how many MSIs the device supports.  The device will be switched from
> +pin-based interrupt mode to MSI mode.  The dev->irq number is changed
> +to a new number which represents the message signaled interrupt.
> +This function should be called before the driver calls request_irq()
> +since enabling MSIs disables the pin-based IRQ and the driver will not
> +receive interrupts on the old interrupt.
>  
> -5.2.2 API pci_disable_msi
> +4.2.2 pci_disable_msi

[deletia]

> +This function should be used to undo the effect of pci_enable_msi().
> +Calling it restores dev->irq to the pin-based interrupt number and frees
> +the previously allocated message signaled interrupt(s).  The interrupt
> +may subsequently be assigned to another device, so drivers should not
> +cache the value of dev->irq.
>  
> -This API enables a device driver to request the PCI subsystem
> -to enable MSI-X messages on its hardware device. Depending on
> -the availability of PCI vectors resources, the PCI subsystem enables
> -either all or none of the requested vectors.
> +A device driver must always call free_irq() on the interrupt(s)
> +for which it has called request_irq() before calling this function.
> +Failure to do so will result in a BUG_ON(), the device will be left with
> +MSI enabled and will leak its vector.
>  
> -Argument 'dev' points to the device (pci_dev) structure.
> +4.3 Using MSI-X
>  
> -Argument 'entries' is a pointer to an array of msix_entry structs.
> -The number of entries is indicated in argument 'nvec'.
> -struct msix_entry is defined in /driver/pci/msi.h:
> +The MSI-X capability is much more flexible than the MSI capability.
> +It supports up to 2048 interrupts, each of which can be separately
> +assigned.  To support this flexibility, drivers must use an array of
> +`struct msix_entry':
>  
>  struct msix_entry {
>  	u16 	vector; /* kernel uses to write alloc vector */
>  	u16	entry; /* driver uses to specify entry */
>  };
>  

[deletia]

> +This allows for the device to use these interrupts in a sparse fashion;
> +for example it could use interrupts 3 and 1027 and allocate only a
> +two-element array.  The driver is expected to fill in the 'entry' value
> +in each element of the array to indicate which entries it wants the kernel
> +to assign interrupts for.  It is invalid to fill in two entries with the
> +same number.
> +
> +4.3.1 pci_enable_msix
> +
> +int pci_enable_msix(struct pci_dev *dev, struct msix_entry *entries, int nvec)
> +
> +Calling this function asks the PCI subsystem to allocate 'nvec' MSIs.
> +The 'entries' argument is a pointer to an array of msix_entry structs
> +which should be at least 'nvec' entries in size.  On success, the
> +function will return 0 and the device will have been switched into
> +MSI-X interrupt mode.  The 'vector' elements in each entry will have
> +been filled in with the interrupt number.  The driver should then call
> +request_irq() for each 'vector' that it decides to use.
> +
> +If this function returns a negative number, it indicates an error and
> +the driver should not attempt to allocate any more MSI-X interrupts for
> +this device.  If it returns a positive number, it indicates the maximum
> +number of interrupt vectors that could have been allocated.
> +
> +This function, in contrast with pci_enable_msi(), does not adjust
> +dev->irq.  The device will not generate interrupts for this interrupt
> +number once MSI-X is enabled.  The device driver is responsible for
> +keeping track of the interrupts assigned to the MSI-X vectors so it can
> +free them again later.
> +
> +Device drivers should normally call this function once per device
> +during the initialization phase.

Consistent isation/ization (or ised/ized), please.

> +
> +4.3.2 pci_disable_msix
>  
>  void pci_disable_msix(struct pci_dev *dev)
>  

[deletia]

> +This API should be used to undo the effect of pci_enable_msix().  It frees
> +the previously allocated message signaled interrupts.  The interrupts may
> +subsequently be assigned to another device, so drivers should not cache
> +the value of the 'vector' elements over a call to pci_disable_msix().
> +
> +A device driver must always call free_irq() on the interrupt(s)
> +for which it has called request_irq() before calling this function.
> +Failure to do so will result in a BUG_ON(), the device will be left with
> +MSI enabled and will leak its vector.
> +
> +4.3.3 The MSI-X Table
> +
> +The MSI-X capability specifies a BAR and offset within that BAR for the
> +MSI-X Table.  This address is mapped by the PCI subsystem, and should not
> +be accessed directly by the device driver.  If the driver wishes to
> +mask or unmask an interrupt, it should call disable_irq() / enable_irq().
> +
> +4.4 Handling devices implementing both MSI and MSI-X capabilities
> +
> +If a device implements both MSI and MSI-X capabilities, it can
> +run in either MSI mode or MSI-X mode but not both simultaneously.
> +This is a requirement of the PCI spec, and it is enforced by the
> +PCI layer.  Calling pci_enable_msi() when MSI-X is already enabled or
> +pci_enable_msix() when MSI is already enabled will result in an error.
> +If a device driver wishes to switch between MSI and MSI-X at runtime,
> +it must first quiesce the device, then switch it back to pin-interrupt
> +mode, before calling pci_enable_msi() or pci_enable_msix() and resuming
> +operation.  This is not expected to be a common operation but may be
> +useful for debugging or testing during development.
> +
> +4.5 Considerations when using MSIs
> +
> +4.5.1 Choosing between MSI-X and MSI
> +
> +If your device supports both MSI-X and MSI capabilities, you should use
> +the MSI-X facilities in preference to the MSI facilities.  As mentioned
> +above, MSI-X supports any number of interrupts between 1 and 2048.
> +In constrast, MSI is restricted to a maximum of 32 interrupts (and
> +must be a power of two).  In addition, the MSI interrupt vectors must
> +be allocated consecutively, so the system may not be able to allocate
> +as many vectors for MSI as it could for MSI-X.  On some platforms, MSI
> +interrupts must all be targetted at the same set of CPUs whereas MSI-X
> +interrupts can all be targetted at different CPUs.
> +
> +4.5.2 Spinlocks
> +
> +Most device drivers have a per-device spinlock which is taken in the
> +interrupt handler.  With pin-based interrupts or a single MSI, it is not
> +necessary to disable interrupts (Linux guarantees the same interrupt will
> +not be re-entered).  If a device uses multiple interrupts, the driver
> +must disable interrupts while the lock is held.  If the device sends
> +a different interrupt, the driver will deadlock trying to recursively
> +acquire the spinlock.
> +
> +There are two solutions.  The first is to take the
> +lock with spin_lock_irqsave() or spin_lock_irq() (see
> +Documentation/DocBook/kernel-locking).  The second is to specify
> +IRQF_DISABLED to request_irq() so that the kernel runs the entire
> +interrupt routine with interrupts disabled.
> +
> +If your MSI interrupt routine does not hold the lock for the whole time
> +it is running, the first solution may be best.  The second solution is
> +normally preferred as it avoids making two transitions from interrupt
> +disabled to enabled and back again.
> +
> +4.6 How to tell whether MSI/MSI-X is enabled on a device
> +
> +Using lspci -v (as root) will show some devices with "Message Signalled

Oh gosh, does lspci misspell it?

> +Interrupts" and others with "MSI-X".  Each of these capabilities have an

                                         Each ..................... has an

> +'Enable' flag which will be followed with either "+" (enabled) or "-"
> +(disabled).
> +
> +
> +5. MSI quirks
> +
> +Several PCI chipsets or devices are known not to support MSIs.
> +The PCI stack provides three ways to disable MSIs:
> +
> +1. globally
> +2. on all devices behind a specific bridge
> +3. on a single device
> +
> +5.1. Disabling MSIs globally
> +
> +Some host chipsets simply don't support MSIs properly.  If we're
> +lucky, the manufacturer knows this and has indicated it in the ACPI
> +FADT table.  In this case, Linux will automatically disable MSIs.
> +Some boards don't include this information in the table and so we have
> +to detect them ourselves.  The complete list of these is found near the
> +quirk_disable_all_msi() function in drivers/pci/quirks.c.
> +
> +If you have a board which has problems with MSIs, you can pass pci=nomsi
> +on the kernel command line to disable MSIs on all devices.  It would be
> +in your best interests to report the problem to linux-pci@vger.kernel.org
> +including a full lspci -v so we can add the quirks to the kernel.

   I would say      "lspci -v" ...

> +
> +5.2. Disabling MSIs below a bridge
> +
> +Some PCI bridges are not able to route MSIs between busses properly.
> +In this case, MSIs must be disabled on all devices behind the bridge.
> +
> +Some bridges allow you to enable MSIs by changing some bits in their
> +PCI configuration space (especially the Hypertransport chipsets such
> +as the nVidia nForce and Serverworks HT2000).  As with host chipsets,
> +Linux mostly knows about them and automatically enables MSIs if it can.
> +If you have a bridge which Linux doesn't yet know about, you can enable
> +MSIs in configuration space using whatever method you know works, then
> +enable MSIs on that bridge by doing:
> +
> +       echo 1 > /sys/bus/pci/devices/$bridge/msi_bus
> +
> +where $bridge is the PCI address of the bridge you've enabled (eg
> +0000:00:0e.0).
> +
> +To disable MSIs, echo 0 instead of 1.  Changing this value should be
> +done with caution as it can break interrupt handling for all devices
> +below this bridge.
> +
> +Again, please notify linux-pci@vger.kernel.org of any bridges that need
> +special handling.
> +
> +5.3. Disabling MSIs on a single device
> +
> +Some devices are known to have faulty MSI implementations.  Usually this
> +is handled in the individual device driver but occasionally it's necessary
> +to handle this with a quirk.  Some drivers have an option to disable MSIs;
> +this is deprecated.
> +
> +5.4. Finding why MSIs are disabled on a device
> +
> +From the above three sections, you can see that there are many reasons
> +why MSIs may not be enabled for a given device.  Your first step should
> +be to examine your dmesg carefully to determine whether MSIs are enabled
> +for your machine.  You should also check your .config to be sure you
> +have enabled CONFIG_PCI_MSI.
> +
> +Then, lspci -t gives the list of bridges above a device.  Reading

         "lspci -t"

> +/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
> +or disabled (0).  If 0 is found in any of the msi_bus files belonging
> +to bridges between the PCI root and the device, MSIs are disabled.
> +
> +It is also worth checking whether the device driver supports MSIs.
> +

Unneeded ending blank line (nit).

-- 
~Randy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-24 20:00   ` Randy Dunlap
@ 2009-02-24 20:28     ` Matthew Wilcox
  2009-02-24 20:55       ` Randy Dunlap
  2009-02-25  7:34       ` Sitsofe Wheeler
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-24 20:28 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: Matthew Wilcox, linux-pci, jbarnes, linux-kernel


Thanks for the review!

On Tue, Feb 24, 2009 at 12:00:03PM -0800, Randy Dunlap wrote:
> Matthew Wilcox wrote:
> > +A Message Signalled Interrupt is a write from the device to a special
> 
> Consistent spelling, please.

Ah yes, I thought I'd fixed that.  I'll follow the PCI spec and use one 'l'.

> > +To support this, the kernel must call each interrupt handler associated
> > +with an interrupt which leads to reduced performance for the system as
> 
>    with an interrupt,

Ack.

> > +Device drivers should normally call this function once per device
> > +during the initialization phase.
> 
> Consistent isation/ization (or ised/ized), please.

The Shorter OED is quite amusing on this point.  It lists 'Initialise' as a
variant, but in the main definition under 'Initialize', the example it
gives of the computer usage uses 'initialise'.  I can only conclude that
a foolish consistency is the hobgoblin of little minds ;-)

> > +4.6 How to tell whether MSI/MSI-X is enabled on a device
> > +
> > +Using lspci -v (as root) will show some devices with "Message Signalled
> 
> Oh gosh, does lspci misspell it?

It used to.  The good news is that recent pciutils now just use MSI
instead of spelling it out.  As a bonus, the capability now fits on one
line.

> > +Interrupts" and others with "MSI-X".  Each of these capabilities have an
> 
>                                          Each ..................... has an

You're correct.

> > +'Enable' flag which will be followed with either "+" (enabled) or "-"
> > +(disabled).


> > +including a full lspci -v so we can add the quirks to the kernel.
> 
>    I would say      "lspci -v" ...

Good idea.

> > +Then, lspci -t gives the list of bridges above a device.  Reading
> 
>          "lspci -t"

Likewise.

> > +It is also worth checking whether the device driver supports MSIs.
> > +
> 
> Unneeded ending blank line (nit).

Heh.  I'll take it out.

May I add your reviewed-by tag?

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-24 20:28     ` Matthew Wilcox
@ 2009-02-24 20:55       ` Randy Dunlap
  2009-02-25  7:34       ` Sitsofe Wheeler
  1 sibling, 0 replies; 23+ messages in thread
From: Randy Dunlap @ 2009-02-24 20:55 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Matthew Wilcox, linux-pci, jbarnes, linux-kernel

Matthew Wilcox wrote:
> Thanks for the review!
> 
> On Tue, Feb 24, 2009 at 12:00:03PM -0800, Randy Dunlap wrote:
>> Matthew Wilcox wrote:
>>> +A Message Signalled Interrupt is a write from the device to a special
>> Consistent spelling, please.
> 
> Ah yes, I thought I'd fixed that.  I'll follow the PCI spec and use one 'l'.
> 
>>> +To support this, the kernel must call each interrupt handler associated
>>> +with an interrupt which leads to reduced performance for the system as
>>    with an interrupt,
> 
> Ack.
> 
>>> +Device drivers should normally call this function once per device
>>> +during the initialization phase.
>> Consistent isation/ization (or ised/ized), please.
> 
> The Shorter OED is quite amusing on this point.  It lists 'Initialise' as a
> variant, but in the main definition under 'Initialize', the example it
> gives of the computer usage uses 'initialise'.  I can only conclude that
> a foolish consistency is the hobgoblin of little minds ;-)
> 
>>> +4.6 How to tell whether MSI/MSI-X is enabled on a device
>>> +
>>> +Using lspci -v (as root) will show some devices with "Message Signalled
>> Oh gosh, does lspci misspell it?
> 
> It used to.  The good news is that recent pciutils now just use MSI
> instead of spelling it out.  As a bonus, the capability now fits on one
> line.
> 
>>> +Interrupts" and others with "MSI-X".  Each of these capabilities have an
>>                                          Each ..................... has an
> 
> You're correct.
> 
>>> +'Enable' flag which will be followed with either "+" (enabled) or "-"
>>> +(disabled).
> 
> 
>>> +including a full lspci -v so we can add the quirks to the kernel.
>>    I would say      "lspci -v" ...
> 
> Good idea.
> 
>>> +Then, lspci -t gives the list of bridges above a device.  Reading
>>          "lspci -t"
> 
> Likewise.
> 
>>> +It is also worth checking whether the device driver supports MSIs.
>>> +
>> Unneeded ending blank line (nit).
> 
> Heh.  I'll take it out.
> 
> May I add your reviewed-by tag?

Yes.

-- 
~Randy

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-24 20:28     ` Matthew Wilcox
  2009-02-24 20:55       ` Randy Dunlap
@ 2009-02-25  7:34       ` Sitsofe Wheeler
  1 sibling, 0 replies; 23+ messages in thread
From: Sitsofe Wheeler @ 2009-02-25  7:34 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Randy Dunlap, Matthew Wilcox, linux-pci, jbarnes, linux-kernel

On Tue, Feb 24, 2009 at 12:28:08PM -0800, Matthew Wilcox wrote:
> 
> > Consistent isation/ization (or ised/ized), please.
> 
> The Shorter OED is quite amusing on this point.  It lists 'Initialise' as a
> variant, but in the main definition under 'Initialize', the example it
> gives of the computer usage uses 'initialise'.  I can only conclude that
> a foolish consistency is the hobgoblin of little minds ;-)

A lot depends on whether you are writing English or American. In the UK
both endings (ise and ize) are used, so it can be quite legal to see
both in the same sentence depending on the word, writer and publication.
I believe in American, ize has been standardized (ha) upon.
http://www.askoxford.com/asktheexperts/faq/aboutspelling/ize?view=uk .
I've checked the "big" OED online and it actually didn't find anything
for initialise. Looking for initialize found a match but the only
reference to initialise there was in reference to the initialising the
letters of words of words (and someone who was an initialist).  Here's a
quote from the OED entry for initialize.

"2. trans. (Computers.) To set to the value, or put in the condition,
appropriate to the start of an operation.        
	Const. to.
1957 D. D. MCCRACKEN Digital Computer Programming xi. 146
[Instructions] 18 and 19 initialize the address of the instruction with
which numbers are brought in from temporary storage.  1961 N. CHAPIN
Programming Computers for Business   Applic. vii. 188 The failure..to
initialize the switch would produce errors. 1963 P. M.  SHERMAN
Programming & Coding..."

Correcting UK usage of ize/ise is a dicey business. Generally the best
course of action seems to be "don't worry about it".

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-23 17:27 ` [PATCH 1/6] Rewrite MSI-HOWTO Matthew Wilcox
  2009-02-24 20:00   ` Randy Dunlap
@ 2009-02-27  6:15   ` Grant Grundler
  2009-02-27 12:14     ` Matthew Wilcox
  1 sibling, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2009-02-27  6:15 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

Willy,
Some suggestions/comments below.

On Mon, Feb 23, 2009 at 12:27:57PM -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> I didn't find the previous version very useful, so I rewrote it.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> ---
>  Documentation/PCI/MSI-HOWTO.txt |  756 ++++++++++++++-------------------------
>  1 files changed, 275 insertions(+), 481 deletions(-)
> 
> diff --git a/Documentation/PCI/MSI-HOWTO.txt b/Documentation/PCI/MSI-HOWTO.txt
> index 256defd..f1b9f09 100644
> --- a/Documentation/PCI/MSI-HOWTO.txt
> +++ b/Documentation/PCI/MSI-HOWTO.txt

...
> +3. Why use MSIs?
> +
> +There are three reasons why using MSIs can give an advantage over
> +traditional pin-based interrupts.
...
> +PCI devices can only support a single pin-based interrupt per function.

Related to this is a 4th reason: distribute workload across CPUs
and enables construction of efficient, multi-queue devices.
Care to mention that?

...
> +The MSI-X capability is much more flexible than the MSI capability.
> +It supports up to 2048 interrupts, each of which can be separately
> +assigned.

Nothing describes "assignment" below or what is meant by "assigned".
My guess is you wanted to differentiate MSIX from MSI with:
  ... and each MSIX can be directed at a different CPU.


> +4.5 Considerations when using MSIs
> +
> +4.5.1 Choosing between MSI-X and MSI
> +
> +If your device supports both MSI-X and MSI capabilities, you should use
> +the MSI-X facilities in preference to the MSI facilities.  As mentioned
> +above, MSI-X supports any number of interrupts between 1 and 2048.
> +In constrast, MSI is restricted to a maximum of 32 interrupts (and
> +must be a power of two).  In addition, the MSI interrupt vectors must
> +be allocated consecutively, so the system may not be able to allocate
> +as many vectors for MSI as it could for MSI-X.  On some platforms, MSI
> +interrupts must all be targetted at the same set of CPUs whereas MSI-X
> +interrupts can all be targetted at different CPUs.

The description for MSI is correct. But Linux will only allocate one
MSI as noted in an earlier section. This section implies more could
be allocated when using MSI and that won't happen.

IIRC, for AHCI perf you were working on a patch to change that and
it should probably update this text at the same time when the
behavior changes.


> +5. MSI quirks
> +
> +Several PCI chipsets or devices are known not to support MSIs.
> +The PCI stack provides three ways to disable MSIs:
> +
> +1. globally
> +2. on all devices behind a specific bridge
> +3. on a single device
...
> +5.3. Disabling MSIs on a single device
> +
> +Some devices are known to have faulty MSI implementations.  Usually this
> +is handled in the individual device driver but occasionally it's necessary
> +to handle this with a quirk.  Some drivers have an option to disable MSIs;
> +this is deprecated.

"this" is ambiguous. My guess is "quirks to disable MSI for a device is
deprecated" since recently some drivers have added module parameters to
disable MSI.


> +5.4. Finding why MSIs are disabled on a device
> +
> +From the above three sections, you can see that there are many reasons
> +why MSIs may not be enabled for a given device.  Your first step should
> +be to examine your dmesg carefully to determine whether MSIs are enabled
> +for your machine.  You should also check your .config to be sure you
> +have enabled CONFIG_PCI_MSI.

Should mention "fgrep MSI /proc/interrupts" to see if any devices have
MSI in use?

> +
> +Then, lspci -t gives the list of bridges above a device.  Reading
> +/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
> +or disabled (0).  If 0 is found in any of the msi_bus files belonging
> +to bridges between the PCI root and the device, MSIs are disabled.
> +
> +It is also worth checking whether the device driver supports MSIs.

Suggestions on how to check?

Conversely, one can easily check if the driver has MSI disabled by default
and MSI can be enabled.  e.g. use "modinfo mvsas" to check driver parameters.


The parts I deleted look good.

Reviewed-by: Grant Grundler <grundler@parisc-linunx.org>

thanks,
grant

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-27  6:15   ` Grant Grundler
@ 2009-02-27 12:14     ` Matthew Wilcox
  2009-03-01 23:46       ` Michael Ellerman
  2009-03-02 20:33       ` Grant Grundler
  0 siblings, 2 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-02-27 12:14 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

On Thu, Feb 26, 2009 at 11:15:25PM -0700, Grant Grundler wrote:
> ...
> > +3. Why use MSIs?
> > +
> > +There are three reasons why using MSIs can give an advantage over
> > +traditional pin-based interrupts.
> ...
> > +PCI devices can only support a single pin-based interrupt per function.
> 
> Related to this is a 4th reason: distribute workload across CPUs
> and enables construction of efficient, multi-queue devices.
> Care to mention that?

That's true for MSI-X, but not for MSIs in general.  Workload is already
distributed across CPUs with round-robin interrupts.  I'm inclined to
leave out this level of detail.

> > +The MSI-X capability is much more flexible than the MSI capability.
> > +It supports up to 2048 interrupts, each of which can be separately
> > +assigned.
> 
> Nothing describes "assignment" below or what is meant by "assigned".
> My guess is you wanted to differentiate MSIX from MSI with:
>   ... and each MSIX can be directed at a different CPU.

I think 'each of which can be controlled separately' might work better.
For example, they're individually maskable which isn't (necessarily)
true of plain MSI.

> > +4.5 Considerations when using MSIs
> > +
> > +4.5.1 Choosing between MSI-X and MSI
> > +
> > +If your device supports both MSI-X and MSI capabilities, you should use
> > +the MSI-X facilities in preference to the MSI facilities.  As mentioned
> > +above, MSI-X supports any number of interrupts between 1 and 2048.
> > +In constrast, MSI is restricted to a maximum of 32 interrupts (and
> > +must be a power of two).  In addition, the MSI interrupt vectors must
> > +be allocated consecutively, so the system may not be able to allocate
> > +as many vectors for MSI as it could for MSI-X.  On some platforms, MSI
> > +interrupts must all be targetted at the same set of CPUs whereas MSI-X
> > +interrupts can all be targetted at different CPUs.
> 
> The description for MSI is correct. But Linux will only allocate one
> MSI as noted in an earlier section. This section implies more could
> be allocated when using MSI and that won't happen.
> 
> IIRC, for AHCI perf you were working on a patch to change that and
> it should probably update this text at the same time when the
> behavior changes.

Did you see this is patch 1/6?  ;-)  I removed the description of
pci_enable_msi_block() from this patch, but missed updating this
paragraph.  By patch 6/6, this paragraph is true.

> > +5. MSI quirks
> > +
> > +Several PCI chipsets or devices are known not to support MSIs.
> > +The PCI stack provides three ways to disable MSIs:
> > +
> > +1. globally
> > +2. on all devices behind a specific bridge
> > +3. on a single device
> ...
> > +5.3. Disabling MSIs on a single device
> > +
> > +Some devices are known to have faulty MSI implementations.  Usually this
> > +is handled in the individual device driver but occasionally it's necessary
> > +to handle this with a quirk.  Some drivers have an option to disable MSIs;
> > +this is deprecated.
> 
> "this" is ambiguous. My guess is "quirks to disable MSI for a device is
> deprecated" since recently some drivers have added module parameters to
> disable MSI.

Having an option to disable MSI is deprecated.  That doesn't mean that
individual driver authors aren't selfish and short-sighted.

> > +5.4. Finding why MSIs are disabled on a device
> > +
> > +From the above three sections, you can see that there are many reasons
> > +why MSIs may not be enabled for a given device.  Your first step should
> > +be to examine your dmesg carefully to determine whether MSIs are enabled
> > +for your machine.  You should also check your .config to be sure you
> > +have enabled CONFIG_PCI_MSI.
> 
> Should mention "fgrep MSI /proc/interrupts" to see if any devices have
> MSI in use?

Yes, you're right.

> > +Then, lspci -t gives the list of bridges above a device.  Reading
> > +/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
> > +or disabled (0).  If 0 is found in any of the msi_bus files belonging
> > +to bridges between the PCI root and the device, MSIs are disabled.
> > +
> > +It is also worth checking whether the device driver supports MSIs.
> 
> Suggestions on how to check?

'eg has calls to pci_enable_msi(), pci_enable_msix() or
pci_enable_msi_block()'?

> Conversely, one can easily check if the driver has MSI disabled by default
> and MSI can be enabled.  e.g. use "modinfo mvsas" to check driver parameters.

I'm not going to give examples of bad practise.

> Reviewed-by: Grant Grundler <grundler@parisc-linunx.org>

Thanks.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-27 12:14     ` Matthew Wilcox
@ 2009-03-01 23:46       ` Michael Ellerman
  2009-03-02 20:33       ` Grant Grundler
  1 sibling, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2009-03-01 23:46 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Grant Grundler, linux-pci, jbarnes, linux-kernel, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 2073 bytes --]

On Fri, 2009-02-27 at 05:14 -0700, Matthew Wilcox wrote:
> On Thu, Feb 26, 2009 at 11:15:25PM -0700, Grant Grundler wrote:
> > ...
> > > +3. Why use MSIs?
> > > +
> > > +There are three reasons why using MSIs can give an advantage over
> > > +traditional pin-based interrupts.
> > ...
> > > +PCI devices can only support a single pin-based interrupt per function.
> > 
> > Related to this is a 4th reason: distribute workload across CPUs
> > and enables construction of efficient, multi-queue devices.
> > Care to mention that?
> 
> That's true for MSI-X, but not for MSIs in general.  Workload is already
> distributed across CPUs with round-robin interrupts.  I'm inclined to
> leave out this level of detail.
> 
> > > +The MSI-X capability is much more flexible than the MSI capability.
> > > +It supports up to 2048 interrupts, each of which can be separately
> > > +assigned.
> > 
> > Nothing describes "assignment" below or what is meant by "assigned".
> > My guess is you wanted to differentiate MSIX from MSI with:
> >   ... and each MSIX can be directed at a different CPU.

So might each MSI, depending on the hardware.

> > > +5.4. Finding why MSIs are disabled on a device
> > > +
> > > +From the above three sections, you can see that there are many reasons
> > > +why MSIs may not be enabled for a given device.  Your first step should
> > > +be to examine your dmesg carefully to determine whether MSIs are enabled
> > > +for your machine.  You should also check your .config to be sure you
> > > +have enabled CONFIG_PCI_MSI.
> > 
> > Should mention "fgrep MSI /proc/interrupts" to see if any devices have
> > MSI in use?
> 
> Yes, you're right.

That won't work on (some) powerpc machines at least .. because MSIs just
get routed into the regular PIC.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-02-27 12:14     ` Matthew Wilcox
  2009-03-01 23:46       ` Michael Ellerman
@ 2009-03-02 20:33       ` Grant Grundler
  2009-03-02 21:01         ` Matthew Wilcox
  1 sibling, 1 reply; 23+ messages in thread
From: Grant Grundler @ 2009-03-02 20:33 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Grant Grundler, linux-pci, jbarnes, linux-kernel, Matthew Wilcox

On Fri, Feb 27, 2009 at 05:14:43AM -0700, Matthew Wilcox wrote:
> On Thu, Feb 26, 2009 at 11:15:25PM -0700, Grant Grundler wrote:
> > ...
> > > +3. Why use MSIs?
> > > +
> > > +There are three reasons why using MSIs can give an advantage over
> > > +traditional pin-based interrupts.
> > ...
> > > +PCI devices can only support a single pin-based interrupt per function.
> > 
> > Related to this is a 4th reason: distribute workload across CPUs
> > and enables construction of efficient, multi-queue devices.
> > Care to mention that?
> 
> That's true for MSI-X, but not for MSIs in general.  Workload is already
> distributed across CPUs with round-robin interrupts.  I'm inclined to
> leave out this level of detail.

I'm Ok with omitting it.

AFAICT "round-robin" was a behavior of older kernels.
All the x86 platforms I've looked at direct the MSI to exactly
one CPU.

> 
> > > +The MSI-X capability is much more flexible than the MSI capability.
> > > +It supports up to 2048 interrupts, each of which can be separately
> > > +assigned.
> > 
> > Nothing describes "assignment" below or what is meant by "assigned".
> > My guess is you wanted to differentiate MSIX from MSI with:
> >   ... and each MSIX can be directed at a different CPU.
> 
> I think 'each of which can be controlled separately' might work better.
> For example, they're individually maskable which isn't (necessarily)
> true of plain MSI.

Sounds good to me.


...
> > The description for MSI is correct. But Linux will only allocate one
> > MSI as noted in an earlier section. This section implies more could
> > be allocated when using MSI and that won't happen.
> > 
> > IIRC, for AHCI perf you were working on a patch to change that and
> > it should probably update this text at the same time when the
> > behavior changes.
> 
> Did you see this is patch 1/6?  ;-)

yes....after I hit send and continued reviewing the rest of the patches. ;)

>  I removed the description of
> pci_enable_msi_block() from this patch, but missed updating this
> paragraph.  By patch 6/6, this paragraph is true.

Yup - agreed.

> > ...
> > > +5.3. Disabling MSIs on a single device
> > > +
> > > +Some devices are known to have faulty MSI implementations.  Usually this
> > > +is handled in the individual device driver but occasionally it's necessary
> > > +to handle this with a quirk.  Some drivers have an option to disable MSIs;
> > > +this is deprecated.
> > 
> > "this" is ambiguous. My guess is "quirks to disable MSI for a device is
> > deprecated" since recently some drivers have added module parameters to
> > disable MSI.
> 
> Having an option to disable MSI is deprecated.  That doesn't mean that
> individual driver authors aren't selfish and short-sighted.

Ok. Here's a suggestion on how to say that:
Driver options to disable MSI are deprecated and will be removed in the future.

But anything you like better is fine with me.

> > Should mention "fgrep MSI /proc/interrupts" to see if any devices have
> > MSI in use?
> 
> Yes, you're right.
> 
> > > +Then, lspci -t gives the list of bridges above a device.  Reading
> > > +/sys/bus/pci/devices/*/msi_bus will tell you whether MSI are enabled (1)
> > > +or disabled (0).  If 0 is found in any of the msi_bus files belonging
> > > +to bridges between the PCI root and the device, MSIs are disabled.
> > > +
> > > +It is also worth checking whether the device driver supports MSIs.
> > 
> > Suggestions on how to check?
> 
> 'eg has calls to pci_enable_msi(), pci_enable_msix() or
> pci_enable_msi_block()'?

Yeah, that should work.
Anyone reading this doc has obviously found a source tree. ;)

> > Conversely, one can easily check if the driver has MSI disabled by default
> > and MSI can be enabled.  e.g. use "modinfo mvsas" to check driver parameters.
> 
> I'm not going to give examples of bad practise.

*nod* I agree it would encourage use and should not be included.
 
cheers,
grant

> > Reviewed-by: Grant Grundler <grundler@parisc-linunx.org>
> 
> Thanks.
> 
> -- 
> Matthew Wilcox				Intel Open Source Technology Centre
> "Bill, look, we understand that you're interested in selling us this
> operating system, but compare it to ours.  We can't possibly take such
> a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 1/6] Rewrite MSI-HOWTO
  2009-03-02 20:33       ` Grant Grundler
@ 2009-03-02 21:01         ` Matthew Wilcox
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-03-02 21:01 UTC (permalink / raw)
  To: Grant Grundler; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

On Mon, Mar 02, 2009 at 01:33:40PM -0700, Grant Grundler wrote:
> AFAICT "round-robin" was a behavior of older kernels.
> All the x86 platforms I've looked at direct the MSI to exactly
> one CPU.

I think it's a factor of your chipset.  For example, my laptop:

$ grep MSI /proc/interrupts 
 26:      33095      33046   PCI-MSI-edge      eth0
 27:     127014     126190   PCI-MSI-edge      ahci
 28:          0          0   PCI-MSI-edge      ahci
 29:          0          0   PCI-MSI-edge      ahci
 31:     525988     518310   PCI-MSI-edge      iwlagn

I know my G33-based desktop also distributes interrupts equally across
all four cores.  It is, of course, possibly to manually set the
affinity, and perhaps that's what you're seeing.

> > Did you see this is patch 1/6?  ;-)
> 
> yes....after I hit send and continued reviewing the rest of the patches. ;)

It's good to know somebody looked at patches 2-6 because I've not had
any comments yet.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 2/6] PCI MSI: Replace 'type' with 'is_msix'
  2009-02-23 17:27 ` [PATCH 2/6] PCI MSI: Replace 'type' with 'is_msix' Matthew Wilcox
@ 2009-03-03  0:16   ` Michael Ellerman
  0 siblings, 0 replies; 23+ messages in thread
From: Michael Ellerman @ 2009-03-03  0:16 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 1014 bytes --]

On Mon, 2009-02-23 at 12:27 -0500, Matthew Wilcox wrote:
> By changing from a 5-bit field to a 1-bit field, we free up some bits
> that can be used by a later patch.  Also rearrange the fields for better
> packing on 64-bit platforms (reducing the size of msi_desc from 72 bytes
> to 64 bytes).
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index dceea56..b3db438 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c

> @@ -393,7 +355,7 @@ static int msi_capability_init(struct pci_dev *dev)
>  	if (!entry)
>  		return -ENOMEM;
>  
> -	entry->msi_attrib.type = PCI_CAP_ID_MSI;
> +	entry->msi_attrib.is_msix = 0;

This isn't strictly necessary given we kzalloc'ed the entry, but no
biggie.

Looks good otherwise.

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] PCI MSI: Refactor interrupt masking code
  2009-02-23 17:28 ` [PATCH 5/6] PCI MSI: Refactor interrupt masking code Matthew Wilcox
@ 2009-03-03  0:16   ` Michael Ellerman
  2009-03-16 21:01     ` Matthew Wilcox
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2009-03-03  0:16 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 10004 bytes --]

On Mon, 2009-02-23 at 12:28 -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Since most of the callers already know whether they have an MSI or
> an MSI-X capability, split msi_set_mask_bits() into msi_mask_irq()
> and msix_mask_irq().  The only callers which don't (mask_msi_irq()
> and unmask_msi_irq()) can share code in msi_set_mask_bit().  This then
> becomes the only caller of msix_flush_writes(), so we can inline it.
> The flushing read can be to any address that belongs to the device,
> so we can eliminate the calculation too.
> 
> We can also get rid of maskbits_mask from struct msi_desc and simply
> recalculate it on the rare occasion that we need it.  The single-bit
> 'masked' element is replaced by a copy of the 32-bit 'masked' register,
> so this patch does not affect the size of msi_desc.
> 
> Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
> ---
>  drivers/pci/msi.c   |  157 +++++++++++++++++++++++++--------------------------
>  include/linux/msi.h |    5 +-
>  2 files changed, 79 insertions(+), 83 deletions(-)
> 
> diff --git a/drivers/pci/msi.c b/drivers/pci/msi.c
> index fcde04d..41f18cb 100644
> --- a/drivers/pci/msi.c
> +++ b/drivers/pci/msi.c
> @@ -105,17 +105,14 @@ static inline __attribute_const__ u32 msi_mask(unsigned x)
>  	return (1 << (1 << x)) - 1;
>  }
>  
> -static void msix_flush_writes(struct irq_desc *desc)
> +static inline __attribute_const__ u32 msi_capable_mask(u16 control)
>  {

/me wonders why this is __attribute_const__ and not __const like all the
other shorthands, but not your fault.

> @@ -127,34 +124,62 @@ static void msix_flush_writes(struct irq_desc *desc)
>   * Returns 1 if it succeeded in masking the interrupt and 0 if the device
>   * doesn't support MSI masking.
>   */
> -static int msi_set_mask_bits(struct irq_desc *desc, u32 mask, u32 flag)
> +static int msi_mask_irq(struct msi_desc *desc, u32 mask, u32 flag)
>  {
> -	struct msi_desc *entry;
> +	u32 mask_bits = desc->masked;
>  
> -	entry = get_irq_desc_msi(desc);
> -	BUG_ON(!entry);
> -	if (entry->msi_attrib.is_msix) {
> -		int offset = entry->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
> -			PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
> -		writel(flag, entry->mask_base + offset);
> -		readl(entry->mask_base + offset);
> -	} else {
> -		int pos;
> -		u32 mask_bits;
> +	if (!desc->msi_attrib.maskbit)
> +		return 0;
>  
> -		if (!entry->msi_attrib.maskbit)
> -			return 0;
> +	mask_bits &= ~mask;
> +	mask_bits |= flag;
> +	pci_write_config_dword(desc->dev, desc->mask_pos, mask_bits);
> +	desc->masked = mask_bits;
>  
> -		pos = entry->mask_pos;
> -		pci_read_config_dword(entry->dev, pos, &mask_bits);
> -		mask_bits &= ~mask;
> -		mask_bits |= flag & mask;
> -		pci_write_config_dword(entry->dev, pos, mask_bits);
> -	}
> -	entry->msi_attrib.masked = !!flag;
>  	return 1;
>  }
>  
> +/*
> + * This internal function does not flush PCI writes to the device.
> + * All users must ensure that they read from the device before either
> + * assuming that the device state is up to date, or returning out of this
> + * file.  This saves a few milliseconds when initialising devices with lots
> + * of MSI-X interrupts.
> + */
> +static void msix_mask_irq(struct msi_desc *desc, u32 flag)
> +{
> +	u32 mask_bits = desc->masked;
> +	unsigned offset = desc->msi_attrib.entry_nr * PCI_MSIX_ENTRY_SIZE +
> +					PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET;
> +	mask_bits &= ~1;
> +	mask_bits |= flag;
> +	writel(mask_bits, desc->mask_base + offset);
> +	desc->masked = mask_bits;
> +}
> +
> +static int msi_set_mask_bit(unsigned irq, u32 flag)
> +{
> +	struct msi_desc *desc = get_irq_msi(irq);
> +
> +	if (desc->msi_attrib.is_msix) {
> +		msix_mask_irq(desc, flag);
> +		readl(desc->mask_base);		/* Flush write to device */
> +		return 1;
> +	} else {
> +		return msi_mask_irq(desc, 1, flag);
> +	}
> +}
> +
> +void mask_msi_irq(unsigned int irq)
> +{
> +	msi_set_mask_bit(irq, 1);
> +}
> +
> +void unmask_msi_irq(unsigned int irq)
> +{
> +	msi_set_mask_bit(irq, 0);
> +}

I don't see why msi_set_mask_bit() or msi_mask_irq() need to return
anything, their return values are never used AFAICT.

> +
>  void read_msi_msg_desc(struct irq_desc *desc, struct msi_msg *msg)
>  {
>  	struct msi_desc *entry = get_irq_desc_msi(desc);
> @@ -230,22 +255,6 @@ void write_msi_msg(unsigned int irq, struct msi_msg *msg)
>  	write_msi_msg_desc(desc, msg);
>  }
>  
> -void mask_msi_irq(unsigned int irq)
> -{
> -	struct irq_desc *desc = irq_to_desc(irq);
> -
> -	msi_set_mask_bits(desc, 1, 1);
> -	msix_flush_writes(desc);
> -}
> -
> -void unmask_msi_irq(unsigned int irq)
> -{
> -	struct irq_desc *desc = irq_to_desc(irq);
> -
> -	msi_set_mask_bits(desc, 1, 0);
> -	msix_flush_writes(desc);
> -}
> -
>  static int msi_free_irqs(struct pci_dev* dev);
>  
>  static struct msi_desc *alloc_msi_entry(struct pci_dev *dev)
> @@ -281,13 +290,9 @@ static void __pci_restore_msi_state(struct pci_dev *dev)
>  	pci_intx_for_msi(dev, 0);
>  	msi_set_enable(dev, 0);
>  	write_msi_msg(dev->irq, &entry->msg);
> -	if (entry->msi_attrib.maskbit) {
> -		struct irq_desc *desc = irq_to_desc(dev->irq);
> -		msi_set_mask_bits(desc, entry->msi_attrib.maskbits_mask,
> -				  entry->msi_attrib.masked);
> -	}
>  
>  	pci_read_config_word(dev, pos + PCI_MSI_FLAGS, &control);
> +	msi_mask_irq(entry, msi_capable_mask(control), entry->masked);
>  	control &= ~PCI_MSI_FLAGS_QSIZE;
>  	control |= PCI_MSI_FLAGS_ENABLE;
>  	pci_write_config_word(dev, pos + PCI_MSI_FLAGS, control);
> @@ -307,9 +312,8 @@ static void __pci_restore_msix_state(struct pci_dev *dev)
>  	msix_set_enable(dev, 0);
>  
>  	list_for_each_entry(entry, &dev->msi_list, list) {
> -		struct irq_desc *desc = irq_to_desc(entry->irq);
>  		write_msi_msg(entry->irq, &entry->msg);
> -		msi_set_mask_bits(desc, 1, entry->msi_attrib.masked);
> +		msix_mask_irq(entry, entry->masked);
>  	}
>  
>  	BUG_ON(list_empty(&dev->msi_list));
> @@ -342,6 +346,7 @@ static int msi_capability_init(struct pci_dev *dev)
>  	struct msi_desc *entry;
>  	int pos, ret;
>  	u16 control;
> +	unsigned mask;
>  
>  	msi_set_enable(dev, 0);	/* Ensure msi is disabled as I set it up */
>  
> @@ -356,21 +361,15 @@ static int msi_capability_init(struct pci_dev *dev)
>  	entry->msi_attrib.is_64 = is_64bit_address(control);
>  	entry->msi_attrib.entry_nr = 0;
>  	entry->msi_attrib.maskbit = is_mask_bit_support(control);
> -	entry->msi_attrib.masked = 1;
>  	entry->msi_attrib.default_irq = dev->irq;	/* Save IOAPIC IRQ */
>  	entry->msi_attrib.pos = pos;
> -	if (entry->msi_attrib.maskbit) {
> -		unsigned int base, maskbits, temp;
> -
> -		base = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
> -		entry->mask_pos = base;
> -		/* All MSIs are unmasked by default, Mask them all */
> -		pci_read_config_dword(dev, base, &maskbits);
> -		temp = msi_mask((control & PCI_MSI_FLAGS_QMASK) >> 1);
> -		maskbits |= temp;
> -		pci_write_config_dword(dev, base, maskbits);
> -		entry->msi_attrib.maskbits_mask = temp;
> -	}
> +
> +	entry->mask_pos = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
> +	/* All MSIs are unmasked by default, Mask them all */
> +	pci_read_config_dword(dev, entry->mask_pos, &entry->masked);
> +	mask = msi_capable_mask(control);
> +	msi_mask_irq(entry, mask, mask);

This looked a little weird at first, in that we're unconditionally doing
the mask - but we're not, msi_mask_irq() checks for us. I guess it's no
drama reading from mask_pos even if it's not implemented.

> @@ -435,11 +434,12 @@ static int msix_capability_init(struct pci_dev *dev,
>  		entry->msi_attrib.is_msix = 1;
>  		entry->msi_attrib.is_64 = 1;
>  		entry->msi_attrib.entry_nr = j;
> -		entry->msi_attrib.maskbit = 1;
> -		entry->msi_attrib.masked = 1;
>  		entry->msi_attrib.default_irq = dev->irq;
>  		entry->msi_attrib.pos = pos;
>  		entry->mask_base = base;
> +		entry->masked = readl(base + j * PCI_MSIX_ENTRY_SIZE +
> +					PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> +		msix_mask_irq(entry, 1);

I was going to say "why bother with the readl". But checking the spec,
the rest of the bits are reserved and we mustn't muck with them.

> @@ -556,9 +556,11 @@ int pci_enable_msi(struct pci_dev* dev)
>  }
>  EXPORT_SYMBOL(pci_enable_msi);
>  
> -void pci_msi_shutdown(struct pci_dev* dev)
> +void pci_msi_shutdown(struct pci_dev *dev)
>  {
> -	struct msi_desc *entry;
> +	struct msi_desc *desc;
> +	u32 mask;
> +	u16 ctrl;
>  
>  	if (!pci_msi_enable || !dev || !dev->msi_enabled)
>  		return;
> @@ -568,18 +570,13 @@ void pci_msi_shutdown(struct pci_dev* dev)
>  	dev->msi_enabled = 0;
>  
>  	BUG_ON(list_empty(&dev->msi_list));
> -	entry = list_entry(dev->msi_list.next, struct msi_desc, list);
> -	/* Return the the pci reset with msi irqs unmasked */
> -	if (entry->msi_attrib.maskbit) {
> -		u32 mask = entry->msi_attrib.maskbits_mask;
> -		struct irq_desc *desc = irq_to_desc(dev->irq);
> -		msi_set_mask_bits(desc, mask, ~mask);
> -	}
> -	if (entry->msi_attrib.is_msix)
> -		return;

You loose this return case, but we should never have hit it AFAICS
because of the check of !dev->msi_enabled earlier - so I think it's ok.

> +	desc = list_first_entry(&dev->msi_list, struct msi_desc, list);
> +	pci_read_config_word(dev, desc->msi_attrib.pos + PCI_MSI_FLAGS, &ctrl);
> +	mask = msi_capable_mask(ctrl);
> +	msi_mask_irq(desc, mask, ~mask);
>  
>  	/* Restore dev->irq to its default pin-assertion irq */
> -	dev->irq = entry->msi_attrib.default_irq;
> +	dev->irq = desc->msi_attrib.default_irq;
>  }


cheers 

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] PCI MSI: Add support for multiple MSI
  2009-02-23 17:28 ` [PATCH 6/6] PCI MSI: Add support for multiple MSI Matthew Wilcox
@ 2009-03-03  0:16   ` Michael Ellerman
  2009-03-16 21:07     ` Matthew Wilcox
  0 siblings, 1 reply; 23+ messages in thread
From: Michael Ellerman @ 2009-03-03  0:16 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

[-- Attachment #1: Type: text/plain, Size: 1460 bytes --]

On Mon, 2009-02-23 at 12:28 -0500, Matthew Wilcox wrote:
> From: Matthew Wilcox <willy@linux.intel.com>
> 
> Add the new API pci_enable_msi_block() to allow drivers to
> request multiple MSI and reimplement pci_enable_msi in terms of
> pci_enable_msi_block.  Ensure that the architecture back ends don't
> have to know about multiple MSI.
> 

> diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
> index 3bb7d3d..0c16e2a 100644
> --- a/arch/powerpc/kernel/msi.c
> +++ b/arch/powerpc/kernel/msi.c
> @@ -19,6 +19,10 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
>  		return -ENOSYS;
>  	}
>  
> +	/* PowerPC doesn't support multiple MSI yet */
> +	if (type == PCI_CAP_ID_MSI && nvec > 1)
> +		return 1;
> +
>  	if (ppc_md.msi_check_device) {
>  		pr_debug("msi: Using platform check routine.\n");
>  		return ppc_md.msi_check_device(dev, nvec, type);

That bit:

Acked-by: Michael Ellerman <michael@ellerman.id.au>

I'll tell benh to expect this hunk to show up sometime via Jesse's tree.

Looks good otherwise, not sure if we'll ever implement it, but good to
have. Is AHCI the only driver you're planning on using it for ATM?

cheers

-- 
Michael Ellerman
OzLabs, IBM Australia Development Lab

wwweb: http://michael.ellerman.id.au
phone: +61 2 6212 1183 (tie line 70 21183)

We do not inherit the earth from our ancestors,
we borrow it from our children. - S.M.A.R.T Person

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 197 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support for multiple MSI
  2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
                   ` (5 preceding siblings ...)
  2009-02-23 17:28 ` [PATCH 6/6] PCI MSI: Add support for multiple MSI Matthew Wilcox
@ 2009-03-04 14:52 ` Eric W. Biederman
  2009-03-04 22:26   ` Matthew Wilcox
  6 siblings, 1 reply; 23+ messages in thread
From: Eric W. Biederman @ 2009-03-04 14:52 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: linux-pci, jbarnes, linux-kernel

Matthew Wilcox <matthew@wil.cx> writes:

> Currently, Linux supports multiple MSI-X interrupts per device, but only
> a single MSI interrupt.  This patch series adds support to the generic
> PCI code for supporting multiple MSI interrupts.  Architectures will
> need to add support for multiple MSIs, and I have a patch to do that for
> x86 (which needs some more work).  Getting this patch series in first
> is important so we can start supporting this interface in drivers and
> architectures independently.

Do we have any benchmarks anywhere that show that multiple msi support
gains us something?

The requirement to allocate a contiguous block of vector numbers worries
me for the x86 implementation.  I don't like the idea of having to deal
with allocations that can fail because of fragmentation.

The fact that we also can not honor the irq affinity properly for multiple
msi also disturbs me.

At a quick skim your patchset is only the generic code without a single
architecture specific implementation so it appears you have not done the
hard work on figuring out how to deal with multiple msi in the real
world.

Given that msi-x does not have any of these issues without data to say
that there is a true gain in supporting multi-msi I don't see the point
of supporting it.

Eric

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: Support for multiple MSI
  2009-03-04 14:52 ` Support " Eric W. Biederman
@ 2009-03-04 22:26   ` Matthew Wilcox
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-03-04 22:26 UTC (permalink / raw)
  To: Eric W. Biederman; +Cc: linux-pci, jbarnes, linux-kernel

On Wed, Mar 04, 2009 at 06:52:39AM -0800, Eric W. Biederman wrote:
> Do we have any benchmarks anywhere that show that multiple msi support
> gains us something?

Yep.  I get a 1% performance gain when doing

dd if=/dev/sdb of=/dev/null bs=512 count=10000000 iflag=direct

This is less improvement than I had expected, and I'm trying to find out
why before I publish the rest of the code.

> The requirement to allocate a contiguous block of vector numbers worries
> me for the x86 implementation.  I don't like the idea of having to deal
> with allocations that can fail because of fragmentation.

Multiple MSI support can fail for any number of reasons, fragmentation
is only one of them.  Drivers just have to deal with it.

> The fact that we also can not honor the irq affinity properly for multiple
> msi also disturbs me.

That's an x86 limitation; powerpc can independently steer MSI.

> At a quick skim your patchset is only the generic code without a single
> architecture specific implementation so it appears you have not done the
> hard work on figuring out how to deal with multiple msi in the real
> world.

I think that's terribly unfair.  How dare you insinuate I have done no
testing of my code?  I've published code before that implements multiple
MSI for x86-64, and I've adapted that code to the current tree.  When I
have enough time to do so, I'm going to adapt it to Ingo's -tip and
publish it.  There will be many things to criticise in it which I'm sure
will make you happy.

> Given that msi-x does not have any of these issues without data to say
> that there is a true gain in supporting multi-msi I don't see the point
> of supporting it.

There are people who have implemented it in silicon, and there is evidence of
performance improvement with Linux and with other operating systems.
Just because you don't like its limitations on x86 is not a valid reason
to keep it out.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 5/6] PCI MSI: Refactor interrupt masking code
  2009-03-03  0:16   ` Michael Ellerman
@ 2009-03-16 21:01     ` Matthew Wilcox
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-03-16 21:01 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

On Tue, Mar 03, 2009 at 11:16:12AM +1100, Michael Ellerman wrote:
> I don't see why msi_set_mask_bit() or msi_mask_irq() need to return
> anything, their return values are never used AFAICT.

You're right.  Changed to void.

> > @@ -356,21 +361,15 @@ static int msi_capability_init(struct pci_dev *dev)
> >  	entry->msi_attrib.is_64 = is_64bit_address(control);
> >  	entry->msi_attrib.entry_nr = 0;
> >  	entry->msi_attrib.maskbit = is_mask_bit_support(control);
> > -	entry->msi_attrib.masked = 1;
> >  	entry->msi_attrib.default_irq = dev->irq;	/* Save IOAPIC IRQ */
> >  	entry->msi_attrib.pos = pos;
> > -	if (entry->msi_attrib.maskbit) {
> > -		unsigned int base, maskbits, temp;
> > -
> > -		base = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
> > -		entry->mask_pos = base;
> > -		/* All MSIs are unmasked by default, Mask them all */
> > -		pci_read_config_dword(dev, base, &maskbits);
> > -		temp = msi_mask((control & PCI_MSI_FLAGS_QMASK) >> 1);
> > -		maskbits |= temp;
> > -		pci_write_config_dword(dev, base, maskbits);
> > -		entry->msi_attrib.maskbits_mask = temp;
> > -	}
> > +
> > +	entry->mask_pos = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
> > +	/* All MSIs are unmasked by default, Mask them all */
> > +	pci_read_config_dword(dev, entry->mask_pos, &entry->masked);
> > +	mask = msi_capable_mask(control);
> > +	msi_mask_irq(entry, mask, mask);
> 
> This looked a little weird at first, in that we're unconditionally doing
> the mask - but we're not, msi_mask_irq() checks for us. I guess it's no
> drama reading from mask_pos even if it's not implemented.

Hm, wasn't quite my intent.  Here's the replacement:

        entry->mask_pos = msi_mask_bits_reg(pos, entry->msi_attrib.is_64);
        /* All MSIs are unmasked by default, Mask them all */
        if (entry->msi_attrib.maskbit)
                pci_read_config_dword(dev, entry->mask_pos, &entry->masked);
        mask = msi_capable_mask(control);
        msi_mask_irq(entry, mask, mask);

So mask_pos still points somewhere bogus, but all uses of it are now guarded by msi_attrib.maskbit, which is OK.

> > @@ -435,11 +434,12 @@ static int msix_capability_init(struct pci_dev *dev,
> >  		entry->msi_attrib.is_msix = 1;
> >  		entry->msi_attrib.is_64 = 1;
> >  		entry->msi_attrib.entry_nr = j;
> > -		entry->msi_attrib.maskbit = 1;
> > -		entry->msi_attrib.masked = 1;
> >  		entry->msi_attrib.default_irq = dev->irq;
> >  		entry->msi_attrib.pos = pos;
> >  		entry->mask_base = base;
> > +		entry->masked = readl(base + j * PCI_MSIX_ENTRY_SIZE +
> > +					PCI_MSIX_ENTRY_VECTOR_CTRL_OFFSET);
> > +		msix_mask_irq(entry, 1);
> 
> I was going to say "why bother with the readl". But checking the spec,
> the rest of the bits are reserved and we mustn't muck with them.

Yeah, we've got away with that until now.  I just checked PCIe 2.1
(out today), and, er, it seems we can't rely on that any longer.
Something about a "TPH Requester Capability" and a "Steering Tag".
I'm looking forward to learning more about those in the next few months.

> > @@ -568,18 +570,13 @@ void pci_msi_shutdown(struct pci_dev* dev)
> >  	dev->msi_enabled = 0;
> >  
> >  	BUG_ON(list_empty(&dev->msi_list));
> > -	entry = list_entry(dev->msi_list.next, struct msi_desc, list);
> > -	/* Return the the pci reset with msi irqs unmasked */
> > -	if (entry->msi_attrib.maskbit) {
> > -		u32 mask = entry->msi_attrib.maskbits_mask;
> > -		struct irq_desc *desc = irq_to_desc(dev->irq);
> > -		msi_set_mask_bits(desc, mask, ~mask);
> > -	}
> > -	if (entry->msi_attrib.is_msix)
> > -		return;
> 
> You loose this return case, but we should never have hit it AFAICS
> because of the check of !dev->msi_enabled earlier - so I think it's ok.

Yeah, I deleted it on purpose.


Thanks for the review!

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [PATCH 6/6] PCI MSI: Add support for multiple MSI
  2009-03-03  0:16   ` Michael Ellerman
@ 2009-03-16 21:07     ` Matthew Wilcox
  0 siblings, 0 replies; 23+ messages in thread
From: Matthew Wilcox @ 2009-03-16 21:07 UTC (permalink / raw)
  To: Michael Ellerman; +Cc: linux-pci, jbarnes, linux-kernel, Matthew Wilcox

On Tue, Mar 03, 2009 at 11:16:13AM +1100, Michael Ellerman wrote:
> > diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c
> > index 3bb7d3d..0c16e2a 100644
> > --- a/arch/powerpc/kernel/msi.c
> > +++ b/arch/powerpc/kernel/msi.c
> > @@ -19,6 +19,10 @@ int arch_msi_check_device(struct pci_dev* dev, int nvec, int type)
> >  		return -ENOSYS;
> >  	}
> >  
> > +	/* PowerPC doesn't support multiple MSI yet */
> > +	if (type == PCI_CAP_ID_MSI && nvec > 1)
> > +		return 1;
> > +
> >  	if (ppc_md.msi_check_device) {
> >  		pr_debug("msi: Using platform check routine.\n");
> >  		return ppc_md.msi_check_device(dev, nvec, type);
> 
> That bit:
> 
> Acked-by: Michael Ellerman <michael@ellerman.id.au>
> 
> I'll tell benh to expect this hunk to show up sometime via Jesse's tree.

Thanks!

> Looks good otherwise, not sure if we'll ever implement it, but good to
> have. Is AHCI the only driver you're planning on using it for ATM?

That's the only one I know where it's important.  I think some PCIe
bridges would like to have multiple MSI.  And I think some hardware
designs (not from Intel, from other companies who've talked to me) are
in the pipe to use multiple MSI, because it's cheaper than using MSI-X.

-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2009-03-16 21:07 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-02-23 17:27 Support for multiple MSI Matthew Wilcox
2009-02-23 17:27 ` [PATCH 1/6] Rewrite MSI-HOWTO Matthew Wilcox
2009-02-24 20:00   ` Randy Dunlap
2009-02-24 20:28     ` Matthew Wilcox
2009-02-24 20:55       ` Randy Dunlap
2009-02-25  7:34       ` Sitsofe Wheeler
2009-02-27  6:15   ` Grant Grundler
2009-02-27 12:14     ` Matthew Wilcox
2009-03-01 23:46       ` Michael Ellerman
2009-03-02 20:33       ` Grant Grundler
2009-03-02 21:01         ` Matthew Wilcox
2009-02-23 17:27 ` [PATCH 2/6] PCI MSI: Replace 'type' with 'is_msix' Matthew Wilcox
2009-03-03  0:16   ` Michael Ellerman
2009-02-23 17:27 ` [PATCH 3/6] PCI MSI: msi_desc->dev is always initialised Matthew Wilcox
2009-02-23 17:28 ` [PATCH 4/6] PCI MSI: Use mask_pos instead of mask_base when appropriate Matthew Wilcox
2009-02-23 17:28 ` [PATCH 5/6] PCI MSI: Refactor interrupt masking code Matthew Wilcox
2009-03-03  0:16   ` Michael Ellerman
2009-03-16 21:01     ` Matthew Wilcox
2009-02-23 17:28 ` [PATCH 6/6] PCI MSI: Add support for multiple MSI Matthew Wilcox
2009-03-03  0:16   ` Michael Ellerman
2009-03-16 21:07     ` Matthew Wilcox
2009-03-04 14:52 ` Support " Eric W. Biederman
2009-03-04 22:26   ` Matthew Wilcox

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.