All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Question on MSI support in PCI and PCI-E devices
       [not found] <f236217608b24a5e976628fe31d41a03@BRMWP-EXMB11.corp.brocade.com>
@ 2015-02-12 14:48 ` Stephen Hemminger
  2015-02-12 15:11     ` Andrey Utkin
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2015-02-12 14:48 UTC (permalink / raw)
  To: Andrey Utkin; +Cc: linux-kernel, kernelnewbies, kernel-mentors

On Wed, 11 Feb 2015 18:19:00 +0000
Andrey Utkin <andrey.krieger.utkin@gmail.com> wrote:

> Is it true that _every_ PCI or PCI Express device supporting MSI is
> indicated by some mention of MSI in "lspci -v", and if there's no such
> mention, it surely doesn't support MSI?
> 

Look at kernel source (drivers/pci/msi.c) function pci_msi_supported
there are many things which can block MSI.

There can be cases where PCI quirks in kernel block MSI
because for example the device supports MSI, but the motherboard
BIOS is broken. This only happens on really old systems.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on MSI support in PCI and PCI-E devices
  2015-02-12 14:48 ` Question on MSI support in PCI and PCI-E devices Stephen Hemminger
@ 2015-02-12 15:11     ` Andrey Utkin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Utkin @ 2015-02-12 15:11 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: linux-kernel, kernelnewbies, kernel-mentors

2015-02-12 16:48 GMT+02:00 Stephen Hemminger <stephen@networkplumber.org>:
> On Wed, 11 Feb 2015 18:19:00 +0000
> Andrey Utkin <andrey.krieger.utkin@gmail.com> wrote:
>
>> Is it true that _every_ PCI or PCI Express device supporting MSI is
>> indicated by some mention of MSI in "lspci -v", and if there's no such
>> mention, it surely doesn't support MSI?
>>
>
> Look at kernel source (drivers/pci/msi.c) function pci_msi_supported
> there are many things which can block MSI.
>
> There can be cases where PCI quirks in kernel block MSI
> because for example the device supports MSI, but the motherboard
> BIOS is broken. This only happens on really old systems.

Thank you for your reply.
However, I was more interested in the case when lspci for device
doesn't mention MSI at all, so I wonder if it makes sense to try to
enable it in the driver at all.

04:05.0 Multimedia video controller: Bluecherry BC-H16480A 16 port
H.264 video and audio encoder / decoder
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64 (250ns max), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at faff0000 (32-bit, prefetchable) [size=64K]
        Kernel driver in use: solo6x10

We have such cards, and we have issues with them - at some moment they
stop producing interrupts. No matter whether they share interrupt
number or not.
There was a recent commit from Krzysztof Hałasa
(3c787b108fe0d1c341a76e718a25897ae14673cf) which improved things, but
the issue still happens regularly on some setups.
Now I've tried the following change, i've introduced such a loop which
I see in bt8xx and ddbridge drivers. This also didn't help. So I'm out
of ideas now (any comments are highly appreciated!); I have read about
MSI, that this interrupts transmission mechanism is more reliable and
fast, but from lspci output it is not clear whether our cards support
MSI at all.

--- a/drivers/media/pci/solo6x10/solo6x10-core.c
+++ b/drivers/media/pci/solo6x10/solo6x10-core.c
@@ -100,10 +100,13 @@ static irqreturn_t solo_isr(int irq, void *data)
        struct solo_dev *solo_dev = data;
        u32 status;
        int i;
+       int handled = 0;

+       while (1) {
        status = solo_reg_read(solo_dev, SOLO_IRQ_STAT);
        if (!status)
-               return IRQ_NONE;
+               break;
+       handled++;

        /* Acknowledge all interrupts immediately */
        solo_reg_write(solo_dev, SOLO_IRQ_STAT, status);
@@ -129,7 +132,11 @@ static irqreturn_t solo_isr(int irq, void *data)
        if (status & SOLO_IRQ_G723)
                solo_g723_isr(solo_dev);

-       return IRQ_HANDLED;
+       }
+
+       if (handled > 1)
+               solo_dev->isr_more_laps++;
+       return IRQ_RETVAL(handled);
 }

 static void free_solo_dev(struct solo_dev *solo_dev)
@@ -232,6 +239,16 @@ static ssize_t p2m_timeouts_show(struct device *dev,
        return sprintf(buf, "%d\n", solo_dev->p2m_timeouts);
 }

+static ssize_t isr_more_laps_show(struct device *dev,
+                                 struct device_attribute *attr,
+                                 char *buf)
+{
+       struct solo_dev *solo_dev =
+               container_of(dev, struct solo_dev, dev);
+
+       return sprintf(buf, "%d\n", solo_dev->isr_more_laps);
+}
+
 static ssize_t sdram_size_show(struct device *dev,
                               struct device_attribute *attr,
                               char *buf)
@@ -415,6 +432,7 @@ static const struct device_attribute solo_dev_attrs[] = {
        __ATTR_RO(input_map),
        __ATTR_RO(intervals),
        __ATTR_RO(sdram_offsets),
+       __ATTR_RO(isr_more_laps),
 };

 static void solo_device_release(struct device *dev)
index d19c0ae..dffd7d7
--- a/drivers/media/pci/solo6x10/solo6x10-enc.c
+++ b/drivers/media/pci/solo6x10/solo6x10-enc.c
@@ -28,7 +28,7 @@
 #define VI_PROG_HSIZE                  (1280 - 16)
 #define VI_PROG_VSIZE                  (1024 - 16)

-#define IRQ_LEVEL                      2
+#define IRQ_LEVEL                      3

 static void solo_capture_config(struct solo_dev *solo_dev)
 {
index 6c9bc70..4799ea2
--- a/drivers/media/pci/solo6x10/solo6x10.h
+++ b/drivers/media/pci/solo6x10/solo6x10.h
@@ -277,6 +277,8 @@ struct solo_dev {
        spinlock_t              slock;
        int                     old_write;
        struct list_head        vidq_active;
+
+       int                     isr_more_laps;
 };

 static inline u32 solo_reg_read(struct solo_dev *solo_dev, int reg)

-- 
Andrey Utkin

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Question on MSI support in PCI and PCI-E devices
@ 2015-02-12 15:11     ` Andrey Utkin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Utkin @ 2015-02-12 15:11 UTC (permalink / raw)
  To: kernelnewbies

2015-02-12 16:48 GMT+02:00 Stephen Hemminger <stephen@networkplumber.org>:
> On Wed, 11 Feb 2015 18:19:00 +0000
> Andrey Utkin <andrey.krieger.utkin@gmail.com> wrote:
>
>> Is it true that _every_ PCI or PCI Express device supporting MSI is
>> indicated by some mention of MSI in "lspci -v", and if there's no such
>> mention, it surely doesn't support MSI?
>>
>
> Look at kernel source (drivers/pci/msi.c) function pci_msi_supported
> there are many things which can block MSI.
>
> There can be cases where PCI quirks in kernel block MSI
> because for example the device supports MSI, but the motherboard
> BIOS is broken. This only happens on really old systems.

Thank you for your reply.
However, I was more interested in the case when lspci for device
doesn't mention MSI at all, so I wonder if it makes sense to try to
enable it in the driver at all.

04:05.0 Multimedia video controller: Bluecherry BC-H16480A 16 port
H.264 video and audio encoder / decoder
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64 (250ns max), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at faff0000 (32-bit, prefetchable) [size=64K]
        Kernel driver in use: solo6x10

We have such cards, and we have issues with them - at some moment they
stop producing interrupts. No matter whether they share interrupt
number or not.
There was a recent commit from Krzysztof Ha?asa
(3c787b108fe0d1c341a76e718a25897ae14673cf) which improved things, but
the issue still happens regularly on some setups.
Now I've tried the following change, i've introduced such a loop which
I see in bt8xx and ddbridge drivers. This also didn't help. So I'm out
of ideas now (any comments are highly appreciated!); I have read about
MSI, that this interrupts transmission mechanism is more reliable and
fast, but from lspci output it is not clear whether our cards support
MSI@all.

--- a/drivers/media/pci/solo6x10/solo6x10-core.c
+++ b/drivers/media/pci/solo6x10/solo6x10-core.c
@@ -100,10 +100,13 @@ static irqreturn_t solo_isr(int irq, void *data)
        struct solo_dev *solo_dev = data;
        u32 status;
        int i;
+       int handled = 0;

+       while (1) {
        status = solo_reg_read(solo_dev, SOLO_IRQ_STAT);
        if (!status)
-               return IRQ_NONE;
+               break;
+       handled++;

        /* Acknowledge all interrupts immediately */
        solo_reg_write(solo_dev, SOLO_IRQ_STAT, status);
@@ -129,7 +132,11 @@ static irqreturn_t solo_isr(int irq, void *data)
        if (status & SOLO_IRQ_G723)
                solo_g723_isr(solo_dev);

-       return IRQ_HANDLED;
+       }
+
+       if (handled > 1)
+               solo_dev->isr_more_laps++;
+       return IRQ_RETVAL(handled);
 }

 static void free_solo_dev(struct solo_dev *solo_dev)
@@ -232,6 +239,16 @@ static ssize_t p2m_timeouts_show(struct device *dev,
        return sprintf(buf, "%d\n", solo_dev->p2m_timeouts);
 }

+static ssize_t isr_more_laps_show(struct device *dev,
+                                 struct device_attribute *attr,
+                                 char *buf)
+{
+       struct solo_dev *solo_dev =
+               container_of(dev, struct solo_dev, dev);
+
+       return sprintf(buf, "%d\n", solo_dev->isr_more_laps);
+}
+
 static ssize_t sdram_size_show(struct device *dev,
                               struct device_attribute *attr,
                               char *buf)
@@ -415,6 +432,7 @@ static const struct device_attribute solo_dev_attrs[] = {
        __ATTR_RO(input_map),
        __ATTR_RO(intervals),
        __ATTR_RO(sdram_offsets),
+       __ATTR_RO(isr_more_laps),
 };

 static void solo_device_release(struct device *dev)
index d19c0ae..dffd7d7
--- a/drivers/media/pci/solo6x10/solo6x10-enc.c
+++ b/drivers/media/pci/solo6x10/solo6x10-enc.c
@@ -28,7 +28,7 @@
 #define VI_PROG_HSIZE                  (1280 - 16)
 #define VI_PROG_VSIZE                  (1024 - 16)

-#define IRQ_LEVEL                      2
+#define IRQ_LEVEL                      3

 static void solo_capture_config(struct solo_dev *solo_dev)
 {
index 6c9bc70..4799ea2
--- a/drivers/media/pci/solo6x10/solo6x10.h
+++ b/drivers/media/pci/solo6x10/solo6x10.h
@@ -277,6 +277,8 @@ struct solo_dev {
        spinlock_t              slock;
        int                     old_write;
        struct list_head        vidq_active;
+
+       int                     isr_more_laps;
 };

 static inline u32 solo_reg_read(struct solo_dev *solo_dev, int reg)

-- 
Andrey Utkin

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* RE: Question on MSI support in PCI and PCI-E devices
  2015-02-12 15:11     ` Andrey Utkin
  (?)
@ 2015-03-02 14:02     ` McKay, Luke
  2015-03-03 14:29       ` Andrey Utkin
  -1 siblings, 1 reply; 11+ messages in thread
From: McKay, Luke @ 2015-03-02 14:02 UTC (permalink / raw)
  To: Andrey Utkin, Stephen Hemminger
  Cc: kernel-mentors, linux-kernel, kernelnewbies

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 6316 bytes --]

It doesn't appear that your device supports MSI.  If it did lspci -v should list the MSI capability and whether or not it is enabled.

i.e. Something like...
Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+

Without a listing that shows the capability is present, there is nothing to enable.

Have you tried polling instead of using interrupts?  Definitely not ideal, but it might help you to determine whether hardware is dropping/missing an interrupt or whether the hardware is being completely hung up.  

Do you know if this missing interrupt is occurring in other systems as well?  How about whether it happens with different boards in the same system?  Answers to these questions would help to determine whether you might have a defective board, or some sort of incompatibility with the system.

Regards,
Luke


-- 
Luke McKay 
Senior Engineer
Cobham AvComm
T : +1 (316) 529 5585

Please consider the environment before printing this email


-----Original Message-----
From: Kernel-mentors [mailto:kernel-mentors-bounces@selenic.com] On Behalf Of Andrey Utkin
Sent: Thursday, February 12, 2015 9:12 AM
To: Stephen Hemminger
Cc: kernel-mentors@selenic.com; linux-kernel@vger.kernel.org; kernelnewbies
Subject: Re: Question on MSI support in PCI and PCI-E devices

2015-02-12 16:48 GMT+02:00 Stephen Hemminger <stephen@networkplumber.org>:
> On Wed, 11 Feb 2015 18:19:00 +0000
> Andrey Utkin <andrey.krieger.utkin@gmail.com> wrote:
>
>> Is it true that _every_ PCI or PCI Express device supporting MSI is 
>> indicated by some mention of MSI in "lspci -v", and if there's no 
>> such mention, it surely doesn't support MSI?
>>
>
> Look at kernel source (drivers/pci/msi.c) function pci_msi_supported 
> there are many things which can block MSI.
>
> There can be cases where PCI quirks in kernel block MSI because for 
> example the device supports MSI, but the motherboard BIOS is broken. 
> This only happens on really old systems.

Thank you for your reply.
However, I was more interested in the case when lspci for device doesn't mention MSI at all, so I wonder if it makes sense to try to enable it in the driver at all.

04:05.0 Multimedia video controller: Bluecherry BC-H16480A 16 port
H.264 video and audio encoder / decoder
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap- 66MHz+ UDF- FastB2B+ ParErr- DEVSEL=medium
>TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 64 (250ns max), Cache Line Size: 32 bytes
        Interrupt: pin A routed to IRQ 16
        Region 0: Memory at faff0000 (32-bit, prefetchable) [size=64K]
        Kernel driver in use: solo6x10

We have such cards, and we have issues with them - at some moment they stop producing interrupts. No matter whether they share interrupt number or not.
There was a recent commit from Krzysztof Hałasa
(3c787b108fe0d1c341a76e718a25897ae14673cf) which improved things, but the issue still happens regularly on some setups.
Now I've tried the following change, i've introduced such a loop which I see in bt8xx and ddbridge drivers. This also didn't help. So I'm out of ideas now (any comments are highly appreciated!); I have read about MSI, that this interrupts transmission mechanism is more reliable and fast, but from lspci output it is not clear whether our cards support MSI at all.

--- a/drivers/media/pci/solo6x10/solo6x10-core.c
+++ b/drivers/media/pci/solo6x10/solo6x10-core.c
@@ -100,10 +100,13 @@ static irqreturn_t solo_isr(int irq, void *data)
        struct solo_dev *solo_dev = data;
        u32 status;
        int i;
+       int handled = 0;

+       while (1) {
        status = solo_reg_read(solo_dev, SOLO_IRQ_STAT);
        if (!status)
-               return IRQ_NONE;
+               break;
+       handled++;

        /* Acknowledge all interrupts immediately */
        solo_reg_write(solo_dev, SOLO_IRQ_STAT, status); @@ -129,7 +132,11 @@ static irqreturn_t solo_isr(int irq, void *data)
        if (status & SOLO_IRQ_G723)
                solo_g723_isr(solo_dev);

-       return IRQ_HANDLED;
+       }
+
+       if (handled > 1)
+               solo_dev->isr_more_laps++;
+       return IRQ_RETVAL(handled);
 }

 static void free_solo_dev(struct solo_dev *solo_dev) @@ -232,6 +239,16 @@ static ssize_t p2m_timeouts_show(struct device *dev,
        return sprintf(buf, "%d\n", solo_dev->p2m_timeouts);  }

+static ssize_t isr_more_laps_show(struct device *dev,
+                                 struct device_attribute *attr,
+                                 char *buf) {
+       struct solo_dev *solo_dev =
+               container_of(dev, struct solo_dev, dev);
+
+       return sprintf(buf, "%d\n", solo_dev->isr_more_laps); }
+
 static ssize_t sdram_size_show(struct device *dev,
                               struct device_attribute *attr,
                               char *buf) @@ -415,6 +432,7 @@ static const struct device_attribute solo_dev_attrs[] = {
        __ATTR_RO(input_map),
        __ATTR_RO(intervals),
        __ATTR_RO(sdram_offsets),
+       __ATTR_RO(isr_more_laps),
 };

 static void solo_device_release(struct device *dev) index d19c0ae..dffd7d7
--- a/drivers/media/pci/solo6x10/solo6x10-enc.c
+++ b/drivers/media/pci/solo6x10/solo6x10-enc.c
@@ -28,7 +28,7 @@
 #define VI_PROG_HSIZE                  (1280 - 16)
 #define VI_PROG_VSIZE                  (1024 - 16)

-#define IRQ_LEVEL                      2
+#define IRQ_LEVEL                      3

 static void solo_capture_config(struct solo_dev *solo_dev)  { index 6c9bc70..4799ea2
--- a/drivers/media/pci/solo6x10/solo6x10.h
+++ b/drivers/media/pci/solo6x10/solo6x10.h
@@ -277,6 +277,8 @@ struct solo_dev {
        spinlock_t              slock;
        int                     old_write;
        struct list_head        vidq_active;
+
+       int                     isr_more_laps;
 };

 static inline u32 solo_reg_read(struct solo_dev *solo_dev, int reg)

--
Andrey Utkin
_______________________________________________
Kernel-mentors mailing list
Kernel-mentors@selenic.com
http://selenic.com/mailman/listinfo/kernel-mentors


Aeroflex is now a Cobham company
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on MSI support in PCI and PCI-E devices
  2015-03-02 14:02     ` McKay, Luke
@ 2015-03-03 14:29       ` Andrey Utkin
  2015-03-04 16:03         ` McKay, Luke
  0 siblings, 1 reply; 11+ messages in thread
From: Andrey Utkin @ 2015-03-03 14:29 UTC (permalink / raw)
  To: McKay, Luke
  Cc: Andrey Utkin, Stephen Hemminger, kernel-mentors, linux-kernel,
	kernelnewbies

On Mon, Mar 2, 2015 at 4:02 PM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
> It doesn't appear that your device supports MSI.  If it did lspci -v should list the MSI capability and whether or not it is enabled.
>
> i.e. Something like...
> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>
> Without a listing that shows the capability is present, there is nothing to enable.
>
> Have you tried polling instead of using interrupts?  Definitely not ideal, but it might help you to determine whether hardware is dropping/missing an interrupt or whether the hardware is being completely hung up.
>
> Do you know if this missing interrupt is occurring in other systems as well?  How about whether it happens with different boards in the same system?  Answers to these questions would help to determine whether you might have a defective board, or some sort of incompatibility with the system.

We have just three setups reproducing this. We have no boards for
replacement experiments, unfortunately.
Polling instead of using interrupts sounds interesting. Is there an
example of such usage in any other PCI device driver?

-- 
Bluecherry developer.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Question on MSI support in PCI and PCI-E devices
  2015-03-03 14:29       ` Andrey Utkin
@ 2015-03-04 16:03         ` McKay, Luke
  2015-03-04 16:30           ` Roger Heflin
  0 siblings, 1 reply; 11+ messages in thread
From: McKay, Luke @ 2015-03-04 16:03 UTC (permalink / raw)
  To: Andrey Utkin
  Cc: Andrey Utkin, Stephen Hemminger, kernel-mentors, linux-kernel,
	kernelnewbies

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 2228 bytes --]

I don't personally know of any PCI drivers that use polling instead of interrupts, since that would really mean the hardware is broke.

Basically all you need to do is create a timer, and have it's callback set to your driver routine that can check the device status registers to determine if there is work to be done.  The status register(s) would be the same indicators that should have generated an interrupt.

Regards,
Luke


-- 
Luke McKay 
Senior Engineer
Cobham AvComm
T : +1 (316) 529 5585

Please consider the environment before printing this email



-----Original Message-----
From: Andrey Utkin [mailto:andrey.utkin@corp.bluecherry.net] 
Sent: Tuesday, March 03, 2015 8:29 AM
To: McKay, Luke
Cc: Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com; linux-kernel@vger.kernel.org; kernelnewbies
Subject: Re: Question on MSI support in PCI and PCI-E devices

On Mon, Mar 2, 2015 at 4:02 PM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
> It doesn't appear that your device supports MSI.  If it did lspci -v should list the MSI capability and whether or not it is enabled.
>
> i.e. Something like...
> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>
> Without a listing that shows the capability is present, there is nothing to enable.
>
> Have you tried polling instead of using interrupts?  Definitely not ideal, but it might help you to determine whether hardware is dropping/missing an interrupt or whether the hardware is being completely hung up.
>
> Do you know if this missing interrupt is occurring in other systems as well?  How about whether it happens with different boards in the same system?  Answers to these questions would help to determine whether you might have a defective board, or some sort of incompatibility with the system.

We have just three setups reproducing this. We have no boards for replacement experiments, unfortunately.
Polling instead of using interrupts sounds interesting. Is there an example of such usage in any other PCI device driver?

--
Bluecherry developer.


Aeroflex is now a Cobham company
ÿôèº{.nÇ+‰·Ÿ®‰­†+%ŠËÿ±éݶ\x17¥Šwÿº{.nÇ+‰·¥Š{±þG«éÿŠ{ayº\x1dʇڙë,j\a­¢f£¢·hšïêÿ‘êçz_è®\x03(­éšŽŠÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?™¨è­Ú&£ø§~á¶iO•æ¬z·švØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?–I¥

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on MSI support in PCI and PCI-E devices
  2015-03-04 16:03         ` McKay, Luke
@ 2015-03-04 16:30           ` Roger Heflin
  2015-03-04 17:04             ` McKay, Luke
  0 siblings, 1 reply; 11+ messages in thread
From: Roger Heflin @ 2015-03-04 16:30 UTC (permalink / raw)
  To: McKay, Luke
  Cc: Andrey Utkin, Andrey Utkin, Stephen Hemminger, kernel-mentors,
	linux-kernel, kernelnewbies

I know from some data I have seen that between the Intel Sandy Bridge
and Intel Ivy Bridge the same motherboards stopped delivering INTx
reliably (int lost under load around 1x every 30 days, driver and
firmware has no method to recover from failure)   We had to transition
to using MSI on some PCI cards that had this issue. Our issue was
duplicated on a large number of different physical machines so if it
was a hardware error is was a lot of different physical machines that
had the defect.

On Wed, Mar 4, 2015 at 10:03 AM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
> I don't personally know of any PCI drivers that use polling instead of interrupts, since that would really mean the hardware is broke.
>
> Basically all you need to do is create a timer, and have it's callback set to your driver routine that can check the device status registers to determine if there is work to be done.  The status register(s) would be the same indicators that should have generated an interrupt.
>
> Regards,
> Luke
>
>
> --
> Luke McKay
> Senior Engineer
> Cobham AvComm
> T : +1 (316) 529 5585
>
> Please consider the environment before printing this email
>
>
>
> -----Original Message-----
> From: Andrey Utkin [mailto:andrey.utkin@corp.bluecherry.net]
> Sent: Tuesday, March 03, 2015 8:29 AM
> To: McKay, Luke
> Cc: Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com; linux-kernel@vger.kernel.org; kernelnewbies
> Subject: Re: Question on MSI support in PCI and PCI-E devices
>
> On Mon, Mar 2, 2015 at 4:02 PM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
>> It doesn't appear that your device supports MSI.  If it did lspci -v should list the MSI capability and whether or not it is enabled.
>>
>> i.e. Something like...
>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>
>> Without a listing that shows the capability is present, there is nothing to enable.
>>
>> Have you tried polling instead of using interrupts?  Definitely not ideal, but it might help you to determine whether hardware is dropping/missing an interrupt or whether the hardware is being completely hung up.
>>
>> Do you know if this missing interrupt is occurring in other systems as well?  How about whether it happens with different boards in the same system?  Answers to these questions would help to determine whether you might have a defective board, or some sort of incompatibility with the system.
>
> We have just three setups reproducing this. We have no boards for replacement experiments, unfortunately.
> Polling instead of using interrupts sounds interesting. Is there an example of such usage in any other PCI device driver?
>
> --
> Bluecherry developer.
>
>
> Aeroflex is now a Cobham company

^ permalink raw reply	[flat|nested] 11+ messages in thread

* RE: Question on MSI support in PCI and PCI-E devices
  2015-03-04 16:30           ` Roger Heflin
@ 2015-03-04 17:04             ` McKay, Luke
  2015-03-04 17:18               ` Roger Heflin
  0 siblings, 1 reply; 11+ messages in thread
From: McKay, Luke @ 2015-03-04 17:04 UTC (permalink / raw)
  To: Roger Heflin
  Cc: Andrey Utkin, Andrey Utkin, Stephen Hemminger, kernel-mentors,
	linux-kernel, kernelnewbies

Legacy INTx is shared amongst multiple devices.  Since it is a level sensitive simulation of the interrupt line, it only takes one device (or driver) to forget to clear the interrupt, and then it stuck and won't work for any of the devices using it.

If you're working with one particular device that seems to be causing these sorts of problems then you can verify misbehaving hardware with a PCIe analyzer.  With the analyzer you can verify that when the driver informs the device that it has processed the interrupt that the device sends the deassertion message for the INTx line.

Or if that isn't available, simply verifying that interrupt being cleared by the driver on the end device is taken correctly and then verifying the chain of propagation that clears the interrupt status.  It can be verified through any switch that is in the path, to the root port where the legacy PCI interrupt controller that the interrupt is cleared, to the top level interrupt controller.

Regards,
Luke

-- 
Luke McKay 
Senior Engineer
Cobham AvComm
T : +1 (316) 529 5585

Please consider the environment before printing this email



-----Original Message-----
From: Roger Heflin [mailto:rogerheflin@gmail.com] 
Sent: Wednesday, March 04, 2015 10:31 AM
To: McKay, Luke
Cc: Andrey Utkin; Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com; linux-kernel@vger.kernel.org; kernelnewbies
Subject: Re: Question on MSI support in PCI and PCI-E devices

I know from some data I have seen that between the Intel Sandy Bridge and Intel Ivy Bridge the same motherboards stopped delivering INTx reliably (int lost under load around 1x every 30 days, driver and
firmware has no method to recover from failure)   We had to transition
to using MSI on some PCI cards that had this issue. Our issue was duplicated on a large number of different physical machines so if it was a hardware error is was a lot of different physical machines that had the defect.

On Wed, Mar 4, 2015 at 10:03 AM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
> I don't personally know of any PCI drivers that use polling instead of interrupts, since that would really mean the hardware is broke.
>
> Basically all you need to do is create a timer, and have it's callback set to your driver routine that can check the device status registers to determine if there is work to be done.  The status register(s) would be the same indicators that should have generated an interrupt.
>
> Regards,
> Luke
>
>
> --
> Luke McKay
> Senior Engineer
> Cobham AvComm
> T : +1 (316) 529 5585
>
> Please consider the environment before printing this email
>
>
>
> -----Original Message-----
> From: Andrey Utkin [mailto:andrey.utkin@corp.bluecherry.net]
> Sent: Tuesday, March 03, 2015 8:29 AM
> To: McKay, Luke
> Cc: Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com; 
> linux-kernel@vger.kernel.org; kernelnewbies
> Subject: Re: Question on MSI support in PCI and PCI-E devices
>
> On Mon, Mar 2, 2015 at 4:02 PM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
>> It doesn't appear that your device supports MSI.  If it did lspci -v should list the MSI capability and whether or not it is enabled.
>>
>> i.e. Something like...
>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>
>> Without a listing that shows the capability is present, there is nothing to enable.
>>
>> Have you tried polling instead of using interrupts?  Definitely not ideal, but it might help you to determine whether hardware is dropping/missing an interrupt or whether the hardware is being completely hung up.
>>
>> Do you know if this missing interrupt is occurring in other systems as well?  How about whether it happens with different boards in the same system?  Answers to these questions would help to determine whether you might have a defective board, or some sort of incompatibility with the system.
>
> We have just three setups reproducing this. We have no boards for replacement experiments, unfortunately.
> Polling instead of using interrupts sounds interesting. Is there an example of such usage in any other PCI device driver?
>
> --
> Bluecherry developer.
>
>
> Aeroflex is now a Cobham company


Aeroflex is now a Cobham company


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Question on MSI support in PCI and PCI-E devices
  2015-03-04 17:04             ` McKay, Luke
@ 2015-03-04 17:18               ` Roger Heflin
  0 siblings, 0 replies; 11+ messages in thread
From: Roger Heflin @ 2015-03-04 17:18 UTC (permalink / raw)
  To: McKay, Luke
  Cc: Andrey Utkin, Andrey Utkin, Stephen Hemminger, kernel-mentors,
	linux-kernel, kernelnewbies

We verified the exact same device worked with the previous cpu in the
same mb/bios combination same os/kernel combination, only identified
change for us was a ivy bridge vs a sandy bridge in the same
mb/bios/boardfirmware.

And in this case only one device driver/pci board was using the given
interrupt.     Hardware vendor for the given pci board debugged a
firmware dump to determine what state the firmware was in and it was
waiting for in intx that never came.     Switching to msi has
resulting in things working reliably.

On Wed, Mar 4, 2015 at 11:04 AM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
> Legacy INTx is shared amongst multiple devices.  Since it is a level sensitive simulation of the interrupt line, it only takes one device (or driver) to forget to clear the interrupt, and then it stuck and won't work for any of the devices using it.
>
> If you're working with one particular device that seems to be causing these sorts of problems then you can verify misbehaving hardware with a PCIe analyzer.  With the analyzer you can verify that when the driver informs the device that it has processed the interrupt that the device sends the deassertion message for the INTx line.
>
> Or if that isn't available, simply verifying that interrupt being cleared by the driver on the end device is taken correctly and then verifying the chain of propagation that clears the interrupt status.  It can be verified through any switch that is in the path, to the root port where the legacy PCI interrupt controller that the interrupt is cleared, to the top level interrupt controller.
>
> Regards,
> Luke
>
> --
> Luke McKay
> Senior Engineer
> Cobham AvComm
> T : +1 (316) 529 5585
>
> Please consider the environment before printing this email
>
>
>
> -----Original Message-----
> From: Roger Heflin [mailto:rogerheflin@gmail.com]
> Sent: Wednesday, March 04, 2015 10:31 AM
> To: McKay, Luke
> Cc: Andrey Utkin; Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com; linux-kernel@vger.kernel.org; kernelnewbies
> Subject: Re: Question on MSI support in PCI and PCI-E devices
>
> I know from some data I have seen that between the Intel Sandy Bridge and Intel Ivy Bridge the same motherboards stopped delivering INTx reliably (int lost under load around 1x every 30 days, driver and
> firmware has no method to recover from failure)   We had to transition
> to using MSI on some PCI cards that had this issue. Our issue was duplicated on a large number of different physical machines so if it was a hardware error is was a lot of different physical machines that had the defect.
>
> On Wed, Mar 4, 2015 at 10:03 AM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
>> I don't personally know of any PCI drivers that use polling instead of interrupts, since that would really mean the hardware is broke.
>>
>> Basically all you need to do is create a timer, and have it's callback set to your driver routine that can check the device status registers to determine if there is work to be done.  The status register(s) would be the same indicators that should have generated an interrupt.
>>
>> Regards,
>> Luke
>>
>>
>> --
>> Luke McKay
>> Senior Engineer
>> Cobham AvComm
>> T : +1 (316) 529 5585
>>
>> Please consider the environment before printing this email
>>
>>
>>
>> -----Original Message-----
>> From: Andrey Utkin [mailto:andrey.utkin@corp.bluecherry.net]
>> Sent: Tuesday, March 03, 2015 8:29 AM
>> To: McKay, Luke
>> Cc: Andrey Utkin; Stephen Hemminger; kernel-mentors@selenic.com;
>> linux-kernel@vger.kernel.org; kernelnewbies
>> Subject: Re: Question on MSI support in PCI and PCI-E devices
>>
>> On Mon, Mar 2, 2015 at 4:02 PM, McKay, Luke <Luke.McKay@aeroflex.com> wrote:
>>> It doesn't appear that your device supports MSI.  If it did lspci -v should list the MSI capability and whether or not it is enabled.
>>>
>>> i.e. Something like...
>>> Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+
>>>
>>> Without a listing that shows the capability is present, there is nothing to enable.
>>>
>>> Have you tried polling instead of using interrupts?  Definitely not ideal, but it might help you to determine whether hardware is dropping/missing an interrupt or whether the hardware is being completely hung up.
>>>
>>> Do you know if this missing interrupt is occurring in other systems as well?  How about whether it happens with different boards in the same system?  Answers to these questions would help to determine whether you might have a defective board, or some sort of incompatibility with the system.
>>
>> We have just three setups reproducing this. We have no boards for replacement experiments, unfortunately.
>> Polling instead of using interrupts sounds interesting. Is there an example of such usage in any other PCI device driver?
>>
>> --
>> Bluecherry developer.
>>
>>
>> Aeroflex is now a Cobham company
>
>
> Aeroflex is now a Cobham company
>

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Question on MSI support in PCI and PCI-E devices
@ 2015-02-11 18:19 ` Andrey Utkin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Utkin @ 2015-02-11 18:19 UTC (permalink / raw)
  To: linux-kernel, kernelnewbies, kernel-mentors

Is it true that _every_ PCI or PCI Express device supporting MSI is
indicated by some mention of MSI in "lspci -v", and if there's no such
mention, it surely doesn't support MSI?

-- 
Andrey Utkin

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Question on MSI support in PCI and PCI-E devices
@ 2015-02-11 18:19 ` Andrey Utkin
  0 siblings, 0 replies; 11+ messages in thread
From: Andrey Utkin @ 2015-02-11 18:19 UTC (permalink / raw)
  To: kernelnewbies

Is it true that _every_ PCI or PCI Express device supporting MSI is
indicated by some mention of MSI in "lspci -v", and if there's no such
mention, it surely doesn't support MSI?

-- 
Andrey Utkin

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2015-03-04 17:18 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <f236217608b24a5e976628fe31d41a03@BRMWP-EXMB11.corp.brocade.com>
2015-02-12 14:48 ` Question on MSI support in PCI and PCI-E devices Stephen Hemminger
2015-02-12 15:11   ` Andrey Utkin
2015-02-12 15:11     ` Andrey Utkin
2015-03-02 14:02     ` McKay, Luke
2015-03-03 14:29       ` Andrey Utkin
2015-03-04 16:03         ` McKay, Luke
2015-03-04 16:30           ` Roger Heflin
2015-03-04 17:04             ` McKay, Luke
2015-03-04 17:18               ` Roger Heflin
2015-02-11 18:19 Andrey Utkin
2015-02-11 18:19 ` Andrey Utkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.