netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* e1000e on thinkpad x60: interrupt problem
@ 2013-07-09  1:05 Pavel Machek
  2013-07-09 15:51 ` Ronciak, John
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Machek @ 2013-07-09  1:05 UTC (permalink / raw)
  To: jeffrey.t.kirsher, jesse.brandeburg, bruce.w.allan,
	carolyn.wyborny, donald.c.skidmore, gregory.v.rose,
	peter.p.waskiewicz.jr, alexander.h.duyck, john.ronciak,
	tushar.n.dave, e1000-devel, netdev

Hi!

I'm using

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller

on thinkpad x60. Kernel is 3.10. 

# CONFIG_E100 is not set
# CONFIG_E1000 is not set
CONFIG_E1000E=y

Interrupts are like this:

pavel@amd:/data/l/linux-good$ cat /proc/interrupts 
           CPU0       CPU1       
  0:   95454037       5192   IO-APIC-edge      timer
  1:      16292         20   IO-APIC-edge      i8042
  3:          9          0   IO-APIC-edge    
  4:          9          0   IO-APIC-edge    
  7:          0          0   IO-APIC-edge      parport0
  8:          1          0   IO-APIC-edge      rtc0
  9:   19471974       1207   IO-APIC-fasteoi   acpi
 12:     168092         15   IO-APIC-edge      i8042
 14:    3568551        165   IO-APIC-edge      ata_piix
 15:          0          0   IO-APIC-edge      ata_piix
 16:   14033945        877   IO-APIC-fasteoi   i915, ahci, yenta, uhci_hcd:usb2, eth0

but it seems that eth0 is not generating interrupts at all:

...
64 bytes from 10.0.0.251: icmp_req=32 ttl=64 time=26.2 ms
64 bytes from 10.0.0.251: icmp_req=33 ttl=64 time=16.6 ms
64 bytes from 10.0.0.251: icmp_req=34 ttl=64 time=1.14 ms
64 bytes from 10.0.0.251: icmp_req=35 ttl=64 time=56.4 ms
^C
--- 10.0.0.251 ping statistics ---
35 packets transmitted, 35 received, 0% packet loss, time 34140ms
rtt min/avg/max/mdev = 1.024/20.173/88.285/25.281 ms
pavel@amd:/data/l/linux-good$ ping 10.0.0.251

Note huge latencies. But as interrupt is shared with ahci, I can help
with:

pavel@amd:~$ sudo cat /dev/sda > /dev/null
[sudo] password for pavel: 

Then latencies get to high but reasonable range:

pavel@amd:/data/l/linux-good$ ping 10.0.0.251
PING 10.0.0.251 (10.0.0.251) 56(84) bytes of data.
64 bytes from 10.0.0.251: icmp_req=1 ttl=64 time=1.14 ms
64 bytes from 10.0.0.251: icmp_req=2 ttl=64 time=3.79 ms
64 bytes from 10.0.0.251: icmp_req=3 ttl=64 time=1.24 ms
64 bytes from 10.0.0.251: icmp_req=4 ttl=64 time=1.54 ms
64 bytes from 10.0.0.251: icmp_req=5 ttl=64 time=2.04 ms
64 bytes from 10.0.0.251: icmp_req=6 ttl=64 time=2.48 ms
64 bytes from 10.0.0.251: icmp_req=7 ttl=64 time=1.90 ms
64 bytes from 10.0.0.251: icmp_req=8 ttl=64 time=2.30 ms
(other attempt)
--- 10.0.0.251 ping statistics ---
22 packets transmitted, 22 received, 0% packet loss, time 21128ms
rtt min/avg/max/mdev = 0.940/1.733/3.604/0.685 ms

root@amd:/data/l/linux-good# ethtool -e eth0
Offset Values
------ ------
0x0000 00 16 d3 25 19 04 30 0b b2 ff 51 00 ff ff ff ff 
0x0010 53 00 03 02 6b 02 7e 20 aa 17 9a 10 86 80 df 80 
0x0020 00 00 00 20 54 7e 00 00 14 00 da 00 04 00 00 27 
0x0030 c9 6c 50 31 3e 07 0b 04 8b 29 00 00 00 f0 02 0f 
0x0040 08 10 00 00 04 0f ff 7f 01 4d ff ff ff ff ff ff 
0x0050 14 00 1d 00 14 00 1d 00 af aa 1e 00 00 00 1d 00 
0x0060 00 01 00 40 32 12 07 40 ff ff ff ff ff ff ff ff 
0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff fd 5d 

02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet Controller
        Subsystem: Lenovo ThinkPad X60s
        Flags: bus master, fast devsel, latency 0, IRQ 16
        Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
        I/O ports at 2000 [size=32]
        Capabilities: [c8] Power Management version 2
        Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
        Capabilities: [e0] Express Endpoint, MSI 00
        Capabilities: [100] Advanced Error Reporting
        Capabilities: [140] Device Serial Number 00-16-d3-ff-ff-25-19-04
        Kernel driver in use: e1000e

Any ideas how to debug/fix this?

[Maybe related: https://bugzilla.kernel.org/show_bug.cgi?id=6929 but
that was fixed in 2007. And yes, it _is_ better with bigger packets.]?

									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: e1000e on thinkpad x60: interrupt problem
  2013-07-09  1:05 e1000e on thinkpad x60: interrupt problem Pavel Machek
@ 2013-07-09 15:51 ` Ronciak, John
  2013-07-09 17:02   ` Pavel Machek
  0 siblings, 1 reply; 9+ messages in thread
From: Ronciak, John @ 2013-07-09 15:51 UTC (permalink / raw)
  To: Pavel Machek, Kirsher, Jeffrey T, Brandeburg, Jesse, Allan,
	Bruce W, Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Waskiewicz Jr, Peter P, Duyck, Alexander H, Dave, Tushar N,
	e1000-devel, netdev

Nothing appears to be wrong.  If the system is seeing ping packets at all means that device is generating interrupts and that they are being processed.  If you are looking at performance then sharing interrupts is probably not a good idea.  Since you are on a laptop it may not be easy to separate the networking device onto its own interrupt.  The interrupt is shared with a lot of other devices and not just ahci.

Cheers,
John


> -----Original Message-----
> From: Pavel Machek [mailto:pavel@ucw.cz]
> Sent: Monday, July 08, 2013 6:05 PM
> To: Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce W; Wyborny,
> Carolyn; Skidmore, Donald C; Rose, Gregory V; Waskiewicz Jr, Peter P;
> Duyck, Alexander H; Ronciak, John; Dave, Tushar N; e1000-
> devel@lists.sourceforge.net; netdev@vger.kernel.org
> Subject: e1000e on thinkpad x60: interrupt problem
> 
> Hi!
> 
> I'm using
> 
> 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
> Controller
> 
> on thinkpad x60. Kernel is 3.10.
> 
> # CONFIG_E100 is not set
> # CONFIG_E1000 is not set
> CONFIG_E1000E=y
> 
> Interrupts are like this:
> 
> pavel@amd:/data/l/linux-good$ cat /proc/interrupts
>            CPU0       CPU1
>   0:   95454037       5192   IO-APIC-edge      timer
>   1:      16292         20   IO-APIC-edge      i8042
>   3:          9          0   IO-APIC-edge
>   4:          9          0   IO-APIC-edge
>   7:          0          0   IO-APIC-edge      parport0
>   8:          1          0   IO-APIC-edge      rtc0
>   9:   19471974       1207   IO-APIC-fasteoi   acpi
>  12:     168092         15   IO-APIC-edge      i8042
>  14:    3568551        165   IO-APIC-edge      ata_piix
>  15:          0          0   IO-APIC-edge      ata_piix
>  16:   14033945        877   IO-APIC-fasteoi   i915, ahci, yenta,
> uhci_hcd:usb2, eth0
> 
> but it seems that eth0 is not generating interrupts at all:
> 
> ...
> 64 bytes from 10.0.0.251: icmp_req=32 ttl=64 time=26.2 ms
> 64 bytes from 10.0.0.251: icmp_req=33 ttl=64 time=16.6 ms
> 64 bytes from 10.0.0.251: icmp_req=34 ttl=64 time=1.14 ms
> 64 bytes from 10.0.0.251: icmp_req=35 ttl=64 time=56.4 ms ^C
> --- 10.0.0.251 ping statistics ---
> 35 packets transmitted, 35 received, 0% packet loss, time 34140ms rtt
> min/avg/max/mdev = 1.024/20.173/88.285/25.281 ms
> pavel@amd:/data/l/linux-good$ ping 10.0.0.251
> 
> Note huge latencies. But as interrupt is shared with ahci, I can help
> with:
> 
> pavel@amd:~$ sudo cat /dev/sda > /dev/null [sudo] password for pavel:
> 
> Then latencies get to high but reasonable range:
> 
> pavel@amd:/data/l/linux-good$ ping 10.0.0.251 PING 10.0.0.251
> (10.0.0.251) 56(84) bytes of data.
> 64 bytes from 10.0.0.251: icmp_req=1 ttl=64 time=1.14 ms
> 64 bytes from 10.0.0.251: icmp_req=2 ttl=64 time=3.79 ms
> 64 bytes from 10.0.0.251: icmp_req=3 ttl=64 time=1.24 ms
> 64 bytes from 10.0.0.251: icmp_req=4 ttl=64 time=1.54 ms
> 64 bytes from 10.0.0.251: icmp_req=5 ttl=64 time=2.04 ms
> 64 bytes from 10.0.0.251: icmp_req=6 ttl=64 time=2.48 ms
> 64 bytes from 10.0.0.251: icmp_req=7 ttl=64 time=1.90 ms
> 64 bytes from 10.0.0.251: icmp_req=8 ttl=64 time=2.30 ms (other
> attempt)
> --- 10.0.0.251 ping statistics ---
> 22 packets transmitted, 22 received, 0% packet loss, time 21128ms rtt
> min/avg/max/mdev = 0.940/1.733/3.604/0.685 ms
> 
> root@amd:/data/l/linux-good# ethtool -e eth0 Offset Values
> ------ ------
> 0x0000 00 16 d3 25 19 04 30 0b b2 ff 51 00 ff ff ff ff
> 0x0010 53 00 03 02 6b 02 7e 20 aa 17 9a 10 86 80 df 80
> 0x0020 00 00 00 20 54 7e 00 00 14 00 da 00 04 00 00 27
> 0x0030 c9 6c 50 31 3e 07 0b 04 8b 29 00 00 00 f0 02 0f
> 0x0040 08 10 00 00 04 0f ff 7f 01 4d ff ff ff ff ff ff
> 0x0050 14 00 1d 00 14 00 1d 00 af aa 1e 00 00 00 1d 00
> 0x0060 00 01 00 40 32 12 07 40 ff ff ff ff ff ff ff ff
> 0x0070 ff ff ff ff ff ff ff ff ff ff ff ff ff ff fd 5d
> 
> 02:00.0 Ethernet controller: Intel Corporation 82573L Gigabit Ethernet
> Controller
>         Subsystem: Lenovo ThinkPad X60s
>         Flags: bus master, fast devsel, latency 0, IRQ 16
>         Memory at ee000000 (32-bit, non-prefetchable) [size=128K]
>         I/O ports at 2000 [size=32]
>         Capabilities: [c8] Power Management version 2
>         Capabilities: [d0] MSI: Enable- Count=1/1 Maskable- 64bit+
>         Capabilities: [e0] Express Endpoint, MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [140] Device Serial Number 00-16-d3-ff-ff-25-19-
> 04
>         Kernel driver in use: e1000e
> 
> Any ideas how to debug/fix this?
> 
> [Maybe related: https://bugzilla.kernel.org/show_bug.cgi?id=6929 but
> that was fixed in 2007. And yes, it _is_ better with bigger packets.]?
> 
> 									Pavel
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: e1000e on thinkpad x60: interrupt problem
  2013-07-09 15:51 ` Ronciak, John
@ 2013-07-09 17:02   ` Pavel Machek
  2013-07-09 17:15     ` Waskiewicz Jr, Peter P
  0 siblings, 1 reply; 9+ messages in thread
From: Pavel Machek @ 2013-07-09 17:02 UTC (permalink / raw)
  To: Ronciak, John; +Cc: e1000-devel, Allan, Bruce W, Brandeburg, Jesse, netdev

Hi!

> Nothing appears to be wrong.  If the system is seeing ping packets
>at all means that device is generating interrupts and that they are
>being processed.  If you are looking at performance then sharing

No, that's not true. There is other interrupt load, and e1000e has big
enough buffers; that means that packets eventually get processed. I
strongly suspect e1000e generates little or no interrupts and packets
only get processed when other devices on shared interrupt line
generate interrupt. 

>interrupts is probably not a good idea.  Since you are on a laptop it
>may not be easy to separate the networking device onto its own
>interrupt.  The interrupt is shared with a lot of other devices and
>not just ahci.

Yes, it is wrong. 100msec latency on unloaded core duo is not
right... and notice how ping latency goes _down_ when I load the other
devices.

[100msec latency is so big that it makes interactive work over
ssh hard.]

IOW if the interrupt was not shared, I'd be getting latencies in one
second range. It happened before on this machine, and other x60 users
are seing that, too.

It may have something to do with ASPM, because hack below makes
latencies lower than 2msec. (And btw shared interrupt load does not
make them worse in a way that can be measured; it stays <2msec with
AHCI load.)

Any ideas for acceptable solution?

								Pavel

diff --git a/.config b/.config
index 149f713..d7f5a11 100644
--- a/.config
+++ b/.config
@@ -559,9 +559,9 @@ CONFIG_PCIEAER=y
 # CONFIG_PCIEAER_INJECT is not set
 CONFIG_PCIEASPM=y
 CONFIG_PCIEASPM_DEBUG=y
-CONFIG_PCIEASPM_DEFAULT=y
+# CONFIG_PCIEASPM_DEFAULT is not set
 # CONFIG_PCIEASPM_POWERSAVE is not set
-# CONFIG_PCIEASPM_PERFORMANCE is not set
+CONFIG_PCIEASPM_PERFORMANCE=y
 CONFIG_PCIE_PME=y
 CONFIG_ARCH_SUPPORTS_MSI=y
 # CONFIG_PCI_MSI is not set
index e4b1fb2..9a1b63e 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -382,7 +382,7 @@ static int __init acpi_pci_init(void)
 
 	if (acpi_gbl_FADT.boot_flags & ACPI_FADT_NO_ASPM) {
 		printk(KERN_INFO"ACPI FADT declares the system doesn't support PCIe ASPM, so disable it\n");
-		pcie_no_aspm();
+//		pcie_no_aspm();
 	}
 
 	ret = register_acpi_bus_type(&acpi_pci_bus);


> > Interrupts are like this:
> > 
> > pavel@amd:/data/l/linux-good$ cat /proc/interrupts
> >            CPU0       CPU1
> >   0:   95454037       5192   IO-APIC-edge      timer
> >   1:      16292         20   IO-APIC-edge      i8042
> >   3:          9          0   IO-APIC-edge
> >   4:          9          0   IO-APIC-edge
> >   7:          0          0   IO-APIC-edge      parport0
> >   8:          1          0   IO-APIC-edge      rtc0
> >   9:   19471974       1207   IO-APIC-fasteoi   acpi
> >  12:     168092         15   IO-APIC-edge      i8042
> >  14:    3568551        165   IO-APIC-edge      ata_piix
> >  15:          0          0   IO-APIC-edge      ata_piix
> >  16:   14033945        877   IO-APIC-fasteoi   i915, ahci, yenta,
> > uhci_hcd:usb2, eth0
> > 
> > but it seems that eth0 is not generating interrupts at all:
> > 
> > ...
> > 64 bytes from 10.0.0.251: icmp_req=32 ttl=64 time=26.2 ms
> > 64 bytes from 10.0.0.251: icmp_req=33 ttl=64 time=16.6 ms
> > 64 bytes from 10.0.0.251: icmp_req=34 ttl=64 time=1.14 ms
> > 64 bytes from 10.0.0.251: icmp_req=35 ttl=64 time=56.4 ms ^C
> > --- 10.0.0.251 ping statistics ---
> > 35 packets transmitted, 35 received, 0% packet loss, time 34140ms rtt
> > min/avg/max/mdev = 1.024/20.173/88.285/25.281 ms
> > pavel@amd:/data/l/linux-good$ ping 10.0.0.251
> > 
> > Note huge latencies. But as interrupt is shared with ahci, I can help
> > with:
> > 
> > pavel@amd:~$ sudo cat /dev/sda > /dev/null [sudo] password for pavel:
> > 
> > Then latencies get to high but reasonable range:
> > 
> > pavel@amd:/data/l/linux-good$ ping 10.0.0.251 PING 10.0.0.251
> > (10.0.0.251) 56(84) bytes of data.
> > 64 bytes from 10.0.0.251: icmp_req=1 ttl=64 time=1.14 ms
> > 64 bytes from 10.0.0.251: icmp_req=2 ttl=64 time=3.79 ms
> > 64 bytes from 10.0.0.251: icmp_req=3 ttl=64 time=1.24 ms
> > 64 bytes from 10.0.0.251: icmp_req=4 ttl=64 time=1.54 ms
> > 64 bytes from 10.0.0.251: icmp_req=5 ttl=64 time=2.04 ms
> > 64 bytes from 10.0.0.251: icmp_req=6 ttl=64 time=2.48 ms
> > 64 bytes from 10.0.0.251: icmp_req=7 ttl=64 time=1.90 ms
> > 64 bytes from 10.0.0.251: icmp_req=8 ttl=64 time=2.30 ms (other
> > attempt)
> > --- 10.0.0.251 ping statistics ---
> > 22 packets transmitted, 22 received, 0% packet loss, time 21128ms rtt
> > min/avg/max/mdev = 0.940/1.733/3.604/0.685 ms

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: e1000e on thinkpad x60: interrupt problem
  2013-07-09 17:02   ` Pavel Machek
@ 2013-07-09 17:15     ` Waskiewicz Jr, Peter P
  2013-07-09 20:48       ` Pavel Machek
  0 siblings, 1 reply; 9+ messages in thread
From: Waskiewicz Jr, Peter P @ 2013-07-09 17:15 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Ronciak, John, Kirsher, Jeffrey T, Brandeburg, Jesse, Allan,
	Bruce W, Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Duyck, Alexander H, Dave, Tushar N, e1000-devel, netdev

On Tue, 2013-07-09 at 19:02 +0200, Pavel Machek wrote:
> Hi!
> 
> > Nothing appears to be wrong.  If the system is seeing ping packets
> >at all means that device is generating interrupts and that they are
> >being processed.  If you are looking at performance then sharing
> 
> No, that's not true. There is other interrupt load, and e1000e has big
> enough buffers; that means that packets eventually get processed. I
> strongly suspect e1000e generates little or no interrupts and packets
> only get processed when other devices on shared interrupt line
> generate interrupt. 

If the interrupt is shared, e1000e checks if it's the hardware that
generated it before processing packets.  Consuming an interrupt that
isn't meant for this device will throw major warnings in the kernel
about bad interrupt routing, etc.  Here's the code from the interrupt
handler (note the last part of the pasted code):

/**
 * e1000_intr - Interrupt Handler
 * @irq: interrupt number
 * @data: pointer to a network interface device structure
 **/
static irqreturn_t e1000_intr(int __always_unused irq, void *data)
{
        struct net_device *netdev = data;
        struct e1000_adapter *adapter = netdev_priv(netdev);
        struct e1000_hw *hw = &adapter->hw;
        u32 rctl, icr = er32(ICR);

        if (!icr || test_bit(__E1000_DOWN, &adapter->state))
                return IRQ_NONE;  /* Not our interrupt */

        /* IMS will not auto-mask if INT_ASSERTED is not set, and if it
is
         * not set, then the adapter didn't send an interrupt
         */
        if (!(icr & E1000_ICR_INT_ASSERTED))
                return IRQ_NONE;

Cheers,
-PJ

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: e1000e on thinkpad x60: interrupt problem
  2013-07-09 17:15     ` Waskiewicz Jr, Peter P
@ 2013-07-09 20:48       ` Pavel Machek
  2013-07-09 20:59         ` Ronciak, John
  2013-07-09 21:40         ` Jesse Brandeburg
  0 siblings, 2 replies; 9+ messages in thread
From: Pavel Machek @ 2013-07-09 20:48 UTC (permalink / raw)
  To: Waskiewicz Jr, Peter P
  Cc: Ronciak, John, Kirsher, Jeffrey T, Brandeburg, Jesse, Allan,
	Bruce W, Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Duyck, Alexander H, Dave, Tushar N, e1000-devel, netdev

On Tue 2013-07-09 17:15:48, Waskiewicz Jr, Peter P wrote:
> On Tue, 2013-07-09 at 19:02 +0200, Pavel Machek wrote:
> > Hi!
> > 
> > > Nothing appears to be wrong.  If the system is seeing ping packets
> > >at all means that device is generating interrupts and that they are
> > >being processed.  If you are looking at performance then sharing
> > 
> > No, that's not true. There is other interrupt load, and e1000e has big
> > enough buffers; that means that packets eventually get processed. I
> > strongly suspect e1000e generates little or no interrupts and packets
> > only get processed when other devices on shared interrupt line
> > generate interrupt. 
> 
> If the interrupt is shared, e1000e checks if it's the hardware that
> generated it before processing packets.  Consuming an interrupt that
> isn't meant for this device will throw major warnings in the kernel
> about bad interrupt routing, etc.  Here's the code from the interrupt
> handler (note the last part of the pasted code):

Yeah, of course you need to ask e1000e if it generated the
interrupt. That part works. The part that actually generates the
interrupt does not. Take a look at original mail...

packet comes
e1000e sets E1000_ICR_INT_ASSERTED bit
e1000e tries to generate an interrupt and fails
50msec passes
AHCI generates interrupt
all the handlers are called
    AHCI processes its interrupt, handles disk read
    e1000_intr notices E1000_ICR_INT_ASSERTED bit, delivers the packet.

Network still works, only slowly. Ping goes lower when I use the
disk. That matches what I see.

Do you have other explanation?
									Pavel

> /**
>  * e1000_intr - Interrupt Handler
>  * @irq: interrupt number
>  * @data: pointer to a network interface device structure
>  **/
> static irqreturn_t e1000_intr(int __always_unused irq, void *data)
> {
>         struct net_device *netdev = data;
>         struct e1000_adapter *adapter = netdev_priv(netdev);
>         struct e1000_hw *hw = &adapter->hw;
>         u32 rctl, icr = er32(ICR);
> 
>         if (!icr || test_bit(__E1000_DOWN, &adapter->state))
>                 return IRQ_NONE;  /* Not our interrupt */
> 
>         /* IMS will not auto-mask if INT_ASSERTED is not set, and if it
> is
>          * not set, then the adapter didn't send an interrupt
>          */
>         if (!(icr & E1000_ICR_INT_ASSERTED))
>                 return IRQ_NONE;
> 
> Cheers,
> -PJ

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: e1000e on thinkpad x60: interrupt problem
  2013-07-09 20:48       ` Pavel Machek
@ 2013-07-09 20:59         ` Ronciak, John
  2013-07-09 21:40         ` Jesse Brandeburg
  1 sibling, 0 replies; 9+ messages in thread
From: Ronciak, John @ 2013-07-09 20:59 UTC (permalink / raw)
  To: Pavel Machek, Waskiewicz Jr, Peter P
  Cc: Kirsher, Jeffrey T, Brandeburg, Jesse, Allan, Bruce W, Wyborny,
	Carolyn, Skidmore, Donald C, Rose, Gregory V, Duyck, Alexander H,
	Dave, Tushar N, e1000-devel, netdev

So are you saying the HW isn't generating an interrupt?  Clearly it is as the driver is taking action on it.  I went and looked at the first mail I saw (the one I first responded to) but maybe you are talking about another email that I (we) didn't get.  I don't see any of what you point to in this email (below).  So I still don't think we know what you are asking.  Is it that you think an interrupt isn't processed because of the interrupts being shared and the packets get processed on the next interrupt?

Cheers,
John


> -----Original Message-----
> From: Pavel Machek [mailto:pavel@ucw.cz]
> Sent: Tuesday, July 09, 2013 1:49 PM
> To: Waskiewicz Jr, Peter P
> Cc: Ronciak, John; Kirsher, Jeffrey T; Brandeburg, Jesse; Allan, Bruce
> W; Wyborny, Carolyn; Skidmore, Donald C; Rose, Gregory V; Duyck,
> Alexander H; Dave, Tushar N; e1000-devel@lists.sourceforge.net;
> netdev@vger.kernel.org
> Subject: Re: e1000e on thinkpad x60: interrupt problem
> 
> On Tue 2013-07-09 17:15:48, Waskiewicz Jr, Peter P wrote:
> > On Tue, 2013-07-09 at 19:02 +0200, Pavel Machek wrote:
> > > Hi!
> > >
> > > > Nothing appears to be wrong.  If the system is seeing ping
> packets
> > > >at all means that device is generating interrupts and that they
> are
> > > >being processed.  If you are looking at performance then sharing
> > >
> > > No, that's not true. There is other interrupt load, and e1000e has
> > > big enough buffers; that means that packets eventually get
> > > processed. I strongly suspect e1000e generates little or no
> > > interrupts and packets only get processed when other devices on
> > > shared interrupt line generate interrupt.
> >
> > If the interrupt is shared, e1000e checks if it's the hardware that
> > generated it before processing packets.  Consuming an interrupt that
> > isn't meant for this device will throw major warnings in the kernel
> > about bad interrupt routing, etc.  Here's the code from the interrupt
> > handler (note the last part of the pasted code):
> 
> Yeah, of course you need to ask e1000e if it generated the interrupt.
> That part works. The part that actually generates the interrupt does
> not. Take a look at original mail...
> 
> packet comes
> e1000e sets E1000_ICR_INT_ASSERTED bit
> e1000e tries to generate an interrupt and fails 50msec passes AHCI
> generates interrupt all the handlers are called
>     AHCI processes its interrupt, handles disk read
>     e1000_intr notices E1000_ICR_INT_ASSERTED bit, delivers the packet.
> 
> Network still works, only slowly. Ping goes lower when I use the disk.
> That matches what I see.
> 
> Do you have other explanation?
> 									Pavel
> 
> > /**
> >  * e1000_intr - Interrupt Handler
> >  * @irq: interrupt number
> >  * @data: pointer to a network interface device structure  **/ static
> > irqreturn_t e1000_intr(int __always_unused irq, void *data) {
> >         struct net_device *netdev = data;
> >         struct e1000_adapter *adapter = netdev_priv(netdev);
> >         struct e1000_hw *hw = &adapter->hw;
> >         u32 rctl, icr = er32(ICR);
> >
> >         if (!icr || test_bit(__E1000_DOWN, &adapter->state))
> >                 return IRQ_NONE;  /* Not our interrupt */
> >
> >         /* IMS will not auto-mask if INT_ASSERTED is not set, and if
> > it is
> >          * not set, then the adapter didn't send an interrupt
> >          */
> >         if (!(icr & E1000_ICR_INT_ASSERTED))
> >                 return IRQ_NONE;
> >
> > Cheers,
> > -PJ
> 
> --
> (english) http://www.livejournal.com/~pavelmachek
> (cesky, pictures)
> http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: e1000e on thinkpad x60: interrupt problem
  2013-07-09 20:48       ` Pavel Machek
  2013-07-09 20:59         ` Ronciak, John
@ 2013-07-09 21:40         ` Jesse Brandeburg
  2013-07-10  3:43           ` [E1000-devel] " Fujinaka, Todd
  2013-07-10 11:37           ` Pavel Machek
  1 sibling, 2 replies; 9+ messages in thread
From: Jesse Brandeburg @ 2013-07-09 21:40 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Waskiewicz Jr, Peter P, Ronciak, John, Kirsher, Jeffrey T, Allan,
	Bruce W, Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Duyck, Alexander H, Dave, Tushar N, e1000-devel, netdev,
	jesse.brandeburg

On Tue, 9 Jul 2013 22:48:54 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Yeah, of course you need to ask e1000e if it generated the
> interrupt. That part works. The part that actually generates the
> interrupt does not. Take a look at original mail...
> 
> packet comes
> e1000e sets E1000_ICR_INT_ASSERTED bit
> e1000e tries to generate an interrupt and fails
> 50msec passes

^^ thats the ASPM timeout length.

> AHCI generates interrupt
> all the handlers are called
>     AHCI processes its interrupt, handles disk read
>     e1000_intr notices E1000_ICR_INT_ASSERTED bit, delivers the packet.
> 
> Network still works, only slowly. Ping goes lower when I use the
> disk. That matches what I see.
> 
> Do you have other explanation?

Regardless of what others are saying I believe you have an issue with
ASPM being enabled.  All the discussion about shared interrupts, is
just a distraction.  This issue would still occur (and just be worse)
without a shared interrupt.

You already mentioned that a kernel hack to disable ASPM fixes it, but
you can just boot with different options to turn off ASPM.

pcie_aspm=off

There are known issues with ASPM on this part, and it definitely needs
to be off.  If your bios has the option to turn it off, that is the
best way to disable it, second choice is to turn it off using the
kernel option.

Hope this helps,
Jesse

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: [E1000-devel] e1000e on thinkpad x60: interrupt problem
  2013-07-09 21:40         ` Jesse Brandeburg
@ 2013-07-10  3:43           ` Fujinaka, Todd
  2013-07-10 11:37           ` Pavel Machek
  1 sibling, 0 replies; 9+ messages in thread
From: Fujinaka, Todd @ 2013-07-10  3:43 UTC (permalink / raw)
  To: Brandeburg, Jesse, Pavel Machek
  Cc: e1000-devel, Allan, Bruce W, Brandeburg, Jesse, Ronciak, John, netdev

The latest kernel should turn off ASPM as well, but you should be able to check by looking at lspci -vvv. I think LnkCtl should say "ASPM disabled."

Sorry for top-posting.

Todd Fujinaka
Software Application Engineer
Networking Division (ND)
Intel Corporation
todd.fujinaka@intel.com
(503) 712-4565


-----Original Message-----
From: Jesse Brandeburg [mailto:jesse.brandeburg@intel.com] 
Sent: Tuesday, July 09, 2013 2:41 PM
To: Pavel Machek
Cc: e1000-devel@lists.sourceforge.net; Allan, Bruce W; Brandeburg, Jesse; Ronciak, John; netdev@vger.kernel.org
Subject: Re: [E1000-devel] e1000e on thinkpad x60: interrupt problem

On Tue, 9 Jul 2013 22:48:54 +0200
Pavel Machek <pavel@ucw.cz> wrote:

> Yeah, of course you need to ask e1000e if it generated the interrupt. 
> That part works. The part that actually generates the interrupt does 
> not. Take a look at original mail...
> 
> packet comes
> e1000e sets E1000_ICR_INT_ASSERTED bit e1000e tries to generate an 
> interrupt and fails 50msec passes

^^ thats the ASPM timeout length.

> AHCI generates interrupt
> all the handlers are called
>     AHCI processes its interrupt, handles disk read
>     e1000_intr notices E1000_ICR_INT_ASSERTED bit, delivers the packet.
> 
> Network still works, only slowly. Ping goes lower when I use the disk. 
> That matches what I see.
> 
> Do you have other explanation?

Regardless of what others are saying I believe you have an issue with ASPM being enabled.  All the discussion about shared interrupts, is just a distraction.  This issue would still occur (and just be worse) without a shared interrupt.

You already mentioned that a kernel hack to disable ASPM fixes it, but you can just boot with different options to turn off ASPM.

pcie_aspm=off

There are known issues with ASPM on this part, and it definitely needs to be off.  If your bios has the option to turn it off, that is the best way to disable it, second choice is to turn it off using the kernel option.

Hope this helps,
Jesse

------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics Get end-to-end visibility with application monitoring from AppDynamics Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
E1000-devel mailing list
E1000-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/e1000-devel
To learn more about Intel&#174; Ethernet, visit http://communities.intel.com/community/wired

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: e1000e on thinkpad x60: interrupt problem
  2013-07-09 21:40         ` Jesse Brandeburg
  2013-07-10  3:43           ` [E1000-devel] " Fujinaka, Todd
@ 2013-07-10 11:37           ` Pavel Machek
  1 sibling, 0 replies; 9+ messages in thread
From: Pavel Machek @ 2013-07-10 11:37 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Waskiewicz Jr, Peter P, Ronciak, John, Kirsher, Jeffrey T, Allan,
	Bruce W, Wyborny, Carolyn, Skidmore, Donald C, Rose, Gregory V,
	Duyck, Alexander H, Dave, Tushar N, e1000-devel, netdev

Hi!

> > Yeah, of course you need to ask e1000e if it generated the
> > interrupt. That part works. The part that actually generates the
> > interrupt does not. Take a look at original mail...
> > 
> > packet comes
> > e1000e sets E1000_ICR_INT_ASSERTED bit
> > e1000e tries to generate an interrupt and fails
> > 50msec passes
> 
> ^^ thats the ASPM timeout length.
> 
> > AHCI generates interrupt
> > all the handlers are called
> >     AHCI processes its interrupt, handles disk read
> >     e1000_intr notices E1000_ICR_INT_ASSERTED bit, delivers the packet.
> > 
> > Network still works, only slowly. Ping goes lower when I use the
> > disk. That matches what I see.
> > 
> > Do you have other explanation?
> 
> Regardless of what others are saying I believe you have an issue with
> ASPM being enabled.  All the discussion about shared interrupts, is
> just a distraction.  This issue would still occur (and just be worse)
> without a shared interrupt.

Agreed.

> You already mentioned that a kernel hack to disable ASPM fixes it, but
> you can just boot with different options to turn off ASPM.
> 
> pcie_aspm=off

Are you sure? AFAICT linux will not turn off aspm if ACPI says
so.. hence the hack. 

# From: Robert Hancock <hancockrwd@gmail.com>
# To: Pavel Machek <pavel@ucw.cz>
# CC: Greg KH <greg@kroah.com>, kernel list
# <linux-kernel@vger.kernel.org>,
#         joe.lawrence@stratus.com, myron.stowe@redhat.com,
#         bhelgaas@google.com
# Subject: Re: /sys/module/pcie_aspm/parameters/policy not writable?
# ...
# > pavel@amd:~$ dmesg | grep -i aspm
# > ACPI FADT declares the system doesn't support PCIe ASPM, so disable
# it
# 
# IIRC, this message is somewhat misleading. When that FADT flag is set
# by the BIOS, the kernel doesn't so much disable ASPM as disable the
# kernel's control over ASPM. I believe this was to match Windows
# behavior.

It looks like ASPM needs to be off, but BIOS enables ASPM and tells
kernel it is not supported... and that means that kernel will not
disable it :-(.

I guess there's no way to do ASPM disable for single device from the
driver?

Thanks,
									Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2013-07-10 11:37 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-09  1:05 e1000e on thinkpad x60: interrupt problem Pavel Machek
2013-07-09 15:51 ` Ronciak, John
2013-07-09 17:02   ` Pavel Machek
2013-07-09 17:15     ` Waskiewicz Jr, Peter P
2013-07-09 20:48       ` Pavel Machek
2013-07-09 20:59         ` Ronciak, John
2013-07-09 21:40         ` Jesse Brandeburg
2013-07-10  3:43           ` [E1000-devel] " Fujinaka, Todd
2013-07-10 11:37           ` Pavel Machek

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).