All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Koehrer Mathias (ETAS/ESW5)" <mathias.koehrer@etas.com>
To: Julia Cartwright <julia@ni.com>
Cc: "Williams, Mitch A" <mitch.a.williams@intel.com>,
	"Kirsher, Jeffrey T" <jeffrey.t.kirsher@intel.com>,
	"linux-rt-users@vger.kernel.org" <linux-rt-users@vger.kernel.org>,
	Sebastian Andrzej Siewior <sebastian.siewior@linutronix.de>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>, Greg <gvrose8192@gmail.com>,
	"Matthew Garrett" <mjg59@coreos.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	"Bjorn Helgaas" <helgaas@kernel.org>
Subject: RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
Date: Mon, 17 Oct 2016 15:00:43 +0000	[thread overview]
Message-ID: <82dcd5bb210f4f82af1e88313c3ec742@FE-MBX1012.de.bosch.com> (raw)
In-Reply-To: <20161014195536.GB27124@jcartwri.amer.corp.natinst.com>

[-- Attachment #1: Type: text/plain, Size: 3567 bytes --]

Hi Julia!
> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> > > anything RT specific about what you are seeing, but it might be nice
> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
> vanilla kernel.
> > >
> > > I find it unlikely that it's a kernel config option that changed
> > > which regressed you, but instead was a code change to a driver.
> > > Which driver is now the question, and the surface area is still big
> > > (processor mapping attributes for this region, PCI root complex configuration,
> PCI brige configuration, igb driver itself, etc.).
> > >
> > > Big enough that I'd recommend a bisection.  It looks like a
> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
> assuming all goes well.
> > >
> >
> > I have now repeated my tests using the vanilla kernel.
> > There I got the very same issue.
> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
> 
> Great, thanks for confirming!  That helps narrow things down quite a bit.
> 
> > Here is my exact (reproducible) test description:
> > I applied the following patch to the kernel to get the igb trace.
> > This patch instruments the igb_rd32() function to measure the call to
> > readl() which is used to access registers of the igb NIC.
> 
> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
> which is an Atom-based board with dual I210s, however I didn't see much
> difference.
> 
> However, it's a fairly simple board, with a much simpler PCI topology than your
> workstation.  I'll see if I can find some other hardware to test on.
> 
> [..]
> > This means, that I think that some other stuff in kernel 4.1 has
> > changed, which has impact on the igb accesses.
> >
> > Any idea what component could cause this kind of issue?
> 
> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
> between 4.0 and 4.1, so you're well on your way.
> 

OK - done.
And finally I was successful!
The following git commit is the one that is causing the trouble!
(The full commit is in the attachment).
+++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
commit 387d37577fdd05e9472c20885464c2a53b3c945f
Author: Matthew Garrett <mjg59@coreos.com>
Date:   Tue Apr 7 11:07:00 2015 -0700

    PCI: Don't clear ASPM bits when the FADT declares it's unsupported

    Communications with a hardware vendor confirm that the expected behaviour
    on systems that set the FADT ASPM disable bit but which still grant full
    PCIe control is for the OS to leave any BIOS configuration intact and
    refuse to touch the ASPM bits.  This mimics the behaviour of Windows.

    Signed-off-by: Matthew Garrett <mjg59@coreos.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
+++++++++++++++++++++ HEADER +++++++++++++++++++++++++++

The only files that are modified by this commit are 
drivers/acpi/pci_root.c
drivers/pci/pcie/aspm.c
include/linux/pci-aspm.h

This is all generic PCIe stuff - however I do not really understand what
the changes of the commit are...

In my setup I am using a dual port igb Ethernet adapter.
This has an onboard PCIe switch and it might be that the configuration of this
PCIe switch on the Intel board is causing the trouble.

Please see also the output of "lspci -v" in the attachment.
The relevant PCI address of the NIC is 04:00.0 / 04:00.1

Any feedback on this is welcome!

Thanks

Mathias






[-- Attachment #2: 387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz --]
[-- Type: application/x-gzip, Size: 1694 bytes --]

[-- Attachment #3: lspci.gz --]
[-- Type: application/x-gzip, Size: 2038 bytes --]

WARNING: multiple messages have this Message-ID (diff)
From: Koehrer Mathias (ETAS/ESW5) <mathias.koehrer@etas.com>
To: intel-wired-lan@osuosl.org
Subject: [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
Date: Mon, 17 Oct 2016 15:00:43 +0000	[thread overview]
Message-ID: <82dcd5bb210f4f82af1e88313c3ec742@FE-MBX1012.de.bosch.com> (raw)
In-Reply-To: <20161014195536.GB27124@jcartwri.amer.corp.natinst.com>

Hi Julia!
> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> > > anything RT specific about what you are seeing, but it might be nice
> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
> vanilla kernel.
> > >
> > > I find it unlikely that it's a kernel config option that changed
> > > which regressed you, but instead was a code change to a driver.
> > > Which driver is now the question, and the surface area is still big
> > > (processor mapping attributes for this region, PCI root complex configuration,
> PCI brige configuration, igb driver itself, etc.).
> > >
> > > Big enough that I'd recommend a bisection.  It looks like a
> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
> assuming all goes well.
> > >
> >
> > I have now repeated my tests using the vanilla kernel.
> > There I got the very same issue.
> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
> 
> Great, thanks for confirming!  That helps narrow things down quite a bit.
> 
> > Here is my exact (reproducible) test description:
> > I applied the following patch to the kernel to get the igb trace.
> > This patch instruments the igb_rd32() function to measure the call to
> > readl() which is used to access registers of the igb NIC.
> 
> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
> which is an Atom-based board with dual I210s, however I didn't see much
> difference.
> 
> However, it's a fairly simple board, with a much simpler PCI topology than your
> workstation.  I'll see if I can find some other hardware to test on.
> 
> [..]
> > This means, that I think that some other stuff in kernel 4.1 has
> > changed, which has impact on the igb accesses.
> >
> > Any idea what component could cause this kind of issue?
> 
> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
> between 4.0 and 4.1, so you're well on your way.
> 

OK - done.
And finally I was successful!
The following git commit is the one that is causing the trouble!
(The full commit is in the attachment).
+++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
commit 387d37577fdd05e9472c20885464c2a53b3c945f
Author: Matthew Garrett <mjg59@coreos.com>
Date:   Tue Apr 7 11:07:00 2015 -0700

    PCI: Don't clear ASPM bits when the FADT declares it's unsupported

    Communications with a hardware vendor confirm that the expected behaviour
    on systems that set the FADT ASPM disable bit but which still grant full
    PCIe control is for the OS to leave any BIOS configuration intact and
    refuse to touch the ASPM bits.  This mimics the behaviour of Windows.

    Signed-off-by: Matthew Garrett <mjg59@coreos.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
+++++++++++++++++++++ HEADER +++++++++++++++++++++++++++

The only files that are modified by this commit are 
drivers/acpi/pci_root.c
drivers/pci/pcie/aspm.c
include/linux/pci-aspm.h

This is all generic PCIe stuff - however I do not really understand what
the changes of the commit are...

In my setup I am using a dual port igb Ethernet adapter.
This has an onboard PCIe switch and it might be that the configuration of this
PCIe switch on the Intel board is causing the trouble.

Please see also the output of "lspci -v" in the attachment.
The relevant PCI address of the NIC is 04:00.0 / 04:00.1

Any feedback on this is welcome!

Thanks

Mathias





-------------- next part --------------
A non-text attachment was scrubbed...
Name: 387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz
Type: application/x-gzip
Size: 1694 bytes
Desc: 387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161017/3e2b3e35/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lspci.gz
Type: application/x-gzip
Size: 2038 bytes
Desc: lspci.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161017/3e2b3e35/attachment-0001.bin>

  reply	other threads:[~2016-10-17 15:00 UTC|newest]

Thread overview: 53+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-09-22 12:44 Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest Koehrer Mathias (ETAS/ESW5)
2016-09-22 15:12 ` Sebastian Andrzej Siewior
2016-09-23  6:38   ` AW: " Koehrer Mathias (ETAS/ESW5)
2016-09-23 11:40     ` Koehrer Mathias (ETAS/ESW5)
2016-09-23 12:32       ` Sebastian Andrzej Siewior
2016-09-23 13:23         ` Koehrer Mathias (ETAS/ESW5)
2016-09-23 14:41           ` Sebastian Andrzej Siewior
2016-09-26 11:12             ` Koehrer Mathias (ETAS/ESW5)
2016-09-28 19:45               ` Julia Cartwright
2016-10-04 14:33                 ` Koehrer Mathias (ETAS/ESW5)
2016-10-04 19:34                   ` Julia Cartwright
2016-10-05  7:02                     ` Koehrer Mathias (ETAS/ESW5)
2016-10-05 15:59                       ` Julia Cartwright
2016-10-05 15:59                         ` [Intel-wired-lan] " Julia Cartwright
2016-10-06  7:01                         ` Koehrer Mathias (ETAS/ESW5)
2016-10-06  7:01                           ` [Intel-wired-lan] " Koehrer Mathias
2016-10-06 10:12                           ` Henri Roosen
2016-10-06 10:12                             ` [Intel-wired-lan] " Henri Roosen
2016-10-06 17:58                           ` Williams, Mitch A
2016-10-06 17:58                             ` [Intel-wired-lan] " Williams, Mitch A
2016-10-07  8:58                             ` Koehrer Mathias (ETAS/ESW5)
2016-10-07  8:58                               ` [Intel-wired-lan] " Koehrer Mathias
2016-10-10 19:39                               ` Julia Cartwright
2016-10-10 19:39                                 ` [Intel-wired-lan] " Julia Cartwright
2016-10-13  6:15                                 ` Koehrer Mathias (ETAS/ESW5)
2016-10-13  6:15                                   ` [Intel-wired-lan] " Koehrer Mathias
2016-10-13 10:57                                   ` Koehrer Mathias (ETAS/ESW5)
2016-10-13 10:57                                     ` [Intel-wired-lan] " Koehrer Mathias
2016-10-13 14:02                                     ` David Laight
2016-10-13 14:02                                       ` [Intel-wired-lan] " David Laight
2016-10-13 16:18                                     ` Julia Cartwright
2016-10-13 16:18                                       ` [Intel-wired-lan] " Julia Cartwright
2016-10-14  8:58                                       ` Koehrer Mathias (ETAS/ESW5)
2016-10-14  8:58                                         ` [Intel-wired-lan] " Koehrer Mathias
2016-10-14 19:55                                         ` Julia Cartwright
2016-10-14 19:55                                           ` [Intel-wired-lan] " Julia Cartwright
2016-10-17 15:00                                           ` Koehrer Mathias (ETAS/ESW5) [this message]
2016-10-17 15:00                                             ` Koehrer Mathias
2016-10-17 15:39                                             ` Alexander Duyck
2016-10-17 15:39                                               ` Alexander Duyck
2016-10-17 18:32                                               ` Julia Cartwright
2016-10-17 18:32                                                 ` [Intel-wired-lan] " Julia Cartwright
2016-10-18  8:43                                                 ` Koehrer Mathias (ETAS/ESW5)
2016-10-18  8:43                                                   ` [Intel-wired-lan] " Koehrer Mathias
2016-10-14 22:06                                         ` Richard Cochran
2016-10-14 22:06                                           ` [Intel-wired-lan] " Richard Cochran
2016-10-17 18:36                                           ` Julia Cartwright
2016-10-17 18:36                                             ` [Intel-wired-lan] " Julia Cartwright
2016-10-17 19:03                                             ` Richard Cochran
2016-10-17 19:03                                               ` [Intel-wired-lan] " Richard Cochran
2016-09-26 11:48       ` Alexander Stein
2016-09-27  6:29         ` Koehrer Mathias (ETAS/ESW5)
2016-09-27  7:56           ` Mathias Koehrer

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=82dcd5bb210f4f82af1e88313c3ec742@FE-MBX1012.de.bosch.com \
    --to=mathias.koehrer@etas.com \
    --cc=bhelgaas@google.com \
    --cc=gvrose8192@gmail.com \
    --cc=helgaas@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=julia@ni.com \
    --cc=linux-rt-users@vger.kernel.org \
    --cc=mitch.a.williams@intel.com \
    --cc=mjg59@coreos.com \
    --cc=netdev@vger.kernel.org \
    --cc=sebastian.siewior@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.