From mboxrd@z Thu Jan 1 00:00:00 1970 From: Julia Cartwright Subject: Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest Date: Tue, 4 Oct 2016 14:34:45 -0500 Message-ID: <20161004193445.GF10625@jcartwri.amer.corp.natinst.com> References: <20160922151205.m3cch6re77tox3aw@linutronix.de> <20160923123224.odybv2uos6tot6it@linutronix.de> <20160923144140.5tkzeymamrb5qnsv@linutronix.de> <20160928194519.GA32423@jcartwri.amer.corp.natinst.com> <487032ca81f84e70bdacc39a024eff5e@FE-MBX1012.de.bosch.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: Sebastian Andrzej Siewior , "linux-rt-users@vger.kernel.org" To: "Koehrer Mathias (ETAS/ESW5)" Return-path: Received: from skprod2.natinst.com ([130.164.80.23]:60450 "EHLO ni.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1751479AbcJDTe7 (ORCPT ); Tue, 4 Oct 2016 15:34:59 -0400 In-Reply-To: <487032ca81f84e70bdacc39a024eff5e@FE-MBX1012.de.bosch.com> Content-Disposition: inline Sender: linux-rt-users-owner@vger.kernel.org List-ID: On Tue, Oct 04, 2016 at 02:33:08PM +0000, Koehrer Mathias (ETAS/ESW5) wrote: > Hi Julia, Hey Mathias- > > Which, looks to me to be the normal "forced primary" interrupt handling path, which > > simply wakes the created irqthread. > > > > However, what isn't clear from the data is _which_ irqthread(s) is being woken up. > > Presumably, due to the prior igb traces, it's one of the igb interrupts, but that would > > be nice to confirm using the sched_wakeup event or other means. > > [..] > In the meanwhile I have detected another finding which might be relevant: > > With the 3.18 kernel the igb driver comes with two interrupts per NIC (e.g. eth2 and eth2-TxRx0) > with the 4.6. kernel the igb driver comes with 9 (!) interrupts per NIC: > eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7. > > As I have used initially the same kernel configuration from 3.18 also > for the 4.6. kernel I wonder where this comes from and if there is any > kernel option I may use to disable these many interrupts and to reduce > it to 2 again. If it's all of these interrupts that are firing and being handled at the same time, that can account for the latencies you were seeing. As I suggested before, having a trace with the sched_wakeup event enabled can help confirm that it's these interrupts causing problems. If it is true, then the question is: why is the device triggering all of these interrupts all at once? Is that expected? These are questions for netdev folks, I think. Julia