Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest

All of lore.kernel.org
 help / color / mirror / Atom feed

* Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-09-22 12:44 Koehrer Mathias (ETAS/ESW5)
  2016-09-22 15:12 ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-09-22 12:44 UTC (permalink / raw)
  To: linux-rt-users

Hi all,

I have detected the strange effect, that the igb driver for Intel's Ethernet adapters is causing huge latencies in cyclictest on
a Core i7 PC (64bit) using kernel 4.6.7-rt13.

I run the cyclictest with the following options:
# cyclictest -a -i 100 -d 10 -m -n -t -p 80

The max latencies are really great, all below 5 microseconds (on an idle system, after 1 minute run time of cyclictest).
Of course the 2 minutes run-time of cyclictest is only a rough first estimate.
With this first test there is no igb base NIC configured yet.

In the PC I have also 2 port Ethernet NIC i350 from Intel that uses the igb driver.
Once I configure one of the i350 ports
# ifconfig eth2 up 192.168.100.100
the cyclictest shows directly and reproducibly significant larger max latency values (40 microseconds, using the same conditions).

I did the very same test with kernel version 3.18.27-rt27.
With that version I did not see anything like that.

Also, only the igb driver seems to cause the trouble. I have also an e1000e based NIC in this PC and 
the usage of this driver does not add any significant latency.

Any idea on this?

Thanks for any feedback

Best regards

Mathias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-22 12:44 Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest Koehrer Mathias (ETAS/ESW5)
@ 2016-09-22 15:12 ` Sebastian Andrzej Siewior
  2016-09-23  6:38   ` AW: " Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 53+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-22 15:12 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: linux-rt-users

On 2016-09-22 12:44:00 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> I run the cyclictest with the following options:
> # cyclictest -a -i 100 -d 10 -m -n -t -p 80

there is -S. And then 100 might be a little tight.

> Of course the 2 minutes run-time of cyclictest is only a rough first estimate.

and with no load…

> Once I configure one of the i350 ports
> # ifconfig eth2 up 192.168.100.100
> the cyclictest shows directly and reproducibly significant larger max latency values (40 microseconds, using the same conditions).
> 
> 
> I did the very same test with kernel version 3.18.27-rt27.
> With that version I did not see anything like that.
> 
> Also, only the igb driver seems to cause the trouble. I have also an e1000e based NIC in this PC and 
> the usage of this driver does not add any significant latency.
> 
> Any idea on this?

Does this also happen if you have the NIC up and you plug in / out the
cable? There are two things that come to mind:
  https://lkml.kernel.org/r/1445465268-10347-1-git-send-email-jonathan.david@ni.com
  https://lkml.kernel.org/r/1445886895-3692-1-git-send-email-joshc@ni.com

> Thanks for any feedback
> 
> Best regards
> 
> Mathias

Sebastian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* AW: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-22 15:12 ` Sebastian Andrzej Siewior
@ 2016-09-23  6:38   ` Koehrer Mathias (ETAS/ESW5)
  2016-09-23 11:40     ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-09-23  6:38 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

thanks for the feedback.
> > I run the cyclictest with the following options:
> > # cyclictest -a -i 100 -d 10 -m -n -t -p 80
> 
> there is -S. And then 100 might be a little tight.
> 
> > Of course the 2 minutes run-time of cyclictest is only a rough first estimate.
> 
> and with no load…
> 
> > Once I configure one of the i350 ports # ifconfig eth2 up
> > 192.168.100.100 the cyclictest shows directly and reproducibly
> > significant larger max latency values (40 microseconds, using the same
> conditions).
> >
> >
> > I did the very same test with kernel version 3.18.27-rt27.
> > With that version I did not see anything like that.
> >
> > Also, only the igb driver seems to cause the trouble. I have also an
> > e1000e based NIC in this PC and the usage of this driver does not add any
> significant latency.
> >
> > Any idea on this?
> 
> Does this also happen if you have the NIC up and you plug in / out the cable? There
> are two things that come to mind:
>   https://lkml.kernel.org/r/1445465268-10347-1-git-send-email-
> jonathan.david@ni.com
>   https://lkml.kernel.org/r/1445886895-3692-1-git-send-email-joshc@ni.com

This happens even if I have done "ifconfig up" on the NIC without having a cable 
plugged in. 
Also, it happens if I have a cable plugged in and the link is up but no traffic is running 
via this NIC port.
It looks as if solely the configured NIC port is causing the additional latency, 
no matter if traffic is flowing via this NIC or not and no matter if the link is up or not.

I did the same test with the kernel/rt_preempt patch versions 
4.1.33-rt37 and 4.4.19-rt27, they show the very same behavior.
In opposite to that, the version 3.18.27-rt27 is working stable!

As mentioned before, the "igb" driver is causing the issue. The "e1000e" driver works 
fine.

Best regards

Mathias


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-23  6:38   ` AW: " Koehrer Mathias (ETAS/ESW5)
@ 2016-09-23 11:40     ` Koehrer Mathias (ETAS/ESW5)
  2016-09-23 12:32       ` Sebastian Andrzej Siewior
  2016-09-26 11:48       ` Alexander Stein
  0 siblings, 2 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-09-23 11:40 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5), Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

> thanks for the feedback.
> > > I run the cyclictest with the following options:
> > > # cyclictest -a -i 100 -d 10 -m -n -t -p 80
> >
> > there is -S. And then 100 might be a little tight.
> >
> > > Of course the 2 minutes run-time of cyclictest is only a rough first estimate.
> >
> > and with no load…
> >
> > > Once I configure one of the i350 ports # ifconfig eth2 up
> > > 192.168.100.100 the cyclictest shows directly and reproducibly
> > > significant larger max latency values (40 microseconds, using the
> > > same
> > conditions).
> > >
> > >
> > > I did the very same test with kernel version 3.18.27-rt27.
> > > With that version I did not see anything like that.
> > >
> > > Also, only the igb driver seems to cause the trouble. I have also an
> > > e1000e based NIC in this PC and the usage of this driver does not
> > > add any
> > significant latency.
> > >
> > > Any idea on this?
> >
> > Does this also happen if you have the NIC up and you plug in / out the
> > cable? There are two things that come to mind:
> >   https://lkml.kernel.org/r/1445465268-10347-1-git-send-email-
> > jonathan.david@ni.com
> >
> > https://lkml.kernel.org/r/1445886895-3692-1-git-send-email-joshc@ni.co
> > m
> 
> This happens even if I have done "ifconfig up" on the NIC without having a cable
> plugged in.
> Also, it happens if I have a cable plugged in and the link is up but no traffic is running
> via this NIC port.
> It looks as if solely the configured NIC port is causing the additional latency, no
> matter if traffic is flowing via this NIC or not and no matter if the link is up or not.
> 
> I did the same test with the kernel/rt_preempt patch versions
> 4.1.33-rt37 and 4.4.19-rt27, they show the very same behavior.
> In opposite to that, the version 3.18.27-rt27 is working stable!
> 
> As mentioned before, the "igb" driver is causing the issue. The "e1000e" driver works
> fine.
> 
I did some further analysis.
The code that is causing the long latencies seems to be the 
function "igb_watchdog_task" within igb_main.c (Line: 4386). 
This function will be called periodically.
When I do a return at the beginning of this function the additional latency is not seen.
In particular that function calls "igb_has_link" which seems to be one candidate that is
causing additional latency.
Do you have any clue how this code can be executed properly without causing the
additional latencies?

Regards

Mathias


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-23 11:40     ` Koehrer Mathias (ETAS/ESW5)
@ 2016-09-23 12:32       ` Sebastian Andrzej Siewior
  2016-09-23 13:23         ` Koehrer Mathias (ETAS/ESW5)
  2016-09-26 11:48       ` Alexander Stein
  1 sibling, 1 reply; 53+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-23 12:32 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: linux-rt-users

On 2016-09-23 11:40:46 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Sebastian,
Hi Mathias,

> The code that is causing the long latencies seems to be the 
> function "igb_watchdog_task" within igb_main.c (Line: 4386). 
> This function will be called periodically.
> When I do a return at the beginning of this function the additional latency is not seen.
> In particular that function calls "igb_has_link" which seems to be one candidate that is
> causing additional latency.
> Do you have any clue how this code can be executed properly without causing the
> additional latencies?

Nope. The watchdog function is invoked via schedule_work() so it should
not be blocking anything. There is no preempt disable or anything like
that. Could you narrow it down further? The igb_has_link() seems also
small and simple.

> Regards
> 
> Mathias
> 
Sebastian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-23 12:32       ` Sebastian Andrzej Siewior
@ 2016-09-23 13:23         ` Koehrer Mathias (ETAS/ESW5)
  2016-09-23 14:41           ` Sebastian Andrzej Siewior
  0 siblings, 1 reply; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-09-23 13:23 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

Hi Sebastian,

> > The code that is causing the long latencies seems to be the function
> > "igb_watchdog_task" within igb_main.c (Line: 4386).
> > This function will be called periodically.
> > When I do a return at the beginning of this function the additional latency is not
> seen.
> > In particular that function calls "igb_has_link" which seems to be one
> > candidate that is causing additional latency.
> > Do you have any clue how this code can be executed properly without
> > causing the additional latencies?
> 
> Nope. The watchdog function is invoked via schedule_work() so it should not be
> blocking anything. There is no preempt disable or anything like that. Could you
> narrow it down further? The igb_has_link() seems also small and simple.
>
OK. I tried to dig deeper into that....
The critical stuff seems to be
igb_has_link() calling hw->mac_ops.check_for_link (which is igb_check_for_link_82575() )
calling igb_check_for_copper_link() calling igb_phy_has_link.

Within igb_phy_has_link() the line with the code
   ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
seems to be the candidate that causing the trouble.
And this function pointer points to igb_read_phy_reg_82580().

Does this give you any idea what goes wrong?

Thanks for any hints!

Regards

Mathias




^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-23 13:23         ` Koehrer Mathias (ETAS/ESW5)
@ 2016-09-23 14:41           ` Sebastian Andrzej Siewior
  2016-09-26 11:12             ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 53+ messages in thread
From: Sebastian Andrzej Siewior @ 2016-09-23 14:41 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: linux-rt-users

On 2016-09-23 13:23:22 [+0000], Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Sebastian,
Hi Mathias,

> The critical stuff seems to be
> igb_has_link() calling hw->mac_ops.check_for_link (which is igb_check_for_link_82575() )
> calling igb_check_for_copper_link() calling igb_phy_has_link.
> 
> Within igb_phy_has_link() the line with the code
>    ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status);
> seems to be the candidate that causing the trouble.
> And this function pointer points to igb_read_phy_reg_82580().
> 
> Does this give you any idea what goes wrong?

Nope. There is nothing that might prevents preemption (unless I missed
something). However if you there are a lot of posted writes writes to
the HW, the read might force them to occur *now* which will stall the
CPU until it happens.
Try to comment the function(s) out to see if that spikes vanishes. 
That igb_read_phy_reg_82580() function also invokes another one for the
acquire callback.

> Thanks for any hints!
> 
> Regards
> 
> Mathias

Sebastian

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-23 14:41           ` Sebastian Andrzej Siewior
@ 2016-09-26 11:12             ` Koehrer Mathias (ETAS/ESW5)
  2016-09-28 19:45               ` Julia Cartwright
  0 siblings, 1 reply; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-09-26 11:12 UTC (permalink / raw)
  To: Sebastian Andrzej Siewior; +Cc: linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 1892 bytes --]

Hi Sebastian,
> > The critical stuff seems to be
> > igb_has_link() calling hw->mac_ops.check_for_link (which is
> > igb_check_for_link_82575() ) calling igb_check_for_copper_link() calling
> igb_phy_has_link.
> >
> > Within igb_phy_has_link() the line with the code
> >    ret_val = hw->phy.ops.read_reg(hw, PHY_STATUS, &phy_status); seems
> > to be the candidate that causing the trouble.
> > And this function pointer points to igb_read_phy_reg_82580().
> >
> > Does this give you any idea what goes wrong?
> 
> Nope. There is nothing that might prevents preemption (unless I missed something).
> However if you there are a lot of posted writes writes to the HW, the read might
> force them to occur *now* which will stall the CPU until it happens.
> Try to comment the function(s) out to see if that spikes vanishes.
> That igb_read_phy_reg_82580() function also invokes another one for the acquire
> callback. 

I tried hard to identify a single function that is causing the spikes. 
However I had no success. I could not identify a single function.
What I did now instead was to use the kernel tracer to identify the issue.
I run now cyclictest with only one thread by using the following command:

# cyclictest -a -i 100 -m -n -p 80 -t 1  -b 25 -f -O latency-format

Please see the attached fragment of the trace output.
It contains the last couple of cycles from the cyclictest before it stopped because
of the latency being larger than 25us.

I repeated that test a multiple times.
And I noticed always igb relevant functions just before the final cycle in the cycletest.
And for me it looks as if the calls that are in the trace from line 3621 to 3641 occur
always before the latency exceeds the threshold.

Can you please have a look inside the attached trace file.
Do you have any idea what might cause the issue here?

Thanks

Mathias

[-- Attachment #2: trace-extract.gz --]
[-- Type: application/x-gzip, Size: 20377 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-23 11:40     ` Koehrer Mathias (ETAS/ESW5)
  2016-09-23 12:32       ` Sebastian Andrzej Siewior
@ 2016-09-26 11:48       ` Alexander Stein
  2016-09-27  6:29         ` Koehrer Mathias (ETAS/ESW5)
  1 sibling, 1 reply; 53+ messages in thread
From: Alexander Stein @ 2016-09-26 11:48 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: Sebastian Andrzej Siewior, linux-rt-users

On Friday 23 September 2016 11:40:46, Koehrer Mathias wrote:
> Hi Sebastian,
> 
> 
> > thanks for the feedback.
> > 
> > > > I run the cyclictest with the following options:
> > > > # cyclictest -a -i 100 -d 10 -m -n -t -p 80
> > >
> > >
> > >
> > > there is -S. And then 100 might be a little tight.
> > >
> > >
> > >
> > > > Of course the 2 minutes run-time of cyclictest is only a rough first
> > > > estimate.
> >
> > >
> > >
> > > and with no load…
> > >
> > >
> > >
> > > > Once I configure one of the i350 ports # ifconfig eth2 up
> > > > 192.168.100.100 the cyclictest shows directly and reproducibly
> > > > significant larger max latency values (40 microseconds, using the
> > > > same
> > > 
> > > conditions).
> > > 
> > > >
> > > >
> > > >
> > > > I did the very same test with kernel version 3.18.27-rt27.
> > > > With that version I did not see anything like that.
> > > >
> > > >
> > > >
> > > > Also, only the igb driver seems to cause the trouble. I have also an
> > > > e1000e based NIC in this PC and the usage of this driver does not
> > > > add any
> > > 
> > > significant latency.
> > > 
> > > >
> > > >
> > > > Any idea on this?
> > >
> > >
> > >
> > > Does this also happen if you have the NIC up and you plug in / out the
> > > cable? There are two things that come to mind:
> > > 
> > >   https://lkml.kernel.org/r/1445465268-10347-1-git-send-email-> > > 
> > > jonathan.david@ni.com
> > >
> > >
> > >
> > > https://lkml.kernel.org/r/1445886895-3692-1-git-send-email-joshc@ni.co
> > > m
> > 
> > 
> > This happens even if I have done "ifconfig up" on the NIC without having a
> > cable
 plugged in.
> > Also, it happens if I have a cable plugged in and the link is up but no
> > traffic is running
 via this NIC port.
> > It looks as if solely the configured NIC port is causing the additional
> > latency, no
 matter if traffic is flowing via this NIC or not and no
> > matter if the link is up or not. 
> > I did the same test with the kernel/rt_preempt patch versions
> > 4.1.33-rt37 and 4.4.19-rt27, they show the very same behavior.
> > In opposite to that, the version 3.18.27-rt27 is working stable!
> > 
> > As mentioned before, the "igb" driver is causing the issue. The "e1000e"
> > driver works
 fine.
> > 
> 
> I did some further analysis.
> The code that is causing the long latencies seems to be the 
> function "igb_watchdog_task" within igb_main.c (Line: 4386). 
> This function will be called periodically.
> When I do a return at the beginning of this function the additional latency
> is not seen.
 In particular that function calls "igb_has_link" which seems
> to be one candidate that is causing additional latency.
> Do you have any clue how this code can be executed properly without causing
> the
 additional latencies?

IMHO something in igb_watchdog_task causes the latency independently from 
actual link. At first glance I would suspect igb_update_stats (called with 
spinlock held) as it seems to do a lot of reads. Maybe this stalls somehow.
Does the latency still occur if you comment that spinlock and call to 
igb_update_stats out?

Best regards,
Alexander


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-26 11:48       ` Alexander Stein
@ 2016-09-27  6:29         ` Koehrer Mathias (ETAS/ESW5)
  2016-09-27  7:56           ` Mathias Koehrer
  0 siblings, 1 reply; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-09-27  6:29 UTC (permalink / raw)
  To: Alexander Stein; +Cc: Sebastian Andrzej Siewior, linux-rt-users

Hi Alexander,

thanks for the proposal. 
> 
> IMHO something in igb_watchdog_task causes the latency independently from
> actual link. At first glance I would suspect igb_update_stats (called with spinlock
> held) as it seems to do a lot of reads. Maybe this stalls somehow.
> Does the latency still occur if you comment that spinlock and call to
> igb_update_stats out?

I tried that, however the result is the very same. I have the very same latency spikes as before.
I went even further. Within the function igb_watchdog_task I removed everything but the call to igb_has_link()
and the final timer reset.
However even that led to the same result.
And there I am again at the findings in one of my previous postings within this thread.
The call to igb_has_link might be a reason for the latency jitters, however I cannot really break it down into a 
single piece of code that is relevant here.

Best regards

Mathias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-27  6:29         ` Koehrer Mathias (ETAS/ESW5)
@ 2016-09-27  7:56           ` Mathias Koehrer
  0 siblings, 0 replies; 53+ messages in thread
From: Mathias Koehrer @ 2016-09-27  7:56 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5), Alexander Stein
  Cc: Sebastian Andrzej Siewior, linux-rt-users

Hi,

> > IMHO something in igb_watchdog_task causes the latency independently
> > from actual link. At first glance I would suspect igb_update_stats
> > (called with spinlock
> > held) as it seems to do a lot of reads. Maybe this stalls somehow.
> > Does the latency still occur if you comment that spinlock and call to
> > igb_update_stats out?
> 
> I tried that, however the result is the very same. I have the very same latency spikes
> as before.
> I went even further. Within the function igb_watchdog_task I removed everything but
> the call to igb_has_link() and the final timer reset.
> However even that led to the same result.
> And there I am again at the findings in one of my previous postings within this
> thread.
> The call to igb_has_link might be a reason for the latency jitters, however I cannot
> really break it down into a single piece of code that is relevant here.
I tried something completely different: 
I used the Intel Linux igb driver from the Intel website
(version 5.3.5.3) and tried this driver on the rt_preempt kernels 3.18.27-rt27 as well as
on the 4.6.7-rt13.
And also here I saw the same effect: With the 3.18.27 the cyclictest shows really great results
but with the 4.6.7 the cyclictest results show spikes once a igb based NIC is enabled.
For me this looks as if there is something running differently in the "underground" and the 
igb driver is make some strange effect visible.
Any ideas on this new finding?

Regards

Mathias


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-26 11:12             ` Koehrer Mathias (ETAS/ESW5)
@ 2016-09-28 19:45               ` Julia Cartwright
  2016-10-04 14:33                 ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 53+ messages in thread
From: Julia Cartwright @ 2016-09-28 19:45 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: Sebastian Andrzej Siewior, linux-rt-users

On Mon, Sep 26, 2016 at 11:12:19AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
[..]
> > Nope. There is nothing that might prevents preemption (unless I missed something).
> > However if you there are a lot of posted writes writes to the HW, the read might
> > force them to occur *now* which will stall the CPU until it happens.
> > Try to comment the function(s) out to see if that spikes vanishes.
> > That igb_read_phy_reg_82580() function also invokes another one for the acquire
> > callback. 
> 
> I tried hard to identify a single function that is causing the spikes. 
> However I had no success. I could not identify a single function.
> What I did now instead was to use the kernel tracer to identify the issue.
> I run now cyclictest with only one thread by using the following command:
> 
> # cyclictest -a -i 100 -m -n -p 80 -t 1  -b 25 -f -O latency-format
> 
> Please see the attached fragment of the trace output.
> It contains the last couple of cycles from the cyclictest before it stopped because
> of the latency being larger than 25us.
> 
> I repeated that test a multiple times.
> And I noticed always igb relevant functions just before the final cycle in the cycletest.
> And for me it looks as if the calls that are in the trace from line 3621 to 3641 occur
> always before the latency exceeds the threshold.
> 
> Can you please have a look inside the attached trace file.
> Do you have any idea what might cause the issue here?

The trace data _seems_ to show that CPU0 is stuck with interrupts
disabled/handling hardirqs from 3702725us (presumably the hrtimer wakeup
for the cyclictest thread), to 3702765us when cyclictest actually gets
scheduled in, which is about 40us.

Right after the initial hrtimer wakeup event, there appears to be a
eight or so of these similar looking traces:

  kworker/-6071    0dN..1.. 3702729us : do_IRQ <-ret_from_intr
  kworker/-6071    0dN..1.. 3702729us : irq_enter <-do_IRQ
  kworker/-6071    0dN..1.. 3702729us : rcu_irq_enter <-irq_enter
  kworker/-6071    0dN..1.. 3702729us : preempt_count_add <-irq_enter
  kworker/-6071    0dN.h1.. 3702729us : exit_idle <-do_IRQ
  kworker/-6071    0dN.h1.. 3702729us : handle_irq <-do_IRQ
  kworker/-6071    0dN.h1.. 3702729us : handle_edge_irq <-handle_irq
  kworker/-6071    0dN.h1.. 3702729us : _raw_spin_lock <-handle_edge_irq
  kworker/-6071    0dN.h1.. 3702729us : preempt_count_add <-_raw_spin_lock
  kworker/-6071    0dN.h2.. 3702730us : irq_may_run <-handle_edge_irq
  kworker/-6071    0dN.h2.. 3702730us : irq_chip_ack_parent <-handle_edge_irq
  kworker/-6071    0dN.h2.. 3702730us : apic_ack_edge <-irq_chip_ack_parent
  kworker/-6071    0dN.h2.. 3702730us : __irq_complete_move <-apic_ack_edge
  kworker/-6071    0dN.h2.. 3702730us : irq_move_irq <-apic_ack_edge
  kworker/-6071    0dN.h2.. 3702730us : handle_irq_event <-handle_edge_irq
  kworker/-6071    0dN.h2.. 3702730us : preempt_count_sub <-handle_irq_event
  kworker/-6071    0dN.h1.. 3702730us : handle_irq_event_percpu <-handle_irq_event
  kworker/-6071    0dN.h1.. 3702731us : irq_default_primary_handler <-handle_irq_event_percpu
  kworker/-6071    0dN.h1.. 3702731us : __irq_wake_thread <-handle_irq_event_percpu
  kworker/-6071    0dN.h1.. 3702731us : wake_up_process <-__irq_wake_thread
  kworker/-6071    0dN.h1.. 3702731us : try_to_wake_up <-wake_up_process
  kworker/-6071    0dN.h1.. 3702731us : _raw_spin_lock_irqsave <-try_to_wake_up
  kworker/-6071    0dN.h1.. 3702731us : preempt_count_add <-_raw_spin_lock_irqsave
  kworker/-6071    0dN.h2.. 3702731us : _raw_spin_lock <-try_to_wake_up
  kworker/-6071    0dN.h2.. 3702731us : preempt_count_add <-_raw_spin_lock
  kworker/-6071    0dN.h3.. 3702731us : ttwu_do_activate.constprop.104 <-try_to_wake_up
  kworker/-6071    0dN.h3.. 3702731us : activate_task <-ttwu_do_activate.constprop.104
  kworker/-6071    0dN.h3.. 3702731us : update_rq_clock.part.91 <-activate_task
  kworker/-6071    0dN.h3.. 3702732us : enqueue_task_rt <-activate_task
  kworker/-6071    0dN.h3.. 3702732us : dequeue_rt_stack <-enqueue_task_rt
  kworker/-6071    0dN.h3.. 3702732us : dequeue_top_rt_rq <-dequeue_rt_stack
  kworker/-6071    0dN.h3.. 3702732us : update_rt_migration <-enqueue_task_rt
  kworker/-6071    0dN.h3.. 3702732us : _raw_spin_lock <-enqueue_task_rt
  kworker/-6071    0dN.h3.. 3702732us : preempt_count_add <-_raw_spin_lock
  kworker/-6071    0dN.h4.. 3702732us : preempt_count_sub <-enqueue_task_rt
  kworker/-6071    0dN.h3.. 3702732us : enqueue_top_rt_rq <-enqueue_task_rt
  kworker/-6071    0dN.h3.. 3702732us : ttwu_do_wakeup <-ttwu_do_activate.constprop.104
  kworker/-6071    0dN.h3.. 3702732us : check_preempt_curr <-ttwu_do_wakeup
  kworker/-6071    0dN.h3.. 3702732us : resched_curr <-check_preempt_curr
  kworker/-6071    0dN.h3.. 3702732us : task_woken_rt <-ttwu_do_wakeup
  kworker/-6071    0dN.h3.. 3702733us : preempt_count_sub <-try_to_wake_up
  kworker/-6071    0dN.h2.. 3702733us : _raw_spin_unlock_irqrestore <-try_to_wake_up
  kworker/-6071    0dN.h2.. 3702733us : preempt_count_sub <-_raw_spin_unlock_irqrestore
  kworker/-6071    0dN.h1.. 3702733us : note_interrupt <-handle_irq_event_percpu
  kworker/-6071    0dN.h1.. 3702733us : _raw_spin_lock <-handle_irq_event
  kworker/-6071    0dN.h1.. 3702733us : preempt_count_add <-_raw_spin_lock
  kworker/-6071    0dN.h2.. 3702733us : preempt_count_sub <-handle_edge_irq
  kworker/-6071    0dN.h1.. 3702733us : irq_exit <-do_IRQ
  kworker/-6071    0dN.h1.. 3702733us : preempt_count_sub <-irq_exit
  kworker/-6071    0dN..1.. 3702733us : rcu_irq_exit <-irq_exit

Which, looks to me to be the normal "forced primary" interrupt handling
path, which simply wakes the created irqthread.

However, what isn't clear from the data is _which_ irqthread(s) is being
woken up.  Presumably, due to the prior igb traces, it's one of the igb
interrupts, but that would be nice to confirm using the sched_wakeup
event or other means.

Similar to the PCI write-buffering cases, we've also observed that when
the PCI interconnect is bogged down with traffic from other masters,
it's possible that a read from the CPU can be stalled (in some cases,
for quite awhile, but it depends on the PCI topology, switches used,
their configurations, etc).

So, one plausible narrative here: it's conceivable that the
rd32(E1000_RXSTMPH) in igb_ptp_rx_hang() is "stuck" in the bus somewhere
presumably due to load from other masters (the trace seems to indicate
it might be as much as 20us), with CPU execution stalled awaiting it's
completion.  Meanwhile, the CPU is encountering interrupts from other
sources (local APIC, etc).  Once the read "completes", the CPU is
finally able to service all of the interrupts that have piled up, which
is why we see in the traces these 9 wakeups happening in a row.

The question is: how can we confirm/refute this, or are there other,
more plausible scenarios it's possible to run into?

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-09-28 19:45               ` Julia Cartwright
@ 2016-10-04 14:33                 ` Koehrer Mathias (ETAS/ESW5)
  2016-10-04 19:34                   ` Julia Cartwright
  0 siblings, 1 reply; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-04 14:33 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: Sebastian Andrzej Siewior, linux-rt-users

Hi Julia,

> 
> Which, looks to me to be the normal "forced primary" interrupt handling path, which
> simply wakes the created irqthread.
> 
> However, what isn't clear from the data is _which_ irqthread(s) is being woken up.
> Presumably, due to the prior igb traces, it's one of the igb interrupts, but that would
> be nice to confirm using the sched_wakeup event or other means.
> 
> Similar to the PCI write-buffering cases, we've also observed that when the PCI
> interconnect is bogged down with traffic from other masters, it's possible that a read
> from the CPU can be stalled (in some cases, for quite awhile, but it depends on the
> PCI topology, switches used, their configurations, etc).
> 
> So, one plausible narrative here: it's conceivable that the
> rd32(E1000_RXSTMPH) in igb_ptp_rx_hang() is "stuck" in the bus somewhere
> presumably due to load from other masters (the trace seems to indicate it might be
> as much as 20us), with CPU execution stalled awaiting it's completion.  Meanwhile,
> the CPU is encountering interrupts from other sources (local APIC, etc).  Once the
> read "completes", the CPU is finally able to service all of the interrupts that have
> piled up, which is why we see in the traces these 9 wakeups happening in a row.
> 
> The question is: how can we confirm/refute this, or are there other, more plausible
> scenarios it's possible to run into?
Thanks for the proposal. Unfortunately I have no idea on this.

In the meanwhile I have detected another finding which might be relevant:
With the 3.18 kernel the igb driver comes with two interrupts per NIC (e.g. eth2 and eth2-TxRx0)
with the 4.6. kernel the igb driver comes with 9 (!) interrupts per NIC: 
eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7.
As I have used initially the same kernel configuration from 3.18 also for the 4.6. kernel I wonder
where this comes from and if there is any kernel option I may use to disable these many
interrupts and to reduce it to 2 again.

Any idea on this is welcome.

Regards

Mathias


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-04 14:33                 ` Koehrer Mathias (ETAS/ESW5)
@ 2016-10-04 19:34                   ` Julia Cartwright
  2016-10-05  7:02                     ` Koehrer Mathias (ETAS/ESW5)
  0 siblings, 1 reply; 53+ messages in thread
From: Julia Cartwright @ 2016-10-04 19:34 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5); +Cc: Sebastian Andrzej Siewior, linux-rt-users

On Tue, Oct 04, 2016 at 02:33:08PM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Julia,

Hey Mathias-

> > Which, looks to me to be the normal "forced primary" interrupt handling path, which
> > simply wakes the created irqthread.
> > 
> > However, what isn't clear from the data is _which_ irqthread(s) is being woken up.
> > Presumably, due to the prior igb traces, it's one of the igb interrupts, but that would
> > be nice to confirm using the sched_wakeup event or other means.
> > 
[..]

> In the meanwhile I have detected another finding which might be relevant:
>
> With the 3.18 kernel the igb driver comes with two interrupts per NIC (e.g. eth2 and eth2-TxRx0)
> with the 4.6. kernel the igb driver comes with 9 (!) interrupts per NIC: 
> eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7.
>
> As I have used initially the same kernel configuration from 3.18 also
> for the 4.6. kernel I wonder where this comes from and if there is any
> kernel option I may use to disable these many interrupts and to reduce
> it to 2 again.

If it's all of these interrupts that are firing and being handled at the
same time, that can account for the latencies you were seeing.  As I
suggested before, having a trace with the sched_wakeup event enabled can
help confirm that it's these interrupts causing problems.

If it is true, then the question is: why is the device triggering all of
these interrupts all at once?  Is that expected?  These are questions
for netdev folks, I think.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-04 19:34                   ` Julia Cartwright
@ 2016-10-05  7:02                     ` Koehrer Mathias (ETAS/ESW5)
  2016-10-05 15:59                         ` [Intel-wired-lan] " Julia Cartwright
  0 siblings, 1 reply; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-05  7:02 UTC (permalink / raw)
  To: Julia Cartwright; +Cc: Sebastian Andrzej Siewior, linux-rt-users

[-- Attachment #1: Type: text/plain, Size: 2817 bytes --]

Hi Julia,

> > In the meanwhile I have detected another finding which might be relevant:
> >
> > With the 3.18 kernel the igb driver comes with two interrupts per NIC
> > (e.g. eth2 and eth2-TxRx0) with the 4.6. kernel the igb driver comes with 9 (!)
> interrupts per NIC:
> > eth2, and eth2-TxRx-0, eth2-TxRx-1, ... , eth2-TxRx-7.
> >
> > As I have used initially the same kernel configuration from 3.18 also
> > for the 4.6. kernel I wonder where this comes from and if there is any
> > kernel option I may use to disable these many interrupts and to reduce
> > it to 2 again.
> 
> If it's all of these interrupts that are firing and being handled at the same time, that
> can account for the latencies you were seeing.  As I suggested before, having a
> trace with the sched_wakeup event enabled can help confirm that it's these
> interrupts causing problems.
> 
> If it is true, then the question is: why is the device triggering all of these interrupts all
> at once?  Is that expected?  These are questions for netdev folks, I think.
> 
>    Julia

OK - I ran again the cyclictest. This time I used the -C option:
# cyclictest -a -i 100 -m -n -p 80 -t 1 -b 21 -C

And the last output lines of the trace are:
cyclicte-5887    0d...2.. 1504647266us!: sched_switch: prev_comm=cyclictest prev_pid=5887 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1504647372us : sched_wakeup: comm=cyclictest pid=5887 prio=19 target_cpu=000
kworker/-5       0dN.h3.. 1504647374us : sched_wakeup: comm=irq/54-eth2-TxR pid=5883 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647375us : sched_wakeup: comm=irq/53-eth2-TxR pid=5882 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647377us : sched_wakeup: comm=irq/52-eth2-TxR pid=5881 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647378us : sched_wakeup: comm=irq/51-eth2-TxR pid=5880 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647380us : sched_wakeup: comm=irq/50-eth2-TxR pid=5879 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647381us : sched_wakeup: comm=irq/49-eth2-TxR pid=5878 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647382us : sched_wakeup: comm=irq/48-eth2-TxR pid=5877 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1504647383us : sched_wakeup: comm=irq/47-eth2-TxR pid=5876 prio=49 target_cpu=000
kworker/-5       0d...2.. 1504647384us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=5887 next_prio=19
cyclicte-5887    0.....11 1504647389us : tracing_mark_write: hit latency threshold (28 > 21)

The attached trace-extract.gz shows some more lines.
It actually looks to me as if the the many irq threads from igb are causing the issue.

Regards

Mathias


[-- Attachment #2: trace-extract.gz --]
[-- Type: application/x-gzip, Size: 1191 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-05  7:02                     ` Koehrer Mathias (ETAS/ESW5)
@ 2016-10-05 15:59                         ` Julia Cartwright
  0 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-05 15:59 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5), Jeff Kirsher
  Cc: Sebastian Andrzej Siewior, linux-rt-users, intel-wired-lan, netdev

On Wed, Oct 05, 2016 at 07:02:21AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Julia,
> 
> > > In the meanwhile I have detected another finding which might be relevant:
> > >
> > > With the 3.18 kernel the igb driver comes with two interrupts per
> > > NIC (e.g. eth2 and eth2-TxRx0) with the 4.6. kernel the igb driver
> > > comes with 9 (!) interrupts per NIC: eth2, and eth2-TxRx-0,
> > > eth2-TxRx-1, ... , eth2-TxRx-7.
> > >
> > > As I have used initially the same kernel configuration from 3.18 also
> > > for the 4.6. kernel I wonder where this comes from and if there is any
> > > kernel option I may use to disable these many interrupts and to reduce
> > > it to 2 again.
> > 
> > If it's all of these interrupts that are firing and being handled at the same time, that
> > can account for the latencies you were seeing.  As I suggested before, having a
> > trace with the sched_wakeup event enabled can help confirm that it's these
> > interrupts causing problems.
> > 
> > If it is true, then the question is: why is the device triggering all of these interrupts all
> > at once?  Is that expected?  These are questions for netdev folks, I think.
> > 
> >    Julia
> 
> OK - I ran again the cyclictest. This time I used the -C option:
> # cyclictest -a -i 100 -m -n -p 80 -t 1 -b 21 -C
> 
> And the last output lines of the trace are:
> cyclicte-5887    0d...2.. 1504647266us!: sched_switch: prev_comm=cyclictest prev_pid=5887 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1504647372us : sched_wakeup: comm=cyclictest pid=5887 prio=19 target_cpu=000
> kworker/-5       0dN.h3.. 1504647374us : sched_wakeup: comm=irq/54-eth2-TxR pid=5883 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647375us : sched_wakeup: comm=irq/53-eth2-TxR pid=5882 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647377us : sched_wakeup: comm=irq/52-eth2-TxR pid=5881 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647378us : sched_wakeup: comm=irq/51-eth2-TxR pid=5880 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647380us : sched_wakeup: comm=irq/50-eth2-TxR pid=5879 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647381us : sched_wakeup: comm=irq/49-eth2-TxR pid=5878 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647382us : sched_wakeup: comm=irq/48-eth2-TxR pid=5877 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647383us : sched_wakeup: comm=irq/47-eth2-TxR pid=5876 prio=49 target_cpu=000
> kworker/-5       0d...2.. 1504647384us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=5887 next_prio=19
> cyclicte-5887    0.....11 1504647389us : tracing_mark_write: hit latency threshold (28 > 21)
> 
> The attached trace-extract.gz shows some more lines.
> It actually looks to me as if the the many irq threads from igb are causing the issue.

Yes, I think so.

Although, to be clear, it isn't the fact that there exists 8 threads,
it's that the device is firing all 8 interrupts at the same time.  The
time spent in hardirq context just waking up all 8 of those threads (and
the cyclictest wakeup) is enough to cause your regression.

netdev/igb folks-

Under what conditions should it be expected that the i350 trigger all of
the TxRx interrupts simultaneously?  Any ideas here?

See the start of this thread here:

  http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56@FE-MBX1012.de.bosch.com

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-05 15:59                         ` Julia Cartwright
  0 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-05 15:59 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Oct 05, 2016 at 07:02:21AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Julia,
> 
> > > In the meanwhile I have detected another finding which might be relevant:
> > >
> > > With the 3.18 kernel the igb driver comes with two interrupts per
> > > NIC (e.g. eth2 and eth2-TxRx0) with the 4.6. kernel the igb driver
> > > comes with 9 (!) interrupts per NIC: eth2, and eth2-TxRx-0,
> > > eth2-TxRx-1, ... , eth2-TxRx-7.
> > >
> > > As I have used initially the same kernel configuration from 3.18 also
> > > for the 4.6. kernel I wonder where this comes from and if there is any
> > > kernel option I may use to disable these many interrupts and to reduce
> > > it to 2 again.
> > 
> > If it's all of these interrupts that are firing and being handled at the same time, that
> > can account for the latencies you were seeing.  As I suggested before, having a
> > trace with the sched_wakeup event enabled can help confirm that it's these
> > interrupts causing problems.
> > 
> > If it is true, then the question is: why is the device triggering all of these interrupts all
> > at once?  Is that expected?  These are questions for netdev folks, I think.
> > 
> >    Julia
> 
> OK - I ran again the cyclictest. This time I used the -C option:
> # cyclictest -a -i 100 -m -n -p 80 -t 1 -b 21 -C
> 
> And the last output lines of the trace are:
> cyclicte-5887    0d...2.. 1504647266us!: sched_switch: prev_comm=cyclictest prev_pid=5887 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1504647372us : sched_wakeup: comm=cyclictest pid=5887 prio=19 target_cpu=000
> kworker/-5       0dN.h3.. 1504647374us : sched_wakeup: comm=irq/54-eth2-TxR pid=5883 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647375us : sched_wakeup: comm=irq/53-eth2-TxR pid=5882 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647377us : sched_wakeup: comm=irq/52-eth2-TxR pid=5881 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647378us : sched_wakeup: comm=irq/51-eth2-TxR pid=5880 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647380us : sched_wakeup: comm=irq/50-eth2-TxR pid=5879 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647381us : sched_wakeup: comm=irq/49-eth2-TxR pid=5878 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647382us : sched_wakeup: comm=irq/48-eth2-TxR pid=5877 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1504647383us : sched_wakeup: comm=irq/47-eth2-TxR pid=5876 prio=49 target_cpu=000
> kworker/-5       0d...2.. 1504647384us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=5887 next_prio=19
> cyclicte-5887    0.....11 1504647389us : tracing_mark_write: hit latency threshold (28 > 21)
> 
> The attached trace-extract.gz shows some more lines.
> It actually looks to me as if the the many irq threads from igb are causing the issue.

Yes, I think so.

Although, to be clear, it isn't the fact that there exists 8 threads,
it's that the device is firing all 8 interrupts at the same time.  The
time spent in hardirq context just waking up all 8 of those threads (and
the cyclictest wakeup) is enough to cause your regression.

netdev/igb folks-

Under what conditions should it be expected that the i350 trigger all of
the TxRx interrupts simultaneously?  Any ideas here?

See the start of this thread here:

  http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56 at FE-MBX1012.de.bosch.com

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-05 15:59                         ` [Intel-wired-lan] " Julia Cartwright
@ 2016-10-06  7:01                           ` Koehrer Mathias
  -1 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-06  7:01 UTC (permalink / raw)
  To: Julia Cartwright, Jeff Kirsher, Greg
  Cc: Sebastian Andrzej Siewior, linux-rt-users, intel-wired-lan, netdev

[-- Attachment #1: Type: text/plain, Size: 7308 bytes --]

Hi all,
> 
> Although, to be clear, it isn't the fact that there exists 8 threads, it's that the device is
> firing all 8 interrupts at the same time.  The time spent in hardirq context just waking
> up all 8 of those threads (and the cyclictest wakeup) is enough to cause your
> regression.
> 
> netdev/igb folks-
> 
> Under what conditions should it be expected that the i350 trigger all of the TxRx
> interrupts simultaneously?  Any ideas here?
> 
> See the start of this thread here:
> 
>   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56@FE-
> MBX1012.de.bosch.com
> 
Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of queues.
I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-tx-0).
However the issue remains the same.

I ran the cyclictest again:
# cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
(Note: When using 105us instead of 100us the long latencies seem to occur more often).

Here are the final lines of the kernel trace output:
  <idle>-0       4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56 next_prio=98
ktimerso-46      3d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/3 prev_pid=46 prev_prio=98 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
ktimerso-24      1d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/1 prev_pid=24 prev_prio=98 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
ktimerso-79      6d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/6 prev_pid=79 prev_prio=98 prev_state=S ==> next_comm=swapper/6 next_pid=0 next_prio=120
ktimerso-35      2d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/2 prev_pid=35 prev_prio=98 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
  rcuc/5-67      5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5 prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5 next_pid=68 next_prio=98
  rcuc/7-89      7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7 prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7 next_pid=90 next_prio=98
ktimerso-4       0d...211 1344661650us : sched_wakeup: comm=rcu_preempt pid=8 prio=98 target_cpu=000
  rcuc/4-56      4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4 prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4 next_pid=57 next_prio=98
ktimerso-4       0d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/0 prev_pid=4 prev_prio=98 prev_state=S ==> next_comm=rcu_preempt next_pid=8 next_prio=98
ktimerso-90      7d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/7 prev_pid=90 prev_prio=98 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120
ktimerso-68      5d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/5 prev_pid=68 prev_prio=98 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
rcu_pree-8       0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10 prio=120 target_cpu=000
ktimerso-57      4d...2.. 1344661652us : sched_switch: prev_comm=ktimersoftd/4 prev_pid=57 prev_prio=98 prev_state=S ==> next_comm=swapper/4 next_pid=0 next_prio=120
rcu_pree-8       0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344661743us : sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10 next_prio=120
 rcuop/0-10      0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0 prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344662053us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344662054us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662168us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344662168us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344662169us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662255us : sched_wakeup: comm=irq/48-eth2-tx- pid=6310 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1344662256us : sched_wakeup: comm=irq/47-eth2-rx- pid=6309 prio=49 target_cpu=000
kworker/-5       0d...2.. 1344662256us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=6310 next_prio=49
irq/48-e-6310    0d...2.. 1344662259us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=6310 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=6309 next_prio=49
irq/47-e-6309    0d...2.. 1344662260us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=6309 prev_prio=49 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662300us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344662300us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0.....11 1344662306us : tracing_mark_write: hit latency threshold (39 > 23)

Just before the long latency,  the irqs "48-eth2-tx" and "48-eth2-rx" are active.
When looking at the 4th line from the bottom, the time for irq/47 is 1344662260us, for the next line (kworker) it is 1344662300us.
Does this mean that the irq/47 took 40us for irq processing? Or is this a misinterpretation?

For more lines of the trace please see the attached trace-extract.gz.

Thanks for any feedback.

Regard

Mahias







[-- Attachment #2: trace-extract.gz --]
[-- Type: application/x-gzip, Size: 1118 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-06  7:01                           ` Koehrer Mathias
  0 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias @ 2016-10-06  7:01 UTC (permalink / raw)
  To: intel-wired-lan

Hi all,
> 
> Although, to be clear, it isn't the fact that there exists 8 threads, it's that the device is
> firing all 8 interrupts at the same time.  The time spent in hardirq context just waking
> up all 8 of those threads (and the cyclictest wakeup) is enough to cause your
> regression.
> 
> netdev/igb folks-
> 
> Under what conditions should it be expected that the i350 trigger all of the TxRx
> interrupts simultaneously?  Any ideas here?
> 
> See the start of this thread here:
> 
>   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56 at FE-
> MBX1012.de.bosch.com
> 
Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of queues.
I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-tx-0).
However the issue remains the same.

I ran the cyclictest again:
# cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
(Note: When using 105us instead of 100us the long latencies seem to occur more often).

Here are the final lines of the kernel trace output:
  <idle>-0       4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56 next_prio=98
ktimerso-46      3d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/3 prev_pid=46 prev_prio=98 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
ktimerso-24      1d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/1 prev_pid=24 prev_prio=98 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
ktimerso-79      6d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/6 prev_pid=79 prev_prio=98 prev_state=S ==> next_comm=swapper/6 next_pid=0 next_prio=120
ktimerso-35      2d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/2 prev_pid=35 prev_prio=98 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
  rcuc/5-67      5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5 prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5 next_pid=68 next_prio=98
  rcuc/7-89      7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7 prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7 next_pid=90 next_prio=98
ktimerso-4       0d...211 1344661650us : sched_wakeup: comm=rcu_preempt pid=8 prio=98 target_cpu=000
  rcuc/4-56      4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4 prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4 next_pid=57 next_prio=98
ktimerso-4       0d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/0 prev_pid=4 prev_prio=98 prev_state=S ==> next_comm=rcu_preempt next_pid=8 next_prio=98
ktimerso-90      7d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/7 prev_pid=90 prev_prio=98 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120
ktimerso-68      5d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/5 prev_pid=68 prev_prio=98 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
rcu_pree-8       0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10 prio=120 target_cpu=000
ktimerso-57      4d...2.. 1344661652us : sched_switch: prev_comm=ktimersoftd/4 prev_pid=57 prev_prio=98 prev_state=S ==> next_comm=swapper/4 next_pid=0 next_prio=120
rcu_pree-8       0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344661743us : sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10 next_prio=120
 rcuop/0-10      0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0 prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344662053us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344662054us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662168us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344662168us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0d...2.. 1344662169us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662255us : sched_wakeup: comm=irq/48-eth2-tx- pid=6310 prio=49 target_cpu=000
kworker/-5       0dN.h3.. 1344662256us : sched_wakeup: comm=irq/47-eth2-rx- pid=6309 prio=49 target_cpu=000
kworker/-5       0d...2.. 1344662256us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=6310 next_prio=49
irq/48-e-6310    0d...2.. 1344662259us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=6310 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=6309 next_prio=49
irq/47-e-6309    0d...2.. 1344662260us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=6309 prev_prio=49 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
kworker/-5       0dN.h2.. 1344662300us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
kworker/-5       0d...2.. 1344662300us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
cyclicte-6314    0.....11 1344662306us : tracing_mark_write: hit latency threshold (39 > 23)

Just before the long latency,  the irqs "48-eth2-tx" and "48-eth2-rx" are active.
When looking at the 4th line from the bottom, the time for irq/47 is 1344662260us, for the next line (kworker) it is 1344662300us.
Does this mean that the irq/47 took 40us for irq processing? Or is this a misinterpretation?

For more lines of the trace please see the attached trace-extract.gz.

Thanks for any feedback.

Regard

Mahias






-------------- next part --------------
A non-text attachment was scrubbed...
Name: trace-extract.gz
Type: application/x-gzip
Size: 1118 bytes
Desc: trace-extract.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161006/b31356b0/attachment.bin>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-06  7:01                           ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-06 10:12                             ` Henri Roosen
  -1 siblings, 0 replies; 53+ messages in thread
From: Henri Roosen @ 2016-10-06 10:12 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5), Julia Cartwright, Jeff Kirsher, Greg
  Cc: Sebastian Andrzej Siewior, linux-rt-users, intel-wired-lan, netdev

On 10/06/2016 09:01 AM, Koehrer Mathias (ETAS/ESW5) wrote:
> Hi all,

Hi Mathias,

>>
>> Although, to be clear, it isn't the fact that there exists 8 threads, it's that the device is
>> firing all 8 interrupts at the same time.  The time spent in hardirq context just waking
>> up all 8 of those threads (and the cyclictest wakeup) is enough to cause your
>> regression.
>>
>> netdev/igb folks-
>>
>> Under what conditions should it be expected that the i350 trigger all of the TxRx
>> interrupts simultaneously?  Any ideas here?
>>
>> See the start of this thread here:
>>
>>   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56@FE-
>> MBX1012.de.bosch.com
>>
> Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of queues.
> I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-tx-0).
> However the issue remains the same.
>
> I ran the cyclictest again:
> # cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
> (Note: When using 105us instead of 100us the long latencies seem to occur more often).
>
> Here are the final lines of the kernel trace output:
>   <idle>-0       4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56 next_prio=98
> ktimerso-46      3d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/3 prev_pid=46 prev_prio=98 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
> ktimerso-24      1d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/1 prev_pid=24 prev_prio=98 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
> ktimerso-79      6d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/6 prev_pid=79 prev_prio=98 prev_state=S ==> next_comm=swapper/6 next_pid=0 next_prio=120
> ktimerso-35      2d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/2 prev_pid=35 prev_prio=98 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
>   rcuc/5-67      5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5 prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5 next_pid=68 next_prio=98
>   rcuc/7-89      7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7 prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7 next_pid=90 next_prio=98
> ktimerso-4       0d...211 1344661650us : sched_wakeup: comm=rcu_preempt pid=8 prio=98 target_cpu=000
>   rcuc/4-56      4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4 prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4 next_pid=57 next_prio=98
> ktimerso-4       0d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/0 prev_pid=4 prev_prio=98 prev_state=S ==> next_comm=rcu_preempt next_pid=8 next_prio=98
> ktimerso-90      7d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/7 prev_pid=90 prev_prio=98 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120
> ktimerso-68      5d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/5 prev_pid=68 prev_prio=98 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
> rcu_pree-8       0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10 prio=120 target_cpu=000
> ktimerso-57      4d...2.. 1344661652us : sched_switch: prev_comm=ktimersoftd/4 prev_pid=57 prev_prio=98 prev_state=S ==> next_comm=swapper/4 next_pid=0 next_prio=120
> rcu_pree-8       0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661743us : sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10 next_prio=120
>  rcuop/0-10      0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0 prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662053us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662054us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662168us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662168us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662169us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662255us : sched_wakeup: comm=irq/48-eth2-tx- pid=6310 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1344662256us : sched_wakeup: comm=irq/47-eth2-rx- pid=6309 prio=49 target_cpu=000
> kworker/-5       0d...2.. 1344662256us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=6310 next_prio=49
> irq/48-e-6310    0d...2.. 1344662259us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=6310 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=6309 next_prio=49
> irq/47-e-6309    0d...2.. 1344662260us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=6309 prev_prio=49 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662300us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662300us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0.....11 1344662306us : tracing_mark_write: hit latency threshold (39 > 23)
>
> Just before the long latency,  the irqs "48-eth2-tx" and "48-eth2-rx" are active.
> When looking at the 4th line from the bottom, the time for irq/47 is 1344662260us, for the next line (kworker) it is 1344662300us.
> Does this mean that the irq/47 took 40us for irq processing? Or is this a misinterpretation?

That is a misinterpretation: at 1344662256us there is a sched_switch
event to pid 6310 (irq/48-eth2-tx-). It completes (prev_state=S) at
1344662259us when there is a sched_switch event to pid 6309
(irq/47-eth2-rx-). That one completes at 1344662260us when there is a
sched_switch event to pid 5. So the thread for irq/48 takes 3us and
irq/47 takes 1us.

If you'd blend in the irq events (irq_handler_entry, irq_handler_exit),
you'll see the timer-irq between 1344662260us and 1344662300us (likely
close to 1344662300us) waking up the cyclictest thread.

Regards,
     Henri

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-06 10:12                             ` Henri Roosen
  0 siblings, 0 replies; 53+ messages in thread
From: Henri Roosen @ 2016-10-06 10:12 UTC (permalink / raw)
  To: intel-wired-lan

On 10/06/2016 09:01 AM, Koehrer Mathias (ETAS/ESW5) wrote:
> Hi all,

Hi Mathias,

>>
>> Although, to be clear, it isn't the fact that there exists 8 threads, it's that the device is
>> firing all 8 interrupts at the same time.  The time spent in hardirq context just waking
>> up all 8 of those threads (and the cyclictest wakeup) is enough to cause your
>> regression.
>>
>> netdev/igb folks-
>>
>> Under what conditions should it be expected that the i350 trigger all of the TxRx
>> interrupts simultaneously?  Any ideas here?
>>
>> See the start of this thread here:
>>
>>   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56 at FE-
>> MBX1012.de.bosch.com
>>
> Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of queues.
> I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-tx-0).
> However the issue remains the same.
>
> I ran the cyclictest again:
> # cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
> (Note: When using 105us instead of 100us the long latencies seem to occur more often).
>
> Here are the final lines of the kernel trace output:
>   <idle>-0       4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56 next_prio=98
> ktimerso-46      3d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/3 prev_pid=46 prev_prio=98 prev_state=S ==> next_comm=swapper/3 next_pid=0 next_prio=120
> ktimerso-24      1d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/1 prev_pid=24 prev_prio=98 prev_state=S ==> next_comm=swapper/1 next_pid=0 next_prio=120
> ktimerso-79      6d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/6 prev_pid=79 prev_prio=98 prev_state=S ==> next_comm=swapper/6 next_pid=0 next_prio=120
> ktimerso-35      2d...2.. 1344661650us : sched_switch: prev_comm=ktimersoftd/2 prev_pid=35 prev_prio=98 prev_state=S ==> next_comm=swapper/2 next_pid=0 next_prio=120
>   rcuc/5-67      5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5 prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5 next_pid=68 next_prio=98
>   rcuc/7-89      7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7 prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7 next_pid=90 next_prio=98
> ktimerso-4       0d...211 1344661650us : sched_wakeup: comm=rcu_preempt pid=8 prio=98 target_cpu=000
>   rcuc/4-56      4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4 prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4 next_pid=57 next_prio=98
> ktimerso-4       0d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/0 prev_pid=4 prev_prio=98 prev_state=S ==> next_comm=rcu_preempt next_pid=8 next_prio=98
> ktimerso-90      7d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/7 prev_pid=90 prev_prio=98 prev_state=S ==> next_comm=swapper/7 next_pid=0 next_prio=120
> ktimerso-68      5d...2.. 1344661651us : sched_switch: prev_comm=ktimersoftd/5 prev_pid=68 prev_prio=98 prev_state=S ==> next_comm=swapper/5 next_pid=0 next_prio=120
> rcu_pree-8       0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10 prio=120 target_cpu=000
> ktimerso-57      4d...2.. 1344661652us : sched_switch: prev_comm=ktimersoftd/4 prev_pid=57 prev_prio=98 prev_state=S ==> next_comm=swapper/4 next_pid=0 next_prio=120
> rcu_pree-8       0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661743us : sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10 next_prio=120
>  rcuop/0-10      0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0 prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662053us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662054us!: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662168us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662168us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662169us+: sched_switch: prev_comm=cyclictest prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662255us : sched_wakeup: comm=irq/48-eth2-tx- pid=6310 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1344662256us : sched_wakeup: comm=irq/47-eth2-rx- pid=6309 prio=49 target_cpu=000
> kworker/-5       0d...2.. 1344662256us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=6310 next_prio=49
> irq/48-e-6310    0d...2.. 1344662259us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=6310 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=6309 next_prio=49
> irq/47-e-6309    0d...2.. 1344662260us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=6309 prev_prio=49 prev_state=S ==> next_comm=kworker/0:0 next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662300us : sched_wakeup: comm=cyclictest pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662300us : sched_switch: prev_comm=kworker/0:0 prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=6314 next_prio=19
> cyclicte-6314    0.....11 1344662306us : tracing_mark_write: hit latency threshold (39 > 23)
>
> Just before the long latency,  the irqs "48-eth2-tx" and "48-eth2-rx" are active.
> When looking at the 4th line from the bottom, the time for irq/47 is 1344662260us, for the next line (kworker) it is 1344662300us.
> Does this mean that the irq/47 took 40us for irq processing? Or is this a misinterpretation?

That is a misinterpretation: at 1344662256us there is a sched_switch
event to pid 6310 (irq/48-eth2-tx-). It completes (prev_state=S) at
1344662259us when there is a sched_switch event to pid 6309
(irq/47-eth2-rx-). That one completes at 1344662260us when there is a
sched_switch event to pid 5. So the thread for irq/48 takes 3us and
irq/47 takes 1us.

If you'd blend in the irq events (irq_handler_entry, irq_handler_exit),
you'll see the timer-irq between 1344662260us and 1344662300us (likely
close to 1344662300us) waking up the cyclictest thread.

Regards,
     Henri
--
>
> For more lines of the trace please see the attached trace-extract.gz.
>
> Thanks for any feedback.
>
> Regard
>
> Mahias
>
>
>
>
>
>

________________________________

Ginzinger electronic systems GmbH
Gewerbegebiet Pirath 16
4952 Weng im Innkreis
www.ginzinger.com

Firmenbuchnummer: FN 364958d
Firmenbuchgericht: Ried im Innkreis
UID-Nr.: ATU66521089


^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-06  7:01                           ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-06 17:58                             ` Williams, Mitch A
  -1 siblings, 0 replies; 53+ messages in thread
From: Williams, Mitch A @ 2016-10-06 17:58 UTC (permalink / raw)
  To: 'Koehrer Mathias (ETAS/ESW5)',
	Julia Cartwright, Kirsher, Jeffrey T, Greg
  Cc: netdev, intel-wired-lan, linux-rt-users, Sebastian Andrzej Siewior



> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces@lists.osuosl.org] On
> Behalf Of Koehrer Mathias (ETAS/ESW5)
> Sent: Thursday, October 06, 2016 12:02 AM
> To: Julia Cartwright <julia@ni.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; Greg <gvrose8192@gmail.com>
> Cc: netdev@vger.kernel.org; intel-wired-lan@lists.osuosl.org; linux-rt-
> users@vger.kernel.org; Sebastian Andrzej Siewior
> <sebastian.siewior@linutronix.de>
> Subject: Re: [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb
> causes huge latencies in cyclictest
> 
> Hi all,
> >
> > Although, to be clear, it isn't the fact that there exists 8 threads, it's
> that the device is
> > firing all 8 interrupts at the same time.  The time spent in hardirq
> context just waking
> > up all 8 of those threads (and the cyclictest wakeup) is enough to cause
> your
> > regression.
> >
> > netdev/igb folks-
> >
> > Under what conditions should it be expected that the i350 trigger all of
> the TxRx
> > interrupts simultaneously?  Any ideas here?

I can answer that! I wrote that code.

We trigger the interrupts once a second because MSI and MSI-X interrupts are NOT guaranteed to be delivered. If this happens, the queues being serviced by this "lost" interrupt are completely stuck.

The device automatically masks each interrupt vector after it fires, expecting the ISR to re-enable the vector after processing is complete. If the interrupt is lost, the ISR doesn't run, so the vector ends up permanently masked. At this point, any queues associated with that vector are stuck. The only recovery is through the netdev watchdog, which initiates a reset.

During development of igb, we had several platforms with chipsets that routinely dropped MSI messages under stress. Things would be running fine and then, pow, all the traffic on a queue would stop. 

So, I added code to fire each vector once per second. Just unmasking the interrupt isn't enough - we need to trigger the ISR to get the queues cleaned up so the device can work again.

Is this workaround still needed? I don't know. Modern chipsets don't break a sweat handling gigabit-speed traffic, and they *probably* don't drop interrupts. But I'd still rather have that insurance.

You could try to remove the write to the EICS registers in the watchdog task to see if that takes care of your problem. But I wouldn't want to remove that code permanently, because we have seen lost interrupts in the past.

You also could try staggering the writes so that not all vectors fire each second. But then you'll potentially incur a much longer delay if an interrupt does get lost, which means you could trigger netdev watchdog events.

-Mitch



> >
> > See the start of this thread here:
> >
> >   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56@FE-
> > MBX1012.de.bosch.com
> >
> Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of
> queues.
> I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-
> tx-0).
> However the issue remains the same.
> 
> I ran the cyclictest again:
> # cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
> (Note: When using 105us instead of 100us the long latencies seem to occur
> more often).
> 
> Here are the final lines of the kernel trace output:
>   <idle>-0       4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4
> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56
> next_prio=98
> ktimerso-46      3d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/3 prev_pid=46 prev_prio=98 prev_state=S ==>
> next_comm=swapper/3 next_pid=0 next_prio=120
> ktimerso-24      1d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/1 prev_pid=24 prev_prio=98 prev_state=S ==>
> next_comm=swapper/1 next_pid=0 next_prio=120
> ktimerso-79      6d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/6 prev_pid=79 prev_prio=98 prev_state=S ==>
> next_comm=swapper/6 next_pid=0 next_prio=120
> ktimerso-35      2d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/2 prev_pid=35 prev_prio=98 prev_state=S ==>
> next_comm=swapper/2 next_pid=0 next_prio=120
>   rcuc/5-67      5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5
> prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5
> next_pid=68 next_prio=98
>   rcuc/7-89      7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7
> prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7
> next_pid=90 next_prio=98
> ktimerso-4       0d...211 1344661650us : sched_wakeup: comm=rcu_preempt
> pid=8 prio=98 target_cpu=000
>   rcuc/4-56      4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4
> prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4
> next_pid=57 next_prio=98
> ktimerso-4       0d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/0 prev_pid=4 prev_prio=98 prev_state=S ==>
> next_comm=rcu_preempt next_pid=8 next_prio=98
> ktimerso-90      7d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/7 prev_pid=90 prev_prio=98 prev_state=S ==>
> next_comm=swapper/7 next_pid=0 next_prio=120
> ktimerso-68      5d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/5 prev_pid=68 prev_prio=98 prev_state=S ==>
> next_comm=swapper/5 next_pid=0 next_prio=120
> rcu_pree-8       0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10
> prio=120 target_cpu=000
> ktimerso-57      4d...2.. 1344661652us : sched_switch:
> prev_comm=ktimersoftd/4 prev_pid=57 prev_prio=98 prev_state=S ==>
> next_comm=swapper/4 next_pid=0 next_prio=120
> rcu_pree-8       0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt
> prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661743us : sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10
> next_prio=120
>  rcuop/0-10      0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0
> prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662053us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662054us!: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662168us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662168us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662169us+: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662255us : sched_wakeup: comm=irq/48-eth2-tx-
> pid=6310 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1344662256us : sched_wakeup: comm=irq/47-eth2-rx-
> pid=6309 prio=49 target_cpu=000
> kworker/-5       0d...2.. 1344662256us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx-
> next_pid=6310 next_prio=49
> irq/48-e-6310    0d...2.. 1344662259us : sched_switch: prev_comm=irq/48-
> eth2-tx- prev_pid=6310 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-
> rx- next_pid=6309 next_prio=49
> irq/47-e-6309    0d...2.. 1344662260us+: sched_switch: prev_comm=irq/47-
> eth2-rx- prev_pid=6309 prev_prio=49 prev_state=S ==> next_comm=kworker/0:0
> next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662300us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662300us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0.....11 1344662306us : tracing_mark_write: hit latency
> threshold (39 > 23)
> 
> Just before the long latency,  the irqs "48-eth2-tx" and "48-eth2-rx" are
> active.
> When looking at the 4th line from the bottom, the time for irq/47 is
> 1344662260us, for the next line (kworker) it is 1344662300us.
> Does this mean that the irq/47 took 40us for irq processing? Or is this a
> misinterpretation?
> 
> For more lines of the trace please see the attached trace-extract.gz.
> 
> Thanks for any feedback.
> 
> Regard
> 
> Mahias
> 
> 
> 
> 
> 

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-06 17:58                             ` Williams, Mitch A
  0 siblings, 0 replies; 53+ messages in thread
From: Williams, Mitch A @ 2016-10-06 17:58 UTC (permalink / raw)
  To: intel-wired-lan



> -----Original Message-----
> From: Intel-wired-lan [mailto:intel-wired-lan-bounces at lists.osuosl.org] On
> Behalf Of Koehrer Mathias (ETAS/ESW5)
> Sent: Thursday, October 06, 2016 12:02 AM
> To: Julia Cartwright <julia@ni.com>; Kirsher, Jeffrey T
> <jeffrey.t.kirsher@intel.com>; Greg <gvrose8192@gmail.com>
> Cc: netdev at vger.kernel.org; intel-wired-lan at lists.osuosl.org; linux-rt-
> users at vger.kernel.org; Sebastian Andrzej Siewior
> <sebastian.siewior@linutronix.de>
> Subject: Re: [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb
> causes huge latencies in cyclictest
> 
> Hi all,
> >
> > Although, to be clear, it isn't the fact that there exists 8 threads, it's
> that the device is
> > firing all 8 interrupts at the same time.  The time spent in hardirq
> context just waking
> > up all 8 of those threads (and the cyclictest wakeup) is enough to cause
> your
> > regression.
> >
> > netdev/igb folks-
> >
> > Under what conditions should it be expected that the i350 trigger all of
> the TxRx
> > interrupts simultaneously?  Any ideas here?

I can answer that! I wrote that code.

We trigger the interrupts once a second because MSI and MSI-X interrupts are NOT guaranteed to be delivered. If this happens, the queues being serviced by this "lost" interrupt are completely stuck.

The device automatically masks each interrupt vector after it fires, expecting the ISR to re-enable the vector after processing is complete. If the interrupt is lost, the ISR doesn't run, so the vector ends up permanently masked. At this point, any queues associated with that vector are stuck. The only recovery is through the netdev watchdog, which initiates a reset.

During development of igb, we had several platforms with chipsets that routinely dropped MSI messages under stress. Things would be running fine and then, pow, all the traffic on a queue would stop. 

So, I added code to fire each vector once per second. Just unmasking the interrupt isn't enough - we need to trigger the ISR to get the queues cleaned up so the device can work again.

Is this workaround still needed? I don't know. Modern chipsets don't break a sweat handling gigabit-speed traffic, and they *probably* don't drop interrupts. But I'd still rather have that insurance.

You could try to remove the write to the EICS registers in the watchdog task to see if that takes care of your problem. But I wouldn't want to remove that code permanently, because we have seen lost interrupts in the past.

You also could try staggering the writes so that not all vectors fire each second. But then you'll potentially incur a much longer delay if an interrupt does get lost, which means you could trigger netdev watchdog events.

-Mitch



> >
> > See the start of this thread here:
> >
> >   http://lkml.kernel.org/r/d648628329bc446fa63b5e19d4d3fb56 at FE-
> > MBX1012.de.bosch.com
> >
> Greg recommended to use "ethtool -L eth2 combined 1" to reduce the number of
> queues.
> I tried that. Now, I have actually only three irqs (eth2, eth2-rx-0, eth2-
> tx-0).
> However the issue remains the same.
> 
> I ran the cyclictest again:
> # cyclictest -a -i 105 -m -n -p 80 -t 1  -b 23 -C
> (Note: When using 105us instead of 100us the long latencies seem to occur
> more often).
> 
> Here are the final lines of the kernel trace output:
>   <idle>-0       4d...2.. 1344661649us : sched_switch: prev_comm=swapper/4
> prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=rcuc/4 next_pid=56
> next_prio=98
> ktimerso-46      3d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/3 prev_pid=46 prev_prio=98 prev_state=S ==>
> next_comm=swapper/3 next_pid=0 next_prio=120
> ktimerso-24      1d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/1 prev_pid=24 prev_prio=98 prev_state=S ==>
> next_comm=swapper/1 next_pid=0 next_prio=120
> ktimerso-79      6d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/6 prev_pid=79 prev_prio=98 prev_state=S ==>
> next_comm=swapper/6 next_pid=0 next_prio=120
> ktimerso-35      2d...2.. 1344661650us : sched_switch:
> prev_comm=ktimersoftd/2 prev_pid=35 prev_prio=98 prev_state=S ==>
> next_comm=swapper/2 next_pid=0 next_prio=120
>   rcuc/5-67      5d...2.. 1344661650us : sched_switch: prev_comm=rcuc/5
> prev_pid=67 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/5
> next_pid=68 next_prio=98
>   rcuc/7-89      7d...2.. 1344661650us : sched_switch: prev_comm=rcuc/7
> prev_pid=89 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/7
> next_pid=90 next_prio=98
> ktimerso-4       0d...211 1344661650us : sched_wakeup: comm=rcu_preempt
> pid=8 prio=98 target_cpu=000
>   rcuc/4-56      4d...2.. 1344661651us : sched_switch: prev_comm=rcuc/4
> prev_pid=56 prev_prio=98 prev_state=S ==> next_comm=ktimersoftd/4
> next_pid=57 next_prio=98
> ktimerso-4       0d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/0 prev_pid=4 prev_prio=98 prev_state=S ==>
> next_comm=rcu_preempt next_pid=8 next_prio=98
> ktimerso-90      7d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/7 prev_pid=90 prev_prio=98 prev_state=S ==>
> next_comm=swapper/7 next_pid=0 next_prio=120
> ktimerso-68      5d...2.. 1344661651us : sched_switch:
> prev_comm=ktimersoftd/5 prev_pid=68 prev_prio=98 prev_state=S ==>
> next_comm=swapper/5 next_pid=0 next_prio=120
> rcu_pree-8       0d...3.. 1344661652us : sched_wakeup: comm=rcuop/0 pid=10
> prio=120 target_cpu=000
> ktimerso-57      4d...2.. 1344661652us : sched_switch:
> prev_comm=ktimersoftd/4 prev_pid=57 prev_prio=98 prev_state=S ==>
> next_comm=swapper/4 next_pid=0 next_prio=120
> rcu_pree-8       0d...2.. 1344661653us+: sched_switch: prev_comm=rcu_preempt
> prev_pid=8 prev_prio=98 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661741us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661742us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661743us : sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=rcuop/0 next_pid=10
> next_prio=120
>  rcuop/0-10      0d...2.. 1344661744us!: sched_switch: prev_comm=rcuop/0
> prev_pid=10 prev_prio=120 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661858us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661859us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661860us!: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344661966us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344661966us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344661967us+: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662052us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662053us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662054us!: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662168us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662168us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0d...2.. 1344662169us+: sched_switch: prev_comm=cyclictest
> prev_pid=6314 prev_prio=19 prev_state=S ==> next_comm=kworker/0:0 next_pid=5
> next_prio=120
> kworker/-5       0dN.h2.. 1344662255us : sched_wakeup: comm=irq/48-eth2-tx-
> pid=6310 prio=49 target_cpu=000
> kworker/-5       0dN.h3.. 1344662256us : sched_wakeup: comm=irq/47-eth2-rx-
> pid=6309 prio=49 target_cpu=000
> kworker/-5       0d...2.. 1344662256us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx-
> next_pid=6310 next_prio=49
> irq/48-e-6310    0d...2.. 1344662259us : sched_switch: prev_comm=irq/48-
> eth2-tx- prev_pid=6310 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-
> rx- next_pid=6309 next_prio=49
> irq/47-e-6309    0d...2.. 1344662260us+: sched_switch: prev_comm=irq/47-
> eth2-rx- prev_pid=6309 prev_prio=49 prev_state=S ==> next_comm=kworker/0:0
> next_pid=5 next_prio=120
> kworker/-5       0dN.h2.. 1344662300us : sched_wakeup: comm=cyclictest
> pid=6314 prio=19 target_cpu=000
> kworker/-5       0d...2.. 1344662300us : sched_switch: prev_comm=kworker/0:0
> prev_pid=5 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=6314 next_prio=19
> cyclicte-6314    0.....11 1344662306us : tracing_mark_write: hit latency
> threshold (39 > 23)
> 
> Just before the long latency,  the irqs "48-eth2-tx" and "48-eth2-rx" are
> active.
> When looking at the 4th line from the bottom, the time for irq/47 is
> 1344662260us, for the next line (kworker) it is 1344662300us.
> Does this mean that the irq/47 took 40us for irq processing? Or is this a
> misinterpretation?
> 
> For more lines of the trace please see the attached trace-extract.gz.
> 
> Thanks for any feedback.
> 
> Regard
> 
> Mahias
> 
> 
> 
> 
> 


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-06 17:58                             ` [Intel-wired-lan] " Williams, Mitch A
@ 2016-10-07  8:58                               ` Koehrer Mathias
  -1 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-07  8:58 UTC (permalink / raw)
  To: Williams, Mitch A, Julia Cartwright, Kirsher, Jeffrey T, Greg
  Cc: netdev, intel-wired-lan, linux-rt-users, Sebastian Andrzej Siewior

Hi Mitch,
> > >
> > > Although, to be clear, it isn't the fact that there exists 8
> > > threads, it's
> > that the device is
> > > firing all 8 interrupts at the same time.  The time spent in hardirq
> > context just waking
> > > up all 8 of those threads (and the cyclictest wakeup) is enough to
> > > cause
> > your
> > > regression.
> > >
> > > netdev/igb folks-
> > >
> > > Under what conditions should it be expected that the i350 trigger
> > > all of
> > the TxRx
> > > interrupts simultaneously?  Any ideas here?
> 
> I can answer that! I wrote that code.
> 
> We trigger the interrupts once a second because MSI and MSI-X interrupts are NOT
> guaranteed to be delivered. If this happens, the queues being serviced by this "lost"
> interrupt are completely stuck.
> 
> The device automatically masks each interrupt vector after it fires, expecting the
> ISR to re-enable the vector after processing is complete. If the interrupt is lost, the
> ISR doesn't run, so the vector ends up permanently masked. At this point, any
> queues associated with that vector are stuck. The only recovery is through the
> netdev watchdog, which initiates a reset.
> 
> During development of igb, we had several platforms with chipsets that routinely
> dropped MSI messages under stress. Things would be running fine and then, pow,
> all the traffic on a queue would stop.
> 
> So, I added code to fire each vector once per second. Just unmasking the interrupt
> isn't enough - we need to trigger the ISR to get the queues cleaned up so the device
> can work again.
> 
> Is this workaround still needed? I don't know. Modern chipsets don't break a sweat
> handling gigabit-speed traffic, and they *probably* don't drop interrupts. But I'd still
> rather have that insurance.
> 
> You could try to remove the write to the EICS registers in the watchdog task to see if
> that takes care of your problem. But I wouldn't want to remove that code
> permanently, because we have seen lost interrupts in the past.
> 
> You also could try staggering the writes so that not all vectors fire each second. But
> then you'll potentially incur a much longer delay if an interrupt does get lost, which
> means you could trigger netdev watchdog events.
> 
> -Mitch

Thanks for the explanation. Now I have switched over to kernel 4.8-rt1.
This shows the very same effect.
I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment out
the access to the EICS registers:

--- igb_main.c.orig     2016-10-07 10:43:37.855873754 +0200
+++ igb_main.c  2016-10-07 10:31:51.451346292 +0200
@@ -4613,6 +4613,7 @@ no_wait:
        }

        /* Cause software interrupt to ensure Rx ring is cleaned */
+#if 0
        if (adapter->flags & IGB_FLAG_HAS_MSIX) {
                u32 eics = 0;

@@ -4622,6 +4623,7 @@ no_wait:
        } else {
                wr32(E1000_ICS, E1000_ICS_RXDMT0);
        }
+#endif

        igb_spoof_check(adapter);
        igb_ptp_rx_hang(adapter);


The result is now slighty better, however the max latency that has been measured by
cyclictest is still much higher compared to the values of kernel 3.18.27-rt27.
I have now enabled all events, the execution of 
# cyclictest -a -i 105 -m -n -p 80 -t 1  -b 30 -C
delivers the following trace values


cyclicte-10062   0....... 10025813402us : sys_exit: NR 230 = 0
cyclicte-10062   0....... 10025813402us : sys_enter: NR 230 (1, 1, 7ffff73ff930, 0, 0, 2bd7e12e)
cyclicte-10062   0....... 10025813402us : hrtimer_init: hrtimer=ffff88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS
cyclicte-10062   0d...1.. 10025813403us : hrtimer_start: hrtimer=ffff88040a017e18 function=hrtimer_wakeup expires=10024735653388 softexpires=10024735653388
cyclicte-10062   0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef
cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: Start context switch
cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: End context switch
cyclicte-10062   0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120
cyclicte-10062   0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: ffff88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 xcomp_bv: 0
kworker/-1388    0d..h... 10025813468us : irq_handler_entry: irq=48 name=eth2-tx-0
kworker/-1388    0d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled
kworker/-1388    0d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
kworker/-1388    0dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
kworker/-1388    0dN.h1.. 10025813470us : irq_handler_entry: irq=47 name=eth2-rx-0
kworker/-1388    0dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled
kworker/-1388    0dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000
kworker/-1388    0dN.h3.. 10025813471us : sched_wakeup: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000
kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: Start context switch
kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: End context switch
kworker/-1388    0dN..2.. 10025813472us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=67566 [ns] vruntime=101216288332 [ns]
kworker/-1388    0d...2.. 10025813472us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=10057 next_prio=49
irq/48-e-10057   0d....11 10025813473us : softirq_raise: vec=3 [action=NET_RX]
irq/48-e-10057   0.....12 10025813474us : softirq_entry: vec=3 [action=NET_RX]
irq/48-e-10057   0.....12 10025813475us : napi_poll: napi poll on napi struct ffff88040a582850 for device eth2 work 0 budget 64
irq/48-e-10057   0.....12 10025813475us : softirq_exit: vec=3 [action=NET_RX]
irq/48-e-10057   0d...1.. 10025813475us : rcu_utilization: Start context switch
irq/48-e-10057   0d...1.. 10025813476us : rcu_utilization: End context switch
irq/48-e-10057   0d...2.. 10025813476us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=10057 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=10056 next_prio=49
irq/47-e-10056   0d....11 10025813477us : softirq_raise: vec=3 [action=NET_RX]
irq/47-e-10056   0.....12 10025813477us : softirq_entry: vec=3 [action=NET_RX]
irq/47-e-10056   0.....12 10025813478us : napi_poll: napi poll on napi struct ffff88040a580850 for device eth2 work 0 budget 64
irq/47-e-10056   0.....12 10025813478us : softirq_exit: vec=3 [action=NET_RX]
irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: Start context switch
irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: End context switch
irq/47-e-10056   0d...2.. 10025813479us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=10056 prev_prio=49 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120
kworker/-1388    0d..h... 10025813516us : local_timer_entry: vector=239
kworker/-1388    0d..h1.. 10025813516us : hrtimer_interrupt: cpu=0 offset=-28999 curr=kworker/0:3[120] thread=cyclictest[19]
kworker/-1388    0d..h1.. 10025813517us : hrtimer_cancel: hrtimer=ffff88040a017e18
kworker/-1388    0d..h... 10025813517us : hrtimer_expire_entry: hrtimer=ffff88040a017e18 function=hrtimer_wakeup now=10024735682387
kworker/-1388    0d..h1.. 10025813517us : sched_waking: comm=cyclictest pid=10062 prio=19 target_cpu=000
kworker/-1388    0dN.h2.. 10025813517us : sched_wakeup: comm=cyclictest pid=10062 prio=19 target_cpu=000
kworker/-1388    0dN.h... 10025813517us : hrtimer_expire_exit: hrtimer=ffff88040a017e18
kworker/-1388    0dN.h1.. 10025813518us : hrtimer_interrupt: cpu=0 offset=317613 curr=kworker/0:3[120] thread=<none>[-1]
kworker/-1388    0dN.h... 10025813518us : write_msr: 6e0, value 20ca631e99f8
kworker/-1388    0dN.h... 10025813518us : local_timer_exit: vector=239
kworker/-1388    0dN..1.. 10025813518us : rcu_utilization: Start context switch
kworker/-1388    0dN..1.. 10025813518us : rcu_utilization: End context switch
kworker/-1388    0dN..2.. 10025813518us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=38173 [ns] vruntime=101216326505 [ns]
kworker/-1388    0d...2.. 10025813519us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=10062 next_prio=19
kworker/-1388    0d...2.. 10025813519us : x86_fpu_regs_activated: x86/fpu: ffff88040c603ec0 fpregs_active: 1 fpstate_active: 1 counter: 70 xfeatures: 2 xcomp_bv: 0
kworker/-1388    0d...2.. 10025813519us : write_msr: c0000100, value 7ffff7400700
cyclicte-10062   0....... 10025813519us : sys_exit: NR 230 = 0
cyclicte-10062   0....... 10025813524us : sys_enter: NR 1 (5, 7ffff7400300, 1f, 7ffff77a5460, 2, 7ffff744c99a)
cyclicte-10062   0.....11 10025813526us : tracing_mark_write: hit latency threshold (32 > 30)
cyclicte-10062   0....... 10025813526us : sys_exit: NR 1 = 31
cyclicte-10062   0....... 10025813527us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460, 2, 7ffff744c99a)

Also, I notice that there is still roughly one irq every 2 seconds on irq/47 (eth2-rx-0) and irq/48 (eth2-tx-0).

Any idea or feedback on this?

Thanks

Mathias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-07  8:58                               ` Koehrer Mathias
  0 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias @ 2016-10-07  8:58 UTC (permalink / raw)
  To: intel-wired-lan

Hi Mitch,
> > >
> > > Although, to be clear, it isn't the fact that there exists 8
> > > threads, it's
> > that the device is
> > > firing all 8 interrupts at the same time.  The time spent in hardirq
> > context just waking
> > > up all 8 of those threads (and the cyclictest wakeup) is enough to
> > > cause
> > your
> > > regression.
> > >
> > > netdev/igb folks-
> > >
> > > Under what conditions should it be expected that the i350 trigger
> > > all of
> > the TxRx
> > > interrupts simultaneously?  Any ideas here?
> 
> I can answer that! I wrote that code.
> 
> We trigger the interrupts once a second because MSI and MSI-X interrupts are NOT
> guaranteed to be delivered. If this happens, the queues being serviced by this "lost"
> interrupt are completely stuck.
> 
> The device automatically masks each interrupt vector after it fires, expecting the
> ISR to re-enable the vector after processing is complete. If the interrupt is lost, the
> ISR doesn't run, so the vector ends up permanently masked. At this point, any
> queues associated with that vector are stuck. The only recovery is through the
> netdev watchdog, which initiates a reset.
> 
> During development of igb, we had several platforms with chipsets that routinely
> dropped MSI messages under stress. Things would be running fine and then, pow,
> all the traffic on a queue would stop.
> 
> So, I added code to fire each vector once per second. Just unmasking the interrupt
> isn't enough - we need to trigger the ISR to get the queues cleaned up so the device
> can work again.
> 
> Is this workaround still needed? I don't know. Modern chipsets don't break a sweat
> handling gigabit-speed traffic, and they *probably* don't drop interrupts. But I'd still
> rather have that insurance.
> 
> You could try to remove the write to the EICS registers in the watchdog task to see if
> that takes care of your problem. But I wouldn't want to remove that code
> permanently, because we have seen lost interrupts in the past.
> 
> You also could try staggering the writes so that not all vectors fire each second. But
> then you'll potentially incur a much longer delay if an interrupt does get lost, which
> means you could trigger netdev watchdog events.
> 
> -Mitch

Thanks for the explanation. Now I have switched over to kernel 4.8-rt1.
This shows the very same effect.
I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment out
the access to the EICS registers:

--- igb_main.c.orig     2016-10-07 10:43:37.855873754 +0200
+++ igb_main.c  2016-10-07 10:31:51.451346292 +0200
@@ -4613,6 +4613,7 @@ no_wait:
        }

        /* Cause software interrupt to ensure Rx ring is cleaned */
+#if 0
        if (adapter->flags & IGB_FLAG_HAS_MSIX) {
                u32 eics = 0;

@@ -4622,6 +4623,7 @@ no_wait:
        } else {
                wr32(E1000_ICS, E1000_ICS_RXDMT0);
        }
+#endif

        igb_spoof_check(adapter);
        igb_ptp_rx_hang(adapter);


The result is now slighty better, however the max latency that has been measured by
cyclictest is still much higher compared to the values of kernel 3.18.27-rt27.
I have now enabled all events, the execution of 
# cyclictest -a -i 105 -m -n -p 80 -t 1  -b 30 -C
delivers the following trace values


cyclicte-10062   0....... 10025813402us : sys_exit: NR 230 = 0
cyclicte-10062   0....... 10025813402us : sys_enter: NR 230 (1, 1, 7ffff73ff930, 0, 0, 2bd7e12e)
cyclicte-10062   0....... 10025813402us : hrtimer_init: hrtimer=ffff88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS
cyclicte-10062   0d...1.. 10025813403us : hrtimer_start: hrtimer=ffff88040a017e18 function=hrtimer_wakeup expires=10024735653388 softexpires=10024735653388
cyclicte-10062   0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef
cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: Start context switch
cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: End context switch
cyclicte-10062   0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120
cyclicte-10062   0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: ffff88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 xcomp_bv: 0
kworker/-1388    0d..h... 10025813468us : irq_handler_entry: irq=48 name=eth2-tx-0
kworker/-1388    0d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled
kworker/-1388    0d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
kworker/-1388    0dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
kworker/-1388    0dN.h1.. 10025813470us : irq_handler_entry: irq=47 name=eth2-rx-0
kworker/-1388    0dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled
kworker/-1388    0dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000
kworker/-1388    0dN.h3.. 10025813471us : sched_wakeup: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000
kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: Start context switch
kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: End context switch
kworker/-1388    0dN..2.. 10025813472us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=67566 [ns] vruntime=101216288332 [ns]
kworker/-1388    0d...2.. 10025813472us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=10057 next_prio=49
irq/48-e-10057   0d....11 10025813473us : softirq_raise: vec=3 [action=NET_RX]
irq/48-e-10057   0.....12 10025813474us : softirq_entry: vec=3 [action=NET_RX]
irq/48-e-10057   0.....12 10025813475us : napi_poll: napi poll on napi struct ffff88040a582850 for device eth2 work 0 budget 64
irq/48-e-10057   0.....12 10025813475us : softirq_exit: vec=3 [action=NET_RX]
irq/48-e-10057   0d...1.. 10025813475us : rcu_utilization: Start context switch
irq/48-e-10057   0d...1.. 10025813476us : rcu_utilization: End context switch
irq/48-e-10057   0d...2.. 10025813476us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=10057 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=10056 next_prio=49
irq/47-e-10056   0d....11 10025813477us : softirq_raise: vec=3 [action=NET_RX]
irq/47-e-10056   0.....12 10025813477us : softirq_entry: vec=3 [action=NET_RX]
irq/47-e-10056   0.....12 10025813478us : napi_poll: napi poll on napi struct ffff88040a580850 for device eth2 work 0 budget 64
irq/47-e-10056   0.....12 10025813478us : softirq_exit: vec=3 [action=NET_RX]
irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: Start context switch
irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: End context switch
irq/47-e-10056   0d...2.. 10025813479us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=10056 prev_prio=49 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120
kworker/-1388    0d..h... 10025813516us : local_timer_entry: vector=239
kworker/-1388    0d..h1.. 10025813516us : hrtimer_interrupt: cpu=0 offset=-28999 curr=kworker/0:3[120] thread=cyclictest[19]
kworker/-1388    0d..h1.. 10025813517us : hrtimer_cancel: hrtimer=ffff88040a017e18
kworker/-1388    0d..h... 10025813517us : hrtimer_expire_entry: hrtimer=ffff88040a017e18 function=hrtimer_wakeup now=10024735682387
kworker/-1388    0d..h1.. 10025813517us : sched_waking: comm=cyclictest pid=10062 prio=19 target_cpu=000
kworker/-1388    0dN.h2.. 10025813517us : sched_wakeup: comm=cyclictest pid=10062 prio=19 target_cpu=000
kworker/-1388    0dN.h... 10025813517us : hrtimer_expire_exit: hrtimer=ffff88040a017e18
kworker/-1388    0dN.h1.. 10025813518us : hrtimer_interrupt: cpu=0 offset=317613 curr=kworker/0:3[120] thread=<none>[-1]
kworker/-1388    0dN.h... 10025813518us : write_msr: 6e0, value 20ca631e99f8
kworker/-1388    0dN.h... 10025813518us : local_timer_exit: vector=239
kworker/-1388    0dN..1.. 10025813518us : rcu_utilization: Start context switch
kworker/-1388    0dN..1.. 10025813518us : rcu_utilization: End context switch
kworker/-1388    0dN..2.. 10025813518us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=38173 [ns] vruntime=101216326505 [ns]
kworker/-1388    0d...2.. 10025813519us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=10062 next_prio=19
kworker/-1388    0d...2.. 10025813519us : x86_fpu_regs_activated: x86/fpu: ffff88040c603ec0 fpregs_active: 1 fpstate_active: 1 counter: 70 xfeatures: 2 xcomp_bv: 0
kworker/-1388    0d...2.. 10025813519us : write_msr: c0000100, value 7ffff7400700
cyclicte-10062   0....... 10025813519us : sys_exit: NR 230 = 0
cyclicte-10062   0....... 10025813524us : sys_enter: NR 1 (5, 7ffff7400300, 1f, 7ffff77a5460, 2, 7ffff744c99a)
cyclicte-10062   0.....11 10025813526us : tracing_mark_write: hit latency threshold (32 > 30)
cyclicte-10062   0....... 10025813526us : sys_exit: NR 1 = 31
cyclicte-10062   0....... 10025813527us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460, 2, 7ffff744c99a)

Also, I notice that there is still roughly one irq every 2 seconds on irq/47 (eth2-rx-0) and irq/48 (eth2-tx-0).

Any idea or feedback on this?

Thanks

Mathias


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-07  8:58                               ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-10 19:39                                 ` Julia Cartwright
  -1 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-10 19:39 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5)
  Cc: Williams, Mitch A, Kirsher, Jeffrey T, Greg, netdev,
	intel-wired-lan, linux-rt-users, Sebastian Andrzej Siewior

Hello Mathias-

On Fri, Oct 07, 2016 at 08:58:08AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
[..]
> I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment out
> the access to the EICS registers:
> 
> --- igb_main.c.orig     2016-10-07 10:43:37.855873754 +0200
> +++ igb_main.c  2016-10-07 10:31:51.451346292 +0200
> @@ -4613,6 +4613,7 @@ no_wait:
>         }
> 
>         /* Cause software interrupt to ensure Rx ring is cleaned */
> +#if 0
>         if (adapter->flags & IGB_FLAG_HAS_MSIX) {
>                 u32 eics = 0;
> 
> @@ -4622,6 +4623,7 @@ no_wait:
>         } else {
>                 wr32(E1000_ICS, E1000_ICS_RXDMT0);
>         }
> +#endif
> 
>         igb_spoof_check(adapter);
>         igb_ptp_rx_hang(adapter);
> 
> 
> The result is now slighty better, however the max latency that has been measured by
> cyclictest is still much higher compared to the values of kernel 3.18.27-rt27.
> I have now enabled all events, the execution of 
> # cyclictest -a -i 105 -m -n -p 80 -t 1  -b 30 -C
> delivers the following trace values

There is something still fishy with these traces...

> cyclicte-10062   0....... 10025813402us : sys_exit: NR 230 = 0
> cyclicte-10062   0....... 10025813402us : sys_enter: NR 230 (1, 1, 7ffff73ff930, 0, 0, 2bd7e12e)
> cyclicte-10062   0....... 10025813402us : hrtimer_init: hrtimer=ffff88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS
> cyclicte-10062   0d...1.. 10025813403us : hrtimer_start: hrtimer=ffff88040a017e18 function=hrtimer_wakeup expires=10024735653388 softexpires=10024735653388

cyclictest thread sleeps waiting for wakeup at 10024735653388.

> cyclicte-10062   0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef
> cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: Start context switch
> cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: End context switch
> cyclicte-10062   0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120
> cyclicte-10062   0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: ffff88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 xcomp_bv: 0
> kworker/-1388    0d..h... 10025813468us : irq_handler_entry: irq=48 name=eth2-tx-0
> kworker/-1388    0d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled
> kworker/-1388    0d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
> kworker/-1388    0dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
> kworker/-1388    0dN.h1.. 10025813470us : irq_handler_entry: irq=47 name=eth2-rx-0
> kworker/-1388    0dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled
> kworker/-1388    0dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000
> kworker/-1388    0dN.h3.. 10025813471us : sched_wakeup: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000

So, kworker/0:3 was busy doing something, and we received both tx and rx
interrupts from eth2, waking up the relevant threads.

> kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: Start context switch
> kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: End context switch
> kworker/-1388    0dN..2.. 10025813472us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=67566 [ns] vruntime=101216288332 [ns]
> kworker/-1388    0d...2.. 10025813472us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=10057 next_prio=49
> irq/48-e-10057   0d....11 10025813473us : softirq_raise: vec=3 [action=NET_RX]
> irq/48-e-10057   0.....12 10025813474us : softirq_entry: vec=3 [action=NET_RX]
> irq/48-e-10057   0.....12 10025813475us : napi_poll: napi poll on napi struct ffff88040a582850 for device eth2 work 0 budget 64

Hmm, the irq/48-eth2-tx- thread is raising NET_RX?  That seems...wrong.

> irq/48-e-10057   0.....12 10025813475us : softirq_exit: vec=3 [action=NET_RX]
> irq/48-e-10057   0d...1.. 10025813475us : rcu_utilization: Start context switch
> irq/48-e-10057   0d...1.. 10025813476us : rcu_utilization: End context switch
> irq/48-e-10057   0d...2.. 10025813476us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=10057 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=10056 next_prio=49
> irq/47-e-10056   0d....11 10025813477us : softirq_raise: vec=3 [action=NET_RX]
> irq/47-e-10056   0.....12 10025813477us : softirq_entry: vec=3 [action=NET_RX]
> irq/47-e-10056   0.....12 10025813478us : napi_poll: napi poll on napi struct ffff88040a580850 for device eth2 work 0 budget 64
> irq/47-e-10056   0.....12 10025813478us : softirq_exit: vec=3 [action=NET_RX]

Oh, then we do the same thing again?  Hrmph...why?

> irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: Start context switch
> irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: End context switch
> irq/47-e-10056   0d...2.. 10025813479us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=10056 prev_prio=49 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120

Done handling the two interrupts.  Back to whatever kworker/0:3 was up
to...

> kworker/-1388    0d..h... 10025813516us : local_timer_entry: vector=239
> kworker/-1388    0d..h1.. 10025813516us : hrtimer_interrupt: cpu=0 offset=-28999 curr=kworker/0:3[120] thread=cyclictest[19]
> kworker/-1388    0d..h1.. 10025813517us : hrtimer_cancel: hrtimer=ffff88040a017e18
> kworker/-1388    0d..h... 10025813517us : hrtimer_expire_entry: hrtimer=ffff88040a017e18 function=hrtimer_wakeup now=10024735682387

Okay, we finally received our wakeup event.  We were expecting to be
woken up at 10024735653388ns, but were actually woken up at 10024735682387ns.

  10024735682387 - 10024735653388 = 28999ns

Our timer fired ~29us late!  But why...?

Sorry I don't have answers, just more questions.  I do wonder what
kworker/0:3 was up to at the time the timer interrupt should have fired.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-10 19:39                                 ` Julia Cartwright
  0 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-10 19:39 UTC (permalink / raw)
  To: intel-wired-lan

Hello Mathias-

On Fri, Oct 07, 2016 at 08:58:08AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
[..]
> I modified the in-kernel's igb_main.c (function igb_watchdog_task) to comment out
> the access to the EICS registers:
> 
> --- igb_main.c.orig     2016-10-07 10:43:37.855873754 +0200
> +++ igb_main.c  2016-10-07 10:31:51.451346292 +0200
> @@ -4613,6 +4613,7 @@ no_wait:
>         }
> 
>         /* Cause software interrupt to ensure Rx ring is cleaned */
> +#if 0
>         if (adapter->flags & IGB_FLAG_HAS_MSIX) {
>                 u32 eics = 0;
> 
> @@ -4622,6 +4623,7 @@ no_wait:
>         } else {
>                 wr32(E1000_ICS, E1000_ICS_RXDMT0);
>         }
> +#endif
> 
>         igb_spoof_check(adapter);
>         igb_ptp_rx_hang(adapter);
> 
> 
> The result is now slighty better, however the max latency that has been measured by
> cyclictest is still much higher compared to the values of kernel 3.18.27-rt27.
> I have now enabled all events, the execution of 
> # cyclictest -a -i 105 -m -n -p 80 -t 1  -b 30 -C
> delivers the following trace values

There is something still fishy with these traces...

> cyclicte-10062   0....... 10025813402us : sys_exit: NR 230 = 0
> cyclicte-10062   0....... 10025813402us : sys_enter: NR 230 (1, 1, 7ffff73ff930, 0, 0, 2bd7e12e)
> cyclicte-10062   0....... 10025813402us : hrtimer_init: hrtimer=ffff88040a017e18 clockid=CLOCK_MONOTONIC mode=HRTIMER_MODE_ABS
> cyclicte-10062   0d...1.. 10025813403us : hrtimer_start: hrtimer=ffff88040a017e18 function=hrtimer_wakeup expires=10024735653388 softexpires=10024735653388

cyclictest thread sleeps waiting for wakeup at 10024735653388.

> cyclicte-10062   0d...1.. 10025813403us : write_msr: 6e0, value 20ca630b9aef
> cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: Start context switch
> cyclicte-10062   0d...1.. 10025813403us : rcu_utilization: End context switch
> cyclicte-10062   0d...2.. 10025813404us : sched_switch: prev_comm=cyclictest prev_pid=10062 prev_prio=19 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120
> cyclicte-10062   0d...2.. 10025813404us+: x86_fpu_regs_deactivated: x86/fpu: ffff88040c603ec0 fpregs_active: 0 fpstate_active: 1 counter: 69 xfeatures: 2 xcomp_bv: 0
> kworker/-1388    0d..h... 10025813468us : irq_handler_entry: irq=48 name=eth2-tx-0
> kworker/-1388    0d..h... 10025813468us : irq_handler_exit: irq=48 ret=handled
> kworker/-1388    0d..h1.. 10025813469us : sched_waking: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
> kworker/-1388    0dN.h2.. 10025813469us : sched_wakeup: comm=irq/48-eth2-tx- pid=10057 prio=49 target_cpu=000
> kworker/-1388    0dN.h1.. 10025813470us : irq_handler_entry: irq=47 name=eth2-rx-0
> kworker/-1388    0dN.h1.. 10025813470us : irq_handler_exit: irq=47 ret=handled
> kworker/-1388    0dN.h2.. 10025813471us : sched_waking: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000
> kworker/-1388    0dN.h3.. 10025813471us : sched_wakeup: comm=irq/47-eth2-rx- pid=10056 prio=49 target_cpu=000

So, kworker/0:3 was busy doing something, and we received both tx and rx
interrupts from eth2, waking up the relevant threads.

> kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: Start context switch
> kworker/-1388    0dN..1.. 10025813472us : rcu_utilization: End context switch
> kworker/-1388    0dN..2.. 10025813472us : sched_stat_runtime: comm=kworker/0:3 pid=1388 runtime=67566 [ns] vruntime=101216288332 [ns]
> kworker/-1388    0d...2.. 10025813472us : sched_switch: prev_comm=kworker/0:3 prev_pid=1388 prev_prio=120 prev_state=R+ ==> next_comm=irq/48-eth2-tx- next_pid=10057 next_prio=49
> irq/48-e-10057   0d....11 10025813473us : softirq_raise: vec=3 [action=NET_RX]
> irq/48-e-10057   0.....12 10025813474us : softirq_entry: vec=3 [action=NET_RX]
> irq/48-e-10057   0.....12 10025813475us : napi_poll: napi poll on napi struct ffff88040a582850 for device eth2 work 0 budget 64

Hmm, the irq/48-eth2-tx- thread is raising NET_RX?  That seems...wrong.

> irq/48-e-10057   0.....12 10025813475us : softirq_exit: vec=3 [action=NET_RX]
> irq/48-e-10057   0d...1.. 10025813475us : rcu_utilization: Start context switch
> irq/48-e-10057   0d...1.. 10025813476us : rcu_utilization: End context switch
> irq/48-e-10057   0d...2.. 10025813476us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=10057 prev_prio=49 prev_state=S ==> next_comm=irq/47-eth2-rx- next_pid=10056 next_prio=49
> irq/47-e-10056   0d....11 10025813477us : softirq_raise: vec=3 [action=NET_RX]
> irq/47-e-10056   0.....12 10025813477us : softirq_entry: vec=3 [action=NET_RX]
> irq/47-e-10056   0.....12 10025813478us : napi_poll: napi poll on napi struct ffff88040a580850 for device eth2 work 0 budget 64
> irq/47-e-10056   0.....12 10025813478us : softirq_exit: vec=3 [action=NET_RX]

Oh, then we do the same thing again?  Hrmph...why?

> irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: Start context switch
> irq/47-e-10056   0d...1.. 10025813479us : rcu_utilization: End context switch
> irq/47-e-10056   0d...2.. 10025813479us+: sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=10056 prev_prio=49 prev_state=S ==> next_comm=kworker/0:3 next_pid=1388 next_prio=120

Done handling the two interrupts.  Back to whatever kworker/0:3 was up
to...

> kworker/-1388    0d..h... 10025813516us : local_timer_entry: vector=239
> kworker/-1388    0d..h1.. 10025813516us : hrtimer_interrupt: cpu=0 offset=-28999 curr=kworker/0:3[120] thread=cyclictest[19]
> kworker/-1388    0d..h1.. 10025813517us : hrtimer_cancel: hrtimer=ffff88040a017e18
> kworker/-1388    0d..h... 10025813517us : hrtimer_expire_entry: hrtimer=ffff88040a017e18 function=hrtimer_wakeup now=10024735682387

Okay, we finally received our wakeup event.  We were expecting to be
woken up at 10024735653388ns, but were actually woken up at 10024735682387ns.

  10024735682387 - 10024735653388 = 28999ns

Our timer fired ~29us late!  But why...?

Sorry I don't have answers, just more questions.  I do wonder what
kworker/0:3 was up to at the time the timer interrupt should have fired.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-10 19:39                                 ` [Intel-wired-lan] " Julia Cartwright
@ 2016-10-13  6:15                                   ` Koehrer Mathias
  -1 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-13  6:15 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Williams, Mitch A, Kirsher, Jeffrey T, Greg, netdev,
	intel-wired-lan, linux-rt-users, Sebastian Andrzej Siewior

Hi Julia,

thanks for the detailed analysis!
> 
> [...] 
> Okay, we finally received our wakeup event.  We were expecting to be woken up at
> 10024735653388ns, but were actually woken up at 10024735682387ns.
> 
>   10024735682387 - 10024735653388 = 28999ns
> 
> Our timer fired ~29us late!  But why...?
> 
> Sorry I don't have answers, just more questions.  I do wonder what
> kworker/0:3 was up to at the time the timer interrupt should have fired.
> 
>    Julia
I have now instrumented the igb driver to generate trace points that allows me to identify in 
more details what's going on.
This delivered interesting results!

Here are some places where I added traces:
In file igb_ptp.c:
void igb_ptp_rx_hang(struct igb_adapter *adapter)
{
        struct e1000_hw *hw = &adapter->hw;
        unsigned long rx_event;
        u32 tsyncrxctl;
        trace_igb(700);
        tsyncrxctl = rd32(E1000_TSYNCRXCTL);
        trace_igb(701);

        /* Other hardware uses per-packet timestamps */
        if (hw->mac.type != e1000_82576)
                return;
...

In file igb_main.c:
static void igb_check_lvmmc(struct igb_adapter *adapter)
{
        struct e1000_hw *hw = &adapter->hw;
        u32 lvmmc;

        trace_igb(600);
        lvmmc = rd32(E1000_LVMMC);
        trace_igb(601);
        if (lvmmc) {
...

When I run now my test, I get the following trace:
[...]
kworker/-607     0....... 107315621us+: igb: val: 700
kworker/-607     0d..h... 107315640us : irq_handler_entry: irq=47 name=eth2-rx-0
kworker/-607     0d..h... 107315640us : irq_handler_exit: irq=47 ret=handled
kworker/-607     0d..h1.. 107315640us : sched_waking: comm=irq/47-eth2-rx- pid=18009 prio=49 target_cpu=000
kworker/-607     0dN.h2.. 107315641us : sched_wakeup: comm=irq/47-eth2-rx- pid=18009 prio=49 target_cpu=000
kworker/-607     0dN.h1.. 107315643us : irq_handler_entry: irq=48 name=eth2-tx-0
kworker/-607     0dN.h1.. 107315643us : irq_handler_exit: irq=48 ret=handled
kworker/-607     0dN.h2.. 107315643us : sched_waking: comm=irq/48-eth2-tx- pid=18010 prio=49 target_cpu=000
kworker/-607     0dN.h3.. 107315644us : sched_wakeup: comm=irq/48-eth2-tx- pid=18010 prio=49 target_cpu=000
kworker/-607     0dN..1.. 107315644us : rcu_utilization: Start context switch
kworker/-607     0dN..1.. 107315644us : rcu_utilization: End context switch
kworker/-607     0dN..2.. 107315644us : sched_stat_runtime: comm=kworker/0:1 pid=607 runtime=88996 [ns] vruntime=49754678074 [ns]
kworker/-607     0d...2.. 107315645us : sched_switch: prev_comm=kworker/0:1 prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=irq/47-eth2-rx- next_pid=18009 next_prio=49
irq/47-e-18009   0d....11 107315646us : softirq_raise: vec=3 [action=NET_RX]
irq/47-e-18009   0.....12 107315646us : softirq_entry: vec=3 [action=NET_RX]
irq/47-e-18009   0.....12 107315647us : napi_poll: napi poll on napi struct ffff88040ae58c50 for device eth2 work 0 budget 64
irq/47-e-18009   0.....12 107315647us : softirq_exit: vec=3 [action=NET_RX]
irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: Start context switch
irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: End context switch
irq/47-e-18009   0d...2.. 107315648us : sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=18009 prev_prio=49 prev_state=S ==> next_comm=irq/48-eth2-tx- next_pid=18010 next_prio=49
irq/48-e-18010   0d....11 107315649us : softirq_raise: vec=3 [action=NET_RX]
irq/48-e-18010   0.....12 107315649us : softirq_entry: vec=3 [action=NET_RX]
irq/48-e-18010   0.....12 107315650us : napi_poll: napi poll on napi struct ffff88040ae5f450 for device eth2 work 0 budget 64
irq/48-e-18010   0.....12 107315650us : softirq_exit: vec=3 [action=NET_RX]
irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: Start context switch
irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: End context switch
irq/48-e-18010   0d...2.. 107315651us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=18010 prev_prio=49 prev_state=S ==> next_comm=kworker/0:1 next_pid=607 next_prio=120
kworker/-607     0....... 107315652us : igb: val: 701
kworker/-607     0....... 107315652us : igb: val: 106
kworker/-607     0....... 107315652us : igb: val: 107
kworker/-607     0....... 107315652us+: igb: val: 600
kworker/-607     0d..h... 107315689us : local_timer_entry: vector=239
kworker/-607     0d..h1.. 107315689us : hrtimer_interrupt: cpu=0 offset=-34521 curr=kworker/0:1[120] thread=cyclictest[19]
kworker/-607     0d..h1.. 107315689us : hrtimer_cancel: hrtimer=ffff8803d42efe18
kworker/-607     0d..h... 107315689us : hrtimer_expire_entry: hrtimer=ffff8803d42efe18 function=hrtimer_wakeup now=752735681960
kworker/-607     0d..h1.. 107315689us : sched_waking: comm=cyclictest pid=18015 prio=19 target_cpu=000
kworker/-607     0dN.h2.. 107315690us : sched_wakeup: comm=cyclictest pid=18015 prio=19 target_cpu=000
kworker/-607     0dN.h... 107315690us : hrtimer_expire_exit: hrtimer=ffff8803d42efe18
kworker/-607     0dN.h1.. 107315690us : hrtimer_interrupt: cpu=0 offset=318040 curr=kworker/0:1[120] thread=<none>[-1]
kworker/-607     0dN.h... 107315690us : write_msr: 6e0, value 28096cdb9ce
kworker/-607     0dN.h... 107315690us : local_timer_exit: vector=239
kworker/-607     0dN..1.. 107315690us : rcu_utilization: Start context switch
kworker/-607     0dN..1.. 107315691us : rcu_utilization: End context switch
kworker/-607     0dN..2.. 107315691us : sched_stat_runtime: comm=kworker/0:1 pid=607 runtime=38439 [ns] vruntime=49754716513 [ns]
kworker/-607     0d...2.. 107315691us : sched_switch: prev_comm=kworker/0:1 prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=18015 next_prio=19
kworker/-607     0d...2.. 107315691us : x86_fpu_regs_activated: x86/fpu: ffff8803f7f55940 fpregs_active: 1 fpstate_active: 1 counter: 99 xfeatures: 2 xcomp_bv: 0
kworker/-607     0d...2.. 107315691us : write_msr: c0000100, value 7ffff7400700
cyclicte-18015   0....... 107315692us : sys_exit: NR 230 = 0
cyclicte-18015   0....... 107315697us : sys_enter: NR 1 (5, 7ffff7400300, 1f, 7ffff77a5460, 2, 7ffff744c99a)
cyclicte-18015   0.....11 107315698us : tracing_mark_write: hit latency threshold (37 > 33)
cyclicte-18015   0....... 107315699us : sys_exit: NR 1 = 31
cyclicte-18015   0....... 107315699us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460, 2, 7ffff744c99a)


Very interesting is also the trace that I get in an idle system - without cyclictest running.
When I just enable my igb tracepoint I got the following result:
[...]
kworker/-607     0....... 585779012us+: igb: val: 700
kworker/-607     0....... 585779042us : igb: val: 701
kworker/-607     0....... 585779042us : igb: val: 106
kworker/-607     0....... 585779043us : igb: val: 107
kworker/-607     0....... 585779043us+: igb: val: 600
kworker/-607     0....... 585779080us : igb: val: 601

The time between my trace points 700 and 701 is about 30us, 
the time between my trace points 600 and 601 is even 37us!!
The code in between is 
     tsyncrxctl = rd32(E1000_TSYNCRXCTL);
resp.
     lvmmc = rd32(E1000_LVMMC);   

In both cases this is a single read from a register.
I have no idea why this single read could take that much time!
Is it possible that the igb hardware is in a state that delays the read access
and this is why the whole I/O system might be delayed?

I hope this delivers more details on understanding this issue.

Thanks for any feedback!

Regards

Mathias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-13  6:15                                   ` Koehrer Mathias
  0 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias @ 2016-10-13  6:15 UTC (permalink / raw)
  To: intel-wired-lan

Hi Julia,

thanks for the detailed analysis!
> 
> [...] 
> Okay, we finally received our wakeup event.  We were expecting to be woken up at
> 10024735653388ns, but were actually woken up at 10024735682387ns.
> 
>   10024735682387 - 10024735653388 = 28999ns
> 
> Our timer fired ~29us late!  But why...?
> 
> Sorry I don't have answers, just more questions.  I do wonder what
> kworker/0:3 was up to at the time the timer interrupt should have fired.
> 
>    Julia
I have now instrumented the igb driver to generate trace points that allows me to identify in 
more details what's going on.
This delivered interesting results!

Here are some places where I added traces:
In file igb_ptp.c:
void igb_ptp_rx_hang(struct igb_adapter *adapter)
{
        struct e1000_hw *hw = &adapter->hw;
        unsigned long rx_event;
        u32 tsyncrxctl;
        trace_igb(700);
        tsyncrxctl = rd32(E1000_TSYNCRXCTL);
        trace_igb(701);

        /* Other hardware uses per-packet timestamps */
        if (hw->mac.type != e1000_82576)
                return;
...

In file igb_main.c:
static void igb_check_lvmmc(struct igb_adapter *adapter)
{
        struct e1000_hw *hw = &adapter->hw;
        u32 lvmmc;

        trace_igb(600);
        lvmmc = rd32(E1000_LVMMC);
        trace_igb(601);
        if (lvmmc) {
...

When I run now my test, I get the following trace:
[...]
kworker/-607     0....... 107315621us+: igb: val: 700
kworker/-607     0d..h... 107315640us : irq_handler_entry: irq=47 name=eth2-rx-0
kworker/-607     0d..h... 107315640us : irq_handler_exit: irq=47 ret=handled
kworker/-607     0d..h1.. 107315640us : sched_waking: comm=irq/47-eth2-rx- pid=18009 prio=49 target_cpu=000
kworker/-607     0dN.h2.. 107315641us : sched_wakeup: comm=irq/47-eth2-rx- pid=18009 prio=49 target_cpu=000
kworker/-607     0dN.h1.. 107315643us : irq_handler_entry: irq=48 name=eth2-tx-0
kworker/-607     0dN.h1.. 107315643us : irq_handler_exit: irq=48 ret=handled
kworker/-607     0dN.h2.. 107315643us : sched_waking: comm=irq/48-eth2-tx- pid=18010 prio=49 target_cpu=000
kworker/-607     0dN.h3.. 107315644us : sched_wakeup: comm=irq/48-eth2-tx- pid=18010 prio=49 target_cpu=000
kworker/-607     0dN..1.. 107315644us : rcu_utilization: Start context switch
kworker/-607     0dN..1.. 107315644us : rcu_utilization: End context switch
kworker/-607     0dN..2.. 107315644us : sched_stat_runtime: comm=kworker/0:1 pid=607 runtime=88996 [ns] vruntime=49754678074 [ns]
kworker/-607     0d...2.. 107315645us : sched_switch: prev_comm=kworker/0:1 prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=irq/47-eth2-rx- next_pid=18009 next_prio=49
irq/47-e-18009   0d....11 107315646us : softirq_raise: vec=3 [action=NET_RX]
irq/47-e-18009   0.....12 107315646us : softirq_entry: vec=3 [action=NET_RX]
irq/47-e-18009   0.....12 107315647us : napi_poll: napi poll on napi struct ffff88040ae58c50 for device eth2 work 0 budget 64
irq/47-e-18009   0.....12 107315647us : softirq_exit: vec=3 [action=NET_RX]
irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: Start context switch
irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: End context switch
irq/47-e-18009   0d...2.. 107315648us : sched_switch: prev_comm=irq/47-eth2-rx- prev_pid=18009 prev_prio=49 prev_state=S ==> next_comm=irq/48-eth2-tx- next_pid=18010 next_prio=49
irq/48-e-18010   0d....11 107315649us : softirq_raise: vec=3 [action=NET_RX]
irq/48-e-18010   0.....12 107315649us : softirq_entry: vec=3 [action=NET_RX]
irq/48-e-18010   0.....12 107315650us : napi_poll: napi poll on napi struct ffff88040ae5f450 for device eth2 work 0 budget 64
irq/48-e-18010   0.....12 107315650us : softirq_exit: vec=3 [action=NET_RX]
irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: Start context switch
irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: End context switch
irq/48-e-18010   0d...2.. 107315651us : sched_switch: prev_comm=irq/48-eth2-tx- prev_pid=18010 prev_prio=49 prev_state=S ==> next_comm=kworker/0:1 next_pid=607 next_prio=120
kworker/-607     0....... 107315652us : igb: val: 701
kworker/-607     0....... 107315652us : igb: val: 106
kworker/-607     0....... 107315652us : igb: val: 107
kworker/-607     0....... 107315652us+: igb: val: 600
kworker/-607     0d..h... 107315689us : local_timer_entry: vector=239
kworker/-607     0d..h1.. 107315689us : hrtimer_interrupt: cpu=0 offset=-34521 curr=kworker/0:1[120] thread=cyclictest[19]
kworker/-607     0d..h1.. 107315689us : hrtimer_cancel: hrtimer=ffff8803d42efe18
kworker/-607     0d..h... 107315689us : hrtimer_expire_entry: hrtimer=ffff8803d42efe18 function=hrtimer_wakeup now=752735681960
kworker/-607     0d..h1.. 107315689us : sched_waking: comm=cyclictest pid=18015 prio=19 target_cpu=000
kworker/-607     0dN.h2.. 107315690us : sched_wakeup: comm=cyclictest pid=18015 prio=19 target_cpu=000
kworker/-607     0dN.h... 107315690us : hrtimer_expire_exit: hrtimer=ffff8803d42efe18
kworker/-607     0dN.h1.. 107315690us : hrtimer_interrupt: cpu=0 offset=318040 curr=kworker/0:1[120] thread=<none>[-1]
kworker/-607     0dN.h... 107315690us : write_msr: 6e0, value 28096cdb9ce
kworker/-607     0dN.h... 107315690us : local_timer_exit: vector=239
kworker/-607     0dN..1.. 107315690us : rcu_utilization: Start context switch
kworker/-607     0dN..1.. 107315691us : rcu_utilization: End context switch
kworker/-607     0dN..2.. 107315691us : sched_stat_runtime: comm=kworker/0:1 pid=607 runtime=38439 [ns] vruntime=49754716513 [ns]
kworker/-607     0d...2.. 107315691us : sched_switch: prev_comm=kworker/0:1 prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest next_pid=18015 next_prio=19
kworker/-607     0d...2.. 107315691us : x86_fpu_regs_activated: x86/fpu: ffff8803f7f55940 fpregs_active: 1 fpstate_active: 1 counter: 99 xfeatures: 2 xcomp_bv: 0
kworker/-607     0d...2.. 107315691us : write_msr: c0000100, value 7ffff7400700
cyclicte-18015   0....... 107315692us : sys_exit: NR 230 = 0
cyclicte-18015   0....... 107315697us : sys_enter: NR 1 (5, 7ffff7400300, 1f, 7ffff77a5460, 2, 7ffff744c99a)
cyclicte-18015   0.....11 107315698us : tracing_mark_write: hit latency threshold (37 > 33)
cyclicte-18015   0....... 107315699us : sys_exit: NR 1 = 31
cyclicte-18015   0....... 107315699us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460, 2, 7ffff744c99a)


Very interesting is also the trace that I get in an idle system - without cyclictest running.
When I just enable my igb tracepoint I got the following result:
[...]
kworker/-607     0....... 585779012us+: igb: val: 700
kworker/-607     0....... 585779042us : igb: val: 701
kworker/-607     0....... 585779042us : igb: val: 106
kworker/-607     0....... 585779043us : igb: val: 107
kworker/-607     0....... 585779043us+: igb: val: 600
kworker/-607     0....... 585779080us : igb: val: 601

The time between my trace points 700 and 701 is about 30us, 
the time between my trace points 600 and 601 is even 37us!!
The code in between is 
     tsyncrxctl = rd32(E1000_TSYNCRXCTL);
resp.
     lvmmc = rd32(E1000_LVMMC);   

In both cases this is a single read from a register.
I have no idea why this single read could take that much time!
Is it possible that the igb hardware is in a state that delays the read access
and this is why the whole I/O system might be delayed?

I hope this delivers more details on understanding this issue.

Thanks for any feedback!

Regards

Mathias


^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-13  6:15                                   ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-13 10:57                                     ` Koehrer Mathias
  -1 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-13 10:57 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5),
	Julia Cartwright, Williams, Mitch A, Kirsher, Jeffrey T
  Cc: linux-rt-users, Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg

Hi all!
> 
> Hi Julia,
> 
> thanks for the detailed analysis!
> >
> > [...]
> > Okay, we finally received our wakeup event.  We were expecting to be
> > woken up at 10024735653388ns, but were actually woken up at
> 10024735682387ns.
> >
> >   10024735682387 - 10024735653388 = 28999ns
> >
> > Our timer fired ~29us late!  But why...?
> >
> > Sorry I don't have answers, just more questions.  I do wonder what
> > kworker/0:3 was up to at the time the timer interrupt should have fired.
> >
> >    Julia
> I have now instrumented the igb driver to generate trace points that allows me to
> identify in more details what's going on.
> This delivered interesting results!
> 
> Here are some places where I added traces:
> In file igb_ptp.c:
> void igb_ptp_rx_hang(struct igb_adapter *adapter) {
>         struct e1000_hw *hw = &adapter->hw;
>         unsigned long rx_event;
>         u32 tsyncrxctl;
>         trace_igb(700);
>         tsyncrxctl = rd32(E1000_TSYNCRXCTL);
>         trace_igb(701);
> 
>         /* Other hardware uses per-packet timestamps */
>         if (hw->mac.type != e1000_82576)
>                 return;
> ...
> 
> In file igb_main.c:
> static void igb_check_lvmmc(struct igb_adapter *adapter) {
>         struct e1000_hw *hw = &adapter->hw;
>         u32 lvmmc;
> 
>         trace_igb(600);
>         lvmmc = rd32(E1000_LVMMC);
>         trace_igb(601);
>         if (lvmmc) {
> ...
> 
> When I run now my test, I get the following trace:
> [...]
> kworker/-607     0....... 107315621us+: igb: val: 700
> kworker/-607     0d..h... 107315640us : irq_handler_entry: irq=47 name=eth2-rx-0
> kworker/-607     0d..h... 107315640us : irq_handler_exit: irq=47 ret=handled
> kworker/-607     0d..h1.. 107315640us : sched_waking: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607     0dN.h2.. 107315641us : sched_wakeup: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607     0dN.h1.. 107315643us : irq_handler_entry: irq=48 name=eth2-tx-0
> kworker/-607     0dN.h1.. 107315643us : irq_handler_exit: irq=48 ret=handled
> kworker/-607     0dN.h2.. 107315643us : sched_waking: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607     0dN.h3.. 107315644us : sched_wakeup: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607     0dN..1.. 107315644us : rcu_utilization: Start context switch
> kworker/-607     0dN..1.. 107315644us : rcu_utilization: End context switch
> kworker/-607     0dN..2.. 107315644us : sched_stat_runtime: comm=kworker/0:1
> pid=607 runtime=88996 [ns] vruntime=49754678074 [ns]
> kworker/-607     0d...2.. 107315645us : sched_switch: prev_comm=kworker/0:1
> prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=irq/47-eth2-rx-
> next_pid=18009 next_prio=49
> irq/47-e-18009   0d....11 107315646us : softirq_raise: vec=3 [action=NET_RX]
> irq/47-e-18009   0.....12 107315646us : softirq_entry: vec=3 [action=NET_RX]
> irq/47-e-18009   0.....12 107315647us : napi_poll: napi poll on napi struct
> ffff88040ae58c50 for device eth2 work 0 budget 64
> irq/47-e-18009   0.....12 107315647us : softirq_exit: vec=3 [action=NET_RX]
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: Start context switch
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: End context switch
> irq/47-e-18009   0d...2.. 107315648us : sched_switch: prev_comm=irq/47-eth2-rx-
> prev_pid=18009 prev_prio=49 prev_state=S ==> next_comm=irq/48-eth2-tx-
> next_pid=18010 next_prio=49
> irq/48-e-18010   0d....11 107315649us : softirq_raise: vec=3 [action=NET_RX]
> irq/48-e-18010   0.....12 107315649us : softirq_entry: vec=3 [action=NET_RX]
> irq/48-e-18010   0.....12 107315650us : napi_poll: napi poll on napi struct
> ffff88040ae5f450 for device eth2 work 0 budget 64
> irq/48-e-18010   0.....12 107315650us : softirq_exit: vec=3 [action=NET_RX]
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: Start context switch
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: End context switch
> irq/48-e-18010   0d...2.. 107315651us : sched_switch: prev_comm=irq/48-eth2-tx-
> prev_pid=18010 prev_prio=49 prev_state=S ==> next_comm=kworker/0:1
> next_pid=607 next_prio=120
> kworker/-607     0....... 107315652us : igb: val: 701
> kworker/-607     0....... 107315652us : igb: val: 106
> kworker/-607     0....... 107315652us : igb: val: 107
> kworker/-607     0....... 107315652us+: igb: val: 600
> kworker/-607     0d..h... 107315689us : local_timer_entry: vector=239
> kworker/-607     0d..h1.. 107315689us : hrtimer_interrupt: cpu=0 offset=-34521
> curr=kworker/0:1[120] thread=cyclictest[19]
> kworker/-607     0d..h1.. 107315689us : hrtimer_cancel: hrtimer=ffff8803d42efe18
> kworker/-607     0d..h... 107315689us : hrtimer_expire_entry:
> hrtimer=ffff8803d42efe18 function=hrtimer_wakeup now=752735681960
> kworker/-607     0d..h1.. 107315689us : sched_waking: comm=cyclictest pid=18015
> prio=19 target_cpu=000
> kworker/-607     0dN.h2.. 107315690us : sched_wakeup: comm=cyclictest
> pid=18015 prio=19 target_cpu=000
> kworker/-607     0dN.h... 107315690us : hrtimer_expire_exit:
> hrtimer=ffff8803d42efe18
> kworker/-607     0dN.h1.. 107315690us : hrtimer_interrupt: cpu=0 offset=318040
> curr=kworker/0:1[120] thread=<none>[-1]
> kworker/-607     0dN.h... 107315690us : write_msr: 6e0, value 28096cdb9ce
> kworker/-607     0dN.h... 107315690us : local_timer_exit: vector=239
> kworker/-607     0dN..1.. 107315690us : rcu_utilization: Start context switch
> kworker/-607     0dN..1.. 107315691us : rcu_utilization: End context switch
> kworker/-607     0dN..2.. 107315691us : sched_stat_runtime: comm=kworker/0:1
> pid=607 runtime=38439 [ns] vruntime=49754716513 [ns]
> kworker/-607     0d...2.. 107315691us : sched_switch: prev_comm=kworker/0:1
> prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=18015 next_prio=19
> kworker/-607     0d...2.. 107315691us : x86_fpu_regs_activated: x86/fpu:
> ffff8803f7f55940 fpregs_active: 1 fpstate_active: 1 counter: 99 xfeatures: 2
> xcomp_bv: 0
> kworker/-607     0d...2.. 107315691us : write_msr: c0000100, value 7ffff7400700
> cyclicte-18015   0....... 107315692us : sys_exit: NR 230 = 0
> cyclicte-18015   0....... 107315697us : sys_enter: NR 1 (5, 7ffff7400300, 1f,
> 7ffff77a5460, 2, 7ffff744c99a)
> cyclicte-18015   0.....11 107315698us : tracing_mark_write: hit latency threshold (37
> > 33)
> cyclicte-18015   0....... 107315699us : sys_exit: NR 1 = 31
> cyclicte-18015   0....... 107315699us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460,
> 2, 7ffff744c99a)
> 
> 
> Very interesting is also the trace that I get in an idle system - without cyclictest
> running.
> When I just enable my igb tracepoint I got the following result:
> [...]
> kworker/-607     0....... 585779012us+: igb: val: 700
> kworker/-607     0....... 585779042us : igb: val: 701
> kworker/-607     0....... 585779042us : igb: val: 106
> kworker/-607     0....... 585779043us : igb: val: 107
> kworker/-607     0....... 585779043us+: igb: val: 600
> kworker/-607     0....... 585779080us : igb: val: 601
> 
> The time between my trace points 700 and 701 is about 30us, the time between my
> trace points 600 and 601 is even 37us!!
> The code in between is
>      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
>      lvmmc = rd32(E1000_LVMMC);
> 
> In both cases this is a single read from a register.
> I have no idea why this single read could take that much time!
> Is it possible that the igb hardware is in a state that delays the read access and this is
> why the whole I/O system might be delayed?
> 

To have a proper comparison, I did the same with kernel 3.18.27-rt27.
Also here, I instrumented the igb driver to get traces for the rd32 calls.
However, here everything is generally much faster!
In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.
On the 4.8 kernel this is always much slower (see above).
My question is now: Is there any kernel config option that has been introduced in the meantime
that may lead to this effect and which is not set in my 4.8 config?

Regards

Mathias

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-13 10:57                                     ` Koehrer Mathias
  0 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias @ 2016-10-13 10:57 UTC (permalink / raw)
  To: intel-wired-lan

Hi all!
> 
> Hi Julia,
> 
> thanks for the detailed analysis!
> >
> > [...]
> > Okay, we finally received our wakeup event.  We were expecting to be
> > woken up at 10024735653388ns, but were actually woken up at
> 10024735682387ns.
> >
> >   10024735682387 - 10024735653388 = 28999ns
> >
> > Our timer fired ~29us late!  But why...?
> >
> > Sorry I don't have answers, just more questions.  I do wonder what
> > kworker/0:3 was up to at the time the timer interrupt should have fired.
> >
> >    Julia
> I have now instrumented the igb driver to generate trace points that allows me to
> identify in more details what's going on.
> This delivered interesting results!
> 
> Here are some places where I added traces:
> In file igb_ptp.c:
> void igb_ptp_rx_hang(struct igb_adapter *adapter) {
>         struct e1000_hw *hw = &adapter->hw;
>         unsigned long rx_event;
>         u32 tsyncrxctl;
>         trace_igb(700);
>         tsyncrxctl = rd32(E1000_TSYNCRXCTL);
>         trace_igb(701);
> 
>         /* Other hardware uses per-packet timestamps */
>         if (hw->mac.type != e1000_82576)
>                 return;
> ...
> 
> In file igb_main.c:
> static void igb_check_lvmmc(struct igb_adapter *adapter) {
>         struct e1000_hw *hw = &adapter->hw;
>         u32 lvmmc;
> 
>         trace_igb(600);
>         lvmmc = rd32(E1000_LVMMC);
>         trace_igb(601);
>         if (lvmmc) {
> ...
> 
> When I run now my test, I get the following trace:
> [...]
> kworker/-607     0....... 107315621us+: igb: val: 700
> kworker/-607     0d..h... 107315640us : irq_handler_entry: irq=47 name=eth2-rx-0
> kworker/-607     0d..h... 107315640us : irq_handler_exit: irq=47 ret=handled
> kworker/-607     0d..h1.. 107315640us : sched_waking: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607     0dN.h2.. 107315641us : sched_wakeup: comm=irq/47-eth2-rx-
> pid=18009 prio=49 target_cpu=000
> kworker/-607     0dN.h1.. 107315643us : irq_handler_entry: irq=48 name=eth2-tx-0
> kworker/-607     0dN.h1.. 107315643us : irq_handler_exit: irq=48 ret=handled
> kworker/-607     0dN.h2.. 107315643us : sched_waking: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607     0dN.h3.. 107315644us : sched_wakeup: comm=irq/48-eth2-tx-
> pid=18010 prio=49 target_cpu=000
> kworker/-607     0dN..1.. 107315644us : rcu_utilization: Start context switch
> kworker/-607     0dN..1.. 107315644us : rcu_utilization: End context switch
> kworker/-607     0dN..2.. 107315644us : sched_stat_runtime: comm=kworker/0:1
> pid=607 runtime=88996 [ns] vruntime=49754678074 [ns]
> kworker/-607     0d...2.. 107315645us : sched_switch: prev_comm=kworker/0:1
> prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=irq/47-eth2-rx-
> next_pid=18009 next_prio=49
> irq/47-e-18009   0d....11 107315646us : softirq_raise: vec=3 [action=NET_RX]
> irq/47-e-18009   0.....12 107315646us : softirq_entry: vec=3 [action=NET_RX]
> irq/47-e-18009   0.....12 107315647us : napi_poll: napi poll on napi struct
> ffff88040ae58c50 for device eth2 work 0 budget 64
> irq/47-e-18009   0.....12 107315647us : softirq_exit: vec=3 [action=NET_RX]
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: Start context switch
> irq/47-e-18009   0d...1.. 107315648us : rcu_utilization: End context switch
> irq/47-e-18009   0d...2.. 107315648us : sched_switch: prev_comm=irq/47-eth2-rx-
> prev_pid=18009 prev_prio=49 prev_state=S ==> next_comm=irq/48-eth2-tx-
> next_pid=18010 next_prio=49
> irq/48-e-18010   0d....11 107315649us : softirq_raise: vec=3 [action=NET_RX]
> irq/48-e-18010   0.....12 107315649us : softirq_entry: vec=3 [action=NET_RX]
> irq/48-e-18010   0.....12 107315650us : napi_poll: napi poll on napi struct
> ffff88040ae5f450 for device eth2 work 0 budget 64
> irq/48-e-18010   0.....12 107315650us : softirq_exit: vec=3 [action=NET_RX]
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: Start context switch
> irq/48-e-18010   0d...1.. 107315651us : rcu_utilization: End context switch
> irq/48-e-18010   0d...2.. 107315651us : sched_switch: prev_comm=irq/48-eth2-tx-
> prev_pid=18010 prev_prio=49 prev_state=S ==> next_comm=kworker/0:1
> next_pid=607 next_prio=120
> kworker/-607     0....... 107315652us : igb: val: 701
> kworker/-607     0....... 107315652us : igb: val: 106
> kworker/-607     0....... 107315652us : igb: val: 107
> kworker/-607     0....... 107315652us+: igb: val: 600
> kworker/-607     0d..h... 107315689us : local_timer_entry: vector=239
> kworker/-607     0d..h1.. 107315689us : hrtimer_interrupt: cpu=0 offset=-34521
> curr=kworker/0:1[120] thread=cyclictest[19]
> kworker/-607     0d..h1.. 107315689us : hrtimer_cancel: hrtimer=ffff8803d42efe18
> kworker/-607     0d..h... 107315689us : hrtimer_expire_entry:
> hrtimer=ffff8803d42efe18 function=hrtimer_wakeup now=752735681960
> kworker/-607     0d..h1.. 107315689us : sched_waking: comm=cyclictest pid=18015
> prio=19 target_cpu=000
> kworker/-607     0dN.h2.. 107315690us : sched_wakeup: comm=cyclictest
> pid=18015 prio=19 target_cpu=000
> kworker/-607     0dN.h... 107315690us : hrtimer_expire_exit:
> hrtimer=ffff8803d42efe18
> kworker/-607     0dN.h1.. 107315690us : hrtimer_interrupt: cpu=0 offset=318040
> curr=kworker/0:1[120] thread=<none>[-1]
> kworker/-607     0dN.h... 107315690us : write_msr: 6e0, value 28096cdb9ce
> kworker/-607     0dN.h... 107315690us : local_timer_exit: vector=239
> kworker/-607     0dN..1.. 107315690us : rcu_utilization: Start context switch
> kworker/-607     0dN..1.. 107315691us : rcu_utilization: End context switch
> kworker/-607     0dN..2.. 107315691us : sched_stat_runtime: comm=kworker/0:1
> pid=607 runtime=38439 [ns] vruntime=49754716513 [ns]
> kworker/-607     0d...2.. 107315691us : sched_switch: prev_comm=kworker/0:1
> prev_pid=607 prev_prio=120 prev_state=R+ ==> next_comm=cyclictest
> next_pid=18015 next_prio=19
> kworker/-607     0d...2.. 107315691us : x86_fpu_regs_activated: x86/fpu:
> ffff8803f7f55940 fpregs_active: 1 fpstate_active: 1 counter: 99 xfeatures: 2
> xcomp_bv: 0
> kworker/-607     0d...2.. 107315691us : write_msr: c0000100, value 7ffff7400700
> cyclicte-18015   0....... 107315692us : sys_exit: NR 230 = 0
> cyclicte-18015   0....... 107315697us : sys_enter: NR 1 (5, 7ffff7400300, 1f,
> 7ffff77a5460, 2, 7ffff744c99a)
> cyclicte-18015   0.....11 107315698us : tracing_mark_write: hit latency threshold (37
> > 33)
> cyclicte-18015   0....... 107315699us : sys_exit: NR 1 = 31
> cyclicte-18015   0....... 107315699us : sys_enter: NR 1 (4, 4076b0, 2, 7ffff77a5460,
> 2, 7ffff744c99a)
> 
> 
> Very interesting is also the trace that I get in an idle system - without cyclictest
> running.
> When I just enable my igb tracepoint I got the following result:
> [...]
> kworker/-607     0....... 585779012us+: igb: val: 700
> kworker/-607     0....... 585779042us : igb: val: 701
> kworker/-607     0....... 585779042us : igb: val: 106
> kworker/-607     0....... 585779043us : igb: val: 107
> kworker/-607     0....... 585779043us+: igb: val: 600
> kworker/-607     0....... 585779080us : igb: val: 601
> 
> The time between my trace points 700 and 701 is about 30us, the time between my
> trace points 600 and 601 is even 37us!!
> The code in between is
>      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
>      lvmmc = rd32(E1000_LVMMC);
> 
> In both cases this is a single read from a register.
> I have no idea why this single read could take that much time!
> Is it possible that the igb hardware is in a state that delays the read access and this is
> why the whole I/O system might be delayed?
> 

To have a proper comparison, I did the same with kernel 3.18.27-rt27.
Also here, I instrumented the igb driver to get traces for the rd32 calls.
However, here everything is generally much faster!
In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.
On the 4.8 kernel this is always much slower (see above).
My question is now: Is there any kernel config option that has been introduced in the meantime
that may lead to this effect and which is not set in my 4.8 config?

Regards

Mathias


^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-13 10:57                                     ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-13 14:02                                       ` David Laight
  -1 siblings, 0 replies; 53+ messages in thread
From: David Laight @ 2016-10-13 14:02 UTC (permalink / raw)
  To: 'Koehrer Mathias (ETAS/ESW5)',
	Julia Cartwright, Williams, Mitch A, Kirsher, Jeffrey T
  Cc: linux-rt-users, Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg

From: Koehrer Mathias
> Sent: 13 October 2016 11:57
..
> > The time between my trace points 700 and 701 is about 30us, the time between my
> > trace points 600 and 601 is even 37us!!
> > The code in between is
> >      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
> >      lvmmc = rd32(E1000_LVMMC);
> >
> > In both cases this is a single read from a register.
> > I have no idea why this single read could take that much time!
> > Is it possible that the igb hardware is in a state that delays the read access and this is
> > why the whole I/O system might be delayed?
> >
> 
> To have a proper comparison, I did the same with kernel 3.18.27-rt27.
> Also here, I instrumented the igb driver to get traces for the rd32 calls.
> However, here everything is generally much faster!
> In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.

1-2us is probably about right, PCIe is high throughput high latency.
You should see the latencies we get talking to fpga!

> On the 4.8 kernel this is always much slower (see above).
> My question is now: Is there any kernel config option that has been introduced in the meantime
> that may lead to this effect and which is not set in my 4.8 config?

Have a look at the generated code for rd32().
Someone might have added a load of synchronisation instructions to it.
On x86 I don't think it needs any.

It is also possible for other PCIe accesses to slow things down
(which might be why you see 6us).

I presume you are doing these comparisons on the same hardware?
Obscure bus topologies could slow things down.

	David

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-13 14:02                                       ` David Laight
  0 siblings, 0 replies; 53+ messages in thread
From: David Laight @ 2016-10-13 14:02 UTC (permalink / raw)
  To: intel-wired-lan

From: Koehrer Mathias
> Sent: 13 October 2016 11:57
..
> > The time between my trace points 700 and 701 is about 30us, the time between my
> > trace points 600 and 601 is even 37us!!
> > The code in between is
> >      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
> >      lvmmc = rd32(E1000_LVMMC);
> >
> > In both cases this is a single read from a register.
> > I have no idea why this single read could take that much time!
> > Is it possible that the igb hardware is in a state that delays the read access and this is
> > why the whole I/O system might be delayed?
> >
> 
> To have a proper comparison, I did the same with kernel 3.18.27-rt27.
> Also here, I instrumented the igb driver to get traces for the rd32 calls.
> However, here everything is generally much faster!
> In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.

1-2us is probably about right, PCIe is high throughput high latency.
You should see the latencies we get talking to fpga!

> On the 4.8 kernel this is always much slower (see above).
> My question is now: Is there any kernel config option that has been introduced in the meantime
> that may lead to this effect and which is not set in my 4.8 config?

Have a look at the generated code for rd32().
Someone might have added a load of synchronisation instructions to it.
On x86 I don't think it needs any.

It is also possible for other PCIe accesses to slow things down
(which might be why you see 6us).

I presume you are doing these comparisons on the same hardware?
Obscure bus topologies could slow things down.

	David


^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-13 10:57                                     ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-13 16:18                                       ` Julia Cartwright
  -1 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-13 16:18 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5)
  Cc: Williams, Mitch A, Kirsher, Jeffrey T, linux-rt-users,
	Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg

Hey Mathias-

On Thu, Oct 13, 2016 at 10:57:18AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
[..]

Interesting indeed!

> > Here are some places where I added traces:
> > In file igb_ptp.c:
> > void igb_ptp_rx_hang(struct igb_adapter *adapter) {
> >         struct e1000_hw *hw = &adapter->hw;
> >         unsigned long rx_event;
> >         u32 tsyncrxctl;
> >         trace_igb(700);
> >         tsyncrxctl = rd32(E1000_TSYNCRXCTL);
> >         trace_igb(701);
> >
> >         /* Other hardware uses per-packet timestamps */
> >         if (hw->mac.type != e1000_82576)
> >                 return;
> > ...
> >
> > In file igb_main.c:
> > static void igb_check_lvmmc(struct igb_adapter *adapter) {
> >         struct e1000_hw *hw = &adapter->hw;
> >         u32 lvmmc;
> >
> >         trace_igb(600);
> >         lvmmc = rd32(E1000_LVMMC);
> >         trace_igb(601);
> >         if (lvmmc) {
> > ...
> >
[..]
> > The time between my trace points 700 and 701 is about 30us, the time between my
> > trace points 600 and 601 is even 37us!!
> >
> > The code in between is
> >      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
> >      lvmmc = rd32(E1000_LVMMC);
> >
> > In both cases this is a single read from a register.
> >
> > I have no idea why this single read could take that much time!

Are these the only registers you see this amount of delay when reading?

It's also possible that it's not these registers themselves that cause
problems, but any writes prior to these reads.  That is, given to PCI's
posted write behavior, it could be that these reads are delayed only
because it's flushing previously writes to the device.

> > Is it possible that the igb hardware is in a state that delays the read access and this is
> > why the whole I/O system might be delayed?

One additional hypothesis is that some register accesses trigger
accesses to off-chip resources synchronously; for example, a write to
enable timestamping needs to access an external phy on a slower bus,
etc.  I don't know enough about this device to say whether or not that
happens or not.

> To have a proper comparison, I did the same with kernel 3.18.27-rt27.
> Also here, I instrumented the igb driver to get traces for the rd32 calls.
> However, here everything is generally much faster!

> In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.
> On the 4.8 kernel this is always much slower (see above).
> My question is now: Is there any kernel config option that has been introduced in the meantime
> that may lead to this effect and which is not set in my 4.8 config?

Have you tested on a vanilla (non-RT) kernel?  I doubt there is anything
RT specific about what you are seeing, but it might be nice to get
confirmation.  Also, bisection would probably be easier if you confirm
on a vanilla kernel.

I find it unlikely that it's a kernel config option that changed which
regressed you, but instead was a code change to a driver.  Which driver
is now the question, and the surface area is still big (processor
mapping attributes for this region, PCI root complex configuration, PCI
brige configuration, igb driver itself, etc.).

Big enough that I'd recommend a bisection.  It looks like a bisection
between 3.18 and 4.8 would take you about 18 tries to narrow down,
assuming all goes well.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-13 16:18                                       ` Julia Cartwright
  0 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-13 16:18 UTC (permalink / raw)
  To: intel-wired-lan

Hey Mathias-

On Thu, Oct 13, 2016 at 10:57:18AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
[..]

Interesting indeed!

> > Here are some places where I added traces:
> > In file igb_ptp.c:
> > void igb_ptp_rx_hang(struct igb_adapter *adapter) {
> >         struct e1000_hw *hw = &adapter->hw;
> >         unsigned long rx_event;
> >         u32 tsyncrxctl;
> >         trace_igb(700);
> >         tsyncrxctl = rd32(E1000_TSYNCRXCTL);
> >         trace_igb(701);
> >
> >         /* Other hardware uses per-packet timestamps */
> >         if (hw->mac.type != e1000_82576)
> >                 return;
> > ...
> >
> > In file igb_main.c:
> > static void igb_check_lvmmc(struct igb_adapter *adapter) {
> >         struct e1000_hw *hw = &adapter->hw;
> >         u32 lvmmc;
> >
> >         trace_igb(600);
> >         lvmmc = rd32(E1000_LVMMC);
> >         trace_igb(601);
> >         if (lvmmc) {
> > ...
> >
[..]
> > The time between my trace points 700 and 701 is about 30us, the time between my
> > trace points 600 and 601 is even 37us!!
> >
> > The code in between is
> >      tsyncrxctl = rd32(E1000_TSYNCRXCTL); resp.
> >      lvmmc = rd32(E1000_LVMMC);
> >
> > In both cases this is a single read from a register.
> >
> > I have no idea why this single read could take that much time!

Are these the only registers you see this amount of delay when reading?

It's also possible that it's not these registers themselves that cause
problems, but any writes prior to these reads.  That is, given to PCI's
posted write behavior, it could be that these reads are delayed only
because it's flushing previously writes to the device.

> > Is it possible that the igb hardware is in a state that delays the read access and this is
> > why the whole I/O system might be delayed?

One additional hypothesis is that some register accesses trigger
accesses to off-chip resources synchronously; for example, a write to
enable timestamping needs to access an external phy on a slower bus,
etc.  I don't know enough about this device to say whether or not that
happens or not.

> To have a proper comparison, I did the same with kernel 3.18.27-rt27.
> Also here, I instrumented the igb driver to get traces for the rd32 calls.
> However, here everything is generally much faster!

> In the idle system the maximum I got for a read was about 6us, most times it was 1-2us.
> On the 4.8 kernel this is always much slower (see above).
> My question is now: Is there any kernel config option that has been introduced in the meantime
> that may lead to this effect and which is not set in my 4.8 config?

Have you tested on a vanilla (non-RT) kernel?  I doubt there is anything
RT specific about what you are seeing, but it might be nice to get
confirmation.  Also, bisection would probably be easier if you confirm
on a vanilla kernel.

I find it unlikely that it's a kernel config option that changed which
regressed you, but instead was a code change to a driver.  Which driver
is now the question, and the surface area is still big (processor
mapping attributes for this region, PCI root complex configuration, PCI
brige configuration, igb driver itself, etc.).

Big enough that I'd recommend a bisection.  It looks like a bisection
between 3.18 and 4.8 would take you about 18 tries to narrow down,
assuming all goes well.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-13 16:18                                       ` [Intel-wired-lan] " Julia Cartwright
@ 2016-10-14  8:58                                         ` Koehrer Mathias
  -1 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-14  8:58 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Williams, Mitch A, Kirsher, Jeffrey T, linux-rt-users,
	Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg

[-- Attachment #1: Type: text/plain, Size: 9462 bytes --]

Hi Julia,
> Have you tested on a vanilla (non-RT) kernel?  I doubt there is anything RT specific
> about what you are seeing, but it might be nice to get confirmation.  Also, bisection
> would probably be easier if you confirm on a vanilla kernel.
> 
> I find it unlikely that it's a kernel config option that changed which regressed you, but
> instead was a code change to a driver.  Which driver is now the question, and the
> surface area is still big (processor mapping attributes for this region, PCI root
> complex configuration, PCI brige configuration, igb driver itself, etc.).
> 
> Big enough that I'd recommend a bisection.  It looks like a bisection between 3.18
> and 4.8 would take you about 18 tries to narrow down, assuming all goes well.
> 

I have now repeated my tests using the vanilla kernel.
There I got the very same issue.
Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.


Here is my exact (reproducible) test description:
I applied the following patch to the kernel to get the igb trace.
This patch instruments the igb_rd32() function to measure the call
to readl() which is used to access registers of the igb NIC.


++++++++++++++++++ BEGIN PATCH ++++++++++++++++++++++++++++++++++++

Index: linux-4.8/drivers/net/ethernet/intel/igb/trace.h
===================================================================
--- /dev/null
+++ linux-4.8/drivers/net/ethernet/intel/igb/trace.h
@@ -0,0 +1,34 @@
+
+#if !defined(_TRACE_IGB_H_) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_IGB_H_ 
+
+#include <linux/tracepoint.h>
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM igb
+
+
+#define _TRACE_H_
+
+
+TRACE_EVENT(igb,
+        TP_PROTO(u32 val),
+        TP_ARGS(val),
+        TP_STRUCT__entry(
+                __field(u32, val)
+        ),
+        TP_fast_assign(
+                __entry->val = val;
+        ),
+        TP_printk("val: %u",
+                   __entry->val)
+);
+
+
+#endif
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH drivers/net/ethernet/intel/igb 
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+#include <trace/define_trace.h>
Index: linux-4.8/drivers/net/ethernet/intel/igb/Makefile
===================================================================
--- linux-4.8.orig/drivers/net/ethernet/intel/igb/Makefile
+++ linux-4.8/drivers/net/ethernet/intel/igb/Makefile
@@ -28,6 +28,7 @@
 #
 # Makefile for the Intel(R) 82575 PCI-Express ethernet driver
 #
+ccflags-y += -I.
 
 obj-$(CONFIG_IGB) += igb.o
 
Index: linux-4.8/drivers/net/ethernet/intel/igb/igb_main.c
===================================================================
--- linux-4.8.orig/drivers/net/ethernet/intel/igb/igb_main.c
+++ linux-4.8/drivers/net/ethernet/intel/igb/igb_main.c
@@ -57,6 +57,9 @@
 #include <linux/i2c.h>
 #include "igb.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 #define MAJ 5
 #define MIN 3
 #define BUILD 0
@@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
 	if (E1000_REMOVED(hw_addr))
 		return ~value;
 
+        trace_igb(801);
 	value = readl(&hw_addr[reg]);
+        trace_igb(802);
 
 	/* reads should not return all F's */
 	if (!(~value) && (!reg || !(~readl(hw_addr)))) {


++++++++++++++++++ END PATCH ++++++++++++++++++++++++++++++++++++


I build the kernel with this patch applied, rebooted the PC to run this kernel and used the 
following script for my test.

++++++++++++++++++ BEGIN SCRIPT  +++++++++++++++++++++++++++++++++
#!/bin/bash

for f in /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_governor ; do
    if [ -w $f ]; then
        echo "performance" > $f
    fi
done

if true; then
    rmmod igb
    modprobe igb
    ethtool -L eth2 combined 1
    ifconfig eth2 up 192.168.100.111
fi

ifconfig

mount /sys/kernel/debug

( cd /sys/kernel/debug/tracing
  echo 0 > tracing_on
  echo 0 > events/enable
  echo 1 > events/igb/enable
  echo "print-parent" > trace_options
  echo "latency-format" > trace_options
  echo 1 > tracing_on

  sleep 4
  cat trace
)
++++++++++++++++++ END SCRIPT  +++++++++++++++++++++++++++++++++

The results of this for kernel 4.0:
[...]
kworker/-1239    3...1 49699046us : igb: val: 801
kworker/-1239    3...1 49699047us : igb: val: 802
kworker/-1239    3...1 49699047us : igb: val: 801
kworker/-1239    3...1 49699048us+: igb: val: 802
kworker/-1239    3...1 49699099us : igb: val: 801
kworker/-1239    3...1 49699100us : igb: val: 802
kworker/-1239    3...1 49699100us : igb: val: 801
kworker/-1239    3...1 49699102us : igb: val: 802
kworker/-1239    3...1 49699102us : igb: val: 801
kworker/-1239    3...1 49699103us : igb: val: 802
kworker/-1239    3...1 49699103us : igb: val: 801
kworker/-1239    3...1 49699104us : igb: val: 802
kworker/-1239    3...1 49699104us : igb: val: 801
kworker/-1239    3...1 49699105us : igb: val: 802
kworker/-1239    3...1 49699105us : igb: val: 801
kworker/-1239    3...1 49699107us : igb: val: 802
kworker/-1239    3...1 49699107us : igb: val: 801
kworker/-1239    3...1 49699108us : igb: val: 802
kworker/-1239    3...1 49699108us : igb: val: 801
kworker/-1239    3...1 49699109us : igb: val: 802
kworker/-1239    3...1 49699109us : igb: val: 801
kworker/-1239    3...1 49699110us : igb: val: 802
kworker/-1239    3...1 49699110us : igb: val: 801
kworker/-1239    3...1 49699111us : igb: val: 802
kworker/-1239    3...1 49699111us : igb: val: 801
kworker/-1239    3...1 49699113us+: igb: val: 802
kworker/-1239    3...1 49699163us : igb: val: 801
kworker/-1239    3...1 49699164us : igb: val: 802
kworker/-1239    3...1 49699164us : igb: val: 801
kworker/-1239    3...1 49699166us : igb: val: 802
kworker/-1239    3...1 49699166us : igb: val: 801
kworker/-1239    3...1 49699167us : igb: val: 802
kworker/-1239    3...1 49699167us : igb: val: 801
kworker/-1239    3...1 49699168us : igb: val: 802
kworker/-1239    3...1 49699168us : igb: val: 801
kworker/-1239    3...1 49699169us : igb: val: 802
kworker/-1239    3...1 49699169us : igb: val: 801
kworker/-1239    3...1 49699170us : igb: val: 802
kworker/-1239    3...1 49699171us : igb: val: 801
kworker/-1239    3...1 49699173us : igb: val: 802
kworker/-1239    3...1 49699173us : igb: val: 801
kworker/-1239    3...1 49699174us : igb: val: 802





The results of this for kernel 4.1:
[...]
kworker/-1333    2...1 75697302us+: igb: val: 801
kworker/-1333    2...1 75697323us+: igb: val: 802
kworker/-1333    2...1 75697373us+: igb: val: 801
kworker/-1333    2...1 75697395us : igb: val: 802
kworker/-1333    2...1 75697395us+: igb: val: 801
kworker/-1333    2...1 75697415us : igb: val: 802
kworker/-1333    2...1 75697416us+: igb: val: 801
kworker/-1333    2...1 75697436us : igb: val: 802
kworker/-1333    2...1 75697436us+: igb: val: 801
kworker/-1333    2...1 75697457us : igb: val: 802
kworker/-1333    2...1 75697457us+: igb: val: 801
kworker/-1333    2...1 75697478us : igb: val: 802
kworker/-1333    2...1 75697478us+: igb: val: 801
kworker/-1333    2...1 75697499us : igb: val: 802
kworker/-1333    2...1 75697499us+: igb: val: 801
kworker/-1333    2...1 75697520us : igb: val: 802
kworker/-1333    2...1 75697520us+: igb: val: 801
kworker/-1333    2...1 75697541us : igb: val: 802
kworker/-1333    2...1 75697541us+: igb: val: 801
kworker/-1333    2...1 75697562us : igb: val: 802
kworker/-1333    2...1 75697562us+: igb: val: 801
kworker/-1333    2...1 75697583us : igb: val: 802
kworker/-1333    2...1 75697583us+: igb: val: 801
kworker/-1333    2...1 75697604us+: igb: val: 802
kworker/-1333    2...1 75697654us+: igb: val: 801
kworker/-1333    2...1 75697675us : igb: val: 802
kworker/-1333    2...1 75697675us+: igb: val: 801
kworker/-1333    2...1 75697696us : igb: val: 802
kworker/-1333    2...1 75697696us+: igb: val: 801
kworker/-1333    2...1 75697717us : igb: val: 802
kworker/-1333    2...1 75697717us+: igb: val: 801
kworker/-1333    2...1 75697738us : igb: val: 802
kworker/-1333    2...1 75697738us+: igb: val: 801
kworker/-1333    2...1 75697759us : igb: val: 802
kworker/-1333    2...1 75697759us+: igb: val: 801
kworker/-1333    2...1 75697780us : igb: val: 802
kworker/-1333    2...1 75697781us+: igb: val: 801
kworker/-1333    2...1 75697801us : igb: val: 802
kworker/-1333    2...1 75697801us+: igb: val: 801
kworker/-1333    2...1 75697822us : igb: val: 802


It can be cleary seen that with the kernel 4.1 the time for the execution of the "readl"
function is much slower! It tasks always about 21us!
In comparison to that the kernel 4.0 is fast. Here the time for "readl" is about
1-2 us.
All measurements have been done on the very same PC.
The PC is an HP EliteDesk 800 G1, 
a dual port Intel i350-T2 server adapter has been plugged in.
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
x86_64 bit mode.

Please find attached the kernel configuration for the 4.1 kernel.

I had a look at the generated .o file of igb_main.o.
However, in both cases - kernel 4.0 and kernel 4.1 - the code for the 
igb_rd32 function looks the same.
Especially the call to the readl() function is the very same.

This means, that I think that some other stuff in kernel 4.1 has changed,
which has impact on the igb accesses.

Any idea what component could cause this kind of issue?

Thanks for any feedback!

Regards

Mathias


[-- Attachment #2: config.gz --]
[-- Type: application/x-gzip, Size: 23977 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-14  8:58                                         ` Koehrer Mathias
  0 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias @ 2016-10-14  8:58 UTC (permalink / raw)
  To: intel-wired-lan

Hi Julia,
> Have you tested on a vanilla (non-RT) kernel?  I doubt there is anything RT specific
> about what you are seeing, but it might be nice to get confirmation.  Also, bisection
> would probably be easier if you confirm on a vanilla kernel.
> 
> I find it unlikely that it's a kernel config option that changed which regressed you, but
> instead was a code change to a driver.  Which driver is now the question, and the
> surface area is still big (processor mapping attributes for this region, PCI root
> complex configuration, PCI brige configuration, igb driver itself, etc.).
> 
> Big enough that I'd recommend a bisection.  It looks like a bisection between 3.18
> and 4.8 would take you about 18 tries to narrow down, assuming all goes well.
> 

I have now repeated my tests using the vanilla kernel.
There I got the very same issue.
Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.


Here is my exact (reproducible) test description:
I applied the following patch to the kernel to get the igb trace.
This patch instruments the igb_rd32() function to measure the call
to readl() which is used to access registers of the igb NIC.


++++++++++++++++++ BEGIN PATCH ++++++++++++++++++++++++++++++++++++

Index: linux-4.8/drivers/net/ethernet/intel/igb/trace.h
===================================================================
--- /dev/null
+++ linux-4.8/drivers/net/ethernet/intel/igb/trace.h
@@ -0,0 +1,34 @@
+
+#if !defined(_TRACE_IGB_H_) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_IGB_H_ 
+
+#include <linux/tracepoint.h>
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM igb
+
+
+#define _TRACE_H_
+
+
+TRACE_EVENT(igb,
+        TP_PROTO(u32 val),
+        TP_ARGS(val),
+        TP_STRUCT__entry(
+                __field(u32, val)
+        ),
+        TP_fast_assign(
+                __entry->val = val;
+        ),
+        TP_printk("val: %u",
+                   __entry->val)
+);
+
+
+#endif
+
+#undef TRACE_INCLUDE_PATH
+#define TRACE_INCLUDE_PATH drivers/net/ethernet/intel/igb 
+#undef TRACE_INCLUDE_FILE
+#define TRACE_INCLUDE_FILE trace
+
+#include <trace/define_trace.h>
Index: linux-4.8/drivers/net/ethernet/intel/igb/Makefile
===================================================================
--- linux-4.8.orig/drivers/net/ethernet/intel/igb/Makefile
+++ linux-4.8/drivers/net/ethernet/intel/igb/Makefile
@@ -28,6 +28,7 @@
 #
 # Makefile for the Intel(R) 82575 PCI-Express ethernet driver
 #
+ccflags-y += -I.
 
 obj-$(CONFIG_IGB) += igb.o
 
Index: linux-4.8/drivers/net/ethernet/intel/igb/igb_main.c
===================================================================
--- linux-4.8.orig/drivers/net/ethernet/intel/igb/igb_main.c
+++ linux-4.8/drivers/net/ethernet/intel/igb/igb_main.c
@@ -57,6 +57,9 @@
 #include <linux/i2c.h>
 #include "igb.h"
 
+#define CREATE_TRACE_POINTS
+#include "trace.h"
+
 #define MAJ 5
 #define MIN 3
 #define BUILD 0
@@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
 	if (E1000_REMOVED(hw_addr))
 		return ~value;
 
+        trace_igb(801);
 	value = readl(&hw_addr[reg]);
+        trace_igb(802);
 
 	/* reads should not return all F's */
 	if (!(~value) && (!reg || !(~readl(hw_addr)))) {


++++++++++++++++++ END PATCH ++++++++++++++++++++++++++++++++++++


I build the kernel with this patch applied, rebooted the PC to run this kernel and used the 
following script for my test.

++++++++++++++++++ BEGIN SCRIPT  +++++++++++++++++++++++++++++++++
#!/bin/bash

for f in /sys/devices/system/cpu/cpu[0-9]*/cpufreq/scaling_governor ; do
    if [ -w $f ]; then
        echo "performance" > $f
    fi
done

if true; then
    rmmod igb
    modprobe igb
    ethtool -L eth2 combined 1
    ifconfig eth2 up 192.168.100.111
fi

ifconfig

mount /sys/kernel/debug

( cd /sys/kernel/debug/tracing
  echo 0 > tracing_on
  echo 0 > events/enable
  echo 1 > events/igb/enable
  echo "print-parent" > trace_options
  echo "latency-format" > trace_options
  echo 1 > tracing_on

  sleep 4
  cat trace
)
++++++++++++++++++ END SCRIPT  +++++++++++++++++++++++++++++++++

The results of this for kernel 4.0:
[...]
kworker/-1239    3...1 49699046us : igb: val: 801
kworker/-1239    3...1 49699047us : igb: val: 802
kworker/-1239    3...1 49699047us : igb: val: 801
kworker/-1239    3...1 49699048us+: igb: val: 802
kworker/-1239    3...1 49699099us : igb: val: 801
kworker/-1239    3...1 49699100us : igb: val: 802
kworker/-1239    3...1 49699100us : igb: val: 801
kworker/-1239    3...1 49699102us : igb: val: 802
kworker/-1239    3...1 49699102us : igb: val: 801
kworker/-1239    3...1 49699103us : igb: val: 802
kworker/-1239    3...1 49699103us : igb: val: 801
kworker/-1239    3...1 49699104us : igb: val: 802
kworker/-1239    3...1 49699104us : igb: val: 801
kworker/-1239    3...1 49699105us : igb: val: 802
kworker/-1239    3...1 49699105us : igb: val: 801
kworker/-1239    3...1 49699107us : igb: val: 802
kworker/-1239    3...1 49699107us : igb: val: 801
kworker/-1239    3...1 49699108us : igb: val: 802
kworker/-1239    3...1 49699108us : igb: val: 801
kworker/-1239    3...1 49699109us : igb: val: 802
kworker/-1239    3...1 49699109us : igb: val: 801
kworker/-1239    3...1 49699110us : igb: val: 802
kworker/-1239    3...1 49699110us : igb: val: 801
kworker/-1239    3...1 49699111us : igb: val: 802
kworker/-1239    3...1 49699111us : igb: val: 801
kworker/-1239    3...1 49699113us+: igb: val: 802
kworker/-1239    3...1 49699163us : igb: val: 801
kworker/-1239    3...1 49699164us : igb: val: 802
kworker/-1239    3...1 49699164us : igb: val: 801
kworker/-1239    3...1 49699166us : igb: val: 802
kworker/-1239    3...1 49699166us : igb: val: 801
kworker/-1239    3...1 49699167us : igb: val: 802
kworker/-1239    3...1 49699167us : igb: val: 801
kworker/-1239    3...1 49699168us : igb: val: 802
kworker/-1239    3...1 49699168us : igb: val: 801
kworker/-1239    3...1 49699169us : igb: val: 802
kworker/-1239    3...1 49699169us : igb: val: 801
kworker/-1239    3...1 49699170us : igb: val: 802
kworker/-1239    3...1 49699171us : igb: val: 801
kworker/-1239    3...1 49699173us : igb: val: 802
kworker/-1239    3...1 49699173us : igb: val: 801
kworker/-1239    3...1 49699174us : igb: val: 802





The results of this for kernel 4.1:
[...]
kworker/-1333    2...1 75697302us+: igb: val: 801
kworker/-1333    2...1 75697323us+: igb: val: 802
kworker/-1333    2...1 75697373us+: igb: val: 801
kworker/-1333    2...1 75697395us : igb: val: 802
kworker/-1333    2...1 75697395us+: igb: val: 801
kworker/-1333    2...1 75697415us : igb: val: 802
kworker/-1333    2...1 75697416us+: igb: val: 801
kworker/-1333    2...1 75697436us : igb: val: 802
kworker/-1333    2...1 75697436us+: igb: val: 801
kworker/-1333    2...1 75697457us : igb: val: 802
kworker/-1333    2...1 75697457us+: igb: val: 801
kworker/-1333    2...1 75697478us : igb: val: 802
kworker/-1333    2...1 75697478us+: igb: val: 801
kworker/-1333    2...1 75697499us : igb: val: 802
kworker/-1333    2...1 75697499us+: igb: val: 801
kworker/-1333    2...1 75697520us : igb: val: 802
kworker/-1333    2...1 75697520us+: igb: val: 801
kworker/-1333    2...1 75697541us : igb: val: 802
kworker/-1333    2...1 75697541us+: igb: val: 801
kworker/-1333    2...1 75697562us : igb: val: 802
kworker/-1333    2...1 75697562us+: igb: val: 801
kworker/-1333    2...1 75697583us : igb: val: 802
kworker/-1333    2...1 75697583us+: igb: val: 801
kworker/-1333    2...1 75697604us+: igb: val: 802
kworker/-1333    2...1 75697654us+: igb: val: 801
kworker/-1333    2...1 75697675us : igb: val: 802
kworker/-1333    2...1 75697675us+: igb: val: 801
kworker/-1333    2...1 75697696us : igb: val: 802
kworker/-1333    2...1 75697696us+: igb: val: 801
kworker/-1333    2...1 75697717us : igb: val: 802
kworker/-1333    2...1 75697717us+: igb: val: 801
kworker/-1333    2...1 75697738us : igb: val: 802
kworker/-1333    2...1 75697738us+: igb: val: 801
kworker/-1333    2...1 75697759us : igb: val: 802
kworker/-1333    2...1 75697759us+: igb: val: 801
kworker/-1333    2...1 75697780us : igb: val: 802
kworker/-1333    2...1 75697781us+: igb: val: 801
kworker/-1333    2...1 75697801us : igb: val: 802
kworker/-1333    2...1 75697801us+: igb: val: 801
kworker/-1333    2...1 75697822us : igb: val: 802


It can be cleary seen that with the kernel 4.1 the time for the execution of the "readl"
function is much slower! It tasks always about 21us!
In comparison to that the kernel 4.0 is fast. Here the time for "readl" is about
1-2 us.
All measurements have been done on the very same PC.
The PC is an HP EliteDesk 800 G1, 
a dual port Intel i350-T2 server adapter has been plugged in.
CPU: Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz
x86_64 bit mode.

Please find attached the kernel configuration for the 4.1 kernel.

I had a look at the generated .o file of igb_main.o.
However, in both cases - kernel 4.0 and kernel 4.1 - the code for the 
igb_rd32 function looks the same.
Especially the call to the readl() function is the very same.

This means, that I think that some other stuff in kernel 4.1 has changed,
which has impact on the igb accesses.

Any idea what component could cause this kind of issue?

Thanks for any feedback!

Regards

Mathias

-------------- next part --------------
A non-text attachment was scrubbed...
Name: config.gz
Type: application/x-gzip
Size: 23977 bytes
Desc: config.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161014/40092abf/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-14  8:58                                         ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-14 19:55                                           ` Julia Cartwright
  -1 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-14 19:55 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5)
  Cc: Williams, Mitch A, Kirsher, Jeffrey T, linux-rt-users,
	Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg

On Fri, Oct 14, 2016 at 08:58:22AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Julia,
>
> > Have you tested on a vanilla (non-RT) kernel?  I doubt there is anything RT specific
> > about what you are seeing, but it might be nice to get confirmation.  Also, bisection
> > would probably be easier if you confirm on a vanilla kernel.
> >
> > I find it unlikely that it's a kernel config option that changed which regressed you, but
> > instead was a code change to a driver.  Which driver is now the question, and the
> > surface area is still big (processor mapping attributes for this region, PCI root
> > complex configuration, PCI brige configuration, igb driver itself, etc.).
> >
> > Big enough that I'd recommend a bisection.  It looks like a bisection between 3.18
> > and 4.8 would take you about 18 tries to narrow down, assuming all goes well.
> >
>
> I have now repeated my tests using the vanilla kernel.
> There I got the very same issue.
> Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.

Great, thanks for confirming!  That helps narrow things down quite a
bit.

> Here is my exact (reproducible) test description:
> I applied the following patch to the kernel to get the igb trace.
> This patch instruments the igb_rd32() function to measure the call
> to readl() which is used to access registers of the igb NIC.

I took your test setup and ran it between 4.0 and 4.1 on the hardware on
my desk, which is an Atom-based board with dual I210s, however I didn't
see much difference.

However, it's a fairly simple board, with a much simpler PCI topology
than your workstation.  I'll see if I can find some other hardware to
test on.

[..]
> This means, that I think that some other stuff in kernel 4.1 has changed,
> which has impact on the igb accesses.
>
> Any idea what component could cause this kind of issue?

Can you continue your bisection using 'git bisect'?  You've already
narrowed it down between 4.0 and 4.1, so you're well on your way.

Another option might be to try to eliminate igb from the picture as
well, and try reading from another device from the same (or, perhaps
nearest) bus segment, and see if you see the same results.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-14 19:55                                           ` Julia Cartwright
  0 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-14 19:55 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Oct 14, 2016 at 08:58:22AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> Hi Julia,
>
> > Have you tested on a vanilla (non-RT) kernel?  I doubt there is anything RT specific
> > about what you are seeing, but it might be nice to get confirmation.  Also, bisection
> > would probably be easier if you confirm on a vanilla kernel.
> >
> > I find it unlikely that it's a kernel config option that changed which regressed you, but
> > instead was a code change to a driver.  Which driver is now the question, and the
> > surface area is still big (processor mapping attributes for this region, PCI root
> > complex configuration, PCI brige configuration, igb driver itself, etc.).
> >
> > Big enough that I'd recommend a bisection.  It looks like a bisection between 3.18
> > and 4.8 would take you about 18 tries to narrow down, assuming all goes well.
> >
>
> I have now repeated my tests using the vanilla kernel.
> There I got the very same issue.
> Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.

Great, thanks for confirming!  That helps narrow things down quite a
bit.

> Here is my exact (reproducible) test description:
> I applied the following patch to the kernel to get the igb trace.
> This patch instruments the igb_rd32() function to measure the call
> to readl() which is used to access registers of the igb NIC.

I took your test setup and ran it between 4.0 and 4.1 on the hardware on
my desk, which is an Atom-based board with dual I210s, however I didn't
see much difference.

However, it's a fairly simple board, with a much simpler PCI topology
than your workstation.  I'll see if I can find some other hardware to
test on.

[..]
> This means, that I think that some other stuff in kernel 4.1 has changed,
> which has impact on the igb accesses.
>
> Any idea what component could cause this kind of issue?

Can you continue your bisection using 'git bisect'?  You've already
narrowed it down between 4.0 and 4.1, so you're well on your way.

Another option might be to try to eliminate igb from the picture as
well, and try reading from another device from the same (or, perhaps
nearest) bus segment, and see if you see the same results.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-14  8:58                                         ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-14 22:06                                           ` Richard Cochran
  -1 siblings, 0 replies; 53+ messages in thread
From: Richard Cochran @ 2016-10-14 22:06 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5)
  Cc: Julia Cartwright, Williams, Mitch A, Kirsher, Jeffrey T,
	linux-rt-users, Sebastian Andrzej Siewior, netdev,
	intel-wired-lan, Greg

On Fri, Oct 14, 2016 at 08:58:22AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> @@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
>  	if (E1000_REMOVED(hw_addr))
>  		return ~value;
>  
> +        trace_igb(801);
>  	value = readl(&hw_addr[reg]);
> +        trace_igb(802);

Nothing prevents this code from being preempted between the two trace
points, and so you can't be sure whether the time delta in the trace
is caused by the PCIe read stalling or not.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-14 22:06                                           ` Richard Cochran
  0 siblings, 0 replies; 53+ messages in thread
From: Richard Cochran @ 2016-10-14 22:06 UTC (permalink / raw)
  To: intel-wired-lan

On Fri, Oct 14, 2016 at 08:58:22AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> @@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
>  	if (E1000_REMOVED(hw_addr))
>  		return ~value;
>  
> +        trace_igb(801);
>  	value = readl(&hw_addr[reg]);
> +        trace_igb(802);

Nothing prevents this code from being preempted between the two trace
points, and so you can't be sure whether the time delta in the trace
is caused by the PCIe read stalling or not.

Thanks,
Richard



^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-14 19:55                                           ` [Intel-wired-lan] " Julia Cartwright
@ 2016-10-17 15:00                                             ` Koehrer Mathias
  -1 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-17 15:00 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Williams, Mitch A, Kirsher, Jeffrey T, linux-rt-users,
	Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg,
	Matthew Garrett, Bjorn Helgaas, Bjorn Helgaas

[-- Attachment #1: Type: text/plain, Size: 3567 bytes --]

Hi Julia!
> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> > > anything RT specific about what you are seeing, but it might be nice
> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
> vanilla kernel.
> > >
> > > I find it unlikely that it's a kernel config option that changed
> > > which regressed you, but instead was a code change to a driver.
> > > Which driver is now the question, and the surface area is still big
> > > (processor mapping attributes for this region, PCI root complex configuration,
> PCI brige configuration, igb driver itself, etc.).
> > >
> > > Big enough that I'd recommend a bisection.  It looks like a
> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
> assuming all goes well.
> > >
> >
> > I have now repeated my tests using the vanilla kernel.
> > There I got the very same issue.
> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
> 
> Great, thanks for confirming!  That helps narrow things down quite a bit.
> 
> > Here is my exact (reproducible) test description:
> > I applied the following patch to the kernel to get the igb trace.
> > This patch instruments the igb_rd32() function to measure the call to
> > readl() which is used to access registers of the igb NIC.
> 
> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
> which is an Atom-based board with dual I210s, however I didn't see much
> difference.
> 
> However, it's a fairly simple board, with a much simpler PCI topology than your
> workstation.  I'll see if I can find some other hardware to test on.
> 
> [..]
> > This means, that I think that some other stuff in kernel 4.1 has
> > changed, which has impact on the igb accesses.
> >
> > Any idea what component could cause this kind of issue?
> 
> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
> between 4.0 and 4.1, so you're well on your way.
> 

OK - done.
And finally I was successful!
The following git commit is the one that is causing the trouble!
(The full commit is in the attachment).
+++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
commit 387d37577fdd05e9472c20885464c2a53b3c945f
Author: Matthew Garrett <mjg59@coreos.com>
Date:   Tue Apr 7 11:07:00 2015 -0700

    PCI: Don't clear ASPM bits when the FADT declares it's unsupported

    Communications with a hardware vendor confirm that the expected behaviour
    on systems that set the FADT ASPM disable bit but which still grant full
    PCIe control is for the OS to leave any BIOS configuration intact and
    refuse to touch the ASPM bits.  This mimics the behaviour of Windows.

    Signed-off-by: Matthew Garrett <mjg59@coreos.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
+++++++++++++++++++++ HEADER +++++++++++++++++++++++++++

The only files that are modified by this commit are 
drivers/acpi/pci_root.c
drivers/pci/pcie/aspm.c
include/linux/pci-aspm.h

This is all generic PCIe stuff - however I do not really understand what
the changes of the commit are...

In my setup I am using a dual port igb Ethernet adapter.
This has an onboard PCIe switch and it might be that the configuration of this
PCIe switch on the Intel board is causing the trouble.

Please see also the output of "lspci -v" in the attachment.
The relevant PCI address of the NIC is 04:00.0 / 04:00.1

Any feedback on this is welcome!

Thanks

Mathias






[-- Attachment #2: 387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz --]
[-- Type: application/x-gzip, Size: 1694 bytes --]

[-- Attachment #3: lspci.gz --]
[-- Type: application/x-gzip, Size: 2038 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-17 15:00                                             ` Koehrer Mathias
  0 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias @ 2016-10-17 15:00 UTC (permalink / raw)
  To: intel-wired-lan

Hi Julia!
> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> > > anything RT specific about what you are seeing, but it might be nice
> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
> vanilla kernel.
> > >
> > > I find it unlikely that it's a kernel config option that changed
> > > which regressed you, but instead was a code change to a driver.
> > > Which driver is now the question, and the surface area is still big
> > > (processor mapping attributes for this region, PCI root complex configuration,
> PCI brige configuration, igb driver itself, etc.).
> > >
> > > Big enough that I'd recommend a bisection.  It looks like a
> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
> assuming all goes well.
> > >
> >
> > I have now repeated my tests using the vanilla kernel.
> > There I got the very same issue.
> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
> 
> Great, thanks for confirming!  That helps narrow things down quite a bit.
> 
> > Here is my exact (reproducible) test description:
> > I applied the following patch to the kernel to get the igb trace.
> > This patch instruments the igb_rd32() function to measure the call to
> > readl() which is used to access registers of the igb NIC.
> 
> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
> which is an Atom-based board with dual I210s, however I didn't see much
> difference.
> 
> However, it's a fairly simple board, with a much simpler PCI topology than your
> workstation.  I'll see if I can find some other hardware to test on.
> 
> [..]
> > This means, that I think that some other stuff in kernel 4.1 has
> > changed, which has impact on the igb accesses.
> >
> > Any idea what component could cause this kind of issue?
> 
> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
> between 4.0 and 4.1, so you're well on your way.
> 

OK - done.
And finally I was successful!
The following git commit is the one that is causing the trouble!
(The full commit is in the attachment).
+++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
commit 387d37577fdd05e9472c20885464c2a53b3c945f
Author: Matthew Garrett <mjg59@coreos.com>
Date:   Tue Apr 7 11:07:00 2015 -0700

    PCI: Don't clear ASPM bits when the FADT declares it's unsupported

    Communications with a hardware vendor confirm that the expected behaviour
    on systems that set the FADT ASPM disable bit but which still grant full
    PCIe control is for the OS to leave any BIOS configuration intact and
    refuse to touch the ASPM bits.  This mimics the behaviour of Windows.

    Signed-off-by: Matthew Garrett <mjg59@coreos.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
+++++++++++++++++++++ HEADER +++++++++++++++++++++++++++

The only files that are modified by this commit are 
drivers/acpi/pci_root.c
drivers/pci/pcie/aspm.c
include/linux/pci-aspm.h

This is all generic PCIe stuff - however I do not really understand what
the changes of the commit are...

In my setup I am using a dual port igb Ethernet adapter.
This has an onboard PCIe switch and it might be that the configuration of this
PCIe switch on the Intel board is causing the trouble.

Please see also the output of "lspci -v" in the attachment.
The relevant PCI address of the NIC is 04:00.0 / 04:00.1

Any feedback on this is welcome!

Thanks

Mathias





-------------- next part --------------
A non-text attachment was scrubbed...
Name: 387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz
Type: application/x-gzip
Size: 1694 bytes
Desc: 387d37577fdd05e9472c20885464c2a53b3c945f.patch.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161017/3e2b3e35/attachment.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lspci.gz
Type: application/x-gzip
Size: 2038 bytes
Desc: lspci.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161017/3e2b3e35/attachment-0001.bin>

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-17 15:00                                             ` [Intel-wired-lan] " Koehrer Mathias
@ 2016-10-17 15:39                                               ` Alexander Duyck
  -1 siblings, 0 replies; 53+ messages in thread
From: Alexander Duyck @ 2016-10-17 15:39 UTC (permalink / raw)
  To: Koehrer Mathias (ETAS/ESW5)
  Cc: Julia Cartwright, Bjorn Helgaas, linux-rt-users,
	Sebastian Andrzej Siewior, netdev, intel-wired-lan,
	Matthew Garrett, Bjorn Helgaas, Greg

On Mon, Oct 17, 2016 at 8:00 AM, Koehrer Mathias (ETAS/ESW5)
<mathias.koehrer@etas.com> wrote:
> Hi Julia!
>> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
>> > > anything RT specific about what you are seeing, but it might be nice
>> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
>> vanilla kernel.
>> > >
>> > > I find it unlikely that it's a kernel config option that changed
>> > > which regressed you, but instead was a code change to a driver.
>> > > Which driver is now the question, and the surface area is still big
>> > > (processor mapping attributes for this region, PCI root complex configuration,
>> PCI brige configuration, igb driver itself, etc.).
>> > >
>> > > Big enough that I'd recommend a bisection.  It looks like a
>> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
>> assuming all goes well.
>> > >
>> >
>> > I have now repeated my tests using the vanilla kernel.
>> > There I got the very same issue.
>> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
>>
>> Great, thanks for confirming!  That helps narrow things down quite a bit.
>>
>> > Here is my exact (reproducible) test description:
>> > I applied the following patch to the kernel to get the igb trace.
>> > This patch instruments the igb_rd32() function to measure the call to
>> > readl() which is used to access registers of the igb NIC.
>>
>> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
>> which is an Atom-based board with dual I210s, however I didn't see much
>> difference.
>>
>> However, it's a fairly simple board, with a much simpler PCI topology than your
>> workstation.  I'll see if I can find some other hardware to test on.
>>
>> [..]
>> > This means, that I think that some other stuff in kernel 4.1 has
>> > changed, which has impact on the igb accesses.
>> >
>> > Any idea what component could cause this kind of issue?
>>
>> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
>> between 4.0 and 4.1, so you're well on your way.
>>
>
> OK - done.
> And finally I was successful!
> The following git commit is the one that is causing the trouble!
> (The full commit is in the attachment).
> +++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
> commit 387d37577fdd05e9472c20885464c2a53b3c945f
> Author: Matthew Garrett <mjg59@coreos.com>
> Date:   Tue Apr 7 11:07:00 2015 -0700
>
>     PCI: Don't clear ASPM bits when the FADT declares it's unsupported
>
>     Communications with a hardware vendor confirm that the expected behaviour
>     on systems that set the FADT ASPM disable bit but which still grant full
>     PCIe control is for the OS to leave any BIOS configuration intact and
>     refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
>
>     Signed-off-by: Matthew Garrett <mjg59@coreos.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> +++++++++++++++++++++ HEADER +++++++++++++++++++++++++++
>
> The only files that are modified by this commit are
> drivers/acpi/pci_root.c
> drivers/pci/pcie/aspm.c
> include/linux/pci-aspm.h
>
> This is all generic PCIe stuff - however I do not really understand what
> the changes of the commit are...
>
> In my setup I am using a dual port igb Ethernet adapter.
> This has an onboard PCIe switch and it might be that the configuration of this
> PCIe switch on the Intel board is causing the trouble.
>
> Please see also the output of "lspci -v" in the attachment.
> The relevant PCI address of the NIC is 04:00.0 / 04:00.1
>
> Any feedback on this is welcome!
>
> Thanks
>
> Mathias

Hi Mathias,

If you could set the output of lspci -vvv it might be more useful as
most of the configuration data isn't present in the lspci dump you had
attached.  Specifically if you could do this for the working case and
the non-working case we could verify if this issue is actually due to
the ASPM configuration on the device.

Also one thing you might try is booting your kernel with the kernel
parameter "pcie_aspm=off".  It sounds like the extra latency is likely
due to your platform enabling ASPM on the device and this in turn will
add latency if the PCIe link is disabled when you attempt to perform a
read as it takes some time to bring the PCIe link up when in L1 state.

Thanks for bisecting this.

- Alex

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-17 15:39                                               ` Alexander Duyck
  0 siblings, 0 replies; 53+ messages in thread
From: Alexander Duyck @ 2016-10-17 15:39 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Oct 17, 2016 at 8:00 AM, Koehrer Mathias (ETAS/ESW5)
<mathias.koehrer@etas.com> wrote:
> Hi Julia!
>> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
>> > > anything RT specific about what you are seeing, but it might be nice
>> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
>> vanilla kernel.
>> > >
>> > > I find it unlikely that it's a kernel config option that changed
>> > > which regressed you, but instead was a code change to a driver.
>> > > Which driver is now the question, and the surface area is still big
>> > > (processor mapping attributes for this region, PCI root complex configuration,
>> PCI brige configuration, igb driver itself, etc.).
>> > >
>> > > Big enough that I'd recommend a bisection.  It looks like a
>> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
>> assuming all goes well.
>> > >
>> >
>> > I have now repeated my tests using the vanilla kernel.
>> > There I got the very same issue.
>> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
>>
>> Great, thanks for confirming!  That helps narrow things down quite a bit.
>>
>> > Here is my exact (reproducible) test description:
>> > I applied the following patch to the kernel to get the igb trace.
>> > This patch instruments the igb_rd32() function to measure the call to
>> > readl() which is used to access registers of the igb NIC.
>>
>> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
>> which is an Atom-based board with dual I210s, however I didn't see much
>> difference.
>>
>> However, it's a fairly simple board, with a much simpler PCI topology than your
>> workstation.  I'll see if I can find some other hardware to test on.
>>
>> [..]
>> > This means, that I think that some other stuff in kernel 4.1 has
>> > changed, which has impact on the igb accesses.
>> >
>> > Any idea what component could cause this kind of issue?
>>
>> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
>> between 4.0 and 4.1, so you're well on your way.
>>
>
> OK - done.
> And finally I was successful!
> The following git commit is the one that is causing the trouble!
> (The full commit is in the attachment).
> +++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
> commit 387d37577fdd05e9472c20885464c2a53b3c945f
> Author: Matthew Garrett <mjg59@coreos.com>
> Date:   Tue Apr 7 11:07:00 2015 -0700
>
>     PCI: Don't clear ASPM bits when the FADT declares it's unsupported
>
>     Communications with a hardware vendor confirm that the expected behaviour
>     on systems that set the FADT ASPM disable bit but which still grant full
>     PCIe control is for the OS to leave any BIOS configuration intact and
>     refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
>
>     Signed-off-by: Matthew Garrett <mjg59@coreos.com>
>     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> +++++++++++++++++++++ HEADER +++++++++++++++++++++++++++
>
> The only files that are modified by this commit are
> drivers/acpi/pci_root.c
> drivers/pci/pcie/aspm.c
> include/linux/pci-aspm.h
>
> This is all generic PCIe stuff - however I do not really understand what
> the changes of the commit are...
>
> In my setup I am using a dual port igb Ethernet adapter.
> This has an onboard PCIe switch and it might be that the configuration of this
> PCIe switch on the Intel board is causing the trouble.
>
> Please see also the output of "lspci -v" in the attachment.
> The relevant PCI address of the NIC is 04:00.0 / 04:00.1
>
> Any feedback on this is welcome!
>
> Thanks
>
> Mathias

Hi Mathias,

If you could set the output of lspci -vvv it might be more useful as
most of the configuration data isn't present in the lspci dump you had
attached.  Specifically if you could do this for the working case and
the non-working case we could verify if this issue is actually due to
the ASPM configuration on the device.

Also one thing you might try is booting your kernel with the kernel
parameter "pcie_aspm=off".  It sounds like the extra latency is likely
due to your platform enabling ASPM on the device and this in turn will
add latency if the PCIe link is disabled when you attempt to perform a
read as it takes some time to bring the PCIe link up when in L1 state.

Thanks for bisecting this.

- Alex

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-17 15:39                                               ` Alexander Duyck
@ 2016-10-17 18:32                                                 ` Julia Cartwright
  -1 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-17 18:32 UTC (permalink / raw)
  To: Alexander Duyck
  Cc: Koehrer Mathias (ETAS/ESW5),
	Bjorn Helgaas, linux-rt-users, Sebastian Andrzej Siewior, netdev,
	intel-wired-lan, Matthew Garrett, Bjorn Helgaas, Greg, linux-pci

+linux-pci

On Mon, Oct 17, 2016 at 08:39:40AM -0700, Alexander Duyck wrote:
> On Mon, Oct 17, 2016 at 8:00 AM, Koehrer Mathias (ETAS/ESW5)
> <mathias.koehrer@etas.com> wrote:
> > Hi Julia!
> >> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> >> > > anything RT specific about what you are seeing, but it might be nice
> >> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
> >> vanilla kernel.
> >> > >
> >> > > I find it unlikely that it's a kernel config option that changed
> >> > > which regressed you, but instead was a code change to a driver.
> >> > > Which driver is now the question, and the surface area is still big
> >> > > (processor mapping attributes for this region, PCI root complex configuration,
> >> PCI brige configuration, igb driver itself, etc.).
> >> > >
> >> > > Big enough that I'd recommend a bisection.  It looks like a
> >> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
> >> assuming all goes well.
> >> > >
> >> >
> >> > I have now repeated my tests using the vanilla kernel.
> >> > There I got the very same issue.
> >> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
> >>
> >> Great, thanks for confirming!  That helps narrow things down quite a bit.
> >>
> >> > Here is my exact (reproducible) test description:
> >> > I applied the following patch to the kernel to get the igb trace.
> >> > This patch instruments the igb_rd32() function to measure the call to
> >> > readl() which is used to access registers of the igb NIC.
> >>
> >> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
> >> which is an Atom-based board with dual I210s, however I didn't see much
> >> difference.
> >>
> >> However, it's a fairly simple board, with a much simpler PCI topology than your
> >> workstation.  I'll see if I can find some other hardware to test on.
> >>
> >> [..]
> >> > This means, that I think that some other stuff in kernel 4.1 has
> >> > changed, which has impact on the igb accesses.
> >> >
> >> > Any idea what component could cause this kind of issue?
> >>
> >> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
> >> between 4.0 and 4.1, so you're well on your way.
> >>
> >
> > OK - done.
> > And finally I was successful!
> > The following git commit is the one that is causing the trouble!
> > (The full commit is in the attachment).
> > +++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
> > commit 387d37577fdd05e9472c20885464c2a53b3c945f
> > Author: Matthew Garrett <mjg59@coreos.com>
> > Date:   Tue Apr 7 11:07:00 2015 -0700
> >
> >     PCI: Don't clear ASPM bits when the FADT declares it's unsupported
> >
> >     Communications with a hardware vendor confirm that the expected behaviour
> >     on systems that set the FADT ASPM disable bit but which still grant full
> >     PCIe control is for the OS to leave any BIOS configuration intact and
> >     refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
> >
> >     Signed-off-by: Matthew Garrett <mjg59@coreos.com>
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > +++++++++++++++++++++ HEADER +++++++++++++++++++++++++++
> >
> > The only files that are modified by this commit are
> > drivers/acpi/pci_root.c
> > drivers/pci/pcie/aspm.c
> > include/linux/pci-aspm.h
> >
> > This is all generic PCIe stuff - however I do not really understand what
> > the changes of the commit are...
> >
> > In my setup I am using a dual port igb Ethernet adapter.
> > This has an onboard PCIe switch and it might be that the configuration of this
> > PCIe switch on the Intel board is causing the trouble.
> >
> > Please see also the output of "lspci -v" in the attachment.
> > The relevant PCI address of the NIC is 04:00.0 / 04:00.1
> >
> > Any feedback on this is welcome!
> >
> > Thanks
> >
> > Mathias
>
> Hi Mathias,
>
> If you could set the output of lspci -vvv it might be more useful as
> most of the configuration data isn't present in the lspci dump you had
> attached.  Specifically if you could do this for the working case and
> the non-working case we could verify if this issue is actually due to
> the ASPM configuration on the device.
>
> Also one thing you might try is booting your kernel with the kernel
> parameter "pcie_aspm=off".  It sounds like the extra latency is likely
> due to your platform enabling ASPM on the device and this in turn will
> add latency if the PCIe link is disabled when you attempt to perform a
> read as it takes some time to bring the PCIe link up when in L1 state.

So if we assume it's this commit causing the regression, then it's safe
to assume that this system's BIOS is claiming to not support ASPM in the
FADT, but the BIOS is leaving ASPM configured in some way on the
relevant devices.

Also, unfortunately, taking a look at the code which handles
"pcie_aspm=off", it won't be sufficient to disable ASPM on these
this system, as disabling these states is skipped when the FADT doesn't
advertise ASPM support.

What would be needed is an option like "force", but which force
_disables_ ASPM.  "force-disable", maybe.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-17 18:32                                                 ` Julia Cartwright
  0 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-17 18:32 UTC (permalink / raw)
  To: intel-wired-lan

+linux-pci

On Mon, Oct 17, 2016 at 08:39:40AM -0700, Alexander Duyck wrote:
> On Mon, Oct 17, 2016 at 8:00 AM, Koehrer Mathias (ETAS/ESW5)
> <mathias.koehrer@etas.com> wrote:
> > Hi Julia!
> >> > > Have you tested on a vanilla (non-RT) kernel?  I doubt there is
> >> > > anything RT specific about what you are seeing, but it might be nice
> >> > > to get confirmation.  Also, bisection would probably be easier if you confirm on a
> >> vanilla kernel.
> >> > >
> >> > > I find it unlikely that it's a kernel config option that changed
> >> > > which regressed you, but instead was a code change to a driver.
> >> > > Which driver is now the question, and the surface area is still big
> >> > > (processor mapping attributes for this region, PCI root complex configuration,
> >> PCI brige configuration, igb driver itself, etc.).
> >> > >
> >> > > Big enough that I'd recommend a bisection.  It looks like a
> >> > > bisection between 3.18 and 4.8 would take you about 18 tries to narrow down,
> >> assuming all goes well.
> >> > >
> >> >
> >> > I have now repeated my tests using the vanilla kernel.
> >> > There I got the very same issue.
> >> > Using kernel 4.0 is fine, however starting with kernel 4.1, the issue appears.
> >>
> >> Great, thanks for confirming!  That helps narrow things down quite a bit.
> >>
> >> > Here is my exact (reproducible) test description:
> >> > I applied the following patch to the kernel to get the igb trace.
> >> > This patch instruments the igb_rd32() function to measure the call to
> >> > readl() which is used to access registers of the igb NIC.
> >>
> >> I took your test setup and ran it between 4.0 and 4.1 on the hardware on my desk,
> >> which is an Atom-based board with dual I210s, however I didn't see much
> >> difference.
> >>
> >> However, it's a fairly simple board, with a much simpler PCI topology than your
> >> workstation.  I'll see if I can find some other hardware to test on.
> >>
> >> [..]
> >> > This means, that I think that some other stuff in kernel 4.1 has
> >> > changed, which has impact on the igb accesses.
> >> >
> >> > Any idea what component could cause this kind of issue?
> >>
> >> Can you continue your bisection using 'git bisect'?  You've already narrowed it down
> >> between 4.0 and 4.1, so you're well on your way.
> >>
> >
> > OK - done.
> > And finally I was successful!
> > The following git commit is the one that is causing the trouble!
> > (The full commit is in the attachment).
> > +++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
> > commit 387d37577fdd05e9472c20885464c2a53b3c945f
> > Author: Matthew Garrett <mjg59@coreos.com>
> > Date:   Tue Apr 7 11:07:00 2015 -0700
> >
> >     PCI: Don't clear ASPM bits when the FADT declares it's unsupported
> >
> >     Communications with a hardware vendor confirm that the expected behaviour
> >     on systems that set the FADT ASPM disable bit but which still grant full
> >     PCIe control is for the OS to leave any BIOS configuration intact and
> >     refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
> >
> >     Signed-off-by: Matthew Garrett <mjg59@coreos.com>
> >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > +++++++++++++++++++++ HEADER +++++++++++++++++++++++++++
> >
> > The only files that are modified by this commit are
> > drivers/acpi/pci_root.c
> > drivers/pci/pcie/aspm.c
> > include/linux/pci-aspm.h
> >
> > This is all generic PCIe stuff - however I do not really understand what
> > the changes of the commit are...
> >
> > In my setup I am using a dual port igb Ethernet adapter.
> > This has an onboard PCIe switch and it might be that the configuration of this
> > PCIe switch on the Intel board is causing the trouble.
> >
> > Please see also the output of "lspci -v" in the attachment.
> > The relevant PCI address of the NIC is 04:00.0 / 04:00.1
> >
> > Any feedback on this is welcome!
> >
> > Thanks
> >
> > Mathias
>
> Hi Mathias,
>
> If you could set the output of lspci -vvv it might be more useful as
> most of the configuration data isn't present in the lspci dump you had
> attached.  Specifically if you could do this for the working case and
> the non-working case we could verify if this issue is actually due to
> the ASPM configuration on the device.
>
> Also one thing you might try is booting your kernel with the kernel
> parameter "pcie_aspm=off".  It sounds like the extra latency is likely
> due to your platform enabling ASPM on the device and this in turn will
> add latency if the PCIe link is disabled when you attempt to perform a
> read as it takes some time to bring the PCIe link up when in L1 state.

So if we assume it's this commit causing the regression, then it's safe
to assume that this system's BIOS is claiming to not support ASPM in the
FADT, but the BIOS is leaving ASPM configured in some way on the
relevant devices.

Also, unfortunately, taking a look at the code which handles
"pcie_aspm=off", it won't be sufficient to disable ASPM on these
this system, as disabling these states is skipped when the FADT doesn't
advertise ASPM support.

What would be needed is an option like "force", but which force
_disables_ ASPM.  "force-disable", maybe.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-14 22:06                                           ` [Intel-wired-lan] " Richard Cochran
@ 2016-10-17 18:36                                             ` Julia Cartwright
  -1 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-17 18:36 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Koehrer Mathias (ETAS/ESW5),
	Williams, Mitch A, Kirsher, Jeffrey T, linux-rt-users,
	Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg

On Sat, Oct 15, 2016 at 12:06:33AM +0200, Richard Cochran wrote:
> On Fri, Oct 14, 2016 at 08:58:22AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> > @@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
> >  	if (E1000_REMOVED(hw_addr))
> >  		return ~value;
> >  
> > +        trace_igb(801);
> >  	value = readl(&hw_addr[reg]);
> > +        trace_igb(802);
> 
> Nothing prevents this code from being preempted between the two trace
> points, and so you can't be sure whether the time delta in the trace
> is caused by the PCIe read stalling or not.

While that is certainly the case, and would explain the most egregious
of measured latency spikes, it doesn't invalidate the test if you
consider the valuable data point(s) to be the minimum and/or median
latencies.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-17 18:36                                             ` Julia Cartwright
  0 siblings, 0 replies; 53+ messages in thread
From: Julia Cartwright @ 2016-10-17 18:36 UTC (permalink / raw)
  To: intel-wired-lan

On Sat, Oct 15, 2016 at 12:06:33AM +0200, Richard Cochran wrote:
> On Fri, Oct 14, 2016 at 08:58:22AM +0000, Koehrer Mathias (ETAS/ESW5) wrote:
> > @@ -753,7 +756,9 @@ u32 igb_rd32(struct e1000_hw *hw, u32 re
> >  	if (E1000_REMOVED(hw_addr))
> >  		return ~value;
> >  
> > +        trace_igb(801);
> >  	value = readl(&hw_addr[reg]);
> > +        trace_igb(802);
> 
> Nothing prevents this code from being preempted between the two trace
> points, and so you can't be sure whether the time delta in the trace
> is caused by the PCIe read stalling or not.

While that is certainly the case, and would explain the most egregious
of measured latency spikes, it doesn't invalidate the test if you
consider the valuable data point(s) to be the minimum and/or median
latencies.

   Julia

^ permalink raw reply	[flat|nested] 53+ messages in thread

* Re: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-17 18:36                                             ` [Intel-wired-lan] " Julia Cartwright
@ 2016-10-17 19:03                                               ` Richard Cochran
  -1 siblings, 0 replies; 53+ messages in thread
From: Richard Cochran @ 2016-10-17 19:03 UTC (permalink / raw)
  To: Julia Cartwright
  Cc: Koehrer Mathias (ETAS/ESW5),
	Williams, Mitch A, Kirsher, Jeffrey T, linux-rt-users,
	Sebastian Andrzej Siewior, netdev, intel-wired-lan, Greg

On Mon, Oct 17, 2016 at 01:36:45PM -0500, Julia Cartwright wrote:
> While that is certainly the case, and would explain the most egregious
> of measured latency spikes, it doesn't invalidate the test if you
> consider the valuable data point(s) to be the minimum and/or median
> latencies.

Well, consider the case where an interrupt is stuck on.  That is a
possible cause, and it can be positively excluded by either disabling
local interrupts around the time stamps or by putting the vector
events into the trace.

(Doesn't matter now that bisection fingered the PCIe setup, just sayin.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-17 19:03                                               ` Richard Cochran
  0 siblings, 0 replies; 53+ messages in thread
From: Richard Cochran @ 2016-10-17 19:03 UTC (permalink / raw)
  To: intel-wired-lan

On Mon, Oct 17, 2016 at 01:36:45PM -0500, Julia Cartwright wrote:
> While that is certainly the case, and would explain the most egregious
> of measured latency spikes, it doesn't invalidate the test if you
> consider the valuable data point(s) to be the minimum and/or median
> latencies.

Well, consider the case where an interrupt is stuck on.  That is a
possible cause, and it can be positively excluded by either disabling
local interrupts around the time stamps or by putting the vector
events into the trace.

(Doesn't matter now that bisection fingered the PCIe setup, just sayin.)

Thanks,
Richard

^ permalink raw reply	[flat|nested] 53+ messages in thread

* RE: Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
  2016-10-17 18:32                                                 ` [Intel-wired-lan] " Julia Cartwright
@ 2016-10-18  8:43                                                   ` Koehrer Mathias
  -1 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias (ETAS/ESW5) @ 2016-10-18  8:43 UTC (permalink / raw)
  To: Julia Cartwright, Alexander Duyck
  Cc: Bjorn Helgaas, linux-rt-users, Sebastian Andrzej Siewior, netdev,
	intel-wired-lan, Matthew Garrett, Bjorn Helgaas, Greg, linux-pci

[-- Attachment #1: Type: text/plain, Size: 6018 bytes --]

Hi all,

> > >>
> > >> Can you continue your bisection using 'git bisect'?  You've already
> > >> narrowed it down between 4.0 and 4.1, so you're well on your way.
> > >>
> > >
> > > OK - done.
> > > And finally I was successful!
> > > The following git commit is the one that is causing the trouble!
> > > (The full commit is in the attachment).
> > > +++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
> > > commit 387d37577fdd05e9472c20885464c2a53b3c945f
> > > Author: Matthew Garrett <mjg59@coreos.com>
> > > Date:   Tue Apr 7 11:07:00 2015 -0700
> > >
> > >     PCI: Don't clear ASPM bits when the FADT declares it's
> > > unsupported
> > >
> > >     Communications with a hardware vendor confirm that the expected
> behaviour
> > >     on systems that set the FADT ASPM disable bit but which still grant full
> > >     PCIe control is for the OS to leave any BIOS configuration intact and
> > >     refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
> > >
> > >     Signed-off-by: Matthew Garrett <mjg59@coreos.com>
> > >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > +++++++++++++++++++++ HEADER +++++++++++++++++++++++++++
> > >
> > > The only files that are modified by this commit are
> > > drivers/acpi/pci_root.c drivers/pci/pcie/aspm.c
> > > include/linux/pci-aspm.h
> > >
> > > This is all generic PCIe stuff - however I do not really understand
> > > what the changes of the commit are...
> > >
> > > In my setup I am using a dual port igb Ethernet adapter.
> > > This has an onboard PCIe switch and it might be that the
> > > configuration of this PCIe switch on the Intel board is causing the trouble.
> > >
> > > Please see also the output of "lspci -v" in the attachment.
> > > The relevant PCI address of the NIC is 04:00.0 / 04:00.1
> > >
> > Hi Mathias,
> >
> > If you could set the output of lspci -vvv it might be more useful as
> > most of the configuration data isn't present in the lspci dump you had
> > attached.  Specifically if you could do this for the working case and
> > the non-working case we could verify if this issue is actually due to
> > the ASPM configuration on the device.
> >
> > Also one thing you might try is booting your kernel with the kernel
> > parameter "pcie_aspm=off".  It sounds like the extra latency is likely
> > due to your platform enabling ASPM on the device and this in turn will
> > add latency if the PCIe link is disabled when you attempt to perform a
> > read as it takes some time to bring the PCIe link up when in L1 state.
> 
> So if we assume it's this commit causing the regression, then it's safe to assume that
> this system's BIOS is claiming to not support ASPM in the FADT, but the BIOS is
> leaving ASPM configured in some way on the relevant devices.
> 
> Also, unfortunately, taking a look at the code which handles "pcie_aspm=off", it
> won't be sufficient to disable ASPM on these this system, as disabling these states is
> skipped when the FADT doesn't advertise ASPM support.
> 
> What would be needed is an option like "force", but which force _disables_ ASPM.
> "force-disable", maybe.
> 

OK, I have now built a "good" kernel 
(using commit 37a9c502c0af013aaae094556830100c2bb133ac) and
a "bad" kernel
(using commit 387d37577fdd05e9472c20885464c2a53b3c945f).

Please find attached the outputs of "lspci -vvv" for both cases.
As assumed, in the "bad" case, the PCIe switch on the NIC board and the
two Ethernet controllers show "ASPM L1 Enabled" in "LnkCtl".
In the "good" case this is "ASPM disabled".

I tried also the kernel option "pcie_aspm=off" in the "bad" case.
However this had no impact, the issue still occurred!


Switching to kernel 4.8 I set the configuration for 
"Default ASPM policy" to CONFIG_PCIEASPM_PERFORMANCE
however this did not show any effect. 
This in contrast to the help text provided in the kernel configuration:
"Disable PCI Express ASPM L0s and L1, even if the BIOS enabled them."

For me the first step should be to make the CONFIG_PCIEASPM_PERFORMANCE
work as expected: It this is set, the ASPM should be forced to be disabled.
This is currently not the case.

During the boot phase I see in dmesg: 
"ACPI FADT declares the system doesn't support PCIe ASPM, so disable it"
This leads to a call of pcie_no_aspm() and this sets the aspm_policy to POLICY_DEFAULT
instead to the value that has been selected in the kernel configuration.

The following patch fixes the issue for me on kernel 4.8.
The config value CONFIG_PCIEASPM_PERFORMANCE is considered correctly.

+++++++++++++++++++++++ BEGIN +++++++++++++++++++
Consider the CONFIG_PCIEASPM_* values within pcie_no_aspm().

Signed-off-by: Mathias Koehrer <mathias.koehrer@etas.com>

---
 drivers/pci/pcie/aspm.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-4.8/drivers/pci/pcie/aspm.c
===================================================================
--- linux-4.8.orig/drivers/pci/pcie/aspm.c
+++ linux-4.8/drivers/pci/pcie/aspm.c
@@ -79,10 +79,13 @@ static LIST_HEAD(link_list);
 
 #ifdef CONFIG_PCIEASPM_PERFORMANCE
 static int aspm_policy = POLICY_PERFORMANCE;
+static int aspm_default_config_policy = POLICY_PERFORMANCE;
 #elif defined CONFIG_PCIEASPM_POWERSAVE
 static int aspm_policy = POLICY_POWERSAVE;
+static int aspm_default_config_policy = POLICY_POWERSAFE;
 #else
 static int aspm_policy;
+static int aspm_default_config_policy;
 #endif
 
 static const char *policy_str[] = {
@@ -946,7 +949,7 @@ void pcie_no_aspm(void)
 	 * (b) prevent userspace from changing policy
 	 */
 	if (!aspm_force) {
-		aspm_policy = POLICY_DEFAULT;
+		aspm_policy = aspm_default_config_policy;
 		aspm_disabled = 1;
 	}
 }
+++++++++++++++++++++++ END +++++++++++++++++++

Apart from that a kernel parameter - as proposed by Julia - like
"pcie_aspm=force-disable" would be helpful as well.

Any feedback is welcome!

Regards

Mathias


[-- Attachment #2: lspci.bad.gz --]
[-- Type: application/x-gzip, Size: 5127 bytes --]

[-- Attachment #3: lspci.good.gz --]
[-- Type: application/x-gzip, Size: 5123 bytes --]

^ permalink raw reply	[flat|nested] 53+ messages in thread

* [Intel-wired-lan] Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest
@ 2016-10-18  8:43                                                   ` Koehrer Mathias
  0 siblings, 0 replies; 53+ messages in thread
From: Koehrer Mathias @ 2016-10-18  8:43 UTC (permalink / raw)
  To: intel-wired-lan

Hi all,

> > >>
> > >> Can you continue your bisection using 'git bisect'?  You've already
> > >> narrowed it down between 4.0 and 4.1, so you're well on your way.
> > >>
> > >
> > > OK - done.
> > > And finally I was successful!
> > > The following git commit is the one that is causing the trouble!
> > > (The full commit is in the attachment).
> > > +++++++++++++++++++++ BEGIN +++++++++++++++++++++++++++
> > > commit 387d37577fdd05e9472c20885464c2a53b3c945f
> > > Author: Matthew Garrett <mjg59@coreos.com>
> > > Date:   Tue Apr 7 11:07:00 2015 -0700
> > >
> > >     PCI: Don't clear ASPM bits when the FADT declares it's
> > > unsupported
> > >
> > >     Communications with a hardware vendor confirm that the expected
> behaviour
> > >     on systems that set the FADT ASPM disable bit but which still grant full
> > >     PCIe control is for the OS to leave any BIOS configuration intact and
> > >     refuse to touch the ASPM bits.  This mimics the behaviour of Windows.
> > >
> > >     Signed-off-by: Matthew Garrett <mjg59@coreos.com>
> > >     Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
> > > +++++++++++++++++++++ HEADER +++++++++++++++++++++++++++
> > >
> > > The only files that are modified by this commit are
> > > drivers/acpi/pci_root.c drivers/pci/pcie/aspm.c
> > > include/linux/pci-aspm.h
> > >
> > > This is all generic PCIe stuff - however I do not really understand
> > > what the changes of the commit are...
> > >
> > > In my setup I am using a dual port igb Ethernet adapter.
> > > This has an onboard PCIe switch and it might be that the
> > > configuration of this PCIe switch on the Intel board is causing the trouble.
> > >
> > > Please see also the output of "lspci -v" in the attachment.
> > > The relevant PCI address of the NIC is 04:00.0 / 04:00.1
> > >
> > Hi Mathias,
> >
> > If you could set the output of lspci -vvv it might be more useful as
> > most of the configuration data isn't present in the lspci dump you had
> > attached.  Specifically if you could do this for the working case and
> > the non-working case we could verify if this issue is actually due to
> > the ASPM configuration on the device.
> >
> > Also one thing you might try is booting your kernel with the kernel
> > parameter "pcie_aspm=off".  It sounds like the extra latency is likely
> > due to your platform enabling ASPM on the device and this in turn will
> > add latency if the PCIe link is disabled when you attempt to perform a
> > read as it takes some time to bring the PCIe link up when in L1 state.
> 
> So if we assume it's this commit causing the regression, then it's safe to assume that
> this system's BIOS is claiming to not support ASPM in the FADT, but the BIOS is
> leaving ASPM configured in some way on the relevant devices.
> 
> Also, unfortunately, taking a look at the code which handles "pcie_aspm=off", it
> won't be sufficient to disable ASPM on these this system, as disabling these states is
> skipped when the FADT doesn't advertise ASPM support.
> 
> What would be needed is an option like "force", but which force _disables_ ASPM.
> "force-disable", maybe.
> 

OK, I have now built a "good" kernel 
(using commit 37a9c502c0af013aaae094556830100c2bb133ac) and
a "bad" kernel
(using commit 387d37577fdd05e9472c20885464c2a53b3c945f).

Please find attached the outputs of "lspci -vvv" for both cases.
As assumed, in the "bad" case, the PCIe switch on the NIC board and the
two Ethernet controllers show "ASPM L1 Enabled" in "LnkCtl".
In the "good" case this is "ASPM disabled".

I tried also the kernel option "pcie_aspm=off" in the "bad" case.
However this had no impact, the issue still occurred!


Switching to kernel 4.8 I set the configuration for 
"Default ASPM policy" to CONFIG_PCIEASPM_PERFORMANCE
however this did not show any effect. 
This in contrast to the help text provided in the kernel configuration:
"Disable PCI Express ASPM L0s and L1, even if the BIOS enabled them."

For me the first step should be to make the CONFIG_PCIEASPM_PERFORMANCE
work as expected: It this is set, the ASPM should be forced to be disabled.
This is currently not the case.

During the boot phase I see in dmesg: 
"ACPI FADT declares the system doesn't support PCIe ASPM, so disable it"
This leads to a call of pcie_no_aspm() and this sets the aspm_policy to POLICY_DEFAULT
instead to the value that has been selected in the kernel configuration.

The following patch fixes the issue for me on kernel 4.8.
The config value CONFIG_PCIEASPM_PERFORMANCE is considered correctly.

+++++++++++++++++++++++ BEGIN +++++++++++++++++++
Consider the CONFIG_PCIEASPM_* values within pcie_no_aspm().

Signed-off-by: Mathias Koehrer <mathias.koehrer@etas.com>

---
 drivers/pci/pcie/aspm.c |    5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Index: linux-4.8/drivers/pci/pcie/aspm.c
===================================================================
--- linux-4.8.orig/drivers/pci/pcie/aspm.c
+++ linux-4.8/drivers/pci/pcie/aspm.c
@@ -79,10 +79,13 @@ static LIST_HEAD(link_list);
 
 #ifdef CONFIG_PCIEASPM_PERFORMANCE
 static int aspm_policy = POLICY_PERFORMANCE;
+static int aspm_default_config_policy = POLICY_PERFORMANCE;
 #elif defined CONFIG_PCIEASPM_POWERSAVE
 static int aspm_policy = POLICY_POWERSAVE;
+static int aspm_default_config_policy = POLICY_POWERSAFE;
 #else
 static int aspm_policy;
+static int aspm_default_config_policy;
 #endif
 
 static const char *policy_str[] = {
@@ -946,7 +949,7 @@ void pcie_no_aspm(void)
 	 * (b) prevent userspace from changing policy
 	 */
 	if (!aspm_force) {
-		aspm_policy = POLICY_DEFAULT;
+		aspm_policy = aspm_default_config_policy;
 		aspm_disabled = 1;
 	}
 }
+++++++++++++++++++++++ END +++++++++++++++++++

Apart from that a kernel parameter - as proposed by Julia - like
"pcie_aspm=force-disable" would be helpful as well.

Any feedback is welcome!

Regards

Mathias

-------------- next part --------------
A non-text attachment was scrubbed...
Name: lspci.bad.gz
Type: application/x-gzip
Size: 5127 bytes
Desc: lspci.bad.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161018/2deeb2f8/attachment-0002.bin>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: lspci.good.gz
Type: application/x-gzip
Size: 5123 bytes
Desc: lspci.good.gz
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20161018/2deeb2f8/attachment-0003.bin>

^ permalink raw reply	[flat|nested] 53+ messages in thread

end of thread, other threads:[~2016-10-18  8:43 UTC | newest]

Thread overview: 53+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-22 12:44 Kernel 4.6.7-rt13: Intel Ethernet driver igb causes huge latencies in cyclictest Koehrer Mathias (ETAS/ESW5)
2016-09-22 15:12 ` Sebastian Andrzej Siewior
2016-09-23  6:38   ` AW: " Koehrer Mathias (ETAS/ESW5)
2016-09-23 11:40     ` Koehrer Mathias (ETAS/ESW5)
2016-09-23 12:32       ` Sebastian Andrzej Siewior
2016-09-23 13:23         ` Koehrer Mathias (ETAS/ESW5)
2016-09-23 14:41           ` Sebastian Andrzej Siewior
2016-09-26 11:12             ` Koehrer Mathias (ETAS/ESW5)
2016-09-28 19:45               ` Julia Cartwright
2016-10-04 14:33                 ` Koehrer Mathias (ETAS/ESW5)
2016-10-04 19:34                   ` Julia Cartwright
2016-10-05  7:02                     ` Koehrer Mathias (ETAS/ESW5)
2016-10-05 15:59                       ` Julia Cartwright
2016-10-05 15:59                         ` [Intel-wired-lan] " Julia Cartwright
2016-10-06  7:01                         ` Koehrer Mathias (ETAS/ESW5)
2016-10-06  7:01                           ` [Intel-wired-lan] " Koehrer Mathias
2016-10-06 10:12                           ` Henri Roosen
2016-10-06 10:12                             ` [Intel-wired-lan] " Henri Roosen
2016-10-06 17:58                           ` Williams, Mitch A
2016-10-06 17:58                             ` [Intel-wired-lan] " Williams, Mitch A
2016-10-07  8:58                             ` Koehrer Mathias (ETAS/ESW5)
2016-10-07  8:58                               ` [Intel-wired-lan] " Koehrer Mathias
2016-10-10 19:39                               ` Julia Cartwright
2016-10-10 19:39                                 ` [Intel-wired-lan] " Julia Cartwright
2016-10-13  6:15                                 ` Koehrer Mathias (ETAS/ESW5)
2016-10-13  6:15                                   ` [Intel-wired-lan] " Koehrer Mathias
2016-10-13 10:57                                   ` Koehrer Mathias (ETAS/ESW5)
2016-10-13 10:57                                     ` [Intel-wired-lan] " Koehrer Mathias
2016-10-13 14:02                                     ` David Laight
2016-10-13 14:02                                       ` [Intel-wired-lan] " David Laight
2016-10-13 16:18                                     ` Julia Cartwright
2016-10-13 16:18                                       ` [Intel-wired-lan] " Julia Cartwright
2016-10-14  8:58                                       ` Koehrer Mathias (ETAS/ESW5)
2016-10-14  8:58                                         ` [Intel-wired-lan] " Koehrer Mathias
2016-10-14 19:55                                         ` Julia Cartwright
2016-10-14 19:55                                           ` [Intel-wired-lan] " Julia Cartwright
2016-10-17 15:00                                           ` Koehrer Mathias (ETAS/ESW5)
2016-10-17 15:00                                             ` [Intel-wired-lan] " Koehrer Mathias
2016-10-17 15:39                                             ` Alexander Duyck
2016-10-17 15:39                                               ` Alexander Duyck
2016-10-17 18:32                                               ` Julia Cartwright
2016-10-17 18:32                                                 ` [Intel-wired-lan] " Julia Cartwright
2016-10-18  8:43                                                 ` Koehrer Mathias (ETAS/ESW5)
2016-10-18  8:43                                                   ` [Intel-wired-lan] " Koehrer Mathias
2016-10-14 22:06                                         ` Richard Cochran
2016-10-14 22:06                                           ` [Intel-wired-lan] " Richard Cochran
2016-10-17 18:36                                           ` Julia Cartwright
2016-10-17 18:36                                             ` [Intel-wired-lan] " Julia Cartwright
2016-10-17 19:03                                             ` Richard Cochran
2016-10-17 19:03                                               ` [Intel-wired-lan] " Richard Cochran
2016-09-26 11:48       ` Alexander Stein
2016-09-27  6:29         ` Koehrer Mathias (ETAS/ESW5)
2016-09-27  7:56           ` Mathias Koehrer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.