All of lore.kernel.org
 help / color / mirror / Atom feed
* xen + i40e: transmit queue timeout
@ 2018-07-06 10:39 Andreas Kinzler
  2018-07-06 12:03 ` Jan Beulich
  0 siblings, 1 reply; 3+ messages in thread
From: Andreas Kinzler @ 2018-07-06 10:39 UTC (permalink / raw)
  To: xen-devel

[-- Attachment #1: Type: text/plain, Size: 3485 bytes --]

I am currently researching a transmit queue timeout with Xen 4.8.2 and  
Intel X722 (i40e driver). The problem occurs with various linux versions  
(4.8.17, 4.13.16, SLES 15 port of i40e). The problem seems to be related  
to heavy forwarding/bridging as I am running a heavy network stress test  
in a domU (linux/pvm 4.13.16). It seems that if I run the same test  
without Xen, it works (not sure).

Any ideas?

Regards Andreas

[  441.823998] NETDEV WATCHDOG: eth0 (i40e): transmit queue 0 timed out
[  441.824033] ------------[ cut here ]------------
[  441.824046] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316  
dev_watchdog+0x218/0x220
[  441.824048] Modules linked in: i40e nf_conntrack_ipv4 nf_defrag_ipv4  
xt_conntrack nf_conntrack xt_physdev br_netfilter bridge stp llc xt_tcpudp  
iptable_filter ip_tables x_tables binfmt_misc tun mlx5_core
[  441.824074] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.16-ak2 #1
[  441.824077] Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 2.0  
11/29/2017
[  441.824079] task: ffffffff81810480 task.stack: ffffffff81800000
[  441.824084] RIP: e030:dev_watchdog+0x218/0x220
[  441.824087] RSP: e02b:ffff88005d203e68 EFLAGS: 00010296
[  441.824091] RAX: 0000000000000038 RBX: 0000000000000000 RCX:  
000000000000003e
[  441.824093] RDX: 0000000000000000 RSI: ffff88005d203ce4 RDI:  
0000000000000004
[  441.824095] RBP: ffff88005d203e98 R08: ffff8800571ae200 R09:  
ffff880056c00248
[  441.824097] R10: 0000000000000082 R11: 0000000000000040 R12:  
ffff880052da9f40
[  441.824099] R13: 0000000000000000 R14: ffff880055d04800 R15:  
0000000000000080
[  441.824112] FS:  0000000000000000(0000) GS:ffff88005d200000(0000)  
knlGS:ffff88005d200000
[  441.824115] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
[  441.824117] CR2: 00007f22b20ed3a0 CR3: 00000000518d1000 CR4:  
0000000000042660
[  441.824121] Call Trace:
[  441.824123]  <IRQ>
[  441.824130]  ? qdisc_rcu_free+0x40/0x40
[  441.824133]  ? qdisc_rcu_free+0x40/0x40
[  441.824140]  call_timer_fn.isra.5+0x1f/0x90
[  441.824144]  expire_timers+0x99/0xb0
[  441.824148]  run_timer_softirq+0x7b/0xc0
[  441.824155]  ? handle_percpu_irq+0x35/0x50
[  441.824159]  ? generic_handle_irq+0x1d/0x30
[  441.824166]  ? __evtchn_fifo_handle_events+0x142/0x160
[  441.824170]  __do_softirq+0xe5/0x200
[  441.824174]  irq_exit+0xb1/0xc0
[  441.824179]  xen_evtchn_do_upcall+0x2b/0x40
[  441.824186]  xen_do_hypervisor_callback+0x1e/0x30
[  441.824188]  </IRQ>
[  441.824193]  ? xen_hypercall_sched_op+0xa/0x20
[  441.824197]  ? xen_hypercall_sched_op+0xa/0x20
[  441.824204]  ? xen_safe_halt+0x10/0x20
[  441.824207]  ? default_idle+0x9/0x10
[  441.824210]  ? arch_cpu_idle+0xa/0x10
[  441.824213]  ? default_idle_call+0x1e/0x30
[  441.824218]  ? do_idle+0x183/0x1b0
[  441.824222]  ? cpu_startup_entry+0x18/0x20
[  441.824226]  ? rest_init+0xcb/0xd0
[  441.824232]  ? start_kernel+0x399/0x3a6
[  441.824238]  ? x86_64_start_reservations+0x2a/0x2c
[  441.824241]  ? xen_start_kernel+0x54f/0x55b
[  441.824244] Code: 63 8e a0 03 00 00 eb 93 4c 89 f7 c6 05 89 22 3a 00 01  
e8 7c fc fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 58 3e 76 81 e8 84 3a bf ff  
<0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00
[  441.824306] ---[ end trace ae5cd79c539b9f32 ]---
[  441.824323] i40e 0000:19:00.0 eth0: tx_timeout: VSI_seid: 390, Q 0,  
NTC: 0x73, HWB: 0x73, NTU: 0x5d, TAIL: 0x73, INT: 0x1
[  441.824330] i40e 0000:19:00.0 eth0: tx_timeout recovery level 1,  
hung_queue 0

[-- Attachment #2: kconfig-4.13.zip --]
[-- Type: application/zip, Size: 21339 bytes --]

[-- Attachment #3: Type: text/plain, Size: 157 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xen + i40e: transmit queue timeout
  2018-07-06 10:39 xen + i40e: transmit queue timeout Andreas Kinzler
@ 2018-07-06 12:03 ` Jan Beulich
  2018-07-09 18:22   ` Andreas Kinzler
  0 siblings, 1 reply; 3+ messages in thread
From: Jan Beulich @ 2018-07-06 12:03 UTC (permalink / raw)
  To: Andreas Kinzler; +Cc: xen-devel

>>> On 06.07.18 at 12:39, <hfp@posteo.de> wrote:
> I am currently researching a transmit queue timeout with Xen 4.8.2 and  
> Intel X722 (i40e driver). The problem occurs with various linux versions  
> (4.8.17, 4.13.16, SLES 15 port of i40e). The problem seems to be related  
> to heavy forwarding/bridging as I am running a heavy network stress test  
> in a domU (linux/pvm 4.13.16). It seems that if I run the same test  
> without Xen, it works (not sure).

The log fragment below of course tells about nothing on why this
is happening. Couple of questions therefore:
- Are interrupts still arriving for this device at the point of the
  reported timeout?
- Are interrupts distributed reasonably evenly between (v)CPUs?
- Is the overall interrupt rate not higher than what the system
  can reasonably handle (the lower handling overhead means
  without Xen a higher rate would still be acceptable)?
- Is the same heavy forwarding/bridging in effect when trying this
  without Xen?
- Does running the same stress test in Dom0 work?

I take it that there are no other relevant messages in any of the
logs, or else you would have provided them right away.

Jan

> [  441.823998] NETDEV WATCHDOG: eth0 (i40e): transmit queue 0 timed out
> [  441.824033] ------------[ cut here ]------------
> [  441.824046] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:316  
> dev_watchdog+0x218/0x220
> [  441.824048] Modules linked in: i40e nf_conntrack_ipv4 nf_defrag_ipv4  
> xt_conntrack nf_conntrack xt_physdev br_netfilter bridge stp llc xt_tcpudp  
> iptable_filter ip_tables x_tables binfmt_misc tun mlx5_core
> [  441.824074] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.13.16-ak2 #1
> [  441.824077] Hardware name: Supermicro Super Server/X11SPi-TF, BIOS 2.0  
> 11/29/2017
> [  441.824079] task: ffffffff81810480 task.stack: ffffffff81800000
> [  441.824084] RIP: e030:dev_watchdog+0x218/0x220
> [  441.824087] RSP: e02b:ffff88005d203e68 EFLAGS: 00010296
> [  441.824091] RAX: 0000000000000038 RBX: 0000000000000000 RCX:  
> 000000000000003e
> [  441.824093] RDX: 0000000000000000 RSI: ffff88005d203ce4 RDI:  
> 0000000000000004
> [  441.824095] RBP: ffff88005d203e98 R08: ffff8800571ae200 R09:  
> ffff880056c00248
> [  441.824097] R10: 0000000000000082 R11: 0000000000000040 R12:  
> ffff880052da9f40
> [  441.824099] R13: 0000000000000000 R14: ffff880055d04800 R15:  
> 0000000000000080
> [  441.824112] FS:  0000000000000000(0000) GS:ffff88005d200000(0000)  
> knlGS:ffff88005d200000
> [  441.824115] CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  441.824117] CR2: 00007f22b20ed3a0 CR3: 00000000518d1000 CR4:  
> 0000000000042660
> [  441.824121] Call Trace:
> [  441.824123]  <IRQ>
> [  441.824130]  ? qdisc_rcu_free+0x40/0x40
> [  441.824133]  ? qdisc_rcu_free+0x40/0x40
> [  441.824140]  call_timer_fn.isra.5+0x1f/0x90
> [  441.824144]  expire_timers+0x99/0xb0
> [  441.824148]  run_timer_softirq+0x7b/0xc0
> [  441.824155]  ? handle_percpu_irq+0x35/0x50
> [  441.824159]  ? generic_handle_irq+0x1d/0x30
> [  441.824166]  ? __evtchn_fifo_handle_events+0x142/0x160
> [  441.824170]  __do_softirq+0xe5/0x200
> [  441.824174]  irq_exit+0xb1/0xc0
> [  441.824179]  xen_evtchn_do_upcall+0x2b/0x40
> [  441.824186]  xen_do_hypervisor_callback+0x1e/0x30
> [  441.824188]  </IRQ>
> [  441.824193]  ? xen_hypercall_sched_op+0xa/0x20
> [  441.824197]  ? xen_hypercall_sched_op+0xa/0x20
> [  441.824204]  ? xen_safe_halt+0x10/0x20
> [  441.824207]  ? default_idle+0x9/0x10
> [  441.824210]  ? arch_cpu_idle+0xa/0x10
> [  441.824213]  ? default_idle_call+0x1e/0x30
> [  441.824218]  ? do_idle+0x183/0x1b0
> [  441.824222]  ? cpu_startup_entry+0x18/0x20
> [  441.824226]  ? rest_init+0xcb/0xd0
> [  441.824232]  ? start_kernel+0x399/0x3a6
> [  441.824238]  ? x86_64_start_reservations+0x2a/0x2c
> [  441.824241]  ? xen_start_kernel+0x54f/0x55b
> [  441.824244] Code: 63 8e a0 03 00 00 eb 93 4c 89 f7 c6 05 89 22 3a 00 01  
> e8 7c fc fd ff 89 d9 48 89 c2 4c 89 f6 48 c7 c7 58 3e 76 81 e8 84 3a bf ff  
> <0f> ff eb c3 0f 1f 40 00 48 c7 47 08 00 00 00 00 55 48 c7 07 00
> [  441.824306] ---[ end trace ae5cd79c539b9f32 ]---
> [  441.824323] i40e 0000:19:00.0 eth0: tx_timeout: VSI_seid: 390, Q 0,  
> NTC: 0x73, HWB: 0x73, NTU: 0x5d, TAIL: 0x73, INT: 0x1
> [  441.824330] i40e 0000:19:00.0 eth0: tx_timeout recovery level 1,  
> hung_queue 0




_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: xen + i40e: transmit queue timeout
  2018-07-06 12:03 ` Jan Beulich
@ 2018-07-09 18:22   ` Andreas Kinzler
  0 siblings, 0 replies; 3+ messages in thread
From: Andreas Kinzler @ 2018-07-09 18:22 UTC (permalink / raw)
  To: Jan Beulich; +Cc: xen-devel

On Fri, 06 Jul 2018 14:03:00 +0200, Jan Beulich <JBeulich@suse.com> wrote:
>> I am currently researching a transmit queue timeout with Xen 4.8.2 and
>> Intel X722 (i40e driver). The problem occurs with various linux versions
>> (4.8.17, 4.13.16, SLES 15 port of i40e). The problem seems to be related
>> to heavy forwarding/bridging as I am running a heavy network stress test
>> in a domU (linux/pvm 4.13.16). It seems that if I run the same test
>> without Xen, it works (not sure).
> The log fragment below of course tells about nothing on why this
> is happening. Couple of questions therefore:

Thanks for suggesting helpful further steps.

> - Are interrupts still arriving for this device at the point of the
>   reported timeout?
> - Are interrupts distributed reasonably evenly between (v)CPUs?
> - Is the overall interrupt rate not higher than what the system
>   can reasonably handle (the lower handling overhead means
>   without Xen a higher rate would still be acceptable)?
> - Is the same heavy forwarding/bridging in effect when trying this
>   without Xen?
> - Does running the same stress test in Dom0 work?
> I take it that there are no other relevant messages in any of the
> logs, or else you would have provided them right away.

Actually, it seems that the driver is the problem which it quite  
counterintuitive because the driver is quite old (started in 2013) and you  
would expect it to be very mature.

For a test I used the 2.4.10 version from  
https://sourceforge.net/projects/e1000/files/i40e%20stable/ and all the  
problems went away. I am writing this here so that others with the same  
problem have a possible solution to try.

Regards Andreas

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-07-09 18:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-06 10:39 xen + i40e: transmit queue timeout Andreas Kinzler
2018-07-06 12:03 ` Jan Beulich
2018-07-09 18:22   ` Andreas Kinzler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.