xen host going down: transmit queue 0 timed out

* xen host going down: transmit queue 0 timed out
@ 2016-02-04  9:24 hw
  2016-02-04 17:48 ` Konrad Rzeszutek Wilk
  0 siblings, 1 reply; 2+ messages in thread
From: hw @ 2016-02-04  9:24 UTC (permalink / raw)
  To: xen-devel

Hi,

the following message shows up in the log file when the xen host goes down:

kernel: ------------[ cut here ]------------
kernel: WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:303 
dev_watchdog+0x23e/0x250()
kernel: NETDEV WATCHDOG: enp3s0f0 (e1000e): transmit queue 0 timed out
kernel: Modules linked in: xt_physdev br_netfilter iptable_filter 
ip_tables xen_pciback xen_gntalloc xen_gntdev bridge stp llc 
snd_hda_codec_r$
kernel: CPU: 0 PID: 0 Comm: swapper/0 Tainted: P           O 
4.0.9-gentoo #2
kernel: Hardware name: Hewlett-Packard HP Z800 Workstation/0AECh, BIOS 
786G5 v03.57 07/15/2013
kernel:  0000000000000000 ffffffff81736392 ffffffff8147294d ffff88003e003e08
kernel:  ffffffff8104c017 0000000000000000 ffff880038b743a0 ffff880038b74000
kernel:  0000000000000000 0000000000000001 ffffffff8104c095 ffffffff81739360
kernel: Call Trace:
kernel:  <IRQ>  [<ffffffff8147294d>] ? dump_stack+0x40/0x50
kernel:  [<ffffffff8104c017>] ? warn_slowpath_common+0x77/0xb0
kernel:  [<ffffffff8104c095>] ? warn_slowpath_fmt+0x45/0x50
kernel:  [<ffffffff81008156>] ? xen_timer_interrupt+0x116/0x150
kernel:  [<ffffffff813f6d3e>] ? dev_watchdog+0x23e/0x250
kernel:  [<ffffffff813f6b00>] ? dev_graft_qdisc+0x80/0x80
kernel:  [<ffffffff813f6b00>] ? dev_graft_qdisc+0x80/0x80
kernel:  [<ffffffff8108f935>] ? call_timer_fn.isra.33+0x15/0x80
kernel:  [<ffffffff810901ea>] ? run_timer_softirq+0x1da/0x290
kernel:  [<ffffffff8104f460>] ? __do_softirq+0xf0/0x1f0
kernel:  [<ffffffff8104f6c5>] ? irq_exit+0x95/0xa0
kernel:  [<ffffffff812b556c>] ? xen_evtchn_do_upcall+0x2c/0x40
kernel:  [<ffffffff81479d8e>] ? xen_do_hypervisor_callback+0x1e/0x40
kernel:  <EOI>  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
kernel:  [<ffffffff810013aa>] ? xen_hypercall_sched_op+0xa/0x20
kernel:  [<ffffffff81007e7c>] ? xen_safe_halt+0xc/0x20
kernel:  [<ffffffff81015d15>] ? default_idle+0x5/0x10
kernel:  [<ffffffff8107d29a>] ? cpu_startup_entry+0x1fa/0x2e0
kernel:  [<ffffffff8188ae98>] ? start_kernel+0x40c/0x417
kernel:  [<ffffffff8188d2ff>] ? xen_start_kernel+0x581/0x58b
kernel: ---[ end trace 9372973812c56dbc ]---
kernel: e1000e 0000:03:00.0 enp3s0f0: Reset adapter unexpectedly
kernel: brloc: port 1(enp3s0f0) entered disabled state

'Goes down' means that the host and all the VMs it's running become
unreachable over the network until a reboot.  This happens at random
time intervals ranging from a single day up to 24 days.  The problem
persists with three different network cards all from different
manufacturers.  The hardware has been completely replaced in that the
disks were moved to an identical HP Z800 which shows the same problem
as the previous machine.  No network issues have been found.

The problem usually occurs over night while the machine is idle.

The problem exclusively occurs on the machine running xen.  None of
the others are showing any issues whatsoever.

Is xen incompatible with HP Z800s?
What can we do to fix or to avoid this problem?

moonflo ~ # brctl show
bridge name     bridge id               STP enabled     interfaces
brloc           8000.001517ebbdb4       no              enp3s0f0
                                                         vif2.0
                                                         vif2.0-emu
                                                         vif3.0
                                                         vif3.0-emu
                                                         vif4.0
                                                         vif4.0-emu
                                                         vif5.0
                                                         vif5.0-emu
                                                         vif6.0
                                                         vif6.0-emu
                                                         vif8.0
                                                         vif8.0-emu
moonflo ~ # uname -a
Linux moonflo 4.0.9-gentoo #2 SMP Tue Nov 3 19:49:23 CET 2015 x86_64 
Intel(R) Xeon(R) CPU X5650 @ 2.67GHz GenuineIntel GNU/Linux
moonflo ~ # ifconfig enp3s0f0
enp3s0f0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
         ether 00:15:17:eb:bd:b4  txqueuelen 1000  (Ethernet)
         RX packets 968091  bytes 196502096 (187.3 MiB)
         RX errors 0  dropped 61  overruns 0  frame 0
         TX packets 925363  bytes 347737586 (331.6 MiB)
         TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
         device interrupt 28  memory 0xfb000000-fb020000

^ permalink raw reply	[flat|nested] 2+ messages in thread