All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions
@ 2015-07-02  5:48 gustavo panizzo
  2015-07-02  5:51 ` [Qemu-devel] [Bug 1470720] " gustavo panizzo
                   ` (5 more replies)
  0 siblings, 6 replies; 7+ messages in thread
From: gustavo panizzo @ 2015-07-02  5:48 UTC (permalink / raw)
  To: qemu-devel

Public bug reported:

 we are having a problem in our hosts, all the vm running on them
suddenly, and for some seconds, lost network connectivity.

the root cause appears to be the increase of irb-tlb from low values
(less than 20) to more than >100k, that spike only last for some seconds
then everything goes back to normal

i've upload an screenshot of collectd for one hypervisor here
http://zumbi.com.ar/tmp/irq-tlb.png


we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
most of our guests run centos 6.5 (kernel 2.6.32)

vm are bridged to a linuxbridge then veth wired to an ovs switch
(neutron openvswitch agent setup)

** Affects: qemu
     Importance: Undecided
         Status: New


** Tags: icehouse kvm precise qemu trusty

** Attachment added: "irq-tlb.png"
   https://bugs.launchpad.net/bugs/1470720/+attachment/4423180/+files/irq-tlb.png

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1470720

Title:
  high IRQ-TLB generates network interruptions

Status in QEMU:
  New

Bug description:
   we are having a problem in our hosts, all the vm running on them
  suddenly, and for some seconds, lost network connectivity.

  the root cause appears to be the increase of irb-tlb from low values
  (less than 20) to more than >100k, that spike only last for some
  seconds then everything goes back to normal

  i've upload an screenshot of collectd for one hypervisor here
  http://zumbi.com.ar/tmp/irq-tlb.png

  
  we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

  issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
  most of our guests run centos 6.5 (kernel 2.6.32)

  vm are bridged to a linuxbridge then veth wired to an ovs switch
  (neutron openvswitch agent setup)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1470720/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1470720] Re: high IRQ-TLB generates network interruptions
  2015-07-02  5:48 [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions gustavo panizzo
@ 2015-07-02  5:51 ` gustavo panizzo
  2015-07-03  6:52 ` gustavo panizzo
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: gustavo panizzo @ 2015-07-02  5:51 UTC (permalink / raw)
  To: qemu-devel

maybe first part is not clear, here it goes again

 this happens on some hypervisors at random times, not all hypervisors
at the same time, and affects all vm on the hypervisor

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1470720

Title:
  high IRQ-TLB generates network interruptions

Status in QEMU:
  New

Bug description:
   we are having a problem in our hosts, all the vm running on them
  suddenly, and for some seconds, lost network connectivity.

  the root cause appears to be the increase of irb-tlb from low values
  (less than 20) to more than >100k, that spike only last for some
  seconds then everything goes back to normal

  i've upload an screenshot of collectd for one hypervisor here
  http://zumbi.com.ar/tmp/irq-tlb.png

  
  we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

  issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
  most of our guests run centos 6.5 (kernel 2.6.32)

  vm are bridged to a linuxbridge then veth wired to an ovs switch
  (neutron openvswitch agent setup)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1470720/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1470720] Re: high IRQ-TLB generates network interruptions
  2015-07-02  5:48 [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions gustavo panizzo
  2015-07-02  5:51 ` [Qemu-devel] [Bug 1470720] " gustavo panizzo
@ 2015-07-03  6:52 ` gustavo panizzo
  2016-07-28  9:47 ` Robert van Leeuwen
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: gustavo panizzo @ 2015-07-03  6:52 UTC (permalink / raw)
  To: qemu-devel

overcommit ratio on latest server i had the problem is 3.6 (3.6 vcpu for
each cpu), would that be part of the problem?  i see other servers that
never had the problem with over commit ratios as high as 4.1

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1470720

Title:
  high IRQ-TLB generates network interruptions

Status in QEMU:
  New

Bug description:
   we are having a problem in our hosts, all the vm running on them
  suddenly, and for some seconds, lost network connectivity.

  the root cause appears to be the increase of irb-tlb from low values
  (less than 20) to more than >100k, that spike only last for some
  seconds then everything goes back to normal

  i've upload an screenshot of collectd for one hypervisor here
  http://zumbi.com.ar/tmp/irq-tlb.png

  
  we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

  issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
  most of our guests run centos 6.5 (kernel 2.6.32)

  vm are bridged to a linuxbridge then veth wired to an ovs switch
  (neutron openvswitch agent setup)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1470720/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1470720] Re: high IRQ-TLB generates network interruptions
  2015-07-02  5:48 [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions gustavo panizzo
  2015-07-02  5:51 ` [Qemu-devel] [Bug 1470720] " gustavo panizzo
  2015-07-03  6:52 ` gustavo panizzo
@ 2016-07-28  9:47 ` Robert van Leeuwen
  2016-08-08  7:01 ` Robert van Leeuwen
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 7+ messages in thread
From: Robert van Leeuwen @ 2016-07-28  9:47 UTC (permalink / raw)
  To: qemu-devel

Seeing the same here, also happens on overbooked hypervisors.

Just one or two hosts have this behaviour.

We are using:
qemu-kvm                             2.0.0+dfsg-2ubuntu1.25
libvirt-bin                          1.2.9
kernel  3.13.0-92-generic

We are using contrail as a SDN.

It looks like it started after upgrading a bunch of packages including
kernel (we came from 3.13.0-83-generic)

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1470720

Title:
  high IRQ-TLB generates network interruptions

Status in QEMU:
  New

Bug description:
   we are having a problem in our hosts, all the vm running on them
  suddenly, and for some seconds, lost network connectivity.

  the root cause appears to be the increase of irb-tlb from low values
  (less than 20) to more than >100k, that spike only last for some
  seconds then everything goes back to normal

  i've upload an screenshot of collectd for one hypervisor here
  http://zumbi.com.ar/tmp/irq-tlb.png

  
  we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

  issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
  most of our guests run centos 6.5 (kernel 2.6.32)

  vm are bridged to a linuxbridge then veth wired to an ovs switch
  (neutron openvswitch agent setup)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1470720/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1470720] Re: high IRQ-TLB generates network interruptions
  2015-07-02  5:48 [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions gustavo panizzo
                   ` (2 preceding siblings ...)
  2016-07-28  9:47 ` Robert van Leeuwen
@ 2016-08-08  7:01 ` Robert van Leeuwen
  2018-06-21 15:04 ` Thomas Huth
  2018-08-21  4:17 ` Launchpad Bug Tracker
  5 siblings, 0 replies; 7+ messages in thread
From: Robert van Leeuwen @ 2016-08-08  7:01 UTC (permalink / raw)
  To: qemu-devel

Disabling huge pages seem to help.
Strangely this should theoretically increase the issue but it so far we have not seen issues after disabling THP.
(have not seen high load spikes in a week but this might also be holiday related)

So other people can try it out:
echo never >/sys/kernel/mm/transparent_hugepage/defrag
echo never > /sys/kernel/mm/transparent_hugepage/enabled

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1470720

Title:
  high IRQ-TLB generates network interruptions

Status in QEMU:
  New

Bug description:
   we are having a problem in our hosts, all the vm running on them
  suddenly, and for some seconds, lost network connectivity.

  the root cause appears to be the increase of irb-tlb from low values
  (less than 20) to more than >100k, that spike only last for some
  seconds then everything goes back to normal

  i've upload an screenshot of collectd for one hypervisor here
  http://zumbi.com.ar/tmp/irq-tlb.png

  
  we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

  issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
  most of our guests run centos 6.5 (kernel 2.6.32)

  vm are bridged to a linuxbridge then veth wired to an ovs switch
  (neutron openvswitch agent setup)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1470720/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1470720] Re: high IRQ-TLB generates network interruptions
  2015-07-02  5:48 [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions gustavo panizzo
                   ` (3 preceding siblings ...)
  2016-08-08  7:01 ` Robert van Leeuwen
@ 2018-06-21 15:04 ` Thomas Huth
  2018-08-21  4:17 ` Launchpad Bug Tracker
  5 siblings, 0 replies; 7+ messages in thread
From: Thomas Huth @ 2018-06-21 15:04 UTC (permalink / raw)
  To: qemu-devel

Looking through old bug tickets... can you still reproduce this issue
with the latest version of QEMU? Or could we close this ticket nowadays?

** Changed in: qemu
       Status: New => Incomplete

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1470720

Title:
  high IRQ-TLB generates network interruptions

Status in QEMU:
  Incomplete

Bug description:
   we are having a problem in our hosts, all the vm running on them
  suddenly, and for some seconds, lost network connectivity.

  the root cause appears to be the increase of irb-tlb from low values
  (less than 20) to more than >100k, that spike only last for some
  seconds then everything goes back to normal

  i've upload an screenshot of collectd for one hypervisor here
  http://zumbi.com.ar/tmp/irq-tlb.png

  
  we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

  issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
  most of our guests run centos 6.5 (kernel 2.6.32)

  vm are bridged to a linuxbridge then veth wired to an ovs switch
  (neutron openvswitch agent setup)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1470720/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] [Bug 1470720] Re: high IRQ-TLB generates network interruptions
  2015-07-02  5:48 [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions gustavo panizzo
                   ` (4 preceding siblings ...)
  2018-06-21 15:04 ` Thomas Huth
@ 2018-08-21  4:17 ` Launchpad Bug Tracker
  5 siblings, 0 replies; 7+ messages in thread
From: Launchpad Bug Tracker @ 2018-08-21  4:17 UTC (permalink / raw)
  To: qemu-devel

[Expired for QEMU because there has been no activity for 60 days.]

** Changed in: qemu
       Status: Incomplete => Expired

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1470720

Title:
  high IRQ-TLB generates network interruptions

Status in QEMU:
  Expired

Bug description:
   we are having a problem in our hosts, all the vm running on them
  suddenly, and for some seconds, lost network connectivity.

  the root cause appears to be the increase of irb-tlb from low values
  (less than 20) to more than >100k, that spike only last for some
  seconds then everything goes back to normal

  i've upload an screenshot of collectd for one hypervisor here
  http://zumbi.com.ar/tmp/irq-tlb.png

  
  we have hosts running precise (qemu 1.5, ovs 2.0.2, libvirt 1.2.2 and kernel 3.13) where the issue is frequent. also we have an small % of our fleet running trusty (qemu 2.0.0 ovs 2.0.2 libvirt 1.2.2 and kernel 3.16) where the problem seemed to be nonexistent until today

  issue seems to be isolated to < 10% of our hypervisors, some hypervisors had this problem every few days, others only once or twice. our vm are a black box to us we don't know what users run on them, but mostly cpu and network bound workload.
  most of our guests run centos 6.5 (kernel 2.6.32)

  vm are bridged to a linuxbridge then veth wired to an ovs switch
  (neutron openvswitch agent setup)

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1470720/+subscriptions

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-08-21  4:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-02  5:48 [Qemu-devel] [Bug 1470720] [NEW] high IRQ-TLB generates network interruptions gustavo panizzo
2015-07-02  5:51 ` [Qemu-devel] [Bug 1470720] " gustavo panizzo
2015-07-03  6:52 ` gustavo panizzo
2016-07-28  9:47 ` Robert van Leeuwen
2016-08-08  7:01 ` Robert van Leeuwen
2018-06-21 15:04 ` Thomas Huth
2018-08-21  4:17 ` Launchpad Bug Tracker

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.