From mboxrd@z Thu Jan 1 00:00:00 1970 From: "SourceForge.net" Subject: [ kvm-Bugs-2506814 ] TAP network lockup after some traffic Date: Mon, 13 Jul 2009 09:09:54 +0000 Message-ID: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" To: noreply@sourceforge.net Return-path: Received: from ch3.sourceforge.net ([216.34.181.60]:32881 "EHLO ch3.sourceforge.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750830AbZGMJKB (ORCPT ); Mon, 13 Jul 2009 05:10:01 -0400 Sender: kvm-owner@vger.kernel.org List-ID: Bugs item #2506814, was opened at 2009-01-14 11:38 Message generated for change (Comment added) made by danny1 You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2506814&group_id=180599 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: Fabio Coatti (cova) Assigned to: Nobody/Anonymous (nobody) Summary: TAP network lockup after some traffic Initial Comment: Hi all, we are experiencing severe network troubles using kvm+tap networking. basically, after some network load (we have yet to identify the exact amount of traffic if one exist) network stops working. During lockups, With tcpdump we can see arp requests on guest interface, then on tap, brX and physical interfaces on host system. the arp answers can be seen, with tcpdump, only on physical host interface and bridge (brX), but not on tap device. Basically it seems that packets coming from external network get lost in tap device on the way to guest (kvm). Looking at tap data with ifconfig, the only weird thing is the TX packets overrun count that is > 0. every time the network stops working, overrun count increases. This has been observed with several kvm releases (for sure, 76/77/78/79/80/81/82) and with different kernels (tried with some versions among 2.6.25.X, 26.X, 27.X, 28) both on guest and host side. we tried several network drivers (virtio, e1000, rtl) and all shows the same problem. Only 100Mbit drivers seems to be unaffected so far. (only virtio has acceptable performance) (btw: on host machine we have vlan on top of ethX devices) cpu number on guest makes no difference. we tried with vanilla kernel provided kvm modules and with kvm package provided modules. guest: 32 bit host: 64 bit host machine: 2 x Quad-Core AMD Opteron(tm) Processor 2352 16GB, gentoo. Of course I can provide more details or perform other tests and try patches, if someone can give me some hints and advices they will be most welcome. Thanks. ---------------------------------------------------------------------- Comment By: Daniel Schwager (danny1) Date: 2009-07-13 11:09 Message: > I installed KVM-86 at the moment and the first test works fine - no disconnect > at the moment (..) i got the disconnect with KVM-86 also.. c_jones, did you check it with the current KVM-88 release ? ---------------------------------------------------------------------- Comment By: Daniel Schwager (danny1) Date: 2009-07-11 09:24 Message: I can confirm exactly the same behavior with kvm-84 - all my machines uses also a dhcp daemon for requesting there IP adr. > My guest nodes are configured to get their IP addresses with DHCP. At the > DHCP server I see the DHCPDISCOVER requests from the guests, and I see the > DHCP server replying with DHCPOFFERs, but the guest never sees the offers > come in. I installed KVM-86 at the moment and the first test works fine - no disconnect at the moment (..) ---------------------------------------------------------------------- Comment By: Chris Jones (c_jones) Date: 2009-07-11 05:14 Message: I'm experiencing exactly the same problem. I've been running kvm-84 for quite a while and it works great there. But I tried moving to kvm-87 and on kvm-87, I'm getting exactly the same behavior you're reporting here -- except all the time (it doesn't ever work). My guest nodes are configured to get their IP addresses with DHCP. At the DHCP server I see the DHCPDISCOVER requests from the guests, and I see the DHCP server replying with DHCPOFFERs, but the guest never sees the offers come in. So, just like the other reports - packets outbound from the guest are fine, but inbound they seem to get dropped. Like the other reports, DHCP requests are UDP, so I guess this might lend weight to the argument this could happen more often with UDP. My environment (where it fails) kvm-87 kernel 2.6.27.26 Using the kvm-87 user component and kernel component. On kvm-84 all works fine (which seems to be in contrast to another person's report). Mine is easily reproducible - let me know if you want me to collect any information. ---------------------------------------------------------------------- Comment By: Daniel Schwager (danny1) Date: 2009-07-08 21:16 Message: > Mellen, did you try KVM-85 ? not -85, but did you try the current on: KVM-87 ? ---------------------------------------------------------------------- Comment By: Daniel Schwager (danny1) Date: 2009-07-08 21:13 Message: Hi, i have the same (?) problem. We are running about 20 x WinXP's using RDP on an 64bit FC9 with KVM-84, bridge-network, tap-device with e1000, smp-1. Sometimes (most (!) 3-4 minutes after starting a winxp-vm), i loose the RDP-connection. While this disconnection, a running ping inside the VM tells's me about ping problems. After only 2-3 Seconds later, i can reconnect the rdp and all works find ... until the next disconnect.. Mellen, did you try KVM-85 ? regards Danny ---------------------------------------------------------------------- Comment By: Tais M. Hansen (mellen) Date: 2009-07-02 17:22 Message: So far, I have been unable to reproduce this with a guest using only one cpu (-smp 1). Previous tests have been made on guests with -smp 2 or 4. ---------------------------------------------------------------------- Comment By: Tais M. Hansen (mellen) Date: 2009-06-29 18:43 Message: I have been able to reproduce this twice using pktgen on a remote host shooting UDP packets through a kvm guest. The guest stalled after about 10 minutes of heavy UDP traffic (40000 packets per second of 200 bytes each). The KVM guest have a simple iptables nat rule forwarding the UDP packets from eth0 out vlan1000 (on the same interface). ---------------------------------------------------------------------- Comment By: Fabio Coatti (cova) Date: 2009-06-09 13:35 Message: The only way that I've found to reproduce this issue is to have the guest create traffic on network (say, using it for wgetting or ftp some external site). After some time or traffic the network stops working. I'll try to reproduce again it on newer kvm versions, but no more reliable way to cause hangs has been identified to me. Thanks. ---------------------------------------------------------------------- Comment By: Tais M. Hansen (mellen) Date: 2009-06-08 14:31 Message: I haven't found a reliable way to make a guest loose its network but it seems like UDP traffic is more likely to cause this problem. A longer (30-60minute) RDP (remote desktop protocol) session of about 5mbps from one Microsoft box to another crossing a guest acting as router/firewall seems to kill the network on that guest. That has happened at least 3 times. That's as close to reproducing the problem as I've gotten so far. ---------------------------------------------------------------------- Comment By: Avi Kivity (avik) Date: 2009-06-08 13:57 Message: Is there a reliable way to reproduce this? ---------------------------------------------------------------------- Comment By: Tais M. Hansen (mellen) Date: 2009-05-07 13:29 Message: Just had another stall. Different host, different guest. Just one guest on that host system. What information would help debug this system? efer_reload 0 0 exits 11147836817 5827 fpu_reload 31483547 1 halt_exits 8041208101 1397 halt_wakeup 5069136218 0 host_state_reload 13598781750 1507 hypercalls 5890581948 4 insn_emulation 8114024171 4338 insn_emulation_fail 0 0 invlpg 0 0 io_exits 5263494017 2 irq_exits 781045211 53 irq_window 0 0 largepages 0 0 mmio_exits 61151 0 mmu_cache_miss 156941726 0 mmu_flooded 158265518 0 mmu_pde_zapped 96062900 0 mmu_pte_updated 2302220251 4 mmu_pte_write 1759022770 4 mmu_recycled 0 0 mmu_shadow_zapped 184691156 0 pf_fixed 4270608276 0 pf_guest 1226004351 0 remote_tlb_flush 481003516 2 request_irq 0 0 signal_exits 1 0 tlb_flush 4539655572 29 ---------------------------------------------------------------------- Comment By: Fabio Coatti (cova) Date: 2009-05-07 10:51 Message: We are still biten by this issue. I'm running out of ideas, but if someone can give me some hints on how to track down the problem or at least collect more information I'll try it. Thanks. ---------------------------------------------------------------------- Comment By: Tais M. Hansen (mellen) Date: 2009-05-07 10:39 Message: Just happened again. Seems like it stops generating interrupts on virtio devices: 10: 3001 51199 IO-APIC-fasteoi virtio1, virtio2 11: 7694835 8468134 IO-APIC-fasteoi virtio0 doing an /bin/ls froze the guest for a minute or so until a CTRL-C got through. After /bin/ls: 10: 3001 51220 IO-APIC-fasteoi virtio1, virtio2 11: 7694835 8468134 IO-APIC-fasteoi virtio0 kvm_stat: efer_reload 0 0 exits 19150288617 1082 fpu_reload 119767301 34 halt_exits 544721221 194 halt_wakeup 300187634 141 host_state_reload 837279034 259 hypercalls 4991133618 0 insn_emulation 4797940911 709 insn_emulation_fail 0 0 invlpg 0 0 io_exits 281254462 65 irq_exits 173205458 0 irq_window 0 0 largepages 0 0 mmio_exits 720457 0 mmu_cache_miss 290275800 0 mmu_flooded 289036380 0 mmu_pde_zapped 170308104 0 mmu_pte_updated 6239297430 0 mmu_pte_write 20470585377 0 mmu_recycled 0 0 mmu_shadow_zapped 335324686 0 pf_fixed 8324827214 0 pf_guest 2229555421 0 remote_tlb_flush 283772247 0 request_irq 0 0 signal_exits 6 0 tlb_flush 5334530556 66 ... but there are 6 other guests on this host running just fine. Can't connect gdb to the kvm's gdbserver. It just says "Remote 'g' packet reply is too long: ...." Issuing system_reset stalled for a minute, then rebooted the guest. After reboot, guest is find again. ---------------------------------------------------------------------- Comment By: Tais M. Hansen (mellen) Date: 2009-05-06 17:07 Message: I'm curious about the status of this issue? I'm experiencing the same problem apparently randomly on guests. Restarting the network interface does not seem to fix the problem. Only a reboot (or system_reset in kvm/qemu console) solves it. Last time it happened was on a host with kvm-84 and guest with kernel 2.6.27 using virtio-net. Leading up to the stall it had a traffic load of about 5 mbit constantly up and down for just over 2 hours with one 12 mbit spikes. I did not check interface counters or network traces at the time. ---------------------------------------------------------------------- Comment By: Fabio Coatti (cova) Date: 2009-01-14 14:24 Message: It seems quite similar indeed, but ip link set eth up/down on guest side seems to have no effect. Besides that, looking at the thread you pointed me, I can see another difference: sniffing on tap device in my case shows only outgoing packets (i.e. leaving kvm guest). So it can be the very same issue, but some differences are present. At least, we are seeing this on newer kernels and kvm revisions :) ---------------------------------------------------------------------- Comment By: Mark McLoughlin (markmc) Date: 2009-01-14 12:19 Message: Does ifup/ifdown in the guest fix the hang? If so, it sounds like the issue discussed in this long thread: http://www.mail-archive.com/kvm@vger.kernel.org/msg06774.html We still haven't got to the bottom of it. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=893831&aid=2506814&group_id=180599