* FAQ on linux-kvm.org has broken link
@ 2013-07-30 1:18 folkert
2013-07-30 12:25 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-07-30 1:18 UTC (permalink / raw)
To: kvm
Hi,
The link at:
http://www.linux-kvm.org/page/FAQ#My_guest_network_is_stuck_what_should_I_do.3F
pointing to:
http://qemu-buch.de/cgi-bin/moin.cgi/QemuNetwork
is broken: it gives a "Internal server error" message.
Please someone point me to the correct location as I'm struggeling with
a VM losing connectivity all the time.
Regards,
Folkert van Heusden
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-07-30 1:18 FAQ on linux-kvm.org has broken link folkert
@ 2013-07-30 12:25 ` Stefan Hajnoczi
2013-07-30 12:45 ` folkert
2013-07-30 20:45 ` folkert
0 siblings, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-07-30 12:25 UTC (permalink / raw)
To: folkert; +Cc: kvm
On Tue, Jul 30, 2013 at 03:18:53AM +0200, folkert wrote:
> The link at:
> http://www.linux-kvm.org/page/FAQ#My_guest_network_is_stuck_what_should_I_do.3F
> pointing to:
> http://qemu-buch.de/cgi-bin/moin.cgi/QemuNetwork
> is broken: it gives a "Internal server error" message.
>
> Please someone point me to the correct location as I'm struggeling with
> a VM losing connectivity all the time.
Hi Folkert,
I have updated the wiki to point to
http://qemu-project.org/Documentation/Networking. The original link
seems to be down.
If you keep losing network connectivity you may have a MAC or IP address
conflict. The symptom is that network traffic is intermittent - for
example, ping might work but a full TCP connection does not.
This happens when two guests are configured with identical MAC or IP
addresses on the same bridge or subnet. They will "fight" over the MAC
or IP address and you will not be able to reliably communicate with
those guests.
The tool for solving networking issues is often tcpdump. Run tcpdump
inside the guest to verify it is receiving traffic or investigate a
failed connection.
Run tcpdump on the host - especially if you are using -netdev tap - to
inspect the traffic being forwarded on behalf of the guest.
If you let libvirt set up networking for you all should be fine. If you
run qemu manually or customized the domain XML, then it's possible you
have a misconfiguration. Feel free to post the details so someone can
help you.
Stefan
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-07-30 12:25 ` Stefan Hajnoczi
@ 2013-07-30 12:45 ` folkert
2013-07-30 20:45 ` folkert
1 sibling, 0 replies; 16+ messages in thread
From: folkert @ 2013-07-30 12:45 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
> > The link at:
> > http://www.linux-kvm.org/page/FAQ#My_guest_network_is_stuck_what_should_I_do.3F
> > pointing to:
> > http://qemu-buch.de/cgi-bin/moin.cgi/QemuNetwork
> > is broken: it gives a "Internal server error" message.
> >
> > Please someone point me to the correct location as I'm struggeling with
> > a VM losing connectivity all the time.
>
> Hi Folkert,
> I have updated the wiki to point to
> http://qemu-project.org/Documentation/Networking. The original link
> seems to be down.
Thanks!
> If you keep losing network connectivity you may have a MAC or IP address
> conflict. The symptom is that network traffic is intermittent - for
> example, ping might work but a full TCP connection does not.
Well it seems that it does receive some traffic (mostly arp) but it
can't definately do any outgoing traffic at all. No ping, and all tcp
sessions stall too.
For now I created a "way around it" using an UMTS stick, so hopefully I
can now fix it (reboot the vm) when I'm not at the office.
I'll verify the mac addresses when it happens again.
I have a suspicion that it happens less frequently if I leave tcpdump
running on the interface in the guest. I've now stopped it.
> This happens when two guests are configured with identical MAC or IP
> addresses on the same bridge or subnet. They will "fight" over the MAC
> or IP address and you will not be able to reliably communicate with
> those guests.
If that is the case, the something must change this later on? Because it
happens occasionally - sometimes in 5 minutes, sometimes in days. And no
new guests are started on that server. It has 3 guests that stay on.
> If you let libvirt set up networking for you all should be fine. If you
> run qemu manually or customized the domain XML, then it's possible you
> have a misconfiguration. Feel free to post the details so someone can
> help you.
Everything is setup by libvirt. Great software by the way!
Folkert van Heusden
--
MultiTail is een flexibele tool voor het volgen van logfiles en
uitvoer van commando's. Filteren, van kleur voorzien, mergen,
'diff-view', etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-07-30 12:25 ` Stefan Hajnoczi
2013-07-30 12:45 ` folkert
@ 2013-07-30 20:45 ` folkert
2013-07-31 8:46 ` Stefan Hajnoczi
1 sibling, 1 reply; 16+ messages in thread
From: folkert @ 2013-07-30 20:45 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
> If you keep losing network connectivity you may have a MAC or IP address
> conflict. The symptom is that network traffic is intermittent - for
> example, ping might work but a full TCP connection does not.
I submitted a bug at bugzilla a while ago which I updated today with new
findings: https://bugzilla.kernel.org/show_bug.cgi?id=60620
This week the system ran a couple of times for 1-2 days but tonight was
a bit of a disaster: I had to reboot the system 18 times. Sometimes it
was fine for half an hour but most of the times after a couple of
minutes (sometimes even during boot) the networking on that one guest
failed.
Folkert van Heusden
--
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-07-30 20:45 ` folkert
@ 2013-07-31 8:46 ` Stefan Hajnoczi
2013-08-02 11:37 ` folkert
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-07-31 8:46 UTC (permalink / raw)
To: folkert; +Cc: kvm
On Tue, Jul 30, 2013 at 10:45:20PM +0200, folkert wrote:
> > If you keep losing network connectivity you may have a MAC or IP address
> > conflict. The symptom is that network traffic is intermittent - for
> > example, ping might work but a full TCP connection does not.
>
> I submitted a bug at bugzilla a while ago which I updated today with new
> findings: https://bugzilla.kernel.org/show_bug.cgi?id=60620
> This week the system ran a couple of times for 1-2 days but tonight was
> a bit of a disaster: I had to reboot the system 18 times. Sometimes it
> was fine for half an hour but most of the times after a couple of
> minutes (sometimes even during boot) the networking on that one guest
> failed.
I can't add anything besides suggesting slightly more verbose
troubleshooting steps:
1. Wait until the guest suffers from lost network connectivity.
2. Confirm the MAC/IP addresses and run tcpdump -ni $IFACE inside the
guest. Ping the guest from the host and check whether tcpdump
reports ICMP ping packets.
3. Now try pinging the host from the guest and run tcpdump -ni $IFACE on
the host. To determine the host-side tap interface, run the
following:
$ virsh domiflist mauer
Interface Type Source Model MAC
-------------------------------------------------------
vnet0 network default virtio 52:54:00:b9:c8:4d
Now you have verified tap connectivity with the guest. We now know:
1. Tap connectivity is fine (both transmit and receive are working)
2. Either transmit or receive are broken (ping doesn't work but tcpdump
does show incoming packets on one side).
3. Tap connectivity is broken (ping fails and tcpdump shows no ICMP
packets).
If the result is #1 then you can continue troubleshooting the next step:
the bridge or NAT configuration on the host.
If the result is #2, check firewalls on host and guest. Also try the
following inside the guest: disable the network interface, rmmod
virtio_net, modprobe virtio_net again, and bring the network up.
If the result is #3, check firewalls on host and guest as well as dmesg
output in host and guest.
Stefan
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-07-31 8:46 ` Stefan Hajnoczi
@ 2013-08-02 11:37 ` folkert
2013-08-02 15:25 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-02 11:37 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
Hi,
> If the result is #2, check firewalls on host and guest. Also try the
> following inside the guest: disable the network interface, rmmod
> virtio_net, modprobe virtio_net again, and bring the network up.
I pinged, I sniffed, I updated the bug report (it also happens with
other guests now!).
And.... the bring down interfaces / rmmod / modprobe / ifup works!
So I think something is wrong with virtio_net!
What shall I do now?
Folkert van Heusden
--
Nagios user? Check out CoffeeSaint - the versatile Nagios status
viewer! http://www.vanheusden.com/java/CoffeeSaint/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-02 11:37 ` folkert
@ 2013-08-02 15:25 ` Stefan Hajnoczi
2013-08-02 18:06 ` folkert
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-02 15:25 UTC (permalink / raw)
To: folkert; +Cc: kvm
On Fri, Aug 2, 2013 at 1:37 PM, folkert <folkert@vanheusden.com> wrote:
>> If the result is #2, check firewalls on host and guest. Also try the
>> following inside the guest: disable the network interface, rmmod
>> virtio_net, modprobe virtio_net again, and bring the network up.
>
> I pinged, I sniffed, I updated the bug report (it also happens with
> other guests now!).
>
> And.... the bring down interfaces / rmmod / modprobe / ifup works!
> So I think something is wrong with virtio_net!
>
> What shall I do now?
Hi Folkert,
I wrote a reply earlier today but it was rejected because I not have a
kernel.org bugzilla account. If you don't mind let's continue
discussing on this mailing list - we don't know whether this is a
kernel bug yet anyway.
A couple of questions:
Please post the QEMU command-line from the host (ps aux | grep qemu).
Please confirm that vhost_net is being used on the host (lsmod | grep
vhost_net).
Please double-check both guest and host dmesg for any suspicious
messages. It could be about networking, out-of-memory, or kernel
backtraces.
Stefan
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-02 15:25 ` Stefan Hajnoczi
@ 2013-08-02 18:06 ` folkert
2013-08-05 11:31 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-02 18:06 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
Hi Stefan,
> > I pinged, I sniffed, I updated the bug report (it also happens with
> > other guests now!).
> >
> > And.... the bring down interfaces / rmmod / modprobe / ifup works!
> > So I think something is wrong with virtio_net!
> >
> > What shall I do now?
>
> I wrote a reply earlier today but it was rejected because I not have a
> kernel.org bugzilla account. If you don't mind let's continue
> discussing on this mailing list - we don't know whether this is a
> kernel bug yet anyway.
no problem
> A couple of questions:
> Please post the QEMU command-line from the host (ps aux | grep qemu).
I'll post them all:
- UMTS-clone: this one works fine since it was created a weak ago
- belle: this one was fine but suddenly also showed the problem
- mauer: the problem one
112 4819 1 4 Jul30 ? 03:29:39 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name UMTS-clone -uuid e49502f1-0c74-2a60-99dc-7602da5ee640 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/UMTS-clone.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_UMTS-clone,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/folkert/ISOs/wheezy.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:09:3b:b6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0,password -vga cirrus -device usb-host,hostbus=6,hostaddr=5,id=hostdev0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
112 10065 1 11 Jul30 ? 07:46:16 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 12,sockets=12,cores=1,threads=1 -name belle -uuid 16b704d7-5fbd-d67b-71e6-0d6b43f1bc0a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/belle.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_BELLE,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGNEO/LV_V_BELLE_OS,if=none,id=drive-virtio-disk1,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1
-drive file=/dev/VGJOURNAL/LV_J_BELLE,if=none,id=drive-ide0-0-0,format=raw,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:75:4a:6f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:0a:6e:de,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=so
und0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
root 13116 12830 0 19:54 pts/8 00:00:00 grep qemu
112 23453 1 57 13:16 ? 03:46:51 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 8,maxcpus=12,sockets=12,cores=1,threads=1 -name mauer -uuid 3a8452e6-81af-b185-63b6-2b32be17ed87 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mauer.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_MAUER,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGJOURNAL/LV_J_MAUER,if=none,id=drive-virtio-disk1,format=raw,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virti
o-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:d9:1f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:a3:12:8a,bus=pci.0,addr=0x4 -netdev tap,fd=30,id=hostnet2,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:0f:54:c2,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x7 -device hda-duplex,id=sound0-codec0,bus=sound0.
0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9
Note that everything is managed via virt-manager.
> Please confirm that vhost_net is being used on the host (lsmod | grep
> vhost_net).
Yes, loaded and used:
root@neo:/home/folkert# lsmod | grep vhost
vhost_net 27658 6
macvtap 17638 1 vhost_net
tun 22479 13 vhost_net
> Please double-check both guest and host dmesg for any suspicious
> messages. It could be about networking, out-of-memory, or kernel
> backtraces.
I have to get back at this: I see messages about topology changes in the host but I forgot to check then if they were there when the problem started.
I *think* they appeared after I rebooted the guests but I'm not entirely sure. So let's wait on that.
That are the only messages appart from devices going into promiscues mode when I start tcpdump.
Folkert van Heusden
--
MultiTail is een flexibele tool voor het volgen van logfiles en
uitvoer van commando's. Filteren, van kleur voorzien, mergen,
'diff-view', etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-02 18:06 ` folkert
@ 2013-08-05 11:31 ` Stefan Hajnoczi
2013-08-05 20:59 ` folkert
2013-08-26 19:50 ` folkert
0 siblings, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-05 11:31 UTC (permalink / raw)
To: folkert; +Cc: kvm
On Fri, Aug 02, 2013 at 08:06:58PM +0200, folkert wrote:
> > A couple of questions:
> > Please post the QEMU command-line from the host (ps aux | grep qemu).
>
> I'll post them all:
> - UMTS-clone: this one works fine since it was created a weak ago
> - belle: this one was fine but suddenly also showed the problem
> - mauer: the problem one
>
> 112 4819 1 4 Jul30 ? 03:29:39 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name UMTS-clone -uuid e49502f1-0c74-2a60-99dc-7602da5ee640 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/UMTS-clone.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_UMTS-clone,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/folkert/ISOs/wheezy.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netd
ev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:09:3b:b6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0,password -vga cirrus -device usb-host,hostbus=6,hostaddr=5,id=hostdev0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> 112 10065 1 11 Jul30 ? 07:46:16 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 12,sockets=12,cores=1,threads=1 -name belle -uuid 16b704d7-5fbd-d67b-71e6-0d6b43f1bc0a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/belle.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_BELLE,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGNEO/LV_V_BELLE_OS,if=none,id=drive-virtio-disk1,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk
1 -drive file=/dev/VGJOURNAL/LV_J_BELLE,if=none,id=drive-ide0-0-0,format=raw,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:75:4a:6f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:0a:6e:de,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=
sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> root 13116 12830 0 19:54 pts/8 00:00:00 grep qemu
> 112 23453 1 57 13:16 ? 03:46:51 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 8,maxcpus=12,sockets=12,cores=1,threads=1 -name mauer -uuid 3a8452e6-81af-b185-63b6-2b32be17ed87 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mauer.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_MAUER,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGJOURNAL/LV_J_MAUER,if=none,id=drive-virtio-disk1,format=raw,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-vir
tio-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:d9:1f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:a3:12:8a,bus=pci.0,addr=0x4 -netdev tap,fd=30,id=hostnet2,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:0f:54:c2,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x7 -device hda-duplex,id=sound0-codec0,bus=sound
0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9
>
> Note that everything is managed via virt-manager.
>
> > Please confirm that vhost_net is being used on the host (lsmod | grep
> > vhost_net).
>
> Yes, loaded and used:
>
> root@neo:/home/folkert# lsmod | grep vhost
> vhost_net 27658 6
> macvtap 17638 1 vhost_net
> tun 22479 13 vhost_net
>
> > Please double-check both guest and host dmesg for any suspicious
> > messages. It could be about networking, out-of-memory, or kernel
> > backtraces.
>
> I have to get back at this: I see messages about topology changes in the host but I forgot to check then if they were there when the problem started.
> I *think* they appeared after I rebooted the guests but I'm not entirely sure. So let's wait on that.
> That are the only messages appart from devices going into promiscues mode when I start tcpdump.
Hi Folkert,
If you do find something in dmesg that could be very helpful.
I'm trying to put together all the data points but a few things are
unclear:
Based on this information it seems like a bug in the virtio_net guest
driver or vhost_net on the host. Actually there is one contradictory
piece of evidence: in the original bug report you said "using e1000
instead of virtio: did not help". Can you confirm that e1000 also does
not work?
In your original bug report you said "If I then ping any host connected
to that interface, no ping comes back: only a message about buffer space
not being enough". Can you post the exact error message and whether it
is printed by ping inside the guest, dmesg inside the guest, or dmesg on
the host?
There is still the possibility that there is a networking configuration
issue or bug inside the guest itself. That would explain why this has
happened across different configurations (tap, mactvap, vhost_net,
e1000).
Two approaches to get closer to the source of the problem:
1. Try the latest vanilla kernel on the host (Linux 3.10.5). This way
you can rule out fixed bugs in vhost_net or tap.
2. Get the system into the bad state and then do some deeper. Start
with outgoing ping, instrument guest driver and host vhost_net
functions to see what the drivers are doing, inspect the transmit
vring, etc.
#1 is probably the best next step. If it fails and you still have time
to work on a solution we can start digging deeper with #2.
Stefan
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-05 11:31 ` Stefan Hajnoczi
@ 2013-08-05 20:59 ` folkert
2013-08-06 8:13 ` Stefan Hajnoczi
2013-08-26 19:50 ` folkert
1 sibling, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-05 20:59 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
> If you do find something in dmesg that could be very helpful.
Ok it is down again: nothing in dmesg, neither on the guest or in the
host.
> Based on this information it seems like a bug in the virtio_net guest
> driver or vhost_net on the host. Actually there is one contradictory
> piece of evidence: in the original bug report you said "using e1000
> instead of virtio: did not help". Can you confirm that e1000 also does
> not work?
When I tested it, it also did not work yes.
> In your original bug report you said "If I then ping any host connected
> to that interface, no ping comes back: only a message about buffer space
> not being enough". Can you post the exact error message and whether it
> is printed by ping inside the guest, dmesg inside the guest, or dmesg on
> the host?
It was inside the guest going to the outside world (eg outside of the
host).
> There is still the possibility that there is a networking configuration
> issue or bug inside the guest itself. That would explain why this has
> happened across different configurations (tap, mactvap, vhost_net,
> e1000).
I don't think it is as:
- bringing down the interfaces _AND_ doing rmmod/modprobe of virtio_net
solves it
- it also happened twice on an other guest
> Two approaches to get closer to the source of the problem:
> 1. Try the latest vanilla kernel on the host (Linux 3.10.5). This way
> you can rule out fixed bugs in vhost_net or tap.
> 2. Get the system into the bad state and then do some deeper. Start
> with outgoing ping, instrument guest driver and host vhost_net
> functions to see what the drivers are doing, inspect the transmit
> vring, etc.
>
> #1 is probably the best next step. If it fails and you still have time
> to work on a solution we can start digging deeper with #2.
I can upgrade now to 3.10.3 as that is the current version in debian.
Folkert van Heusden
--
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-05 20:59 ` folkert
@ 2013-08-06 8:13 ` Stefan Hajnoczi
2013-09-03 17:03 ` folkert
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-06 8:13 UTC (permalink / raw)
To: folkert; +Cc: kvm
On Mon, Aug 05, 2013 at 10:59:45PM +0200, folkert wrote:
> > Two approaches to get closer to the source of the problem:
> > 1. Try the latest vanilla kernel on the host (Linux 3.10.5). This way
> > you can rule out fixed bugs in vhost_net or tap.
> > 2. Get the system into the bad state and then do some deeper. Start
> > with outgoing ping, instrument guest driver and host vhost_net
> > functions to see what the drivers are doing, inspect the transmit
> > vring, etc.
> >
> > #1 is probably the best next step. If it fails and you still have time
> > to work on a solution we can start digging deeper with #2.
>
> I can upgrade now to 3.10.3 as that is the current version in debian.
Sounds good. That way you'll also have access to the latest perf for
instrumenting vhost_net if it still fails.
Stefan
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-05 11:31 ` Stefan Hajnoczi
2013-08-05 20:59 ` folkert
@ 2013-08-26 19:50 ` folkert
2013-08-27 7:31 ` Stefan Hajnoczi
1 sibling, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-26 19:50 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
Hi,
> 1. Try the latest vanilla kernel on the host (Linux 3.10.5). This way
> you can rule out fixed bugs in vhost_net or tap.
For the last two weeks it went fine for a couple of days each time. This
evening was really bad again: uptimes of 5-10 minutes. This is with
3.11-rc4.
> 2. Get the system into the bad state and then do some deeper. Start
> with outgoing ping, instrument guest driver and host vhost_net
> functions to see what the drivers are doing, inspect the transmit
> vring, etc.
>
> #1 is probably the best next step. If it fails and you still have time
Yup, very much.
> to work on a solution we can start digging deeper with #2.
I had a small script running with showed the amount of traffic coming
and going out each second. I did not see any increase or decrease in the
amount, only that when the problem happens RX stays but TX goes to 0.
Folkert van Heusden
--
MultiTail na wan makriki wrokosani fu tan luku den logfile nanga san
den commando spiti puru. Piki puru spesrutu sani, wroko nanga difrenti
kroru, tya kon makandra, nanga wan lo moro.
http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-26 19:50 ` folkert
@ 2013-08-27 7:31 ` Stefan Hajnoczi
2013-10-29 17:18 ` folkert
0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-27 7:31 UTC (permalink / raw)
To: folkert; +Cc: kvm
On Mon, Aug 26, 2013 at 09:50:38PM +0200, folkert wrote:
> > 2. Get the system into the bad state and then do some deeper. Start
> > with outgoing ping, instrument guest driver and host vhost_net
> > functions to see what the drivers are doing, inspect the transmit
> > vring, etc.
Okay, it's time to understand what is going on at the virtio-net level.
You can check whether the host is being notified of packets ready to be
transmitted:
(host)$ sudo perf probe --module vhost_net handle_tx
(host)$ sudo perf record --pid 24306 -e probe:handle_tx -R
[...attempt to transmit packets inside guest...]
^C
(host)$ sudo perf script
vhost-24305 24306 [000] 101120.817103: probe:handle_tx: (ffffffffa06fd6b0)
Here the qemu-kvm process was pid 24305 and the vhost kernel thread was
24306 (you can find out the correct pid to use using ps aux). If your
host is only running the guest under test you can drop the pid argument
and replace it with -a.
If you see handle_tx probes firing on the host then the issue may be
with the vring or tap device. If you do not see handle_tx probes firing
on the host, then the problem may be inside the guest.
Stefan
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-06 8:13 ` Stefan Hajnoczi
@ 2013-09-03 17:03 ` folkert
0 siblings, 0 replies; 16+ messages in thread
From: folkert @ 2013-09-03 17:03 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
> > > Two approaches to get closer to the source of the problem:
> > > 1. Try the latest vanilla kernel on the host (Linux 3.10.5). This way
> > > you can rule out fixed bugs in vhost_net or tap.
> > > 2. Get the system into the bad state and then do some deeper. Start
> > > with outgoing ping, instrument guest driver and host vhost_net
> > > functions to see what the drivers are doing, inspect the transmit
> > > vring, etc.
> > >
> > > #1 is probably the best next step. If it fails and you still have time
> > > to work on a solution we can start digging deeper with #2.
> >
> > I can upgrade now to 3.10.3 as that is the current version in debian.
>
> Sounds good. That way you'll also have access to the latest perf for
> instrumenting vhost_net if it still fails.
It seems some of perf is in the debian kernel and some isn't. So I have
to request them to include that. That might take a while.
I have a feeling but I did not write it down so I'm not 100% sure that
this problem mosly happens (starts at) 18:30 and also past 00:00. That
would make one say: then the problem is with that vm. I don't think it
is also at least one other vm/guest has this problem. Only very very
much less than this problem vm.
I'm going to make a list for when it happens.
Folkert van Heusden
--
Feeling generous? -> http://www.vanheusden.com/wishlist.php
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-08-27 7:31 ` Stefan Hajnoczi
@ 2013-10-29 17:18 ` folkert
2013-11-01 9:52 ` Stefan Hajnoczi
0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-10-29 17:18 UTC (permalink / raw)
To: Stefan Hajnoczi; +Cc: kvm
> > > 2. Get the system into the bad state and then do some deeper. Start
> > > with outgoing ping, instrument guest driver and host vhost_net
> > > functions to see what the drivers are doing, inspect the transmit
> > > vring, etc.
Update: have not gotten around to reboot the server for the new kernel
with those probes, but: it is definately very much worse on tuesday(!)
and when I use mosh to remotely login.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: FAQ on linux-kvm.org has broken link
2013-10-29 17:18 ` folkert
@ 2013-11-01 9:52 ` Stefan Hajnoczi
0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-11-01 9:52 UTC (permalink / raw)
To: folkert; +Cc: kvm, Christian Theune, Michael S. Tsirkin, Jason Wang
On Tue, Oct 29, 2013 at 06:18:05PM +0100, folkert wrote:
> > > > 2. Get the system into the bad state and then do some deeper. Start
> > > > with outgoing ping, instrument guest driver and host vhost_net
> > > > functions to see what the drivers are doing, inspect the transmit
> > > > vring, etc.
>
> Update: have not gotten around to reboot the server for the new kernel
> with those probes, but: it is definately very much worse on tuesday(!)
> and when I use mosh to remotely login.
If you're hitting the oversized GSO packet issue that Christian Theune
also seems to be hitting, then further debugging isn't necessary on your
part.
When guest network tx is stalled, please check the following on the
host:
(host)# ifconfig vnet0
vnet0: flags=4099<UP,BROADCAST,MULTICAST> mtu 1500
inet 192.168.122.123 netmask 255.255.255.0 broadcast 192.168.122.255
ether 52:54:00:bb:01:ac txqueuelen 0 (Ethernet)
RX packets 0 bytes 0 (0.0 B)
RX errors 0 dropped 1 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
If you see "RX dropped" greater than zero then it's probably the same issue.
Stefan
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2013-11-01 9:52 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-30 1:18 FAQ on linux-kvm.org has broken link folkert
2013-07-30 12:25 ` Stefan Hajnoczi
2013-07-30 12:45 ` folkert
2013-07-30 20:45 ` folkert
2013-07-31 8:46 ` Stefan Hajnoczi
2013-08-02 11:37 ` folkert
2013-08-02 15:25 ` Stefan Hajnoczi
2013-08-02 18:06 ` folkert
2013-08-05 11:31 ` Stefan Hajnoczi
2013-08-05 20:59 ` folkert
2013-08-06 8:13 ` Stefan Hajnoczi
2013-09-03 17:03 ` folkert
2013-08-26 19:50 ` folkert
2013-08-27 7:31 ` Stefan Hajnoczi
2013-10-29 17:18 ` folkert
2013-11-01 9:52 ` Stefan Hajnoczi
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.