All of lore.kernel.org
 help / color / mirror / Atom feed
* FAQ on linux-kvm.org has broken link
@ 2013-07-30  1:18 folkert
  2013-07-30 12:25 ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-07-30  1:18 UTC (permalink / raw)
  To: kvm

Hi,

The link at:
http://www.linux-kvm.org/page/FAQ#My_guest_network_is_stuck_what_should_I_do.3F
pointing to:
http://qemu-buch.de/cgi-bin/moin.cgi/QemuNetwork
is broken: it gives a "Internal server error" message.

Please someone point me to the correct location as I'm struggeling with
a VM losing connectivity all the time.


Regards,

Folkert van Heusden

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-07-30  1:18 FAQ on linux-kvm.org has broken link folkert
@ 2013-07-30 12:25 ` Stefan Hajnoczi
  2013-07-30 12:45   ` folkert
  2013-07-30 20:45   ` folkert
  0 siblings, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-07-30 12:25 UTC (permalink / raw)
  To: folkert; +Cc: kvm

On Tue, Jul 30, 2013 at 03:18:53AM +0200, folkert wrote:
> The link at:
> http://www.linux-kvm.org/page/FAQ#My_guest_network_is_stuck_what_should_I_do.3F
> pointing to:
> http://qemu-buch.de/cgi-bin/moin.cgi/QemuNetwork
> is broken: it gives a "Internal server error" message.
> 
> Please someone point me to the correct location as I'm struggeling with
> a VM losing connectivity all the time.

Hi Folkert,
I have updated the wiki to point to
http://qemu-project.org/Documentation/Networking.  The original link
seems to be down.

If you keep losing network connectivity you may have a MAC or IP address
conflict.  The symptom is that network traffic is intermittent - for
example, ping might work but a full TCP connection does not.

This happens when two guests are configured with identical MAC or IP
addresses on the same bridge or subnet.  They will "fight" over the MAC
or IP address and you will not be able to reliably communicate with
those guests.

The tool for solving networking issues is often tcpdump.  Run tcpdump
inside the guest to verify it is receiving traffic or investigate a
failed connection.

Run tcpdump on the host - especially if you are using -netdev tap - to
inspect the traffic being forwarded on behalf of the guest.

If you let libvirt set up networking for you all should be fine.  If you
run qemu manually or customized the domain XML, then it's possible you
have a misconfiguration.  Feel free to post the details so someone can
help you.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-07-30 12:25 ` Stefan Hajnoczi
@ 2013-07-30 12:45   ` folkert
  2013-07-30 20:45   ` folkert
  1 sibling, 0 replies; 16+ messages in thread
From: folkert @ 2013-07-30 12:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

> > The link at:
> > http://www.linux-kvm.org/page/FAQ#My_guest_network_is_stuck_what_should_I_do.3F
> > pointing to:
> > http://qemu-buch.de/cgi-bin/moin.cgi/QemuNetwork
> > is broken: it gives a "Internal server error" message.
> > 
> > Please someone point me to the correct location as I'm struggeling with
> > a VM losing connectivity all the time.
> 
> Hi Folkert,
> I have updated the wiki to point to
> http://qemu-project.org/Documentation/Networking.  The original link
> seems to be down.

Thanks!

> If you keep losing network connectivity you may have a MAC or IP address
> conflict.  The symptom is that network traffic is intermittent - for
> example, ping might work but a full TCP connection does not.

Well it seems that it does receive some traffic (mostly arp) but it
can't definately do any outgoing traffic at all. No ping, and all tcp
sessions stall too.

For now I created a "way around it" using an UMTS stick, so hopefully I
can now fix it (reboot the vm) when I'm not at the office.
I'll verify the mac addresses when it happens again.

I have a suspicion that it happens less frequently if I leave tcpdump
running on the interface in the guest. I've now stopped it.

> This happens when two guests are configured with identical MAC or IP
> addresses on the same bridge or subnet.  They will "fight" over the MAC
> or IP address and you will not be able to reliably communicate with
> those guests.

If that is the case, the something must change this later on? Because it
happens occasionally - sometimes in 5 minutes, sometimes in days. And no
new guests are started on that server. It has 3 guests that stay on.

> If you let libvirt set up networking for you all should be fine.  If you
> run qemu manually or customized the domain XML, then it's possible you
> have a misconfiguration.  Feel free to post the details so someone can
> help you.

Everything is setup by libvirt. Great software by the way!


Folkert van Heusden

-- 
MultiTail is een flexibele tool voor het volgen van logfiles en
uitvoer van commando's. Filteren, van kleur voorzien, mergen,
'diff-view', etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-07-30 12:25 ` Stefan Hajnoczi
  2013-07-30 12:45   ` folkert
@ 2013-07-30 20:45   ` folkert
  2013-07-31  8:46     ` Stefan Hajnoczi
  1 sibling, 1 reply; 16+ messages in thread
From: folkert @ 2013-07-30 20:45 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

> If you keep losing network connectivity you may have a MAC or IP address
> conflict.  The symptom is that network traffic is intermittent - for
> example, ping might work but a full TCP connection does not.

I submitted a bug at bugzilla a while ago which I updated today with new
findings: https://bugzilla.kernel.org/show_bug.cgi?id=60620
This week the system ran a couple of times for 1-2 days but tonight was
a bit of a disaster: I had to reboot the system 18 times. Sometimes it
was fine for half an hour but most of the times after a couple of
minutes (sometimes even during boot) the networking on that one guest
failed.


Folkert van Heusden

-- 
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-07-30 20:45   ` folkert
@ 2013-07-31  8:46     ` Stefan Hajnoczi
  2013-08-02 11:37       ` folkert
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-07-31  8:46 UTC (permalink / raw)
  To: folkert; +Cc: kvm

On Tue, Jul 30, 2013 at 10:45:20PM +0200, folkert wrote:
> > If you keep losing network connectivity you may have a MAC or IP address
> > conflict.  The symptom is that network traffic is intermittent - for
> > example, ping might work but a full TCP connection does not.
> 
> I submitted a bug at bugzilla a while ago which I updated today with new
> findings: https://bugzilla.kernel.org/show_bug.cgi?id=60620
> This week the system ran a couple of times for 1-2 days but tonight was
> a bit of a disaster: I had to reboot the system 18 times. Sometimes it
> was fine for half an hour but most of the times after a couple of
> minutes (sometimes even during boot) the networking on that one guest
> failed.

I can't add anything besides suggesting slightly more verbose
troubleshooting steps:

1. Wait until the guest suffers from lost network connectivity.

2. Confirm the MAC/IP addresses and run tcpdump -ni $IFACE inside the
   guest.  Ping the guest from the host and check whether tcpdump
   reports ICMP ping packets.

3. Now try pinging the host from the guest and run tcpdump -ni $IFACE on
   the host.  To determine the host-side tap interface, run the
   following:

   $ virsh domiflist mauer
   Interface  Type       Source     Model       MAC
   -------------------------------------------------------
   vnet0      network    default    virtio      52:54:00:b9:c8:4d

Now you have verified tap connectivity with the guest.  We now know:

1. Tap connectivity is fine (both transmit and receive are working)
2. Either transmit or receive are broken (ping doesn't work but tcpdump
   does show incoming packets on one side).
3. Tap connectivity is broken (ping fails and tcpdump shows no ICMP
   packets).

If the result is #1 then you can continue troubleshooting the next step:
the bridge or NAT configuration on the host.

If the result is #2, check firewalls on host and guest.  Also try the
following inside the guest: disable the network interface, rmmod
virtio_net, modprobe virtio_net again, and bring the network up.

If the result is #3, check firewalls on host and guest as well as dmesg
output in host and guest.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-07-31  8:46     ` Stefan Hajnoczi
@ 2013-08-02 11:37       ` folkert
  2013-08-02 15:25         ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-02 11:37 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

Hi,

> If the result is #2, check firewalls on host and guest.  Also try the
> following inside the guest: disable the network interface, rmmod
> virtio_net, modprobe virtio_net again, and bring the network up.

I pinged, I sniffed, I updated the bug report (it also happens with
other guests now!).

And.... the bring down interfaces / rmmod / modprobe / ifup works!
So I think something is wrong with virtio_net!

What shall I do now?


Folkert van Heusden

-- 
Nagios user? Check out CoffeeSaint - the versatile Nagios status
viewer! http://www.vanheusden.com/java/CoffeeSaint/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-02 11:37       ` folkert
@ 2013-08-02 15:25         ` Stefan Hajnoczi
  2013-08-02 18:06           ` folkert
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-02 15:25 UTC (permalink / raw)
  To: folkert; +Cc: kvm

On Fri, Aug 2, 2013 at 1:37 PM, folkert <folkert@vanheusden.com> wrote:
>> If the result is #2, check firewalls on host and guest.  Also try the
>> following inside the guest: disable the network interface, rmmod
>> virtio_net, modprobe virtio_net again, and bring the network up.
>
> I pinged, I sniffed, I updated the bug report (it also happens with
> other guests now!).
>
> And.... the bring down interfaces / rmmod / modprobe / ifup works!
> So I think something is wrong with virtio_net!
>
> What shall I do now?

Hi Folkert,
I wrote a reply earlier today but it was rejected because I not have a
kernel.org bugzilla account.  If you don't mind let's continue
discussing on this mailing list - we don't know whether this is a
kernel bug yet anyway.

A couple of questions:

Please post the QEMU command-line from the host (ps aux | grep qemu).

Please confirm that vhost_net is being used on the host (lsmod | grep
vhost_net).

Please double-check both guest and host dmesg for any suspicious
messages.  It could be about networking, out-of-memory, or kernel
backtraces.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-02 15:25         ` Stefan Hajnoczi
@ 2013-08-02 18:06           ` folkert
  2013-08-05 11:31             ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-02 18:06 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

Hi Stefan,

> > I pinged, I sniffed, I updated the bug report (it also happens with
> > other guests now!).
> >
> > And.... the bring down interfaces / rmmod / modprobe / ifup works!
> > So I think something is wrong with virtio_net!
> >
> > What shall I do now?
> 
> I wrote a reply earlier today but it was rejected because I not have a
> kernel.org bugzilla account.  If you don't mind let's continue
> discussing on this mailing list - we don't know whether this is a
> kernel bug yet anyway.

no problem

> A couple of questions:
> Please post the QEMU command-line from the host (ps aux | grep qemu).

I'll post them all:
- UMTS-clone: this one works fine since it was created a weak ago
- belle: this one was fine but suddenly also showed the problem
- mauer: the problem one

112       4819     1  4 Jul30 ?        03:29:39 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name UMTS-clone -uuid e49502f1-0c74-2a60-99dc-7602da5ee640 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/UMTS-clone.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_UMTS-clone,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/folkert/ISOs/wheezy.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev
  tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:09:3b:b6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0,password -vga cirrus -device usb-host,hostbus=6,hostaddr=5,id=hostdev0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
112      10065     1 11 Jul30 ?        07:46:16 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 12,sockets=12,cores=1,threads=1 -name belle -uuid 16b704d7-5fbd-d67b-71e6-0d6b43f1bc0a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/belle.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_BELLE,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGNEO/LV_V_BELLE_OS,if=none,id=drive-virtio-disk1,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk1 
 -drive file=/dev/VGJOURNAL/LV_J_BELLE,if=none,id=drive-ide0-0-0,format=raw,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:75:4a:6f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:0a:6e:de,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=so
 und0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
root     13116 12830  0 19:54 pts/8    00:00:00 grep qemu
112      23453     1 57 13:16 ?        03:46:51 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 8,maxcpus=12,sockets=12,cores=1,threads=1 -name mauer -uuid 3a8452e6-81af-b185-63b6-2b32be17ed87 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mauer.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_MAUER,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGJOURNAL/LV_J_MAUER,if=none,id=drive-virtio-disk1,format=raw,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-virti
 o-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:d9:1f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:a3:12:8a,bus=pci.0,addr=0x4 -netdev tap,fd=30,id=hostnet2,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:0f:54:c2,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x7 -device hda-duplex,id=sound0-codec0,bus=sound0.
 0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9

Note that everything is managed via virt-manager.

> Please confirm that vhost_net is being used on the host (lsmod | grep
> vhost_net).

Yes, loaded and used:

root@neo:/home/folkert# lsmod | grep vhost
vhost_net              27658  6
macvtap                17638  1 vhost_net
tun                    22479  13 vhost_net

> Please double-check both guest and host dmesg for any suspicious
> messages.  It could be about networking, out-of-memory, or kernel
> backtraces.

I have to get back at this: I see messages about topology changes in the host but I forgot to check then if they were there when the problem started.
I *think* they appeared after I rebooted the guests but I'm not entirely sure. So let's wait on that.
That are the only messages appart from devices going into promiscues mode when I start tcpdump.


Folkert van Heusden

-- 
MultiTail is een flexibele tool voor het volgen van logfiles en
uitvoer van commando's. Filteren, van kleur voorzien, mergen,
'diff-view', etc. http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-02 18:06           ` folkert
@ 2013-08-05 11:31             ` Stefan Hajnoczi
  2013-08-05 20:59               ` folkert
  2013-08-26 19:50               ` folkert
  0 siblings, 2 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-05 11:31 UTC (permalink / raw)
  To: folkert; +Cc: kvm

On Fri, Aug 02, 2013 at 08:06:58PM +0200, folkert wrote:
> > A couple of questions:
> > Please post the QEMU command-line from the host (ps aux | grep qemu).
> 
> I'll post them all:
> - UMTS-clone: this one works fine since it was created a weak ago
> - belle: this one was fine but suddenly also showed the problem
> - mauer: the problem one
> 
> 112       4819     1  4 Jul30 ?        03:29:39 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 1024 -smp 1,sockets=1,cores=1,threads=1 -name UMTS-clone -uuid e49502f1-0c74-2a60-99dc-7602da5ee640 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/UMTS-clone.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_UMTS-clone,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/folkert/ISOs/wheezy.iso,if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netd
 ev tap,fd=20,id=hostnet0,vhost=on,vhostfd=21 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:09:3b:b6,bus=pci.0,addr=0x3 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -vnc 127.0.0.1:0,password -vga cirrus -device usb-host,hostbus=6,hostaddr=5,id=hostdev0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> 112      10065     1 11 Jul30 ?        07:46:16 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 12,sockets=12,cores=1,threads=1 -name belle -uuid 16b704d7-5fbd-d67b-71e6-0d6b43f1bc0a -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/belle.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=localtime -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_BELLE,if=none,id=drive-virtio-disk0,format=raw -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGNEO/LV_V_BELLE_OS,if=none,id=drive-virtio-disk1,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk1,id=virtio-disk
 1 -drive file=/dev/VGJOURNAL/LV_J_BELLE,if=none,id=drive-ide0-0-0,format=raw,cache=writeback -device ide-hd,bus=ide.0,unit=0,drive=drive-ide0-0-0,id=ide0-0-0 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw,cache=none -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:75:4a:6f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:0a:6e:de,bus=pci.0,addr=0x7 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:1,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x4 -device hda-duplex,id=sound0-codec0,bus=
 sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6
> root     13116 12830  0 19:54 pts/8    00:00:00 grep qemu
> 112      23453     1 57 13:16 ?        03:46:51 /usr/bin/kvm -S -M pc-1.1 -enable-kvm -m 8192 -smp 8,maxcpus=12,sockets=12,cores=1,threads=1 -name mauer -uuid 3a8452e6-81af-b185-63b6-2b32be17ed87 -no-user-config -nodefaults -chardev socket,id=charmonitor,path=/var/lib/libvirt/qemu/mauer.monitor,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -drive file=/dev/VGNEO/LV_V_MAUER,if=none,id=drive-virtio-disk0,format=raw,cache=writeback -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x8,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/dev/VGJOURNAL/LV_J_MAUER,if=none,id=drive-virtio-disk1,format=raw,cache=writethrough -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0xa,drive=drive-vir
 tio-disk1,id=virtio-disk1 -drive if=none,id=drive-ide0-1-0,readonly=on,format=raw -device ide-cd,bus=ide.1,unit=0,drive=drive-ide0-1-0,id=ide0-1-0 -netdev tap,fd=26,id=hostnet0,vhost=on,vhostfd=27 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:86:d9:1f,bus=pci.0,addr=0x3 -netdev tap,fd=28,id=hostnet1,vhost=on,vhostfd=29 -device virtio-net-pci,netdev=hostnet1,id=net1,mac=52:54:00:a3:12:8a,bus=pci.0,addr=0x4 -netdev tap,fd=30,id=hostnet2,vhost=on,vhostfd=31 -device virtio-net-pci,netdev=hostnet2,id=net2,mac=52:54:00:0f:54:c2,bus=pci.0,addr=0x5 -chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 -device usb-tablet,id=input0 -vnc 127.0.0.1:2,password -vga cirrus -device intel-hda,id=sound0,bus=pci.0,addr=0x7 -device hda-duplex,id=sound0-codec0,bus=sound
 0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x9
> 
> Note that everything is managed via virt-manager.
> 
> > Please confirm that vhost_net is being used on the host (lsmod | grep
> > vhost_net).
> 
> Yes, loaded and used:
> 
> root@neo:/home/folkert# lsmod | grep vhost
> vhost_net              27658  6
> macvtap                17638  1 vhost_net
> tun                    22479  13 vhost_net
> 
> > Please double-check both guest and host dmesg for any suspicious
> > messages.  It could be about networking, out-of-memory, or kernel
> > backtraces.
> 
> I have to get back at this: I see messages about topology changes in the host but I forgot to check then if they were there when the problem started.
> I *think* they appeared after I rebooted the guests but I'm not entirely sure. So let's wait on that.
> That are the only messages appart from devices going into promiscues mode when I start tcpdump.

Hi Folkert,
If you do find something in dmesg that could be very helpful.

I'm trying to put together all the data points but a few things are
unclear:

Based on this information it seems like a bug in the virtio_net guest
driver or vhost_net on the host.  Actually there is one contradictory
piece of evidence: in the original bug report you said "using e1000
instead of virtio: did not help".  Can you confirm that e1000 also does
not work?

In your original bug report you said "If I then ping any host connected
to that interface, no ping comes back: only a message about buffer space
not being enough".  Can you post the exact error message and whether it
is printed by ping inside the guest, dmesg inside the guest, or dmesg on
the host?

There is still the possibility that there is a networking configuration
issue or bug inside the guest itself.  That would explain why this has
happened across different configurations (tap, mactvap, vhost_net,
e1000).

Two approaches to get closer to the source of the problem:

1. Try the latest vanilla kernel on the host (Linux 3.10.5).  This way
   you can rule out fixed bugs in vhost_net or tap.

2. Get the system into the bad state and then do some deeper.  Start
   with outgoing ping, instrument guest driver and host vhost_net
   functions to see what the drivers are doing, inspect the transmit
   vring, etc.

#1 is probably the best next step.  If it fails and you still have time
to work on a solution we can start digging deeper with #2.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-05 11:31             ` Stefan Hajnoczi
@ 2013-08-05 20:59               ` folkert
  2013-08-06  8:13                 ` Stefan Hajnoczi
  2013-08-26 19:50               ` folkert
  1 sibling, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-05 20:59 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

> If you do find something in dmesg that could be very helpful.

Ok it is down again: nothing in dmesg, neither on the guest or in the
host.

> Based on this information it seems like a bug in the virtio_net guest
> driver or vhost_net on the host.  Actually there is one contradictory
> piece of evidence: in the original bug report you said "using e1000
> instead of virtio: did not help".  Can you confirm that e1000 also does
> not work?

When I tested it, it also did not work yes.

> In your original bug report you said "If I then ping any host connected
> to that interface, no ping comes back: only a message about buffer space
> not being enough".  Can you post the exact error message and whether it
> is printed by ping inside the guest, dmesg inside the guest, or dmesg on
> the host?

It was inside the guest going to the outside world (eg outside of the
host).

> There is still the possibility that there is a networking configuration
> issue or bug inside the guest itself.  That would explain why this has
> happened across different configurations (tap, mactvap, vhost_net,
> e1000).

I don't think it is as:
- bringing down the interfaces _AND_ doing rmmod/modprobe of virtio_net
  solves it
- it also happened twice on an other guest

> Two approaches to get closer to the source of the problem:
> 1. Try the latest vanilla kernel on the host (Linux 3.10.5).  This way
>    you can rule out fixed bugs in vhost_net or tap.
> 2. Get the system into the bad state and then do some deeper.  Start
>    with outgoing ping, instrument guest driver and host vhost_net
>    functions to see what the drivers are doing, inspect the transmit
>    vring, etc.
> 
> #1 is probably the best next step.  If it fails and you still have time
> to work on a solution we can start digging deeper with #2.

I can upgrade now to 3.10.3 as that is the current version in debian.


Folkert van Heusden

-- 
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-05 20:59               ` folkert
@ 2013-08-06  8:13                 ` Stefan Hajnoczi
  2013-09-03 17:03                   ` folkert
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-06  8:13 UTC (permalink / raw)
  To: folkert; +Cc: kvm

On Mon, Aug 05, 2013 at 10:59:45PM +0200, folkert wrote:
> > Two approaches to get closer to the source of the problem:
> > 1. Try the latest vanilla kernel on the host (Linux 3.10.5).  This way
> >    you can rule out fixed bugs in vhost_net or tap.
> > 2. Get the system into the bad state and then do some deeper.  Start
> >    with outgoing ping, instrument guest driver and host vhost_net
> >    functions to see what the drivers are doing, inspect the transmit
> >    vring, etc.
> > 
> > #1 is probably the best next step.  If it fails and you still have time
> > to work on a solution we can start digging deeper with #2.
> 
> I can upgrade now to 3.10.3 as that is the current version in debian.

Sounds good.  That way you'll also have access to the latest perf for
instrumenting vhost_net if it still fails.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-05 11:31             ` Stefan Hajnoczi
  2013-08-05 20:59               ` folkert
@ 2013-08-26 19:50               ` folkert
  2013-08-27  7:31                 ` Stefan Hajnoczi
  1 sibling, 1 reply; 16+ messages in thread
From: folkert @ 2013-08-26 19:50 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

Hi,

> 1. Try the latest vanilla kernel on the host (Linux 3.10.5).  This way
>    you can rule out fixed bugs in vhost_net or tap.

For the last two weeks it went fine for a couple of days each time. This
evening was really bad again: uptimes of 5-10 minutes. This is with
3.11-rc4.

> 2. Get the system into the bad state and then do some deeper.  Start
>    with outgoing ping, instrument guest driver and host vhost_net
>    functions to see what the drivers are doing, inspect the transmit
>    vring, etc.
> 
> #1 is probably the best next step.  If it fails and you still have time

Yup, very much.

> to work on a solution we can start digging deeper with #2.

I had a small script running with showed the amount of traffic coming
and going out each second. I did not see any increase or decrease in the
amount, only that when the problem happens RX stays but TX goes to 0.


Folkert van Heusden

-- 
MultiTail na wan makriki wrokosani fu tan luku den logfile nanga san
den commando spiti puru. Piki puru spesrutu sani, wroko nanga difrenti
kroru, tya kon makandra, nanga wan lo moro.
http://www.vanheusden.com/multitail/
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-26 19:50               ` folkert
@ 2013-08-27  7:31                 ` Stefan Hajnoczi
  2013-10-29 17:18                   ` folkert
  0 siblings, 1 reply; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-08-27  7:31 UTC (permalink / raw)
  To: folkert; +Cc: kvm

On Mon, Aug 26, 2013 at 09:50:38PM +0200, folkert wrote:
> > 2. Get the system into the bad state and then do some deeper.  Start
> >    with outgoing ping, instrument guest driver and host vhost_net
> >    functions to see what the drivers are doing, inspect the transmit
> >    vring, etc.

Okay, it's time to understand what is going on at the virtio-net level.

You can check whether the host is being notified of packets ready to be
transmitted:

  (host)$ sudo perf probe --module vhost_net handle_tx
  (host)$ sudo perf record --pid 24306 -e probe:handle_tx -R
  [...attempt to transmit packets inside guest...]
  ^C
  (host)$ sudo perf script
       vhost-24305 24306 [000] 101120.817103: probe:handle_tx: (ffffffffa06fd6b0)

Here the qemu-kvm process was pid 24305 and the vhost kernel thread was
24306 (you can find out the correct pid to use using ps aux).  If your
host is only running the guest under test you can drop the pid argument
and replace it with -a.

If you see handle_tx probes firing on the host then the issue may be
with the vring or tap device.  If you do not see handle_tx probes firing
on the host, then the problem may be inside the guest.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-06  8:13                 ` Stefan Hajnoczi
@ 2013-09-03 17:03                   ` folkert
  0 siblings, 0 replies; 16+ messages in thread
From: folkert @ 2013-09-03 17:03 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

> > > Two approaches to get closer to the source of the problem:
> > > 1. Try the latest vanilla kernel on the host (Linux 3.10.5).  This way
> > >    you can rule out fixed bugs in vhost_net or tap.
> > > 2. Get the system into the bad state and then do some deeper.  Start
> > >    with outgoing ping, instrument guest driver and host vhost_net
> > >    functions to see what the drivers are doing, inspect the transmit
> > >    vring, etc.
> > > 
> > > #1 is probably the best next step.  If it fails and you still have time
> > > to work on a solution we can start digging deeper with #2.
> > 
> > I can upgrade now to 3.10.3 as that is the current version in debian.
> 
> Sounds good.  That way you'll also have access to the latest perf for
> instrumenting vhost_net if it still fails.

It seems some of perf is in the debian kernel and some isn't. So I have
to request them to include that. That might take a while.

I have a feeling but I did not write it down so I'm not 100% sure that
this problem mosly happens (starts at) 18:30 and also past 00:00. That
would make one say: then the problem is with that vm. I don't think it
is also at least one other vm/guest has this problem. Only very very
much less than this problem vm.

I'm going to make a list for when it happens.


Folkert van Heusden

-- 
Feeling generous? -> http://www.vanheusden.com/wishlist.php
----------------------------------------------------------------------
Phone: +31-6-41278122, PGP-key: 1F28D8AE, www.vanheusden.com

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-08-27  7:31                 ` Stefan Hajnoczi
@ 2013-10-29 17:18                   ` folkert
  2013-11-01  9:52                     ` Stefan Hajnoczi
  0 siblings, 1 reply; 16+ messages in thread
From: folkert @ 2013-10-29 17:18 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: kvm

> > > 2. Get the system into the bad state and then do some deeper.  Start
> > >    with outgoing ping, instrument guest driver and host vhost_net
> > >    functions to see what the drivers are doing, inspect the transmit
> > >    vring, etc.

Update: have not gotten around to reboot the server for the new kernel
with those probes, but: it is definately very much worse on tuesday(!)
and when I use mosh to remotely login.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: FAQ on linux-kvm.org has broken link
  2013-10-29 17:18                   ` folkert
@ 2013-11-01  9:52                     ` Stefan Hajnoczi
  0 siblings, 0 replies; 16+ messages in thread
From: Stefan Hajnoczi @ 2013-11-01  9:52 UTC (permalink / raw)
  To: folkert; +Cc: kvm, Christian Theune, Michael S. Tsirkin, Jason Wang

On Tue, Oct 29, 2013 at 06:18:05PM +0100, folkert wrote:
> > > > 2. Get the system into the bad state and then do some deeper.  Start
> > > >    with outgoing ping, instrument guest driver and host vhost_net
> > > >    functions to see what the drivers are doing, inspect the transmit
> > > >    vring, etc.
> 
> Update: have not gotten around to reboot the server for the new kernel
> with those probes, but: it is definately very much worse on tuesday(!)
> and when I use mosh to remotely login.

If you're hitting the oversized GSO packet issue that Christian Theune
also seems to be hitting, then further debugging isn't necessary on your
part.

When guest network tx is stalled, please check the following on the
host:

  (host)# ifconfig vnet0
  vnet0: flags=4099<UP,BROADCAST,MULTICAST>  mtu 1500
          inet 192.168.122.123  netmask 255.255.255.0  broadcast 192.168.122.255
          ether 52:54:00:bb:01:ac  txqueuelen 0  (Ethernet)
          RX packets 0  bytes 0 (0.0 B)
          RX errors 0  dropped 1  overruns 0  frame 0
          TX packets 0  bytes 0 (0.0 B)
          TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

If you see "RX dropped" greater than zero then it's probably the same issue.

Stefan

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-11-01  9:52 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-30  1:18 FAQ on linux-kvm.org has broken link folkert
2013-07-30 12:25 ` Stefan Hajnoczi
2013-07-30 12:45   ` folkert
2013-07-30 20:45   ` folkert
2013-07-31  8:46     ` Stefan Hajnoczi
2013-08-02 11:37       ` folkert
2013-08-02 15:25         ` Stefan Hajnoczi
2013-08-02 18:06           ` folkert
2013-08-05 11:31             ` Stefan Hajnoczi
2013-08-05 20:59               ` folkert
2013-08-06  8:13                 ` Stefan Hajnoczi
2013-09-03 17:03                   ` folkert
2013-08-26 19:50               ` folkert
2013-08-27  7:31                 ` Stefan Hajnoczi
2013-10-29 17:18                   ` folkert
2013-11-01  9:52                     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.