All of lore.kernel.org
 help / color / mirror / Atom feed
* persistent tun & different virtual NICs & dead guest network
@ 2009-04-04 10:26 Michael Tokarev
  2009-04-05 11:58   ` [Qemu-devel] " Avi Kivity
  0 siblings, 1 reply; 7+ messages in thread
From: Michael Tokarev @ 2009-04-04 10:26 UTC (permalink / raw)
  To: KVM list

Hello.

2 days debugging an.. issue here, and finally got it.
To make the long and painful (it was for me anyway)
story short...

kvm provides a way to control various offload settings
on the "host side" of the tun network device (I mean
the `-net tap' setup) from within guest.  I.e., guest
can set/clear various offload bits according to its
capabilities/wishes.

The problem is that different virtual NICs as used by
kvm/qemu expects and sets different offload bits for
the virtual NIC.  And sets only those bits which -
as they "think" - differs from the default (all-off).

This means that when changing virtual NIC model AND
using persistent tun device, it's very likely to get
inconsistent flags.

For example, here's how the offload settings on the
host looks like after using e1000 driver in guest
(freshly created persistent tun device):

  rx-checksumming: on
  tx-checksumming: on
  scatter-gather: on
  tcp segmentation offload: on
  udp fragmentation offload: off
  generic segmentation offload: off
  large receive offload: off

Here's the same setting when using virtio_net
instead:

  rx-checksumming: on
  tx-checksumming: off
  scatter-gather: off
  tcp segmentation offload: off
  udp fragmentation offload: off
  generic segmentation offload: off
  large receive offload: off

I.e., only rx-checksumming.  When using virtio_net
from 2.6.29, which supports LRO, it also turns on
large receive offload.

Now, say, I tried a host with e1000 driver, and it
turned on tx, sg and tso bits.  And now I'm trying
to run a guest with new virtio-net NIC instead.  It
turns on lro bit, but the network does not work anyway:
almost any packet that's being sent from host to the
guest has incorrect checksum - because the NIC is marked
as able to do tx-checksumming but it does not do it.
The network is dead.

Now, after trying that and this, not understanding
what's going on etc, let's reboot back with e1000
NIC which worked a few minutes ago... just to discover
that it does not work anymore too!  Because previous
attempt with virtio_net resulted in lro being on, but
the driver does not support it!  So now, we've non-
working network again, and now, it does not matter
which driver we'll try: neither of them will work
because the offload settings are broken.

It's more: one can't control this stuff from the
host side using standard ethtool: it says that
the operation is not supported (I wonder how kvm
performs the settings changes).

The solution here is to re-create the tun device
before changing the virtual NIC model.  But it
isn't always possible, esp. when guests are
being run from non-root user (where persistent
tun devices are most useful).

Can this be fixed somehow please?

I think all the settings should be reset to 0
when opening the tun device.

Thanks.

/mjt,
   who lost 2 more days and had another sleepless
   night trying to understand what's going wrong...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: persistent tun & different virtual NICs & dead guest network
  2009-04-04 10:26 persistent tun & different virtual NICs & dead guest network Michael Tokarev
@ 2009-04-05 11:58   ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-04-05 11:58 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: KVM list, qemu-devel

(cc qemu-devel)

Michael Tokarev wrote:
> Hello.
>
> 2 days debugging an.. issue here, and finally got it.
> To make the long and painful (it was for me anyway)
> story short...
>
> kvm provides a way to control various offload settings
> on the "host side" of the tun network device (I mean
> the `-net tap' setup) from within guest.  I.e., guest
> can set/clear various offload bits according to its
> capabilities/wishes.
>
> The problem is that different virtual NICs as used by
> kvm/qemu expects and sets different offload bits for
> the virtual NIC.  And sets only those bits which -
> as they "think" - differs from the default (all-off).
>
> This means that when changing virtual NIC model AND
> using persistent tun device, it's very likely to get
> inconsistent flags.
>
> For example, here's how the offload settings on the
> host looks like after using e1000 driver in guest
> (freshly created persistent tun device):
>
>  rx-checksumming: on
>  tx-checksumming: on
>  scatter-gather: on
>  tcp segmentation offload: on
>  udp fragmentation offload: off
>  generic segmentation offload: off
>  large receive offload: off
>
> Here's the same setting when using virtio_net
> instead:
>
>  rx-checksumming: on
>  tx-checksumming: off
>  scatter-gather: off
>  tcp segmentation offload: off
>  udp fragmentation offload: off
>  generic segmentation offload: off
>  large receive offload: off
>
> I.e., only rx-checksumming.  When using virtio_net
> from 2.6.29, which supports LRO, it also turns on
> large receive offload.
>
> Now, say, I tried a host with e1000 driver, and it
> turned on tx, sg and tso bits.  And now I'm trying
> to run a guest with new virtio-net NIC instead.  It
> turns on lro bit, but the network does not work anyway:
> almost any packet that's being sent from host to the
> guest has incorrect checksum - because the NIC is marked
> as able to do tx-checksumming but it does not do it.
> The network is dead.
>
> Now, after trying that and this, not understanding
> what's going on etc, let's reboot back with e1000
> NIC which worked a few minutes ago... just to discover
> that it does not work anymore too!  Because previous
> attempt with virtio_net resulted in lro being on, but
> the driver does not support it!  So now, we've non-
> working network again, and now, it does not matter
> which driver we'll try: neither of them will work
> because the offload settings are broken.
>
> It's more: one can't control this stuff from the
> host side using standard ethtool: it says that
> the operation is not supported (I wonder how kvm
> performs the settings changes).
>
> The solution here is to re-create the tun device
> before changing the virtual NIC model.  But it
> isn't always possible, esp. when guests are
> being run from non-root user (where persistent
> tun devices are most useful).
>
> Can this be fixed somehow please?
>
> I think all the settings should be reset to 0
> when opening the tun device.

This should definitely be fixed.  I'll look at writing a patch.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: persistent tun & different virtual NICs & dead guest network
@ 2009-04-05 11:58   ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-04-05 11:58 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: qemu-devel, KVM list

(cc qemu-devel)

Michael Tokarev wrote:
> Hello.
>
> 2 days debugging an.. issue here, and finally got it.
> To make the long and painful (it was for me anyway)
> story short...
>
> kvm provides a way to control various offload settings
> on the "host side" of the tun network device (I mean
> the `-net tap' setup) from within guest.  I.e., guest
> can set/clear various offload bits according to its
> capabilities/wishes.
>
> The problem is that different virtual NICs as used by
> kvm/qemu expects and sets different offload bits for
> the virtual NIC.  And sets only those bits which -
> as they "think" - differs from the default (all-off).
>
> This means that when changing virtual NIC model AND
> using persistent tun device, it's very likely to get
> inconsistent flags.
>
> For example, here's how the offload settings on the
> host looks like after using e1000 driver in guest
> (freshly created persistent tun device):
>
>  rx-checksumming: on
>  tx-checksumming: on
>  scatter-gather: on
>  tcp segmentation offload: on
>  udp fragmentation offload: off
>  generic segmentation offload: off
>  large receive offload: off
>
> Here's the same setting when using virtio_net
> instead:
>
>  rx-checksumming: on
>  tx-checksumming: off
>  scatter-gather: off
>  tcp segmentation offload: off
>  udp fragmentation offload: off
>  generic segmentation offload: off
>  large receive offload: off
>
> I.e., only rx-checksumming.  When using virtio_net
> from 2.6.29, which supports LRO, it also turns on
> large receive offload.
>
> Now, say, I tried a host with e1000 driver, and it
> turned on tx, sg and tso bits.  And now I'm trying
> to run a guest with new virtio-net NIC instead.  It
> turns on lro bit, but the network does not work anyway:
> almost any packet that's being sent from host to the
> guest has incorrect checksum - because the NIC is marked
> as able to do tx-checksumming but it does not do it.
> The network is dead.
>
> Now, after trying that and this, not understanding
> what's going on etc, let's reboot back with e1000
> NIC which worked a few minutes ago... just to discover
> that it does not work anymore too!  Because previous
> attempt with virtio_net resulted in lro being on, but
> the driver does not support it!  So now, we've non-
> working network again, and now, it does not matter
> which driver we'll try: neither of them will work
> because the offload settings are broken.
>
> It's more: one can't control this stuff from the
> host side using standard ethtool: it says that
> the operation is not supported (I wonder how kvm
> performs the settings changes).
>
> The solution here is to re-create the tun device
> before changing the virtual NIC model.  But it
> isn't always possible, esp. when guests are
> being run from non-root user (where persistent
> tun devices are most useful).
>
> Can this be fixed somehow please?
>
> I think all the settings should be reset to 0
> when opening the tun device.

This should definitely be fixed.  I'll look at writing a patch.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: persistent tun & different virtual NICs & dead guest network
  2009-04-05 11:58   ` [Qemu-devel] " Avi Kivity
@ 2009-04-05 12:12     ` Avi Kivity
  -1 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-04-05 12:12 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: KVM list, qemu-devel

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

Avi Kivity wrote:
>> I think all the settings should be reset to 0
>> when opening the tun device.
>
> This should definitely be fixed.  I'll look at writing a patch.
>

Okay, that's not in upstream qemu, so I committed a fix to 
kvm-userspace.git.

Attached if you want to test it.

-- 
error compiling committee.c: too many arguments to function


[-- Attachment #2: 0001-kvm-qemu-clear-tap-features-on-initialization.patch --]
[-- Type: text/x-patch, Size: 926 bytes --]

>From 25971710409c374e9486c960c297f324a9164a65 Mon Sep 17 00:00:00 2001
From: Avi Kivity <avi@redhat.com>
Date: Sun, 5 Apr 2009 15:08:55 +0300
Subject: [PATCH] kvm: qemu: clear tap features on initialization

tap features change how tap interprets data, so they must be cleared on
initialization to prevent old settings from interfering with new guest
instances.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 qemu/net.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/qemu/net.c b/qemu/net.c
index d753fa0..703d01c 100644
--- a/qemu/net.c
+++ b/qemu/net.c
@@ -930,6 +930,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
 #endif
 #ifdef TUNSETOFFLOAD
     s->vc->set_offload = tap_set_offload;
+    tap_set_offload(s->vc, 0, 0, 0, 0);
 #endif
     qemu_set_fd_handler2(s->fd, tap_can_send, tap_send, NULL, s);
     snprintf(s->vc->info_str, sizeof(s->vc->info_str), "fd=%d", fd);
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: persistent tun & different virtual NICs & dead guest network
@ 2009-04-05 12:12     ` Avi Kivity
  0 siblings, 0 replies; 7+ messages in thread
From: Avi Kivity @ 2009-04-05 12:12 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: qemu-devel, KVM list

[-- Attachment #1: Type: text/plain, Size: 349 bytes --]

Avi Kivity wrote:
>> I think all the settings should be reset to 0
>> when opening the tun device.
>
> This should definitely be fixed.  I'll look at writing a patch.
>

Okay, that's not in upstream qemu, so I committed a fix to 
kvm-userspace.git.

Attached if you want to test it.

-- 
error compiling committee.c: too many arguments to function


[-- Attachment #2: 0001-kvm-qemu-clear-tap-features-on-initialization.patch --]
[-- Type: text/x-patch, Size: 926 bytes --]

>From 25971710409c374e9486c960c297f324a9164a65 Mon Sep 17 00:00:00 2001
From: Avi Kivity <avi@redhat.com>
Date: Sun, 5 Apr 2009 15:08:55 +0300
Subject: [PATCH] kvm: qemu: clear tap features on initialization

tap features change how tap interprets data, so they must be cleared on
initialization to prevent old settings from interfering with new guest
instances.

Signed-off-by: Avi Kivity <avi@redhat.com>
---
 qemu/net.c |    1 +
 1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/qemu/net.c b/qemu/net.c
index d753fa0..703d01c 100644
--- a/qemu/net.c
+++ b/qemu/net.c
@@ -930,6 +930,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
 #endif
 #ifdef TUNSETOFFLOAD
     s->vc->set_offload = tap_set_offload;
+    tap_set_offload(s->vc, 0, 0, 0, 0);
 #endif
     qemu_set_fd_handler2(s->fd, tap_can_send, tap_send, NULL, s);
     snprintf(s->vc->info_str, sizeof(s->vc->info_str), "fd=%d", fd);
-- 
1.6.0.6


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: persistent tun & different virtual NICs & dead guest network
  2009-04-05 12:12     ` [Qemu-devel] " Avi Kivity
@ 2009-07-09 18:13       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 7+ messages in thread
From: Michael S. Tsirkin @ 2009-07-09 18:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael Tokarev, KVM list, qemu-devel

On Sun, Apr 05, 2009 at 03:12:04PM +0300, Avi Kivity wrote:
> Avi Kivity wrote:
>>> I think all the settings should be reset to 0
>>> when opening the tun device.
>>
>> This should definitely be fixed.  I'll look at writing a patch.
>>
>
> Okay, that's not in upstream qemu, so I committed a fix to  
> kvm-userspace.git.
>
> Attached if you want to test it.
>
> -- 
> error compiling committee.c: too many arguments to function
>

> >From 25971710409c374e9486c960c297f324a9164a65 Mon Sep 17 00:00:00 2001
> From: Avi Kivity <avi@redhat.com>
> Date: Sun, 5 Apr 2009 15:08:55 +0300
> Subject: [PATCH] kvm: qemu: clear tap features on initialization
> 
> tap features change how tap interprets data, so they must be cleared on
> initialization to prevent old settings from interfering with new guest
> instances.
> 
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  qemu/net.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/qemu/net.c b/qemu/net.c
> index d753fa0..703d01c 100644
> --- a/qemu/net.c
> +++ b/qemu/net.c
> @@ -930,6 +930,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
>  #endif
>  #ifdef TUNSETOFFLOAD
>      s->vc->set_offload = tap_set_offload;
> +    tap_set_offload(s->vc, 0, 0, 0, 0);

BTW, should not these bits be restored on load?
I couldn't find code that does this.

>  #endif
>      qemu_set_fd_handler2(s->fd, tap_can_send, tap_send, NULL, s);
>      snprintf(s->vc->info_str, sizeof(s->vc->info_str), "fd=%d", fd);

Looks good to me.  Just a thought: do we want kernel to give us an
option to tie the offload bits to the character device, so that it gets
cleaned automatically when qemu dies?

-- 
MST

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [Qemu-devel] Re: persistent tun & different virtual NICs & dead guest network
@ 2009-07-09 18:13       ` Michael S. Tsirkin
  0 siblings, 0 replies; 7+ messages in thread
From: Michael S. Tsirkin @ 2009-07-09 18:13 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Michael Tokarev, qemu-devel, KVM list

On Sun, Apr 05, 2009 at 03:12:04PM +0300, Avi Kivity wrote:
> Avi Kivity wrote:
>>> I think all the settings should be reset to 0
>>> when opening the tun device.
>>
>> This should definitely be fixed.  I'll look at writing a patch.
>>
>
> Okay, that's not in upstream qemu, so I committed a fix to  
> kvm-userspace.git.
>
> Attached if you want to test it.
>
> -- 
> error compiling committee.c: too many arguments to function
>

> >From 25971710409c374e9486c960c297f324a9164a65 Mon Sep 17 00:00:00 2001
> From: Avi Kivity <avi@redhat.com>
> Date: Sun, 5 Apr 2009 15:08:55 +0300
> Subject: [PATCH] kvm: qemu: clear tap features on initialization
> 
> tap features change how tap interprets data, so they must be cleared on
> initialization to prevent old settings from interfering with new guest
> instances.
> 
> Signed-off-by: Avi Kivity <avi@redhat.com>
> ---
>  qemu/net.c |    1 +
>  1 files changed, 1 insertions(+), 0 deletions(-)
> 
> diff --git a/qemu/net.c b/qemu/net.c
> index d753fa0..703d01c 100644
> --- a/qemu/net.c
> +++ b/qemu/net.c
> @@ -930,6 +930,7 @@ static TAPState *net_tap_fd_init(VLANState *vlan,
>  #endif
>  #ifdef TUNSETOFFLOAD
>      s->vc->set_offload = tap_set_offload;
> +    tap_set_offload(s->vc, 0, 0, 0, 0);

BTW, should not these bits be restored on load?
I couldn't find code that does this.

>  #endif
>      qemu_set_fd_handler2(s->fd, tap_can_send, tap_send, NULL, s);
>      snprintf(s->vc->info_str, sizeof(s->vc->info_str), "fd=%d", fd);

Looks good to me.  Just a thought: do we want kernel to give us an
option to tie the offload bits to the character device, so that it gets
cleaned automatically when qemu dies?

-- 
MST

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2009-07-09 18:14 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-04-04 10:26 persistent tun & different virtual NICs & dead guest network Michael Tokarev
2009-04-05 11:58 ` Avi Kivity
2009-04-05 11:58   ` [Qemu-devel] " Avi Kivity
2009-04-05 12:12   ` Avi Kivity
2009-04-05 12:12     ` [Qemu-devel] " Avi Kivity
2009-07-09 18:13     ` Michael S. Tsirkin
2009-07-09 18:13       ` [Qemu-devel] " Michael S. Tsirkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.