All of lore.kernel.org
 help / color / mirror / Atom feed
* tap devices not receiving packets from a bridge
@ 2012-11-22 14:29 ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-22 14:29 UTC (permalink / raw)
  To: qemu-devel, netdev

Hi,

is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
a bridge from sending pakets to a tap device?

My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
which is based on Linux 3.2.33.

I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
the tap does not have any TX packets. RX is working fine. I see the packets coming in at
the physical interface on the host, but they are not forwarded to the tap interface.
The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
bridge) again. Unfortunately it seems not to be reproducible.

Maybe this sounds familiar to someone?

Thank you,
Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [Qemu-devel] tap devices not receiving packets from a bridge
@ 2012-11-22 14:29 ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-22 14:29 UTC (permalink / raw)
  To: qemu-devel, netdev

Hi,

is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
a bridge from sending pakets to a tap device?

My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
which is based on Linux 3.2.33.

I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
the tap does not have any TX packets. RX is working fine. I see the packets coming in at
the physical interface on the host, but they are not forwarded to the tap interface.
The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
bridge) again. Unfortunately it seems not to be reproducible.

Maybe this sounds familiar to someone?

Thank you,
Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2012-11-22 14:29 ` [Qemu-devel] " Peter Lieven
@ 2012-11-23  7:02   ` Stefan Hajnoczi
  -1 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2012-11-23  7:02 UTC (permalink / raw)
  To: Peter Lieven; +Cc: qemu-devel, netdev

On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
> a bridge from sending pakets to a tap device?
> 
> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
> which is based on Linux 3.2.33.
> 
> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
> the physical interface on the host, but they are not forwarded to the tap interface.
> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
> bridge) again. Unfortunately it seems not to be reproducible.

Not sure but this patch from Michael Tsirkin may help - it solves an
issue with persistent tap devices:

http://patchwork.ozlabs.org/patch/198598/

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2012-11-23  7:02   ` Stefan Hajnoczi
  0 siblings, 0 replies; 44+ messages in thread
From: Stefan Hajnoczi @ 2012-11-23  7:02 UTC (permalink / raw)
  To: Peter Lieven; +Cc: netdev, qemu-devel

On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
> a bridge from sending pakets to a tap device?
> 
> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
> which is based on Linux 3.2.33.
> 
> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
> the physical interface on the host, but they are not forwarded to the tap interface.
> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
> bridge) again. Unfortunately it seems not to be reproducible.

Not sure but this patch from Michael Tsirkin may help - it solves an
issue with persistent tap devices:

http://patchwork.ozlabs.org/patch/198598/

Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2012-11-23  7:02   ` Stefan Hajnoczi
@ 2012-11-23  9:41     ` Peter Lieven
  -1 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-23  9:41 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, netdev, mst


Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:

> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>> a bridge from sending pakets to a tap device?
>> 
>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>> which is based on Linux 3.2.33.
>> 
>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>> the physical interface on the host, but they are not forwarded to the tap interface.
>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>> bridge) again. Unfortunately it seems not to be reproducible.
> 
> Not sure but this patch from Michael Tsirkin may help - it solves an
> issue with persistent tap devices:
> 
> http://patchwork.ozlabs.org/patch/198598/

Hi Stefan,

thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
with persistent taps. But maybe the taps in the kernel are not deleted directly. 
Can you remember what the syptomps of the above issue have been? Sorry for
being vague, but I currently have no clue whats going on.

Can someone who has more internal knowledge of the bridging/tap code say if qemu can
be responsible at all if the tap device is not receiving packets from the bridge.

If I have the following config. Lets say packets coming in via physical interface eth1.123,
and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
and tap0 are member of br123. 

If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
in via eth1.123, but the reply can't be seen on tap0.

Peter

> 
> Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2012-11-23  9:41     ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-23  9:41 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, qemu-devel, mst


Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:

> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>> a bridge from sending pakets to a tap device?
>> 
>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>> which is based on Linux 3.2.33.
>> 
>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>> the physical interface on the host, but they are not forwarded to the tap interface.
>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>> bridge) again. Unfortunately it seems not to be reproducible.
> 
> Not sure but this patch from Michael Tsirkin may help - it solves an
> issue with persistent tap devices:
> 
> http://patchwork.ozlabs.org/patch/198598/

Hi Stefan,

thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
with persistent taps. But maybe the taps in the kernel are not deleted directly. 
Can you remember what the syptomps of the above issue have been? Sorry for
being vague, but I currently have no clue whats going on.

Can someone who has more internal knowledge of the bridging/tap code say if qemu can
be responsible at all if the tap device is not receiving packets from the bridge.

If I have the following config. Lets say packets coming in via physical interface eth1.123,
and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
and tap0 are member of br123. 

If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
in via eth1.123, but the reply can't be seen on tap0.

Peter

> 
> Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2012-11-23  9:41     ` Peter Lieven
  (?)
@ 2012-11-23 11:01     ` Michael S. Tsirkin
  2012-11-23 11:02         ` Peter Lieven
  2013-01-22  9:04       ` Peter Lieven
  -1 siblings, 2 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2012-11-23 11:01 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, qemu-devel, netdev

On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
> 
> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
> 
> > On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
> >> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
> >> a bridge from sending pakets to a tap device?
> >> 
> >> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
> >> which is based on Linux 3.2.33.
> >> 
> >> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
> >> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
> >> the physical interface on the host, but they are not forwarded to the tap interface.
> >> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
> >> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
> >> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
> >> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
> >> bridge) again. Unfortunately it seems not to be reproducible.
> > 
> > Not sure but this patch from Michael Tsirkin may help - it solves an
> > issue with persistent tap devices:
> > 
> > http://patchwork.ozlabs.org/patch/198598/
> 
> Hi Stefan,
> 
> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
> with persistent taps. But maybe the taps in the kernel are not deleted directly. 
> Can you remember what the syptomps of the above issue have been? Sorry for
> being vague, but I currently have no clue whats going on.
> 
> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
> be responsible at all if the tap device is not receiving packets from the bridge.
> 
> If I have the following config. Lets say packets coming in via physical interface eth1.123,
> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
> and tap0 are member of br123. 
> 
> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
> from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
> in via eth1.123, but the reply can't be seen on tap0.
> 
> Peter

If guest is not consuming packets, a TX queue in tap device
will with time overrun (there's space for 1000 packets there).
This is code from tun:

        if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
                          >= dev->tx_queue_len / tun->numqueues){
                if (!(tun->flags & TUN_ONE_QUEUE)) {
                        /* Normal queueing mode. */
                        /* Packet scheduler handles dropping of further
 * packets. */
                        netif_stop_subqueue(dev, txq);

                        /* We won't see all dropped packets
 * individually, so overrun
                         * error is more appropriate. */
                        dev->stats.tx_fifo_errors++;


So you can detect that this triggered by looking at fifo errors counter in device.

Once this happens TX queue is stopped, then you hit this path:

                        if (!netif_xmit_stopped(txq)) {
                                __this_cpu_inc(xmit_recursion);
                                rc = dev_hard_start_xmit(skb, dev, txq);
                                __this_cpu_dec(xmit_recursion);
                                if (dev_xmit_complete(rc)) {
                                        HARD_TX_UNLOCK(dev, txq);
                                        goto out;
                                }
                        }

so packets are not passed to device anymore.
It will stay this way until guest consumes some packets and
queue is restarted.

> > 
> > Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2012-11-23 11:01     ` Michael S. Tsirkin
@ 2012-11-23 11:02         ` Peter Lieven
  2013-01-22  9:04       ` Peter Lieven
  1 sibling, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-23 11:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, netdev


Am 23.11.2012 um 12:01 schrieb Michael S. Tsirkin:

> On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
>> 
>> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
>> 
>>> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>>>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>>>> a bridge from sending pakets to a tap device?
>>>> 
>>>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>>>> which is based on Linux 3.2.33.
>>>> 
>>>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>>>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>>>> the physical interface on the host, but they are not forwarded to the tap interface.
>>>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>>>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>>>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>>>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>>>> bridge) again. Unfortunately it seems not to be reproducible.
>>> 
>>> Not sure but this patch from Michael Tsirkin may help - it solves an
>>> issue with persistent tap devices:
>>> 
>>> http://patchwork.ozlabs.org/patch/198598/
>> 
>> Hi Stefan,
>> 
>> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
>> with persistent taps. But maybe the taps in the kernel are not deleted directly. 
>> Can you remember what the syptomps of the above issue have been? Sorry for
>> being vague, but I currently have no clue whats going on.
>> 
>> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
>> be responsible at all if the tap device is not receiving packets from the bridge.
>> 
>> If I have the following config. Lets say packets coming in via physical interface eth1.123,
>> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
>> and tap0 are member of br123. 
>> 
>> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
>> from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
>> in via eth1.123, but the reply can't be seen on tap0.
>> 
>> Peter
> 
> If guest is not consuming packets, a TX queue in tap device
> will with time overrun (there's space for 1000 packets there).
> This is code from tun:

>From what I remember there where zero TX packets and no TX errors
on the device.

Might it be that this queue is somehow not cleared correctly when
the device is reassigned (although it was nonpersistant in between).

Thank you,
Peter

> 
>        if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
>> = dev->tx_queue_len / tun->numqueues){
>                if (!(tun->flags & TUN_ONE_QUEUE)) {
>                        /* Normal queueing mode. */
>                        /* Packet scheduler handles dropping of further
> * packets. */
>                        netif_stop_subqueue(dev, txq);
> 
>                        /* We won't see all dropped packets
> * individually, so overrun
>                         * error is more appropriate. */
>                        dev->stats.tx_fifo_errors++;
> 
> 
> So you can detect that this triggered by looking at fifo errors counter in device.
> 
> Once this happens TX queue is stopped, then you hit this path:
> 
>                        if (!netif_xmit_stopped(txq)) {
>                                __this_cpu_inc(xmit_recursion);
>                                rc = dev_hard_start_xmit(skb, dev, txq);
>                                __this_cpu_dec(xmit_recursion);
>                                if (dev_xmit_complete(rc)) {
>                                        HARD_TX_UNLOCK(dev, txq);
>                                        goto out;
>                                }
>                        }
> 
> so packets are not passed to device anymore.
> It will stay this way until guest consumes some packets and
> queue is restarted.
> 
>>> 
>>> Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2012-11-23 11:02         ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-23 11:02 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, netdev


Am 23.11.2012 um 12:01 schrieb Michael S. Tsirkin:

> On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
>> 
>> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
>> 
>>> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>>>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>>>> a bridge from sending pakets to a tap device?
>>>> 
>>>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>>>> which is based on Linux 3.2.33.
>>>> 
>>>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>>>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>>>> the physical interface on the host, but they are not forwarded to the tap interface.
>>>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>>>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>>>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>>>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>>>> bridge) again. Unfortunately it seems not to be reproducible.
>>> 
>>> Not sure but this patch from Michael Tsirkin may help - it solves an
>>> issue with persistent tap devices:
>>> 
>>> http://patchwork.ozlabs.org/patch/198598/
>> 
>> Hi Stefan,
>> 
>> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
>> with persistent taps. But maybe the taps in the kernel are not deleted directly. 
>> Can you remember what the syptomps of the above issue have been? Sorry for
>> being vague, but I currently have no clue whats going on.
>> 
>> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
>> be responsible at all if the tap device is not receiving packets from the bridge.
>> 
>> If I have the following config. Lets say packets coming in via physical interface eth1.123,
>> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
>> and tap0 are member of br123. 
>> 
>> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
>> from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
>> in via eth1.123, but the reply can't be seen on tap0.
>> 
>> Peter
> 
> If guest is not consuming packets, a TX queue in tap device
> will with time overrun (there's space for 1000 packets there).
> This is code from tun:

From what I remember there where zero TX packets and no TX errors
on the device.

Might it be that this queue is somehow not cleared correctly when
the device is reassigned (although it was nonpersistant in between).

Thank you,
Peter

> 
>        if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
>> = dev->tx_queue_len / tun->numqueues){
>                if (!(tun->flags & TUN_ONE_QUEUE)) {
>                        /* Normal queueing mode. */
>                        /* Packet scheduler handles dropping of further
> * packets. */
>                        netif_stop_subqueue(dev, txq);
> 
>                        /* We won't see all dropped packets
> * individually, so overrun
>                         * error is more appropriate. */
>                        dev->stats.tx_fifo_errors++;
> 
> 
> So you can detect that this triggered by looking at fifo errors counter in device.
> 
> Once this happens TX queue is stopped, then you hit this path:
> 
>                        if (!netif_xmit_stopped(txq)) {
>                                __this_cpu_inc(xmit_recursion);
>                                rc = dev_hard_start_xmit(skb, dev, txq);
>                                __this_cpu_dec(xmit_recursion);
>                                if (dev_xmit_complete(rc)) {
>                                        HARD_TX_UNLOCK(dev, txq);
>                                        goto out;
>                                }
>                        }
> 
> so packets are not passed to device anymore.
> It will stay this way until guest consumes some packets and
> queue is restarted.
> 
>>> 
>>> Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2012-11-23  7:02   ` Stefan Hajnoczi
@ 2012-11-29 18:58     ` Peter Lieven
  -1 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-29 18:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: qemu-devel, netdev, mst

Am 23.11.2012 08:02, schrieb Stefan Hajnoczi:
> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>> a bridge from sending pakets to a tap device?
>>
>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>> which is based on Linux 3.2.33.
>>
>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>> the physical interface on the host, but they are not forwarded to the tap interface.
>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>> bridge) again. Unfortunately it seems not to be reproducible.
> 
> Not sure but this patch from Michael Tsirkin may help - it solves an
> issue with persistent tap devices:
> 
> http://patchwork.ozlabs.org/patch/198598/

I have tried that patch (even if I do not use persistant taps), but it doesn't help.

What I can say now is that if a VM is not working with a tap - lets say tap10 then
it will not work with tap10, even if the vm is shut down. tap10 is set to non-persistant.
then the vm is started again, assigned occasionally again tap10 and is not working again.

BUT, if I use qemu-kvm-1.0.1 in the second case it is working. I have seen that there is
a lot of changes between 1.0.1 and 1.2.0 in the tap code. Maybe there is a bug in the
inititialization since then.

What also seem to have changed is that vlans have been removed. I was not aware of that,
so I still use vlans which are now emulated via hubs. Maybe this is related.

Peter

> 
> Stefan
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2012-11-29 18:58     ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2012-11-29 18:58 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: netdev, qemu-devel, mst

Am 23.11.2012 08:02, schrieb Stefan Hajnoczi:
> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>> a bridge from sending pakets to a tap device?
>>
>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>> which is based on Linux 3.2.33.
>>
>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>> the physical interface on the host, but they are not forwarded to the tap interface.
>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>> bridge) again. Unfortunately it seems not to be reproducible.
> 
> Not sure but this patch from Michael Tsirkin may help - it solves an
> issue with persistent tap devices:
> 
> http://patchwork.ozlabs.org/patch/198598/

I have tried that patch (even if I do not use persistant taps), but it doesn't help.

What I can say now is that if a VM is not working with a tap - lets say tap10 then
it will not work with tap10, even if the vm is shut down. tap10 is set to non-persistant.
then the vm is started again, assigned occasionally again tap10 and is not working again.

BUT, if I use qemu-kvm-1.0.1 in the second case it is working. I have seen that there is
a lot of changes between 1.0.1 and 1.2.0 in the tap code. Maybe there is a bug in the
inititialization since then.

What also seem to have changed is that vlans have been removed. I was not aware of that,
so I still use vlans which are now emulated via hubs. Maybe this is related.

Peter

> 
> Stefan
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2012-11-23 11:01     ` Michael S. Tsirkin
  2012-11-23 11:02         ` Peter Lieven
@ 2013-01-22  9:04       ` Peter Lieven
  2013-01-22  9:43         ` Peter Lieven
  2013-01-23 10:03         ` [Qemu-devel] " Michael S. Tsirkin
  1 sibling, 2 replies; 44+ messages in thread
From: Peter Lieven @ 2013-01-22  9:04 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, netdev

On 23.11.2012 12:01, Michael S. Tsirkin wrote:
> On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
>>
>> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
>>
>>> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>>>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>>>> a bridge from sending pakets to a tap device?
>>>>
>>>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>>>> which is based on Linux 3.2.33.
>>>>
>>>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>>>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>>>> the physical interface on the host, but they are not forwarded to the tap interface.
>>>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>>>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>>>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>>>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>>>> bridge) again. Unfortunately it seems not to be reproducible.
>>>
>>> Not sure but this patch from Michael Tsirkin may help - it solves an
>>> issue with persistent tap devices:
>>>
>>> http://patchwork.ozlabs.org/patch/198598/
>>
>> Hi Stefan,
>>
>> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
>> with persistent taps. But maybe the taps in the kernel are not deleted directly.
>> Can you remember what the syptomps of the above issue have been? Sorry for
>> being vague, but I currently have no clue whats going on.
>>
>> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
>> be responsible at all if the tap device is not receiving packets from the bridge.
>>
>> If I have the following config. Lets say packets coming in via physical interface eth1.123,
>> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
>> and tap0 are member of br123.
>>
>> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
>> from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
>> in via eth1.123, but the reply can't be seen on tap0.
>>
>> Peter
>
> If guest is not consuming packets, a TX queue in tap device
> will with time overrun (there's space for 1000 packets there).
> This is code from tun:
>
>          if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
>                            >= dev->tx_queue_len / tun->numqueues){
>                  if (!(tun->flags & TUN_ONE_QUEUE)) {
>                          /* Normal queueing mode. */
>                          /* Packet scheduler handles dropping of further
>   * packets. */
>                          netif_stop_subqueue(dev, txq);
>
>                          /* We won't see all dropped packets
>   * individually, so overrun
>                           * error is more appropriate. */
>                          dev->stats.tx_fifo_errors++;
>
>
> So you can detect that this triggered by looking at fifo errors counter in device.
>
> Once this happens TX queue is stopped, then you hit this path:
>
>                          if (!netif_xmit_stopped(txq)) {
>                                  __this_cpu_inc(xmit_recursion);
>                                  rc = dev_hard_start_xmit(skb, dev, txq);
>                                  __this_cpu_dec(xmit_recursion);
>                                  if (dev_xmit_complete(rc)) {
>                                          HARD_TX_UNLOCK(dev, txq);
>                                          goto out;
>                                  }
>                          }
>
> so packets are not passed to device anymore.
> It will stay this way until guest consumes some packets and
> queue is restarted.

After some time I again have a vServer in this state. It seems not like there
are no TX errors.

# ifconfig tap10
tap10     Link encap:Ethernet  HWaddr 7a:59:20:6f:e7:e5
           inet6 addr: fe80::7859:20ff:fe6f:e7e5/64 Scope:Link
           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
           RX packets:197431 errors:0 dropped:0 overruns:0 frame:0
           TX packets:264309 errors:0 dropped:0 overruns:2 carrier:0
           collisions:0 txqueuelen:500
           RX bytes:13842063 (13.8 MB)  TX bytes:35092821 (35.0 MB)

It seems like the bridge is not forwarding any packets to the tap device anymore altough it has learnt
the MAC-Adresses and there are also broadcast packets coming in.

Any more ideas where I could debug?

Peter

>
>>>
>>> Stefan

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2013-01-22  9:04       ` Peter Lieven
@ 2013-01-22  9:43         ` Peter Lieven
  2013-01-23 10:03         ` [Qemu-devel] " Michael S. Tsirkin
  1 sibling, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2013-01-22  9:43 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, netdev

On 22.01.2013 10:04, Peter Lieven wrote:
> On 23.11.2012 12:01, Michael S. Tsirkin wrote:
>> On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
>>>
>>> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
>>>
>>>> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>>>>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>>>>> a bridge from sending pakets to a tap device?
>>>>>
>>>>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>>>>> which is based on Linux 3.2.33.
>>>>>
>>>>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>>>>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>>>>> the physical interface on the host, but they are not forwarded to the tap interface.
>>>>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>>>>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>>>>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>>>>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>>>>> bridge) again. Unfortunately it seems not to be reproducible.
>>>>
>>>> Not sure but this patch from Michael Tsirkin may help - it solves an
>>>> issue with persistent tap devices:
>>>>
>>>> http://patchwork.ozlabs.org/patch/198598/
>>>
>>> Hi Stefan,
>>>
>>> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
>>> with persistent taps. But maybe the taps in the kernel are not deleted directly.
>>> Can you remember what the syptomps of the above issue have been? Sorry for
>>> being vague, but I currently have no clue whats going on.
>>>
>>> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
>>> be responsible at all if the tap device is not receiving packets from the bridge.
>>>
>>> If I have the following config. Lets say packets coming in via physical interface eth1.123,
>>> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
>>> and tap0 are member of br123.
>>>
>>> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
>>> from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
>>> in via eth1.123, but the reply can't be seen on tap0.
>>>
>>> Peter
>>
>> If guest is not consuming packets, a TX queue in tap device
>> will with time overrun (there's space for 1000 packets there).
>> This is code from tun:
>>
>>          if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
>>                            >= dev->tx_queue_len / tun->numqueues){
>>                  if (!(tun->flags & TUN_ONE_QUEUE)) {
>>                          /* Normal queueing mode. */
>>                          /* Packet scheduler handles dropping of further
>>   * packets. */
>>                          netif_stop_subqueue(dev, txq);
>>
>>                          /* We won't see all dropped packets
>>   * individually, so overrun
>>                           * error is more appropriate. */
>>                          dev->stats.tx_fifo_errors++;
>>
>>
>> So you can detect that this triggered by looking at fifo errors counter in device.
>>
>> Once this happens TX queue is stopped, then you hit this path:
>>
>>                          if (!netif_xmit_stopped(txq)) {
>>                                  __this_cpu_inc(xmit_recursion);
>>                                  rc = dev_hard_start_xmit(skb, dev, txq);
>>                                  __this_cpu_dec(xmit_recursion);
>>                                  if (dev_xmit_complete(rc)) {
>>                                          HARD_TX_UNLOCK(dev, txq);
>>                                          goto out;
>>                                  }
>>                          }
>>
>> so packets are not passed to device anymore.
>> It will stay this way until guest consumes some packets and
>> queue is restarted.
>
> After some time I again have a vServer in this state. It seems not like there
> are no TX errors.
>
> # ifconfig tap10
> tap10     Link encap:Ethernet  HWaddr 7a:59:20:6f:e7:e5
>            inet6 addr: fe80::7859:20ff:fe6f:e7e5/64 Scope:Link
>            UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>            RX packets:197431 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:264309 errors:0 dropped:0 overruns:2 carrier:0
>            collisions:0 txqueuelen:500
>            RX bytes:13842063 (13.8 MB)  TX bytes:35092821 (35.0 MB)
>
> It seems like the bridge is not forwarding any packets to the tap device anymore altough it has learnt
> the MAC-Adresses and there are also broadcast packets coming in.
>
> Any more ideas where I could debug?

Would like to add I see the packets in ebtables forwarding chain, but the TX counters
of the interface are not increasing.

Peter


>
> Peter
>
>>
>>>>
>>>> Stefan
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-01-22  9:04       ` Peter Lieven
  2013-01-22  9:43         ` Peter Lieven
@ 2013-01-23 10:03         ` Michael S. Tsirkin
  2013-02-12  7:06           ` Peter Lieven
  1 sibling, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-01-23 10:03 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, qemu-devel, netdev

On Tue, Jan 22, 2013 at 10:04:07AM +0100, Peter Lieven wrote:
> On 23.11.2012 12:01, Michael S. Tsirkin wrote:
> >On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
> >>
> >>Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
> >>
> >>>On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
> >>>>is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
> >>>>a bridge from sending pakets to a tap device?
> >>>>
> >>>>My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
> >>>>which is based on Linux 3.2.33.
> >>>>
> >>>>I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
> >>>>the tap does not have any TX packets. RX is working fine. I see the packets coming in at
> >>>>the physical interface on the host, but they are not forwarded to the tap interface.
> >>>>The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
> >>>>It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
> >>>>It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
> >>>>is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
> >>>>bridge) again. Unfortunately it seems not to be reproducible.
> >>>
> >>>Not sure but this patch from Michael Tsirkin may help - it solves an
> >>>issue with persistent tap devices:
> >>>
> >>>http://patchwork.ozlabs.org/patch/198598/
> >>
> >>Hi Stefan,
> >>
> >>thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
> >>with persistent taps. But maybe the taps in the kernel are not deleted directly.
> >>Can you remember what the syptomps of the above issue have been? Sorry for
> >>being vague, but I currently have no clue whats going on.
> >>
> >>Can someone who has more internal knowledge of the bridging/tap code say if qemu can
> >>be responsible at all if the tap device is not receiving packets from the bridge.
> >>
> >>If I have the following config. Lets say packets coming in via physical interface eth1.123,
> >>and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
> >>and tap0 are member of br123.
> >>
> >>If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
> >>from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
> >>in via eth1.123, but the reply can't be seen on tap0.
> >>
> >>Peter
> >
> >If guest is not consuming packets, a TX queue in tap device
> >will with time overrun (there's space for 1000 packets there).
> >This is code from tun:
> >
> >         if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
> >                           >= dev->tx_queue_len / tun->numqueues){
> >                 if (!(tun->flags & TUN_ONE_QUEUE)) {
> >                         /* Normal queueing mode. */
> >                         /* Packet scheduler handles dropping of further
> >  * packets. */
> >                         netif_stop_subqueue(dev, txq);
> >
> >                         /* We won't see all dropped packets
> >  * individually, so overrun
> >                          * error is more appropriate. */
> >                         dev->stats.tx_fifo_errors++;
> >
> >
> >So you can detect that this triggered by looking at fifo errors counter in device.
> >
> >Once this happens TX queue is stopped, then you hit this path:
> >
> >                         if (!netif_xmit_stopped(txq)) {
> >                                 __this_cpu_inc(xmit_recursion);
> >                                 rc = dev_hard_start_xmit(skb, dev, txq);
> >                                 __this_cpu_dec(xmit_recursion);
> >                                 if (dev_xmit_complete(rc)) {
> >                                         HARD_TX_UNLOCK(dev, txq);
> >                                         goto out;
> >                                 }
> >                         }
> >
> >so packets are not passed to device anymore.
> >It will stay this way until guest consumes some packets and
> >queue is restarted.
> 
> After some time I again have a vServer in this state. It seems not like there
> are no TX errors.
> 
> # ifconfig tap10
> tap10     Link encap:Ethernet  HWaddr 7a:59:20:6f:e7:e5
>           inet6 addr: fe80::7859:20ff:fe6f:e7e5/64 Scope:Link
>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>           RX packets:197431 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:264309 errors:0 dropped:0 overruns:2 carrier:0
>           collisions:0 txqueuelen:500
>           RX bytes:13842063 (13.8 MB)  TX bytes:35092821 (35.0 MB)
> 
> It seems like the bridge is not forwarding any packets to the tap device anymore altough it has learnt
> the MAC-Adresses and there are also broadcast packets coming in.
> 
> Any more ideas where I could debug?
> 
> Peter
> 
> >
> >>>
> >>>Stefan

Hmm. So there are two overrun errors that triggered, so
it's possible after the second one the queue got stuck in an xoff state.
You'd have to use something like systemtap or kdb to poke at the
queue state to see whether xoff flag is set and/or look
at the receive queue length.

For future, we can try to set TUN_ONE_QUEUE flag on the interface,
or try applying this patch
5d097109257c03a71845729f8db6b5770c4bbedc
in kernel see if this helps.

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2013-01-23 10:03         ` [Qemu-devel] " Michael S. Tsirkin
@ 2013-02-12  7:06           ` Peter Lieven
  2013-02-12  9:08             ` [Qemu-devel] " Michael S. Tsirkin
  2013-05-14 14:21               ` [Qemu-devel] " Nicholas Thomas
  0 siblings, 2 replies; 44+ messages in thread
From: Peter Lieven @ 2013-02-12  7:06 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, netdev

On 23.01.2013 11:03, Michael S. Tsirkin wrote:
> On Tue, Jan 22, 2013 at 10:04:07AM +0100, Peter Lieven wrote:
>> On 23.11.2012 12:01, Michael S. Tsirkin wrote:
>>> On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
>>>>
>>>> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
>>>>
>>>>> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>>>>>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>>>>>> a bridge from sending pakets to a tap device?
>>>>>>
>>>>>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>>>>>> which is based on Linux 3.2.33.
>>>>>>
>>>>>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>>>>>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>>>>>> the physical interface on the host, but they are not forwarded to the tap interface.
>>>>>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>>>>>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>>>>>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>>>>>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>>>>>> bridge) again. Unfortunately it seems not to be reproducible.
>>>>>
>>>>> Not sure but this patch from Michael Tsirkin may help - it solves an
>>>>> issue with persistent tap devices:
>>>>>
>>>>> http://patchwork.ozlabs.org/patch/198598/
>>>>
>>>> Hi Stefan,
>>>>
>>>> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
>>>> with persistent taps. But maybe the taps in the kernel are not deleted directly.
>>>> Can you remember what the syptomps of the above issue have been? Sorry for
>>>> being vague, but I currently have no clue whats going on.
>>>>
>>>> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
>>>> be responsible at all if the tap device is not receiving packets from the bridge.
>>>>
>>>> If I have the following config. Lets say packets coming in via physical interface eth1.123,
>>>> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
>>>> and tap0 are member of br123.
>>>>
>>>> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
>>> >from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
>>>> in via eth1.123, but the reply can't be seen on tap0.
>>>>
>>>> Peter
>>>
>>> If guest is not consuming packets, a TX queue in tap device
>>> will with time overrun (there's space for 1000 packets there).
>>> This is code from tun:
>>>
>>>          if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
>>>                            >= dev->tx_queue_len / tun->numqueues){
>>>                  if (!(tun->flags & TUN_ONE_QUEUE)) {
>>>                          /* Normal queueing mode. */
>>>                          /* Packet scheduler handles dropping of further
>>>   * packets. */
>>>                          netif_stop_subqueue(dev, txq);
>>>
>>>                          /* We won't see all dropped packets
>>>   * individually, so overrun
>>>                           * error is more appropriate. */
>>>                          dev->stats.tx_fifo_errors++;
>>>
>>>
>>> So you can detect that this triggered by looking at fifo errors counter in device.
>>>
>>> Once this happens TX queue is stopped, then you hit this path:
>>>
>>>                          if (!netif_xmit_stopped(txq)) {
>>>                                  __this_cpu_inc(xmit_recursion);
>>>                                  rc = dev_hard_start_xmit(skb, dev, txq);
>>>                                  __this_cpu_dec(xmit_recursion);
>>>                                  if (dev_xmit_complete(rc)) {
>>>                                          HARD_TX_UNLOCK(dev, txq);
>>>                                          goto out;
>>>                                  }
>>>                          }
>>>
>>> so packets are not passed to device anymore.
>>> It will stay this way until guest consumes some packets and
>>> queue is restarted.
>>
>> After some time I again have a vServer in this state. It seems not like there
>> are no TX errors.
>>
>> # ifconfig tap10
>> tap10     Link encap:Ethernet  HWaddr 7a:59:20:6f:e7:e5
>>            inet6 addr: fe80::7859:20ff:fe6f:e7e5/64 Scope:Link
>>            UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>            RX packets:197431 errors:0 dropped:0 overruns:0 frame:0
>>            TX packets:264309 errors:0 dropped:0 overruns:2 carrier:0
>>            collisions:0 txqueuelen:500
>>            RX bytes:13842063 (13.8 MB)  TX bytes:35092821 (35.0 MB)
>>
>> It seems like the bridge is not forwarding any packets to the tap device anymore altough it has learnt
>> the MAC-Adresses and there are also broadcast packets coming in.
>>
>> Any more ideas where I could debug?
>>
>> Peter
>>
>>>
>>>>>
>>>>> Stefan
>
> Hmm. So there are two overrun errors that triggered, so
> it's possible after the second one the queue got stuck in an xoff state.
> You'd have to use something like systemtap or kdb to poke at the
> queue state to see whether xoff flag is set and/or look
> at the receive queue length.
>
> For future, we can try to set TUN_ONE_QUEUE flag on the interface,
> or try applying this patch
> 5d097109257c03a71845729f8db6b5770c4bbedc
> in kernel see if this helps.
>

If have set this option for 2 weeks now and not seen this problem again.
How does this flag work with the recently added tap multiqueue support?

Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-02-12  7:06           ` Peter Lieven
@ 2013-02-12  9:08             ` Michael S. Tsirkin
  2013-02-12  9:10               ` Peter Lieven
  2013-05-14 14:21               ` [Qemu-devel] " Nicholas Thomas
  1 sibling, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12  9:08 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, qemu-devel, netdev

On Tue, Feb 12, 2013 at 08:06:04AM +0100, Peter Lieven wrote:
> On 23.01.2013 11:03, Michael S. Tsirkin wrote:
> >On Tue, Jan 22, 2013 at 10:04:07AM +0100, Peter Lieven wrote:
> >>On 23.11.2012 12:01, Michael S. Tsirkin wrote:
> >>>On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
> >>>>
> >>>>Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
> >>>>
> >>>>>On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
> >>>>>>is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
> >>>>>>a bridge from sending pakets to a tap device?
> >>>>>>
> >>>>>>My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
> >>>>>>which is based on Linux 3.2.33.
> >>>>>>
> >>>>>>I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
> >>>>>>the tap does not have any TX packets. RX is working fine. I see the packets coming in at
> >>>>>>the physical interface on the host, but they are not forwarded to the tap interface.
> >>>>>>The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
> >>>>>>It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
> >>>>>>It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
> >>>>>>is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
> >>>>>>bridge) again. Unfortunately it seems not to be reproducible.
> >>>>>
> >>>>>Not sure but this patch from Michael Tsirkin may help - it solves an
> >>>>>issue with persistent tap devices:
> >>>>>
> >>>>>http://patchwork.ozlabs.org/patch/198598/
> >>>>
> >>>>Hi Stefan,
> >>>>
> >>>>thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
> >>>>with persistent taps. But maybe the taps in the kernel are not deleted directly.
> >>>>Can you remember what the syptomps of the above issue have been? Sorry for
> >>>>being vague, but I currently have no clue whats going on.
> >>>>
> >>>>Can someone who has more internal knowledge of the bridging/tap code say if qemu can
> >>>>be responsible at all if the tap device is not receiving packets from the bridge.
> >>>>
> >>>>If I have the following config. Lets say packets coming in via physical interface eth1.123,
> >>>>and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
> >>>>and tap0 are member of br123.
> >>>>
> >>>>If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
> >>>>from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
> >>>>in via eth1.123, but the reply can't be seen on tap0.
> >>>>
> >>>>Peter
> >>>
> >>>If guest is not consuming packets, a TX queue in tap device
> >>>will with time overrun (there's space for 1000 packets there).
> >>>This is code from tun:
> >>>
> >>>         if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
> >>>                           >= dev->tx_queue_len / tun->numqueues){
> >>>                 if (!(tun->flags & TUN_ONE_QUEUE)) {
> >>>                         /* Normal queueing mode. */
> >>>                         /* Packet scheduler handles dropping of further
> >>>  * packets. */
> >>>                         netif_stop_subqueue(dev, txq);
> >>>
> >>>                         /* We won't see all dropped packets
> >>>  * individually, so overrun
> >>>                          * error is more appropriate. */
> >>>                         dev->stats.tx_fifo_errors++;
> >>>
> >>>
> >>>So you can detect that this triggered by looking at fifo errors counter in device.
> >>>
> >>>Once this happens TX queue is stopped, then you hit this path:
> >>>
> >>>                         if (!netif_xmit_stopped(txq)) {
> >>>                                 __this_cpu_inc(xmit_recursion);
> >>>                                 rc = dev_hard_start_xmit(skb, dev, txq);
> >>>                                 __this_cpu_dec(xmit_recursion);
> >>>                                 if (dev_xmit_complete(rc)) {
> >>>                                         HARD_TX_UNLOCK(dev, txq);
> >>>                                         goto out;
> >>>                                 }
> >>>                         }
> >>>
> >>>so packets are not passed to device anymore.
> >>>It will stay this way until guest consumes some packets and
> >>>queue is restarted.
> >>
> >>After some time I again have a vServer in this state. It seems not like there
> >>are no TX errors.
> >>
> >># ifconfig tap10
> >>tap10     Link encap:Ethernet  HWaddr 7a:59:20:6f:e7:e5
> >>           inet6 addr: fe80::7859:20ff:fe6f:e7e5/64 Scope:Link
> >>           UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
> >>           RX packets:197431 errors:0 dropped:0 overruns:0 frame:0
> >>           TX packets:264309 errors:0 dropped:0 overruns:2 carrier:0
> >>           collisions:0 txqueuelen:500
> >>           RX bytes:13842063 (13.8 MB)  TX bytes:35092821 (35.0 MB)
> >>
> >>It seems like the bridge is not forwarding any packets to the tap device anymore altough it has learnt
> >>the MAC-Adresses and there are also broadcast packets coming in.
> >>
> >>Any more ideas where I could debug?
> >>
> >>Peter
> >>
> >>>
> >>>>>
> >>>>>Stefan
> >
> >Hmm. So there are two overrun errors that triggered, so
> >it's possible after the second one the queue got stuck in an xoff state.
> >You'd have to use something like systemtap or kdb to poke at the
> >queue state to see whether xoff flag is set and/or look
> >at the receive queue length.
> >
> >For future, we can try to set TUN_ONE_QUEUE flag on the interface,
> >or try applying this patch
> >5d097109257c03a71845729f8db6b5770c4bbedc
> >in kernel see if this helps.
> >
> 
> If have set this option for 2 weeks now and not seen this problem again.
> How does this flag work with the recently added tap multiqueue support?
> 
> Peter

This will be the only option in 3.8.

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-02-12  9:08             ` [Qemu-devel] " Michael S. Tsirkin
@ 2013-02-12  9:10               ` Peter Lieven
  2013-02-12  9:29                 ` Michael S. Tsirkin
  2013-02-12  9:39                 ` Michael Tokarev
  0 siblings, 2 replies; 44+ messages in thread
From: Peter Lieven @ 2013-02-12  9:10 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, qemu-devel, netdev


Am 12.02.2013 um 10:08 schrieb "Michael S. Tsirkin" <mst@redhat.com>:

> On Tue, Feb 12, 2013 at 08:06:04AM +0100, Peter Lieven wrote:
>> On 23.01.2013 11:03, Michael S. Tsirkin wrote:
>>> On Tue, Jan 22, 2013 at 10:04:07AM +0100, Peter Lieven wrote:
>>>> On 23.11.2012 12:01, Michael S. Tsirkin wrote:
>>>>> On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
>>>>>> 
>>>>>> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
>>>>>> 
>>>>>>> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
>>>>>>>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
>>>>>>>> a bridge from sending pakets to a tap device?
>>>>>>>> 
>>>>>>>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
>>>>>>>> which is based on Linux 3.2.33.
>>>>>>>> 
>>>>>>>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
>>>>>>>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
>>>>>>>> the physical interface on the host, but they are not forwarded to the tap interface.
>>>>>>>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
>>>>>>>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
>>>>>>>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
>>>>>>>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
>>>>>>>> bridge) again. Unfortunately it seems not to be reproducible.
>>>>>>> 
>>>>>>> Not sure but this patch from Michael Tsirkin may help - it solves an
>>>>>>> issue with persistent tap devices:
>>>>>>> 
>>>>>>> http://patchwork.ozlabs.org/patch/198598/
>>>>>> 
>>>>>> Hi Stefan,
>>>>>> 
>>>>>> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
>>>>>> with persistent taps. But maybe the taps in the kernel are not deleted directly.
>>>>>> Can you remember what the syptomps of the above issue have been? Sorry for
>>>>>> being vague, but I currently have no clue whats going on.
>>>>>> 
>>>>>> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
>>>>>> be responsible at all if the tap device is not receiving packets from the bridge.
>>>>>> 
>>>>>> If I have the following config. Lets say packets coming in via physical interface eth1.123,
>>>>>> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
>>>>>> and tap0 are member of br123.
>>>>>> 
>>>>>> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
>>>>>> from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
>>>>>> in via eth1.123, but the reply can't be seen on tap0.
>>>>>> 
>>>>>> Peter
>>>>> 
>>>>> If guest is not consuming packets, a TX queue in tap device
>>>>> will with time overrun (there's space for 1000 packets there).
>>>>> This is code from tun:
>>>>> 
>>>>>        if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
>>>>>> = dev->tx_queue_len / tun->numqueues){
>>>>>                if (!(tun->flags & TUN_ONE_QUEUE)) {
>>>>>                        /* Normal queueing mode. */
>>>>>                        /* Packet scheduler handles dropping of further
>>>>> * packets. */
>>>>>                        netif_stop_subqueue(dev, txq);
>>>>> 
>>>>>                        /* We won't see all dropped packets
>>>>> * individually, so overrun
>>>>>                         * error is more appropriate. */
>>>>>                        dev->stats.tx_fifo_errors++;
>>>>> 
>>>>> 
>>>>> So you can detect that this triggered by looking at fifo errors counter in device.
>>>>> 
>>>>> Once this happens TX queue is stopped, then you hit this path:
>>>>> 
>>>>>                        if (!netif_xmit_stopped(txq)) {
>>>>>                                __this_cpu_inc(xmit_recursion);
>>>>>                                rc = dev_hard_start_xmit(skb, dev, txq);
>>>>>                                __this_cpu_dec(xmit_recursion);
>>>>>                                if (dev_xmit_complete(rc)) {
>>>>>                                        HARD_TX_UNLOCK(dev, txq);
>>>>>                                        goto out;
>>>>>                                }
>>>>>                        }
>>>>> 
>>>>> so packets are not passed to device anymore.
>>>>> It will stay this way until guest consumes some packets and
>>>>> queue is restarted.
>>>> 
>>>> After some time I again have a vServer in this state. It seems not like there
>>>> are no TX errors.
>>>> 
>>>> # ifconfig tap10
>>>> tap10     Link encap:Ethernet  HWaddr 7a:59:20:6f:e7:e5
>>>>          inet6 addr: fe80::7859:20ff:fe6f:e7e5/64 Scope:Link
>>>>          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
>>>>          RX packets:197431 errors:0 dropped:0 overruns:0 frame:0
>>>>          TX packets:264309 errors:0 dropped:0 overruns:2 carrier:0
>>>>          collisions:0 txqueuelen:500
>>>>          RX bytes:13842063 (13.8 MB)  TX bytes:35092821 (35.0 MB)
>>>> 
>>>> It seems like the bridge is not forwarding any packets to the tap device anymore altough it has learnt
>>>> the MAC-Adresses and there are also broadcast packets coming in.
>>>> 
>>>> Any more ideas where I could debug?
>>>> 
>>>> Peter
>>>> 
>>>>> 
>>>>>>> 
>>>>>>> Stefan
>>> 
>>> Hmm. So there are two overrun errors that triggered, so
>>> it's possible after the second one the queue got stuck in an xoff state.
>>> You'd have to use something like systemtap or kdb to poke at the
>>> queue state to see whether xoff flag is set and/or look
>>> at the receive queue length.
>>> 
>>> For future, we can try to set TUN_ONE_QUEUE flag on the interface,
>>> or try applying this patch
>>> 5d097109257c03a71845729f8db6b5770c4bbedc
>>> in kernel see if this helps.
>>> 
>> 
>> If have set this option for 2 weeks now and not seen this problem again.
>> How does this flag work with the recently added tap multiqueue support?
>> 
>> Peter
> 
> This will be the only option in 3.8.

Ok, but wouldn`t it be good to set it in qemu for kernels <3.8?

Peter

> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-02-12  9:10               ` Peter Lieven
@ 2013-02-12  9:29                 ` Michael S. Tsirkin
  2013-02-12  9:39                 ` Michael Tokarev
  1 sibling, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12  9:29 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, qemu-devel, netdev

On Tue, Feb 12, 2013 at 10:10:24AM +0100, Peter Lieven wrote:
> 
> Am 12.02.2013 um 10:08 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
> 
> > On Tue, Feb 12, 2013 at 08:06:04AM +0100, Peter Lieven wrote:
> >> On 23.01.2013 11:03, Michael S. Tsirkin wrote:
> >>> On Tue, Jan 22, 2013 at 10:04:07AM +0100, Peter Lieven wrote:
> >>>> On 23.11.2012 12:01, Michael S. Tsirkin wrote:
> >>>>> On Fri, Nov 23, 2012 at 10:41:21AM +0100, Peter Lieven wrote:
> >>>>>> 
> >>>>>> Am 23.11.2012 um 08:02 schrieb Stefan Hajnoczi:
> >>>>>> 
> >>>>>>> On Thu, Nov 22, 2012 at 03:29:52PM +0100, Peter Lieven wrote:
> >>>>>>>> is anyone aware of a problem with the linux network bridge that in very rare circumstances stops
> >>>>>>>> a bridge from sending pakets to a tap device?
> >>>>>>>> 
> >>>>>>>> My problem occurs in conjunction with vanilla qemu-kvm-1.2.0 and Ubuntu Kernel 3.2.0-34.53
> >>>>>>>> which is based on Linux 3.2.33.
> >>>>>>>> 
> >>>>>>>> I was not yet able to reproduce the issue, it happens in really rare cases. The symptom is that
> >>>>>>>> the tap does not have any TX packets. RX is working fine. I see the packets coming in at
> >>>>>>>> the physical interface on the host, but they are not forwarded to the tap interface.
> >>>>>>>> The bridge itself has learnt the mac address of the vServer that is connected to the tap interface.
> >>>>>>>> It does not help to toggle the bridge link status,  the tap interface status or the interface in the vServer.
> >>>>>>>> It seems that problem occurs if a tap interface that has previously been used, but set to nonpersistent
> >>>>>>>> is set persistent again and then is by chance assigned to the same vServer (=same mac address on same
> >>>>>>>> bridge) again. Unfortunately it seems not to be reproducible.
> >>>>>>> 
> >>>>>>> Not sure but this patch from Michael Tsirkin may help - it solves an
> >>>>>>> issue with persistent tap devices:
> >>>>>>> 
> >>>>>>> http://patchwork.ozlabs.org/patch/198598/
> >>>>>> 
> >>>>>> Hi Stefan,
> >>>>>> 
> >>>>>> thanks for the pointer. I have seen this patch, but I have neglected it because it was dealing
> >>>>>> with persistent taps. But maybe the taps in the kernel are not deleted directly.
> >>>>>> Can you remember what the syptomps of the above issue have been? Sorry for
> >>>>>> being vague, but I currently have no clue whats going on.
> >>>>>> 
> >>>>>> Can someone who has more internal knowledge of the bridging/tap code say if qemu can
> >>>>>> be responsible at all if the tap device is not receiving packets from the bridge.
> >>>>>> 
> >>>>>> If I have the following config. Lets say packets coming in via physical interface eth1.123,
> >>>>>> and a bridge called br123.I further have a virtual machine with tap0. Both eth1.123
> >>>>>> and tap0 are member of br123.
> >>>>>> 
> >>>>>> If the issue occurs the vServer has no network connectivity inbound. If I sent a ping
> >>>>>> from the vServer I see it on tap0 and leaving on eth1.123. I see further the arp reply coming
> >>>>>> in via eth1.123, but the reply can't be seen on tap0.
> >>>>>> 
> >>>>>> Peter
> >>>>> 
> >>>>> If guest is not consuming packets, a TX queue in tap device
> >>>>> will with time overrun (there's space for 1000 packets there).
> >>>>> This is code from tun:
> >>>>> 
> >>>>>        if (skb_queue_len(&tfile->socket.sk->sk_receive_queue)
> >>>>>> = dev->tx_queue_len / tun->numqueues){
> >>>>>                if (!(tun->flags & TUN_ONE_QUEUE)) {
> >>>>>                        /* Normal queueing mode. */
> >>>>>                        /* Packet scheduler handles dropping of further
> >>>>> * packets. */
> >>>>>                        netif_stop_subqueue(dev, txq);
> >>>>> 
> >>>>>                        /* We won't see all dropped packets
> >>>>> * individually, so overrun
> >>>>>                         * error is more appropriate. */
> >>>>>                        dev->stats.tx_fifo_errors++;
> >>>>> 
> >>>>> 
> >>>>> So you can detect that this triggered by looking at fifo errors counter in device.
> >>>>> 
> >>>>> Once this happens TX queue is stopped, then you hit this path:
> >>>>> 
> >>>>>                        if (!netif_xmit_stopped(txq)) {
> >>>>>                                __this_cpu_inc(xmit_recursion);
> >>>>>                                rc = dev_hard_start_xmit(skb, dev, txq);
> >>>>>                                __this_cpu_dec(xmit_recursion);
> >>>>>                                if (dev_xmit_complete(rc)) {
> >>>>>                                        HARD_TX_UNLOCK(dev, txq);
> >>>>>                                        goto out;
> >>>>>                                }
> >>>>>                        }
> >>>>> 
> >>>>> so packets are not passed to device anymore.
> >>>>> It will stay this way until guest consumes some packets and
> >>>>> queue is restarted.
> >>>> 
> >>>> After some time I again have a vServer in this state. It seems not like there
> >>>> are no TX errors.
> >>>> 
> >>>> # ifconfig tap10
> >>>> tap10     Link encap:Ethernet  HWaddr 7a:59:20:6f:e7:e5
> >>>>          inet6 addr: fe80::7859:20ff:fe6f:e7e5/64 Scope:Link
> >>>>          UP BROADCAST RUNNING PROMISC MULTICAST  MTU:1500  Metric:1
> >>>>          RX packets:197431 errors:0 dropped:0 overruns:0 frame:0
> >>>>          TX packets:264309 errors:0 dropped:0 overruns:2 carrier:0
> >>>>          collisions:0 txqueuelen:500
> >>>>          RX bytes:13842063 (13.8 MB)  TX bytes:35092821 (35.0 MB)
> >>>> 
> >>>> It seems like the bridge is not forwarding any packets to the tap device anymore altough it has learnt
> >>>> the MAC-Adresses and there are also broadcast packets coming in.
> >>>> 
> >>>> Any more ideas where I could debug?
> >>>> 
> >>>> Peter
> >>>> 
> >>>>> 
> >>>>>>> 
> >>>>>>> Stefan
> >>> 
> >>> Hmm. So there are two overrun errors that triggered, so
> >>> it's possible after the second one the queue got stuck in an xoff state.
> >>> You'd have to use something like systemtap or kdb to poke at the
> >>> queue state to see whether xoff flag is set and/or look
> >>> at the receive queue length.
> >>> 
> >>> For future, we can try to set TUN_ONE_QUEUE flag on the interface,
> >>> or try applying this patch
> >>> 5d097109257c03a71845729f8db6b5770c4bbedc
> >>> in kernel see if this helps.
> >>> 
> >> 
> >> If have set this option for 2 weeks now and not seen this problem again.
> >> How does this flag work with the recently added tap multiqueue support?
> >> 
> >> Peter
> > 
> > This will be the only option in 3.8.
> 
> Ok, but wouldn`t it be good to set it in qemu for kernels <3.8?
> 
> Peter

Yes, probably a good idea. Patch?

> > 
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2013-02-12  9:10               ` Peter Lieven
  2013-02-12  9:29                 ` Michael S. Tsirkin
@ 2013-02-12  9:39                 ` Michael Tokarev
  2013-02-12  9:54                   ` Michael S. Tsirkin
  1 sibling, 1 reply; 44+ messages in thread
From: Michael Tokarev @ 2013-02-12  9:39 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

12.02.2013 13:10, Peter Lieven wrote:
[]

Guys, can we please trim the excessive quoting just a bit? ;)

 >>> If have set this option for 2 weeks now and not seen this problem again.
 >>> How does this flag work with the recently added tap multiqueue support?

>> This will be the only option in 3.8.

> Ok, but wouldn`t it be good to set it in qemu for kernels <3.8?

I'd say for kernels without mq support, not for <3.8, right? :)

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2013-02-12  9:39                 ` Michael Tokarev
@ 2013-02-12  9:54                   ` Michael S. Tsirkin
  2013-02-12 10:11                     ` [Qemu-devel] " Peter Lieven
  0 siblings, 1 reply; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12  9:54 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, netdev

On Tue, Feb 12, 2013 at 01:39:07PM +0400, Michael Tokarev wrote:
> 12.02.2013 13:10, Peter Lieven wrote:
> []
> 
> Guys, can we please trim the excessive quoting just a bit? ;)
> 
> >>> If have set this option for 2 weeks now and not seen this problem again.
> >>> How does this flag work with the recently added tap multiqueue support?
> 
> >>This will be the only option in 3.8.
> 
> >Ok, but wouldn`t it be good to set it in qemu for kernels <3.8?
> 
> I'd say for kernels without mq support, not for <3.8, right? :)
> 
> Thanks,
> 
> /mjt

It's harmless to always set this flag, on 3.8 it does nothing.

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-02-12  9:54                   ` Michael S. Tsirkin
@ 2013-02-12 10:11                     ` Peter Lieven
  2013-02-12 10:43                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 44+ messages in thread
From: Peter Lieven @ 2013-02-12 10:11 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Michael Tokarev, Stefan Hajnoczi, qemu-devel, netdev


Am 12.02.2013 um 10:54 schrieb "Michael S. Tsirkin" <mst@redhat.com>:

> On Tue, Feb 12, 2013 at 01:39:07PM +0400, Michael Tokarev wrote:
>> 12.02.2013 13:10, Peter Lieven wrote:
>> []
>> 
>> Guys, can we please trim the excessive quoting just a bit? ;)
>> 
>>>>> If have set this option for 2 weeks now and not seen this problem again.
>>>>> How does this flag work with the recently added tap multiqueue support?
>> 
>>>> This will be the only option in 3.8.
>> 
>>> Ok, but wouldn`t it be good to set it in qemu for kernels <3.8?
>> 
>> I'd say for kernels without mq support, not for <3.8, right? :)
>> 
>> Thanks,
>> 
>> /mjt
> 
> It's harmless to always set this flag, on 3.8 it does nothing.

And kernels <3.8 do not have multi queue support?

Peter

> 
> -- 
> MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-02-12 10:11                     ` [Qemu-devel] " Peter Lieven
@ 2013-02-12 10:43                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-02-12 10:43 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Michael Tokarev, Stefan Hajnoczi, qemu-devel, netdev

On Tue, Feb 12, 2013 at 11:11:45AM +0100, Peter Lieven wrote:
> 
> Am 12.02.2013 um 10:54 schrieb "Michael S. Tsirkin" <mst@redhat.com>:
> 
> > On Tue, Feb 12, 2013 at 01:39:07PM +0400, Michael Tokarev wrote:
> >> 12.02.2013 13:10, Peter Lieven wrote:
> >> []
> >> 
> >> Guys, can we please trim the excessive quoting just a bit? ;)
> >> 
> >>>>> If have set this option for 2 weeks now and not seen this problem again.
> >>>>> How does this flag work with the recently added tap multiqueue support?
> >> 
> >>>> This will be the only option in 3.8.
> >> 
> >>> Ok, but wouldn`t it be good to set it in qemu for kernels <3.8?
> >> 
> >> I'd say for kernels without mq support, not for <3.8, right? :)
> >> 
> >> Thanks,
> >> 
> >> /mjt
> > 
> > It's harmless to always set this flag, on 3.8 it does nothing.
> 
> And kernels <3.8 do not have multi queue support?
> 
> Peter

Let's be specific. Multiqueue support in qemu uses TUNSETQUEUE ioctl.
No kernel released by Linus so far has support for this ioctl in tun device,
but it has been merged so should be in 3.8.

> > 
> > -- 
> > MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2013-02-12  7:06           ` Peter Lieven
@ 2013-05-14 14:21               ` Nicholas Thomas
  2013-05-14 14:21               ` [Qemu-devel] " Nicholas Thomas
  1 sibling, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-14 14:21 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

Hi all,

On Tue, 2013-02-12 at 08:06 +0100, Peter Lieven wrote:
> On 23.01.2013 11:03, Michael S. Tsirkin wrote:
> > For future, we can try to set TUN_ONE_QUEUE flag on the interface,
> > or try applying this patch
> > 5d097109257c03a71845729f8db6b5770c4bbedc
> > in kernel see if this helps.
> >
> 
> If have set this option for 2 weeks now and not seen this problem again.
> How does this flag work with the recently added tap multiqueue support?
> 
> Peter

( Host systems are Linux kernel 3.2, from debian squeeze-backports, in
all cases. The guests use virtio-net, the hosts use netxen_nic )

We run QEMU like: 

qemu-system-x86_64 -enable-kvm -[...] \
  -net user,vlan=50,name=user,restrict=y
  -net nic,macaddr=fe:ff:00:00:00:00,name=t100,model=virtio,vlan=748
  -net tap,downscript=no,name=t100,script=no,vlan=748,ifname=t100 [...]

The TAP devices are created by us, by calling the appropriate ioctls,
more or less like:
fd = open("/dev/net/tun", "a+")
ioctl(fd, TUNSETIFF, "t100", IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE )
ioctl(fd, TUNSETOWNER, "t100", 20000)
ioctl(fd, TUNSETGROUP, "t100", 108)
ioctl(fd, SIOCSIFHWADDR, "t100", ARPHRD_ETHER, "fe:ff:00:00:00:00")
ioctl(fd, TUNSETPERSIST, "t100", 1)

(I'm translating ruby code here, but that's the gist of it)

We used to run QEMU 0.15.0, and didn't set IFF_ONE_QUEUE on the tap
devices we created. We never saw this bug. Last week, we began upgrading
to QEMU 1.4.1; our imager setup (netboot, download a large disc image
over HTTP, run a script in it) immediately began triggering this bug,
quite reliably. 

We changed our code to set IFF_ONE_QUEUE on the tap devices we created,
and this has reduced the frequency with which the bug is triggered, but
we still experience it from time to time. Over 5 trials, I triggered the
bug three times.

Interestingly, while the guest fails to receive packets, no TX overruns
to the tap device are initially reported on the host (by ifconfig). The
overrun counter ticks to 1 after I ping the guest a few times, like so:

Before:

t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f  
          inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
          TX packets:57992 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)


After:

t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f  
          inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
          TX packets:57992 errors:0 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)


The packets are still visible coming in on the bridge interface, and the
bridge knows the MAC address of the guest. I'm afraid I'm at a bit of a
loss on how to track this down; can anyone advise? 

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-14 14:21               ` Nicholas Thomas
  0 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-14 14:21 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

Hi all,

On Tue, 2013-02-12 at 08:06 +0100, Peter Lieven wrote:
> On 23.01.2013 11:03, Michael S. Tsirkin wrote:
> > For future, we can try to set TUN_ONE_QUEUE flag on the interface,
> > or try applying this patch
> > 5d097109257c03a71845729f8db6b5770c4bbedc
> > in kernel see if this helps.
> >
> 
> If have set this option for 2 weeks now and not seen this problem again.
> How does this flag work with the recently added tap multiqueue support?
> 
> Peter

( Host systems are Linux kernel 3.2, from debian squeeze-backports, in
all cases. The guests use virtio-net, the hosts use netxen_nic )

We run QEMU like: 

qemu-system-x86_64 -enable-kvm -[...] \
  -net user,vlan=50,name=user,restrict=y
  -net nic,macaddr=fe:ff:00:00:00:00,name=t100,model=virtio,vlan=748
  -net tap,downscript=no,name=t100,script=no,vlan=748,ifname=t100 [...]

The TAP devices are created by us, by calling the appropriate ioctls,
more or less like:
fd = open("/dev/net/tun", "a+")
ioctl(fd, TUNSETIFF, "t100", IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE )
ioctl(fd, TUNSETOWNER, "t100", 20000)
ioctl(fd, TUNSETGROUP, "t100", 108)
ioctl(fd, SIOCSIFHWADDR, "t100", ARPHRD_ETHER, "fe:ff:00:00:00:00")
ioctl(fd, TUNSETPERSIST, "t100", 1)

(I'm translating ruby code here, but that's the gist of it)

We used to run QEMU 0.15.0, and didn't set IFF_ONE_QUEUE on the tap
devices we created. We never saw this bug. Last week, we began upgrading
to QEMU 1.4.1; our imager setup (netboot, download a large disc image
over HTTP, run a script in it) immediately began triggering this bug,
quite reliably. 

We changed our code to set IFF_ONE_QUEUE on the tap devices we created,
and this has reduced the frequency with which the bug is triggered, but
we still experience it from time to time. Over 5 trials, I triggered the
bug three times.

Interestingly, while the guest fails to receive packets, no TX overruns
to the tap device are initially reported on the host (by ifconfig). The
overrun counter ticks to 1 after I ping the guest a few times, like so:

Before:

t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f  
          inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
          TX packets:57992 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)


After:

t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f  
          inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
          TX packets:57992 errors:0 dropped:0 overruns:1 carrier:0
          collisions:0 txqueuelen:500 
          RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)


The packets are still visible coming in on the bridge interface, and the
bridge knows the MAC address of the guest. I'm afraid I'm at a bit of a
loss on how to track this down; can anyone advise? 

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2013-05-14 14:21               ` [Qemu-devel] " Nicholas Thomas
@ 2013-05-14 14:28                 ` Peter Lieven
  -1 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2013-05-14 14:28 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

On 14.05.2013 16:21, Nicholas Thomas wrote:
> Hi all,
>
> On Tue, 2013-02-12 at 08:06 +0100, Peter Lieven wrote:
>> On 23.01.2013 11:03, Michael S. Tsirkin wrote:
>>> For future, we can try to set TUN_ONE_QUEUE flag on the interface,
>>> or try applying this patch
>>> 5d097109257c03a71845729f8db6b5770c4bbedc
>>> in kernel see if this helps.
>>>
>>
>> If have set this option for 2 weeks now and not seen this problem again.
>> How does this flag work with the recently added tap multiqueue support?
>>
>> Peter
>
> ( Host systems are Linux kernel 3.2, from debian squeeze-backports, in
> all cases. The guests use virtio-net, the hosts use netxen_nic )
>
> We run QEMU like:
>
> qemu-system-x86_64 -enable-kvm -[...] \
>    -net user,vlan=50,name=user,restrict=y
>    -net nic,macaddr=fe:ff:00:00:00:00,name=t100,model=virtio,vlan=748
>    -net tap,downscript=no,name=t100,script=no,vlan=748,ifname=t100 [...]
>
> The TAP devices are created by us, by calling the appropriate ioctls,
> more or less like:
> fd = open("/dev/net/tun", "a+")
> ioctl(fd, TUNSETIFF, "t100", IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE )
> ioctl(fd, TUNSETOWNER, "t100", 20000)
> ioctl(fd, TUNSETGROUP, "t100", 108)
> ioctl(fd, SIOCSIFHWADDR, "t100", ARPHRD_ETHER, "fe:ff:00:00:00:00")
> ioctl(fd, TUNSETPERSIST, "t100", 1)
>
> (I'm translating ruby code here, but that's the gist of it)
>
> We used to run QEMU 0.15.0, and didn't set IFF_ONE_QUEUE on the tap
> devices we created. We never saw this bug. Last week, we began upgrading
> to QEMU 1.4.1; our imager setup (netboot, download a large disc image
> over HTTP, run a script in it) immediately began triggering this bug,
> quite reliably.
>
> We changed our code to set IFF_ONE_QUEUE on the tap devices we created,
> and this has reduced the frequency with which the bug is triggered, but
> we still experience it from time to time. Over 5 trials, I triggered the
> bug three times.
>
> Interestingly, while the guest fails to receive packets, no TX overruns
> to the tap device are initially reported on the host (by ifconfig). The
> overrun counter ticks to 1 after I ping the guest a few times, like so:
>
> Before:
>
> t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f
>            inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:57992 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:500
>            RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)
>
>
> After:
>
> t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f
>            inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:57992 errors:0 dropped:0 overruns:1 carrier:0
>            collisions:0 txqueuelen:500
>            RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)
>
>
> The packets are still visible coming in on the bridge interface, and the
> bridge knows the MAC address of the guest. I'm afraid I'm at a bit of a
> loss on how to track this down; can anyone advise?

Please check the tunnel mode in sysfs after your VM is started. It is likely
that qemu overwrites the settings you made in the ruby script.

Please check if the patch

tap: set IFF_ONE_QUEUE per default

is in your qemu 1.4.1 version.

Peter



>
> /Nick
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-14 14:28                 ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2013-05-14 14:28 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

On 14.05.2013 16:21, Nicholas Thomas wrote:
> Hi all,
>
> On Tue, 2013-02-12 at 08:06 +0100, Peter Lieven wrote:
>> On 23.01.2013 11:03, Michael S. Tsirkin wrote:
>>> For future, we can try to set TUN_ONE_QUEUE flag on the interface,
>>> or try applying this patch
>>> 5d097109257c03a71845729f8db6b5770c4bbedc
>>> in kernel see if this helps.
>>>
>>
>> If have set this option for 2 weeks now and not seen this problem again.
>> How does this flag work with the recently added tap multiqueue support?
>>
>> Peter
>
> ( Host systems are Linux kernel 3.2, from debian squeeze-backports, in
> all cases. The guests use virtio-net, the hosts use netxen_nic )
>
> We run QEMU like:
>
> qemu-system-x86_64 -enable-kvm -[...] \
>    -net user,vlan=50,name=user,restrict=y
>    -net nic,macaddr=fe:ff:00:00:00:00,name=t100,model=virtio,vlan=748
>    -net tap,downscript=no,name=t100,script=no,vlan=748,ifname=t100 [...]
>
> The TAP devices are created by us, by calling the appropriate ioctls,
> more or less like:
> fd = open("/dev/net/tun", "a+")
> ioctl(fd, TUNSETIFF, "t100", IFF_TAP | IFF_NO_PI | IFF_ONE_QUEUE )
> ioctl(fd, TUNSETOWNER, "t100", 20000)
> ioctl(fd, TUNSETGROUP, "t100", 108)
> ioctl(fd, SIOCSIFHWADDR, "t100", ARPHRD_ETHER, "fe:ff:00:00:00:00")
> ioctl(fd, TUNSETPERSIST, "t100", 1)
>
> (I'm translating ruby code here, but that's the gist of it)
>
> We used to run QEMU 0.15.0, and didn't set IFF_ONE_QUEUE on the tap
> devices we created. We never saw this bug. Last week, we began upgrading
> to QEMU 1.4.1; our imager setup (netboot, download a large disc image
> over HTTP, run a script in it) immediately began triggering this bug,
> quite reliably.
>
> We changed our code to set IFF_ONE_QUEUE on the tap devices we created,
> and this has reduced the frequency with which the bug is triggered, but
> we still experience it from time to time. Over 5 trials, I triggered the
> bug three times.
>
> Interestingly, while the guest fails to receive packets, no TX overruns
> to the tap device are initially reported on the host (by ifconfig). The
> overrun counter ticks to 1 after I ping the guest a few times, like so:
>
> Before:
>
> t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f
>            inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:57992 errors:0 dropped:0 overruns:0 carrier:0
>            collisions:0 txqueuelen:500
>            RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)
>
>
> After:
>
> t100      Link encap:Ethernet  HWaddr ae:17:96:7d:32:3f
>            inet6 addr: fe80::ac17:96ff:fe7d:323f/64 Scope:Link
>            UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>            RX packets:58006 errors:0 dropped:0 overruns:0 frame:0
>            TX packets:57992 errors:0 dropped:0 overruns:1 carrier:0
>            collisions:0 txqueuelen:500
>            RX bytes:3825467 (3.6 MiB)  TX bytes:87661451 (83.6 MiB)
>
>
> The packets are still visible coming in on the bridge interface, and the
> bridge knows the MAC address of the guest. I'm afraid I'm at a bit of a
> loss on how to track this down; can anyone advise?

Please check the tunnel mode in sysfs after your VM is started. It is likely
that qemu overwrites the settings you made in the ruby script.

Please check if the patch

tap: set IFF_ONE_QUEUE per default

is in your qemu 1.4.1 version.

Peter



>
> /Nick
>

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: tap devices not receiving packets from a bridge
  2013-05-14 14:28                 ` [Qemu-devel] " Peter Lieven
@ 2013-05-14 14:49                   ` Nicholas Thomas
  -1 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-14 14:49 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

Hi,

On Tue, 2013-05-14 at 16:28 +0200, Peter Lieven wrote:

> Please check the tunnel mode in sysfs after your VM is started. It is likely
> that qemu overwrites the settings you made in the ruby script.
> 
> Please check if the patch
> 
> tap: set IFF_ONE_QUEUE per default
> 
> is in your qemu 1.4.1 version.

That didn't even cross my mind!

/sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
surprises me, but that's probably my fault, rather than qemu's.

Sorry for the noise, and thanks for the quick response :)

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-14 14:49                   ` Nicholas Thomas
  0 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-14 14:49 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

Hi,

On Tue, 2013-05-14 at 16:28 +0200, Peter Lieven wrote:

> Please check the tunnel mode in sysfs after your VM is started. It is likely
> that qemu overwrites the settings you made in the ruby script.
> 
> Please check if the patch
> 
> tap: set IFF_ONE_QUEUE per default
> 
> is in your qemu 1.4.1 version.

That didn't even cross my mind!

/sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
surprises me, but that's probably my fault, rather than qemu's.

Sorry for the noise, and thanks for the quick response :)

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-14 14:49                   ` [Qemu-devel] " Nicholas Thomas
@ 2013-05-15 11:00                     ` Nicholas Thomas
  -1 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-15 11:00 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Michael S. Tsirkin, Stefan Hajnoczi, qemu-devel, netdev

Hi again,

On Tue, 2013-05-14 at 15:49 +0100, Nicholas Thomas wrote:
> /sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
> IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
> surprises me, but that's probably my fault, rather than qemu's.


I've rebuilt 1.4.1 with the IFF_ONE_QUEUE patch and tun_flags is now
0x7002; unfortunately, I'm still seeing this bug, twice in five trials.
Symptoms in `ifconfig t100` now differ; overruns stays at 0, and
"dropped" increases monotonically as I send packets. Those packets do
appear if I tcpdump t100 on the host, but not if I tcpdump t100 on the
guest.

I've turned off gro in the guest, which makes no difference, and tried
changing the queue sizes (post-hoc) in both guest and host, in the hope
of causing them to be emptied out, clearing the condition; again to no
effect.

The VMs in question are bridged to a large (and busy) VLAN with no
ingress filtering to speak of; I guess what's happening is that the
transmit queue is filled up by that traffic while the guest is in ipxe,
and it never gets out of that state when it happens... so maybe there is
still an underlying problem?

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-15 11:00                     ` Nicholas Thomas
  0 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-15 11:00 UTC (permalink / raw)
  To: Peter Lieven; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Michael S. Tsirkin

Hi again,

On Tue, 2013-05-14 at 15:49 +0100, Nicholas Thomas wrote:
> /sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
> IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
> surprises me, but that's probably my fault, rather than qemu's.


I've rebuilt 1.4.1 with the IFF_ONE_QUEUE patch and tun_flags is now
0x7002; unfortunately, I'm still seeing this bug, twice in five trials.
Symptoms in `ifconfig t100` now differ; overruns stays at 0, and
"dropped" increases monotonically as I send packets. Those packets do
appear if I tcpdump t100 on the host, but not if I tcpdump t100 on the
guest.

I've turned off gro in the guest, which makes no difference, and tried
changing the queue sizes (post-hoc) in both guest and host, in the hope
of causing them to be emptied out, clearing the condition; again to no
effect.

The VMs in question are bridged to a large (and busy) VLAN with no
ingress filtering to speak of; I guess what's happening is that the
transmit queue is filled up by that traffic while the guest is in ipxe,
and it never gets out of that state when it happens... so maybe there is
still an underlying problem?

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-15 11:00                     ` Nicholas Thomas
@ 2013-05-16  6:24                       ` Michael S. Tsirkin
  -1 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16  6:24 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Peter Lieven, Stefan Hajnoczi, qemu-devel, netdev

On Wed, May 15, 2013 at 12:00:03PM +0100, Nicholas Thomas wrote:
> Hi again,
> 
> On Tue, 2013-05-14 at 15:49 +0100, Nicholas Thomas wrote:
> > /sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
> > IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
> > surprises me, but that's probably my fault, rather than qemu's.
> 
> 
> I've rebuilt 1.4.1 with the IFF_ONE_QUEUE patch and tun_flags is now
> 0x7002; unfortunately, I'm still seeing this bug, twice in five trials.
> Symptoms in `ifconfig t100` now differ; overruns stays at 0, and
> "dropped" increases monotonically as I send packets. Those packets do
> appear if I tcpdump t100 on the host, but not if I tcpdump t100 on the
> guest.
> 
> I've turned off gro in the guest, which makes no difference, and tried
> changing the queue sizes (post-hoc) in both guest and host, in the hope
> of causing them to be emptied out, clearing the condition; again to no
> effect.
> 
> The VMs in question are bridged to a large (and busy) VLAN with no
> ingress filtering to speak of; I guess what's happening is that the
> transmit queue is filled up by that traffic while the guest is in ipxe,
> and it never gets out of that state when it happens... so maybe there is
> still an underlying problem?
> 
> /Nick


Is this with or without vhost-net in host?

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-16  6:24                       ` Michael S. Tsirkin
  0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16  6:24 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, netdev

On Wed, May 15, 2013 at 12:00:03PM +0100, Nicholas Thomas wrote:
> Hi again,
> 
> On Tue, 2013-05-14 at 15:49 +0100, Nicholas Thomas wrote:
> > /sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
> > IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
> > surprises me, but that's probably my fault, rather than qemu's.
> 
> 
> I've rebuilt 1.4.1 with the IFF_ONE_QUEUE patch and tun_flags is now
> 0x7002; unfortunately, I'm still seeing this bug, twice in five trials.
> Symptoms in `ifconfig t100` now differ; overruns stays at 0, and
> "dropped" increases monotonically as I send packets. Those packets do
> appear if I tcpdump t100 on the host, but not if I tcpdump t100 on the
> guest.
> 
> I've turned off gro in the guest, which makes no difference, and tried
> changing the queue sizes (post-hoc) in both guest and host, in the hope
> of causing them to be emptied out, clearing the condition; again to no
> effect.
> 
> The VMs in question are bridged to a large (and busy) VLAN with no
> ingress filtering to speak of; I guess what's happening is that the
> transmit queue is filled up by that traffic while the guest is in ipxe,
> and it never gets out of that state when it happens... so maybe there is
> still an underlying problem?
> 
> /Nick


Is this with or without vhost-net in host?

-- 
MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-16  6:24                       ` Michael S. Tsirkin
@ 2013-05-16  6:27                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16  6:27 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Peter Lieven, Stefan Hajnoczi, qemu-devel, netdev

On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> On Wed, May 15, 2013 at 12:00:03PM +0100, Nicholas Thomas wrote:
> > Hi again,
> > 
> > On Tue, 2013-05-14 at 15:49 +0100, Nicholas Thomas wrote:
> > > /sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
> > > IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
> > > surprises me, but that's probably my fault, rather than qemu's.
> > 
> > 
> > I've rebuilt 1.4.1 with the IFF_ONE_QUEUE patch and tun_flags is now
> > 0x7002; unfortunately, I'm still seeing this bug, twice in five trials.
> > Symptoms in `ifconfig t100` now differ; overruns stays at 0, and
> > "dropped" increases monotonically as I send packets. Those packets do
> > appear if I tcpdump t100 on the host, but not if I tcpdump t100 on the
> > guest.
> > 
> > I've turned off gro in the guest, which makes no difference, and tried
> > changing the queue sizes (post-hoc) in both guest and host, in the hope
> > of causing them to be emptied out, clearing the condition; again to no
> > effect.
> > 
> > The VMs in question are bridged to a large (and busy) VLAN with no
> > ingress filtering to speak of; I guess what's happening is that the
> > transmit queue is filled up by that traffic while the guest is in ipxe,
> > and it never gets out of that state when it happens... so maybe there is
> > still an underlying problem?
> > 
> > /Nick
> 
> 
> Is this with or without vhost-net in host?

never mind, I see it's without.
Try to enable vhost-net (you'll have to switch to -netdev syntax
for that to work) and see if this help.
If it does it's likely a qemu bug if not probably a guest bug.

> -- 
> MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-16  6:27                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16  6:27 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, netdev

On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> On Wed, May 15, 2013 at 12:00:03PM +0100, Nicholas Thomas wrote:
> > Hi again,
> > 
> > On Tue, 2013-05-14 at 15:49 +0100, Nicholas Thomas wrote:
> > > /sys/devices/virtual/net/t100/tun_flags is 0x5002 - so it looks like
> > > IFF_ONE_QUEUE was indeed unset by qemu (which is lacking the patch). It
> > > surprises me, but that's probably my fault, rather than qemu's.
> > 
> > 
> > I've rebuilt 1.4.1 with the IFF_ONE_QUEUE patch and tun_flags is now
> > 0x7002; unfortunately, I'm still seeing this bug, twice in five trials.
> > Symptoms in `ifconfig t100` now differ; overruns stays at 0, and
> > "dropped" increases monotonically as I send packets. Those packets do
> > appear if I tcpdump t100 on the host, but not if I tcpdump t100 on the
> > guest.
> > 
> > I've turned off gro in the guest, which makes no difference, and tried
> > changing the queue sizes (post-hoc) in both guest and host, in the hope
> > of causing them to be emptied out, clearing the condition; again to no
> > effect.
> > 
> > The VMs in question are bridged to a large (and busy) VLAN with no
> > ingress filtering to speak of; I guess what's happening is that the
> > transmit queue is filled up by that traffic while the guest is in ipxe,
> > and it never gets out of that state when it happens... so maybe there is
> > still an underlying problem?
> > 
> > /Nick
> 
> 
> Is this with or without vhost-net in host?

never mind, I see it's without.
Try to enable vhost-net (you'll have to switch to -netdev syntax
for that to work) and see if this help.
If it does it's likely a qemu bug if not probably a guest bug.

> -- 
> MST

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-16  6:27                         ` Michael S. Tsirkin
@ 2013-05-16  8:20                           ` Nicholas Thomas
  -1 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-16  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Peter Lieven, Stefan Hajnoczi, qemu-devel, netdev

Hi,

On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > Is this with or without vhost-net in host?
> 
> never mind, I see it's without.
> Try to enable vhost-net (you'll have to switch to -netdev syntax
> for that to work) and see if this help.
> If it does it's likely a qemu bug if not probably a guest bug.

Switching to -netdev is non-trivial for me, unfortunately. Anyway, it's
definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
but doesn't happen with qemu 0.15.0 or 1.5.0rc1.

I'll have a dig through git to see if I can identify the patch that
resolves it. It feels-like qemu sometimes stops reading from the tap
file descriptor between ipxe exiting and the linux kernel bringing up
the network interface, and never recovers from that.

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-16  8:20                           ` Nicholas Thomas
  0 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-16  8:20 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, netdev

Hi,

On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > Is this with or without vhost-net in host?
> 
> never mind, I see it's without.
> Try to enable vhost-net (you'll have to switch to -netdev syntax
> for that to work) and see if this help.
> If it does it's likely a qemu bug if not probably a guest bug.

Switching to -netdev is non-trivial for me, unfortunately. Anyway, it's
definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
but doesn't happen with qemu 0.15.0 or 1.5.0rc1.

I'll have a dig through git to see if I can identify the patch that
resolves it. It feels-like qemu sometimes stops reading from the tap
file descriptor between ipxe exiting and the linux kernel bringing up
the network interface, and never recovers from that.

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-16  8:20                           ` Nicholas Thomas
@ 2013-05-16  8:40                             ` Michael S. Tsirkin
  -1 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16  8:40 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Peter Lieven, Stefan Hajnoczi, qemu-devel, netdev

On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
> Hi,
> 
> On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > > Is this with or without vhost-net in host?
> > 
> > never mind, I see it's without.
> > Try to enable vhost-net (you'll have to switch to -netdev syntax
> > for that to work) and see if this help.
> > If it does it's likely a qemu bug if not probably a guest bug.
> 
> Switching to -netdev is non-trivial for me, unfortunately.

Interesting. Why is that?

> Anyway, it's
> definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
> but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
> 
> I'll have a dig through git to see if I can identify the patch that
> resolves it. It feels-like qemu sometimes stops reading from the tap
> file descriptor between ipxe exiting and the linux kernel bringing up
> the network interface, and never recovers from that.
> 
> /Nick

You can try to bisect, yes.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-16  8:40                             ` Michael S. Tsirkin
  0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16  8:40 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, netdev

On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
> Hi,
> 
> On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > > Is this with or without vhost-net in host?
> > 
> > never mind, I see it's without.
> > Try to enable vhost-net (you'll have to switch to -netdev syntax
> > for that to work) and see if this help.
> > If it does it's likely a qemu bug if not probably a guest bug.
> 
> Switching to -netdev is non-trivial for me, unfortunately.

Interesting. Why is that?

> Anyway, it's
> definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
> but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
> 
> I'll have a dig through git to see if I can identify the patch that
> resolves it. It feels-like qemu sometimes stops reading from the tap
> file descriptor between ipxe exiting and the linux kernel bringing up
> the network interface, and never recovers from that.
> 
> /Nick

You can try to bisect, yes.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-16  8:40                             ` Michael S. Tsirkin
@ 2013-05-16  8:47                               ` Peter Lieven
  -1 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2013-05-16  8:47 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Nicholas Thomas, Stefan Hajnoczi, qemu-devel, netdev

Am 16.05.2013 10:40, schrieb Michael S. Tsirkin:
> On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
>> Hi,
>>
>> On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
>>> On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
>>>> Is this with or without vhost-net in host?
>>>
>>> never mind, I see it's without.
>>> Try to enable vhost-net (you'll have to switch to -netdev syntax
>>> for that to work) and see if this help.
>>> If it does it's likely a qemu bug if not probably a guest bug.
>>
>> Switching to -netdev is non-trivial for me, unfortunately.
> 
> Interesting. Why is that?
> 
>> Anyway, it's
>> definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
>> but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
>>
>> I'll have a dig through git to see if I can identify the patch that
>> resolves it. It feels-like qemu sometimes stops reading from the tap
>> file descriptor between ipxe exiting and the linux kernel bringing up
>> the network interface, and never recovers from that.
>>
>> /Nick
> 
> You can try to bisect, yes.
> 

It would be good to bisect this. I would appreciate it. I have a similar problem with rtl8139 (without vhost-net), but I was unable
to reproduce yet.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-16  8:47                               ` Peter Lieven
  0 siblings, 0 replies; 44+ messages in thread
From: Peter Lieven @ 2013-05-16  8:47 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, netdev, qemu-devel, Nicholas Thomas

Am 16.05.2013 10:40, schrieb Michael S. Tsirkin:
> On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
>> Hi,
>>
>> On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
>>> On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
>>>> Is this with or without vhost-net in host?
>>>
>>> never mind, I see it's without.
>>> Try to enable vhost-net (you'll have to switch to -netdev syntax
>>> for that to work) and see if this help.
>>> If it does it's likely a qemu bug if not probably a guest bug.
>>
>> Switching to -netdev is non-trivial for me, unfortunately.
> 
> Interesting. Why is that?
> 
>> Anyway, it's
>> definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
>> but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
>>
>> I'll have a dig through git to see if I can identify the patch that
>> resolves it. It feels-like qemu sometimes stops reading from the tap
>> file descriptor between ipxe exiting and the linux kernel bringing up
>> the network interface, and never recovers from that.
>>
>> /Nick
> 
> You can try to bisect, yes.
> 

It would be good to bisect this. I would appreciate it. I have a similar problem with rtl8139 (without vhost-net), but I was unable
to reproduce yet.

Thanks,
Peter

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-16  8:40                             ` Michael S. Tsirkin
@ 2013-05-16 11:27                               ` Nicholas Thomas
  -1 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-16 11:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Peter Lieven, Stefan Hajnoczi, qemu-devel, netdev

On Thu, 2013-05-16 at 11:40 +0300, Michael S. Tsirkin wrote:
> On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
> > Hi,
> > 
> > On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> > > On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > > > Is this with or without vhost-net in host?
> > > 
> > > never mind, I see it's without.
> > > Try to enable vhost-net (you'll have to switch to -netdev syntax
> > > for that to work) and see if this help.
> > > If it does it's likely a qemu bug if not probably a guest bug.
> > 
> > Switching to -netdev is non-trivial for me, unfortunately.
> 
> Interesting. Why is that?

Our setup is bond0 <-> vlanX <-> bridgeX <-> [ tap devices ] and we do
all that outside of qemu at the moment, specifying -net tap,ifname=... -
we also run some processes on the TAP interface and insert a bunch of
ebtables rules between creating it and starting qemu. Duplicating that
with -net bridge seemed close to impossible, and -netdev tap was
throwing EBUSY from /dev/net/tun. I guess our external magic should be
using ,fd= instead.

> > Anyway, it's
> > definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
> > but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
> > 
> > I'll have a dig through git to see if I can identify the patch that
> > resolves it. It feels-like qemu sometimes stops reading from the tap
> > file descriptor between ipxe exiting and the linux kernel bringing up
> > the network interface, and never recovers from that.
> > 
> > /Nick
> 
> You can try to bisect, yes.

Work have decided to accept 1.5.0 when it arrives instead, so I'm afraid
I won't be working on this after all. 

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-16 11:27                               ` Nicholas Thomas
  0 siblings, 0 replies; 44+ messages in thread
From: Nicholas Thomas @ 2013-05-16 11:27 UTC (permalink / raw)
  To: Michael S. Tsirkin; +Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, netdev

On Thu, 2013-05-16 at 11:40 +0300, Michael S. Tsirkin wrote:
> On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
> > Hi,
> > 
> > On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> > > On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > > > Is this with or without vhost-net in host?
> > > 
> > > never mind, I see it's without.
> > > Try to enable vhost-net (you'll have to switch to -netdev syntax
> > > for that to work) and see if this help.
> > > If it does it's likely a qemu bug if not probably a guest bug.
> > 
> > Switching to -netdev is non-trivial for me, unfortunately.
> 
> Interesting. Why is that?

Our setup is bond0 <-> vlanX <-> bridgeX <-> [ tap devices ] and we do
all that outside of qemu at the moment, specifying -net tap,ifname=... -
we also run some processes on the TAP interface and insert a bunch of
ebtables rules between creating it and starting qemu. Duplicating that
with -net bridge seemed close to impossible, and -netdev tap was
throwing EBUSY from /dev/net/tun. I guess our external magic should be
using ,fd= instead.

> > Anyway, it's
> > definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
> > but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
> > 
> > I'll have a dig through git to see if I can identify the patch that
> > resolves it. It feels-like qemu sometimes stops reading from the tap
> > file descriptor between ipxe exiting and the linux kernel bringing up
> > the network interface, and never recovers from that.
> > 
> > /Nick
> 
> You can try to bisect, yes.

Work have decided to accept 1.5.0 when it arrives instead, so I'm afraid
I won't be working on this after all. 

/Nick

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
  2013-05-16 11:27                               ` Nicholas Thomas
@ 2013-05-16 12:09                                 ` Michael S. Tsirkin
  -1 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16 12:09 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Peter Lieven, Stefan Hajnoczi, qemu-devel, netdev

On Thu, May 16, 2013 at 12:27:52PM +0100, Nicholas Thomas wrote:
> On Thu, 2013-05-16 at 11:40 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
> > > Hi,
> > > 
> > > On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> > > > On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > > > > Is this with or without vhost-net in host?
> > > > 
> > > > never mind, I see it's without.
> > > > Try to enable vhost-net (you'll have to switch to -netdev syntax
> > > > for that to work) and see if this help.
> > > > If it does it's likely a qemu bug if not probably a guest bug.
> > > 
> > > Switching to -netdev is non-trivial for me, unfortunately.
> > 
> > Interesting. Why is that?
> 
> Our setup is bond0 <-> vlanX <-> bridgeX <-> [ tap devices ] and we do
> all that outside of qemu at the moment, specifying -net tap,ifname=... -
> we also run some processes on the TAP interface and insert a bunch of
> ebtables rules between creating it and starting qemu. Duplicating that
> with -net bridge seemed close to impossible, and -netdev tap was
> throwing EBUSY from /dev/net/tun. I guess our external magic should be
> using ,fd= instead.

I'm not sure what's wrong with -netdev tap.
You don't have to use fd=, you can specify ifname= with netdev as well.
Here's what I use:

-net nic,model=virtio,netdev=foo
-netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on

the netdev/id pair above is almost the same as vlan=20
in your example, except there's always exactly one frontend
and one backend in the netdev case, vlans let you connect
more than 2 devices.

> > > Anyway, it's
> > > definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
> > > but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
> > > 
> > > I'll have a dig through git to see if I can identify the patch that
> > > resolves it. It feels-like qemu sometimes stops reading from the tap
> > > file descriptor between ipxe exiting and the linux kernel bringing up
> > > the network interface, and never recovers from that.
> > > 
> > > /Nick
> > 
> > You can try to bisect, yes.
> 
> Work have decided to accept 1.5.0 when it arrives instead, so I'm afraid
> I won't be working on this after all. 
> 
> /Nick
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [Qemu-devel] tap devices not receiving packets from a bridge
@ 2013-05-16 12:09                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 44+ messages in thread
From: Michael S. Tsirkin @ 2013-05-16 12:09 UTC (permalink / raw)
  To: Nicholas Thomas; +Cc: Stefan Hajnoczi, Peter Lieven, qemu-devel, netdev

On Thu, May 16, 2013 at 12:27:52PM +0100, Nicholas Thomas wrote:
> On Thu, 2013-05-16 at 11:40 +0300, Michael S. Tsirkin wrote:
> > On Thu, May 16, 2013 at 09:20:55AM +0100, Nicholas Thomas wrote:
> > > Hi,
> > > 
> > > On Thu, 2013-05-16 at 09:27 +0300, Michael S. Tsirkin wrote:
> > > > On Thu, May 16, 2013 at 09:24:05AM +0300, Michael S. Tsirkin wrote:
> > > > > Is this with or without vhost-net in host?
> > > > 
> > > > never mind, I see it's without.
> > > > Try to enable vhost-net (you'll have to switch to -netdev syntax
> > > > for that to work) and see if this help.
> > > > If it does it's likely a qemu bug if not probably a guest bug.
> > > 
> > > Switching to -netdev is non-trivial for me, unfortunately.
> > 
> > Interesting. Why is that?
> 
> Our setup is bond0 <-> vlanX <-> bridgeX <-> [ tap devices ] and we do
> all that outside of qemu at the moment, specifying -net tap,ifname=... -
> we also run some processes on the TAP interface and insert a bunch of
> ebtables rules between creating it and starting qemu. Duplicating that
> with -net bridge seemed close to impossible, and -netdev tap was
> throwing EBUSY from /dev/net/tun. I guess our external magic should be
> using ,fd= instead.

I'm not sure what's wrong with -netdev tap.
You don't have to use fd=, you can specify ifname= with netdev as well.
Here's what I use:

-net nic,model=virtio,netdev=foo
-netdev tap,id=foo,ifname=msttap0,script=/home/mst/ifup,downscript=no,vhost=on

the netdev/id pair above is almost the same as vlan=20
in your example, except there's always exactly one frontend
and one backend in the netdev case, vlans let you connect
more than 2 devices.

> > > Anyway, it's
> > > definitely a qemu bug - it happens on kernels 3.2 and 3.9 with 1.4.1,
> > > but doesn't happen with qemu 0.15.0 or 1.5.0rc1.
> > > 
> > > I'll have a dig through git to see if I can identify the patch that
> > > resolves it. It feels-like qemu sometimes stops reading from the tap
> > > file descriptor between ipxe exiting and the linux kernel bringing up
> > > the network interface, and never recovers from that.
> > > 
> > > /Nick
> > 
> > You can try to bisect, yes.
> 
> Work have decided to accept 1.5.0 when it arrives instead, so I'm afraid
> I won't be working on this after all. 
> 
> /Nick
> 
> 

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2013-05-16 12:09 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-11-22 14:29 tap devices not receiving packets from a bridge Peter Lieven
2012-11-22 14:29 ` [Qemu-devel] " Peter Lieven
2012-11-23  7:02 ` Stefan Hajnoczi
2012-11-23  7:02   ` Stefan Hajnoczi
2012-11-23  9:41   ` Peter Lieven
2012-11-23  9:41     ` Peter Lieven
2012-11-23 11:01     ` Michael S. Tsirkin
2012-11-23 11:02       ` Peter Lieven
2012-11-23 11:02         ` Peter Lieven
2013-01-22  9:04       ` Peter Lieven
2013-01-22  9:43         ` Peter Lieven
2013-01-23 10:03         ` [Qemu-devel] " Michael S. Tsirkin
2013-02-12  7:06           ` Peter Lieven
2013-02-12  9:08             ` [Qemu-devel] " Michael S. Tsirkin
2013-02-12  9:10               ` Peter Lieven
2013-02-12  9:29                 ` Michael S. Tsirkin
2013-02-12  9:39                 ` Michael Tokarev
2013-02-12  9:54                   ` Michael S. Tsirkin
2013-02-12 10:11                     ` [Qemu-devel] " Peter Lieven
2013-02-12 10:43                       ` Michael S. Tsirkin
2013-05-14 14:21             ` Nicholas Thomas
2013-05-14 14:21               ` [Qemu-devel] " Nicholas Thomas
2013-05-14 14:28               ` Peter Lieven
2013-05-14 14:28                 ` [Qemu-devel] " Peter Lieven
2013-05-14 14:49                 ` Nicholas Thomas
2013-05-14 14:49                   ` [Qemu-devel] " Nicholas Thomas
2013-05-15 11:00                   ` Nicholas Thomas
2013-05-15 11:00                     ` Nicholas Thomas
2013-05-16  6:24                     ` Michael S. Tsirkin
2013-05-16  6:24                       ` Michael S. Tsirkin
2013-05-16  6:27                       ` Michael S. Tsirkin
2013-05-16  6:27                         ` Michael S. Tsirkin
2013-05-16  8:20                         ` Nicholas Thomas
2013-05-16  8:20                           ` Nicholas Thomas
2013-05-16  8:40                           ` Michael S. Tsirkin
2013-05-16  8:40                             ` Michael S. Tsirkin
2013-05-16  8:47                             ` Peter Lieven
2013-05-16  8:47                               ` Peter Lieven
2013-05-16 11:27                             ` Nicholas Thomas
2013-05-16 11:27                               ` Nicholas Thomas
2013-05-16 12:09                               ` Michael S. Tsirkin
2013-05-16 12:09                                 ` Michael S. Tsirkin
2012-11-29 18:58   ` Peter Lieven
2012-11-29 18:58     ` Peter Lieven

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.