* [PATCH] macvlan/macvtap: Fix vlan tagging on user read
@ 2012-04-18 18:34 Basil Gor
2012-04-18 18:54 ` Eric W. Biederman
0 siblings, 1 reply; 20+ messages in thread
From: Basil Gor @ 2012-04-18 18:34 UTC (permalink / raw)
To: netdev; +Cc: David S. Miller, Eric W. Biederman, Basil Gor
Vlan tag is restored during buffer transmit to a network device (bridge
port) in bridging code in case of tun/tap driver. In case of macvtap it
has to be done explicitly. Otherwise vlan_tci is ignored and user always
gets untagged packets.
Scenario tested:
kvm guests (that use vlans) migration from bridged network to macvtap
revealed that packets delivered to guests are always untagged. Dumping
and comparing sk_buff in case of tap and macvtap driver showed that
macvtap does not restore vlan_tci.
With current patch applied I was able to get working network, kvm guests
get correctly tagged packets and can reach each other when macvtap in
bridge mode (both with no vlans and through vlan interfaces).
Signed-off-by: Basil Gor <basilgor@gmail.com>
---
drivers/net/macvtap.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 0427c65..a6802b9 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1,6 +1,7 @@
#include <linux/etherdevice.h>
#include <linux/if_macvlan.h>
#include <linux/interrupt.h>
+#include <linux/if_vlan.h>
#include <linux/nsproxy.h>
#include <linux/compat.h>
#include <linux/if_tun.h>
@@ -254,6 +255,14 @@ static int macvtap_forward(struct net_device *dev, struct sk_buff *skb)
if (skb_queue_len(&q->sk.sk_receive_queue) >= dev->tx_queue_len)
goto drop;
+ if (vlan_tx_tag_present(skb)) {
+ skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
+ if (unlikely(!skb))
+ return NET_RX_DROP;
+
+ skb->vlan_tci = 0;
+ }
+
skb_queue_tail(&q->sk.sk_receive_queue, skb);
wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN | POLLRDNORM | POLLRDBAND);
return NET_RX_SUCCESS;
--
1.7.6.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH] macvlan/macvtap: Fix vlan tagging on user read
2012-04-18 18:34 [PATCH] macvlan/macvtap: Fix vlan tagging on user read Basil Gor
@ 2012-04-18 18:54 ` Eric W. Biederman
2012-04-18 19:33 ` Basil Gor
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2012-04-18 18:54 UTC (permalink / raw)
To: Basil Gor; +Cc: netdev, David S. Miller, Basil Gor
Basil Gor <basil.gor@gmail.com> writes:
> Vlan tag is restored during buffer transmit to a network device (bridge
> port) in bridging code in case of tun/tap driver. In case of macvtap it
> has to be done explicitly. Otherwise vlan_tci is ignored and user always
> gets untagged packets.
>
> Scenario tested:
> kvm guests (that use vlans) migration from bridged network to macvtap
> revealed that packets delivered to guests are always untagged. Dumping
> and comparing sk_buff in case of tap and macvtap driver showed that
> macvtap does not restore vlan_tci.
>
> With current patch applied I was able to get working network, kvm guests
> get correctly tagged packets and can reach each other when macvtap in
> bridge mode (both with no vlans and through vlan interfaces).
My first impression is that this is the wrong place to add a vlan
header back.
You need to keep the vlan information in vlan_tci until just
before the packet is delivered to userspace. Which would suggest
the best place for these games is macvtap_put_user.
Elsewhere vlan headers should not be explicitly stored in the packet.
At least that was the rule last I looked.
Eric
> Signed-off-by: Basil Gor <basilgor@gmail.com>
> ---
> drivers/net/macvtap.c | 9 +++++++++
> 1 files changed, 9 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 0427c65..a6802b9 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1,6 +1,7 @@
> #include <linux/etherdevice.h>
> #include <linux/if_macvlan.h>
> #include <linux/interrupt.h>
> +#include <linux/if_vlan.h>
> #include <linux/nsproxy.h>
> #include <linux/compat.h>
> #include <linux/if_tun.h>
> @@ -254,6 +255,14 @@ static int macvtap_forward(struct net_device *dev, struct sk_buff *skb)
> if (skb_queue_len(&q->sk.sk_receive_queue) >= dev->tx_queue_len)
> goto drop;
>
> + if (vlan_tx_tag_present(skb)) {
> + skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> + if (unlikely(!skb))
> + return NET_RX_DROP;
> +
> + skb->vlan_tci = 0;
> + }
> +
> skb_queue_tail(&q->sk.sk_receive_queue, skb);
> wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN | POLLRDNORM | POLLRDBAND);
> return NET_RX_SUCCESS;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] macvlan/macvtap: Fix vlan tagging on user read
2012-04-18 18:54 ` Eric W. Biederman
@ 2012-04-18 19:33 ` Basil Gor
2012-04-20 23:11 ` Basil Gor
0 siblings, 1 reply; 20+ messages in thread
From: Basil Gor @ 2012-04-18 19:33 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, David S. Miller
On Wed, Apr 18, 2012 at 11:54:52AM -0700, Eric W. Biederman wrote:
> Basil Gor <basil.gor@gmail.com> writes:
>
> > Vlan tag is restored during buffer transmit to a network device (bridge
> > port) in bridging code in case of tun/tap driver. In case of macvtap it
> > has to be done explicitly. Otherwise vlan_tci is ignored and user always
> > gets untagged packets.
> >
> > Scenario tested:
> > kvm guests (that use vlans) migration from bridged network to macvtap
> > revealed that packets delivered to guests are always untagged. Dumping
> > and comparing sk_buff in case of tap and macvtap driver showed that
> > macvtap does not restore vlan_tci.
> >
> > With current patch applied I was able to get working network, kvm guests
> > get correctly tagged packets and can reach each other when macvtap in
> > bridge mode (both with no vlans and through vlan interfaces).
>
> My first impression is that this is the wrong place to add a vlan
> header back.
>
> You need to keep the vlan information in vlan_tci until just
> before the packet is delivered to userspace. Which would suggest
> the best place for these games is macvtap_put_user.
>
> Elsewhere vlan headers should not be explicitly stored in the packet.
>
> At least that was the rule last I looked.
>
> Eric
>
>
This sounds right, and macvtap_put_user was the first place where I
put vlan header adding. But qemu-kvm does smth like get pending data
size and then read, and when I put code in macvtap_put_user qemu
supplied buffer 4 bytes smaller then needed and packets were
truncated. On the other hand tun/tap driver never keeps vlan info in
vlan_tci because you can't do any vlan operations on it I think. So I
decided to restore vlan header just before adding it to macvtap queue.
But I'll try to look deeper in it.
Thanks
> > Signed-off-by: Basil Gor <basilgor@gmail.com>
> > ---
> > drivers/net/macvtap.c | 9 +++++++++
> > 1 files changed, 9 insertions(+), 0 deletions(-)
> >
> > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > index 0427c65..a6802b9 100644
> > --- a/drivers/net/macvtap.c
> > +++ b/drivers/net/macvtap.c
> > @@ -1,6 +1,7 @@
> > #include <linux/etherdevice.h>
> > #include <linux/if_macvlan.h>
> > #include <linux/interrupt.h>
> > +#include <linux/if_vlan.h>
> > #include <linux/nsproxy.h>
> > #include <linux/compat.h>
> > #include <linux/if_tun.h>
> > @@ -254,6 +255,14 @@ static int macvtap_forward(struct net_device *dev, struct sk_buff *skb)
> > if (skb_queue_len(&q->sk.sk_receive_queue) >= dev->tx_queue_len)
> > goto drop;
> >
> > + if (vlan_tx_tag_present(skb)) {
> > + skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> > + if (unlikely(!skb))
> > + return NET_RX_DROP;
> > +
> > + skb->vlan_tci = 0;
> > + }
> > +
> > skb_queue_tail(&q->sk.sk_receive_queue, skb);
> > wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN | POLLRDNORM | POLLRDBAND);
> > return NET_RX_SUCCESS;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] macvlan/macvtap: Fix vlan tagging on user read
2012-04-18 19:33 ` Basil Gor
@ 2012-04-20 23:11 ` Basil Gor
2012-04-21 1:49 ` Eric W. Biederman
0 siblings, 1 reply; 20+ messages in thread
From: Basil Gor @ 2012-04-20 23:11 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: netdev, David S. Miller
I did some additional code review, and it's easier to show on stack traces and
by comparing macvtap with tun/tap driver.
tun/tap device does not need to care about vlan tag stuff, as it gets skb with
vlan id in the header and vlan_tci is not used.
[97493.070321] tun_net_xmit devname vnet0 vlan_tci 0 vlan 0 proto 8100 len 64
[97493.070327] Pid: 0, comm: swapper/2 Tainted: G O 3.3.1-3.fc16.x86_64 #1
[97493.070331] Call Trace:
[97493.070334] <IRQ> [<ffffffffa02c6827>] tun_net_xmit+0x47/0x260 [tun]
[97493.070347] [<ffffffff814e8072>] dev_hard_start_xmit+0x332/0x6d0 <------ __vlan_put_tag is called
[97493.070355] [<ffffffff81503f5a>] sch_direct_xmit+0xfa/0x1d0
[97493.070364] [<ffffffff814e85b5>] dev_queue_xmit+0x1a5/0x640
[97493.070377] [<ffffffffa057c180>] ? br_flood+0xc0/0xc0 [bridge]
[97493.070395] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
[97493.070409] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
[97493.070423] [<ffffffffa057c1ec>] br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
[97493.070438] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
[97493.070457] [<ffffffffa057c242>] br_forward_finish+0x22/0x60 [bridge]
[97493.070471] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
[97493.070485] [<ffffffffa057c3dd>] __br_forward+0x5d/0xb0 [bridge]
[97493.070495] [<ffffffff814dafe4>] ? skb_clone+0x54/0xb0
[97493.070508] [<ffffffffa057bf1e>] deliver_clone+0x3e/0x60 [bridge]
[97493.070523] [<ffffffffa057c143>] br_flood+0x83/0xc0 [bridge]
[97493.070534] [<ffffffffa057c525>] br_flood_forward+0x15/0x20 [bridge]
[97493.070544] [<ffffffffa057d256>] br_handle_frame_finish+0x246/0x2a0 [bridge]
[97493.070555] [<ffffffffa057d444>] br_handle_frame+0x194/0x260 [bridge]
[97493.070567] [<ffffffffa057d2b0>] ? br_handle_frame_finish+0x2a0/0x2a0 [bridge]
[97493.070581] [<ffffffff814e56de>] __netif_receive_skb+0x1be/0x5c0 <------ vlan_untag is called
[97493.070594] [<ffffffff81088ba2>] ? default_wake_function+0x12/0x20
[97493.070604] [<ffffffff814e5ef1>] process_backlog+0xb1/0x170
[97493.070613] [<ffffffff814e718b>] net_rx_action+0x12b/0x270
[97493.070623] [<ffffffff8108daed>] ? sched_clock_cpu+0xbd/0x110
[97493.070633] [<ffffffff8105efb8>] __do_softirq+0xb8/0x230
[97493.070644] [<ffffffff810e3c30>] ? handle_irq_event+0x50/0x70
[97493.070654] [<ffffffff815fd49c>] call_softirq+0x1c/0x30
[97493.070662] [<ffffffff81016455>] do_softirq+0x65/0xa0
[97493.070668] [<ffffffff8105f3ce>] irq_exit+0x9e/0xc0
[97493.070675] [<ffffffff815fdd03>] do_IRQ+0x63/0xe0
[97493.070682] [<ffffffff815f42ae>] common_interrupt+0x6e/0x6e
[97493.070686] <EOI> [<ffffffff8131b236>] ? intel_idle+0xe6/0x150
[97493.070697] [<ffffffff8131b218>] ? intel_idle+0xc8/0x150
[97493.070705] [<ffffffff814a4071>] cpuidle_idle_call+0xc1/0x280
[97493.070713] [<ffffffff8101322f>] cpu_idle+0xcf/0x120
[97493.070720] [<ffffffff815e3b1f>] start_secondary+0x282/0x284
but macvtap device gets skb with vlan tag extacted in vlan_tci, and as original driver
code was mostly based on tun/tap driver vlan thing was missed.
[98143.863560] macvtap_receive devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
[98143.863570] macvtap_forward devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
[98143.863578] Pid: 0, comm: swapper/2 Tainted: G O 3.3.1-3.fc16.x86_64 #1
[98143.863583] Call Trace:
[98143.863587] <IRQ> [<ffffffffa026819c>] macvtap_forward+0x8c/0x1b0 [macvtap]
[98143.863606] [<ffffffffa0268314>] macvtap_receive+0x54/0x60 [macvtap]
[98143.863623] [<ffffffffa02c05db>] macvlan_handle_frame+0xbb/0x2c0 [macvlan]
[98143.863635] [<ffffffffa02c0520>] ? macvlan_broadcast+0x160/0x160 [macvlan]
[98143.863646] [<ffffffff814e56de>] __netif_receive_skb+0x1be/0x5c0 <------ vlan_untag is called
[98143.863653] [<ffffffff814e67e3>] netif_receive_skb+0x23/0x90
[98143.863660] [<ffffffff814e6c09>] ? dev_gro_receive+0x1b9/0x2b0
[98143.863667] [<ffffffff814e6950>] napi_skb_finish+0x50/0x70
[98143.863673] [<ffffffff814e6f45>] napi_gro_receive+0xf5/0x140
[98143.863697] [<ffffffffa0241fab>] e1000_receive_skb+0x5b/0x70 [e1000e]
[98143.863718] [<ffffffffa0244b21>] e1000_clean_rx_irq+0x2f1/0x400 [e1000e]
[98143.863737] [<ffffffffa02432e8>] e1000_clean+0x78/0x2c0 [e1000e]
[98143.863745] [<ffffffff814e718b>] net_rx_action+0x12b/0x270
[98143.863752] [<ffffffff8108daed>] ? sched_clock_cpu+0xbd/0x110
[98143.863759] [<ffffffff8105efb8>] __do_softirq+0xb8/0x230
[98143.863767] [<ffffffff810e3c30>] ? handle_irq_event+0x50/0x70
[98143.863775] [<ffffffff815fd49c>] call_softirq+0x1c/0x30
[98143.863782] [<ffffffff81016455>] do_softirq+0x65/0xa0
[98143.863788] [<ffffffff8105f3ce>] irq_exit+0x9e/0xc0
[98143.863796] [<ffffffff815fdd03>] do_IRQ+0x63/0xe0
[98143.863803] [<ffffffff815f42ae>] common_interrupt+0x6e/0x6e
[98143.863807] <EOI> [<ffffffff8131b236>] ? intel_idle+0xe6/0x150
[98143.863818] [<ffffffff8131b218>] ? intel_idle+0xc8/0x150
[98143.863826] [<ffffffff814a4071>] cpuidle_idle_call+0xc1/0x280
[98143.863834] [<ffffffff8101322f>] cpu_idle+0xcf/0x120
[98143.863841] [<ffffffff815e3b1f>] start_secondary+0x282/0x284
and as Eric Biederman noted, why not add vlan header back at the last moment? in
macvtap_put_user. And it would work for user space applications which read
/dev/tapX, but in kvm case actual reading is done by vhost_net driver. And this
driver actually does skb_peek on macvtap queue to get next packet size before
reading (in handle_rx).
[98143.863878] vhost peek_head_len devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
so, it gets skb len without vlan tag and then performs read with buffer smaller then needed
[98143.863885] macvtap_do_read buflen 102 <--- 90 + (vnet_hdr_sz 12 bytes)
[98143.863889] devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
__vlan_put_tag is called here
[98143.863894] macvtap_do_read reallen 106 <--- 90 + 4 + (vnet_hdr_sz 12)
[98143.863898] devname macvtap0 vlan_tci 0 vlan 0 proto 8100 len 94
[98143.863904] Pid: 7289, comm: vhost-7236 Tainted: G O 3.3.1-3.fc16.x86_64 #1
[98143.863935] Call Trace:
[98143.863944] [<ffffffffa0268593>] macvtap_do_read+0x243/0x420 [macvtap]
[98143.863954] [<ffffffff81088b90>] ? try_to_wake_up+0x2b0/0x2b0
[98143.863962] [<ffffffffa02687ba>] macvtap_recvmsg+0x4a/0x70 [macvtap]
[98143.863971] [<ffffffffa02e149e>] handle_rx+0x39e/0x6e0 [vhost_net]
[98143.863983] [<ffffffffa02e17f5>] handle_rx_net+0x15/0x20 [vhost_net]
[98143.863996] [<ffffffffa02de84c>] vhost_worker+0xcc/0x150 [vhost_net]
[98143.864008] [<ffffffffa02de780>] ? __vhost_add_used_n+0x110/0x110 [vhost_net]
[98143.864020] [<ffffffff81079af3>] kthread+0x93/0xa0
[98143.864032] [<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
[98143.864044] [<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
[98143.864056] [<ffffffff815fd3a0>] ? gs_change+0x13/0x13
things get more interesting when we take another case in account. When one kvm guest sends
packet on the same macvlan to another guest macvtap gets skb with vlan id in the header
and vlan_tci is not used.
[99564.523943] macvtap_forward devname (null) vlan_tci 0 vlan 0 proto 8100 len 94
[99564.523946] Pid: 8849, comm: vhost-8797 Tainted: G O 3.3.1-3.fc16.x86_64 #1
[99564.523947] Call Trace:
[99564.523952] [<ffffffffa02de19c>] macvtap_forward+0x8c/0x1b0 [macvtap]
[99564.523963] [<ffffffffa02c0502>] macvlan_broadcast+0x142/0x160 [macvlan]
[99564.523967] [<ffffffffa02c146d>] macvlan_start_xmit+0x14d/0x178 [macvlan]
[99564.523969] [<ffffffffa02df378>] macvtap_get_user+0x388/0x420 [macvtap]
[99564.523971] [<ffffffffa02df43b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
[99564.523973] [<ffffffffa026bb3d>] handle_tx+0x2dd/0x620 [vhost_net]
[99564.523976] [<ffffffffa026beb5>] handle_tx_kick+0x15/0x20 [vhost_net]
[99564.523978] [<ffffffffa026884c>] vhost_worker+0xcc/0x150 [vhost_net]
[99564.523980] [<ffffffffa0268780>] ? __vhost_add_used_n+0x110/0x110 [vhost_net]
[99564.523984] [<ffffffff81079af3>] kthread+0x93/0xa0
[99564.523987] [<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
[99564.523989] [<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
[99564.523991] [<ffffffff815fd3a0>] ? gs_change+0x13/0x13
[99564.523999] vhost peek_head_len devname (null) vlan_tci 0 vlan 0 proto 8100 len 94
[99564.524003] macvtap_do_read buflen 106
[99564.524004] macvtap_do_read reallen 106
And we definitely want to have common rules for all cases. So we either
1. restore vlan headers from vlan_tci for any packets coming outside of macvlan in
macvtap_receive and we don't need to fix vhost_net and we preserve same vlan id
policy that tun/tap driver have. (my original patch)
or
2. we extract vlan ids for packets coming from the same macvlan, fixing vhost_net to
take vlan_tci into account and restoring vlan headers on macvtap_put_user
or please propose another solution.
Basil Gor
On Wed, Apr 18, 2012 at 11:33:12PM +0400, Basil Gor wrote:
> On Wed, Apr 18, 2012 at 11:54:52AM -0700, Eric W. Biederman wrote:
> > Basil Gor <basilgor@gmail.com> writes:
> >
> > > Vlan tag is restored during buffer transmit to a network device (bridge
> > > port) in bridging code in case of tun/tap driver. In case of macvtap it
> > > has to be done explicitly. Otherwise vlan_tci is ignored and user always
> > > gets untagged packets.
> > >
> > > Scenario tested:
> > > kvm guests (that use vlans) migration from bridged network to macvtap
> > > revealed that packets delivered to guests are always untagged. Dumping
> > > and comparing sk_buff in case of tap and macvtap driver showed that
> > > macvtap does not restore vlan_tci.
> > >
> > > With current patch applied I was able to get working network, kvm guests
> > > get correctly tagged packets and can reach each other when macvtap in
> > > bridge mode (both with no vlans and through vlan interfaces).
> >
> > My first impression is that this is the wrong place to add a vlan
> > header back.
> >
> > You need to keep the vlan information in vlan_tci until just
> > before the packet is delivered to userspace. Which would suggest
> > the best place for these games is macvtap_put_user.
> >
> > Elsewhere vlan headers should not be explicitly stored in the packet.
> >
> > At least that was the rule last I looked.
> >
> > Eric
> >
> >
> This sounds right, and macvtap_put_user was the first place where I
> put vlan header adding. But qemu-kvm does smth like get pending data
> size and then read, and when I put code in macvtap_put_user qemu
> supplied buffer 4 bytes smaller then needed and packets were
> truncated. On the other hand tun/tap driver never keeps vlan info in
> vlan_tci because you can't do any vlan operations on it I think. So I
> decided to restore vlan header just before adding it to macvtap queue.
>
> But I'll try to look deeper in it.
> Thanks
> > > Signed-off-by: Basil Gor <basilgor@gmail.com>
> > > ---
> > > drivers/net/macvtap.c | 9 +++++++++
> > > 1 files changed, 9 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > > index 0427c65..a6802b9 100644
> > > --- a/drivers/net/macvtap.c
> > > +++ b/drivers/net/macvtap.c
> > > @@ -1,6 +1,7 @@
> > > #include <linux/etherdevice.h>
> > > #include <linux/if_macvlan.h>
> > > #include <linux/interrupt.h>
> > > +#include <linux/if_vlan.h>
> > > #include <linux/nsproxy.h>
> > > #include <linux/compat.h>
> > > #include <linux/if_tun.h>
> > > @@ -254,6 +255,14 @@ static int macvtap_forward(struct net_device *dev, struct sk_buff *skb)
> > > if (skb_queue_len(&q->sk.sk_receive_queue) >= dev->tx_queue_len)
> > > goto drop;
> > >
> > > + if (vlan_tx_tag_present(skb)) {
> > > + skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> > > + if (unlikely(!skb))
> > > + return NET_RX_DROP;
> > > +
> > > + skb->vlan_tci = 0;
> > > + }
> > > +
> > > skb_queue_tail(&q->sk.sk_receive_queue, skb);
> > > wake_up_interruptible_poll(sk_sleep(&q->sk), POLLIN | POLLRDNORM | POLLRDBAND);
> > > return NET_RX_SUCCESS;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] macvlan/macvtap: Fix vlan tagging on user read
2012-04-20 23:11 ` Basil Gor
@ 2012-04-21 1:49 ` Eric W. Biederman
2012-04-25 17:01 ` [PATCH v3 1/2] vhost-net: fix handle_rx buffer size Basil Gor
2012-04-25 17:01 ` [PATCH v3 2/2] macvtap: restore vlan header on user read Basil Gor
0 siblings, 2 replies; 20+ messages in thread
From: Eric W. Biederman @ 2012-04-21 1:49 UTC (permalink / raw)
To: Basil Gor; +Cc: netdev, David S. Miller
Basil Gor <basil.gor@gmail.com> writes:
> I did some additional code review, and it's easier to show on stack traces and
> by comparing macvtap with tun/tap driver.
>
> tun/tap device does not need to care about vlan tag stuff, as it gets skb with
> vlan id in the header and vlan_tci is not used.
>
> [97493.070321] tun_net_xmit devname vnet0 vlan_tci 0 vlan 0 proto 8100 len 64
> [97493.070327] Pid: 0, comm: swapper/2 Tainted: G O 3.3.1-3.fc16.x86_64 #1
> [97493.070331] Call Trace:
> [97493.070334] <IRQ> [<ffffffffa02c6827>] tun_net_xmit+0x47/0x260 [tun]
> [97493.070347] [<ffffffff814e8072>] dev_hard_start_xmit+0x332/0x6d0 <------ __vlan_put_tag is called
> [97493.070355] [<ffffffff81503f5a>] sch_direct_xmit+0xfa/0x1d0
> [97493.070364] [<ffffffff814e85b5>] dev_queue_xmit+0x1a5/0x640
> [97493.070377] [<ffffffffa057c180>] ? br_flood+0xc0/0xc0 [bridge]
> [97493.070395] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070409] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070423] [<ffffffffa057c1ec>] br_dev_queue_push_xmit+0x6c/0xa0 [bridge]
> [97493.070438] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070457] [<ffffffffa057c242>] br_forward_finish+0x22/0x60 [bridge]
> [97493.070471] [<ffffffffa057c380>] ? __br_deliver+0x100/0x100 [bridge]
> [97493.070485] [<ffffffffa057c3dd>] __br_forward+0x5d/0xb0 [bridge]
> [97493.070495] [<ffffffff814dafe4>] ? skb_clone+0x54/0xb0
> [97493.070508] [<ffffffffa057bf1e>] deliver_clone+0x3e/0x60 [bridge]
> [97493.070523] [<ffffffffa057c143>] br_flood+0x83/0xc0 [bridge]
> [97493.070534] [<ffffffffa057c525>] br_flood_forward+0x15/0x20 [bridge]
> [97493.070544] [<ffffffffa057d256>] br_handle_frame_finish+0x246/0x2a0 [bridge]
> [97493.070555] [<ffffffffa057d444>] br_handle_frame+0x194/0x260 [bridge]
> [97493.070567] [<ffffffffa057d2b0>] ? br_handle_frame_finish+0x2a0/0x2a0 [bridge]
> [97493.070581] [<ffffffff814e56de>] __netif_receive_skb+0x1be/0x5c0 <------ vlan_untag is called
> [97493.070594] [<ffffffff81088ba2>] ? default_wake_function+0x12/0x20
> [97493.070604] [<ffffffff814e5ef1>] process_backlog+0xb1/0x170
> [97493.070613] [<ffffffff814e718b>] net_rx_action+0x12b/0x270
> [97493.070623] [<ffffffff8108daed>] ? sched_clock_cpu+0xbd/0x110
> [97493.070633] [<ffffffff8105efb8>] __do_softirq+0xb8/0x230
> [97493.070644] [<ffffffff810e3c30>] ? handle_irq_event+0x50/0x70
> [97493.070654] [<ffffffff815fd49c>] call_softirq+0x1c/0x30
> [97493.070662] [<ffffffff81016455>] do_softirq+0x65/0xa0
> [97493.070668] [<ffffffff8105f3ce>] irq_exit+0x9e/0xc0
> [97493.070675] [<ffffffff815fdd03>] do_IRQ+0x63/0xe0
> [97493.070682] [<ffffffff815f42ae>] common_interrupt+0x6e/0x6e
> [97493.070686] <EOI> [<ffffffff8131b236>] ? intel_idle+0xe6/0x150
> [97493.070697] [<ffffffff8131b218>] ? intel_idle+0xc8/0x150
> [97493.070705] [<ffffffff814a4071>] cpuidle_idle_call+0xc1/0x280
> [97493.070713] [<ffffffff8101322f>] cpu_idle+0xcf/0x120
> [97493.070720] [<ffffffff815e3b1f>] start_secondary+0x282/0x284
>
> but macvtap device gets skb with vlan tag extacted in vlan_tci, and as original driver
> code was mostly based on tun/tap driver vlan thing was missed.
>
> [98143.863560] macvtap_receive devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
> [98143.863570] macvtap_forward devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
> [98143.863578] Pid: 0, comm: swapper/2 Tainted: G O 3.3.1-3.fc16.x86_64 #1
> [98143.863583] Call Trace:
> [98143.863587] <IRQ> [<ffffffffa026819c>] macvtap_forward+0x8c/0x1b0 [macvtap]
> [98143.863606] [<ffffffffa0268314>] macvtap_receive+0x54/0x60 [macvtap]
> [98143.863623] [<ffffffffa02c05db>] macvlan_handle_frame+0xbb/0x2c0 [macvlan]
> [98143.863635] [<ffffffffa02c0520>] ? macvlan_broadcast+0x160/0x160 [macvlan]
> [98143.863646] [<ffffffff814e56de>] __netif_receive_skb+0x1be/0x5c0 <------ vlan_untag is called
> [98143.863653] [<ffffffff814e67e3>] netif_receive_skb+0x23/0x90
> [98143.863660] [<ffffffff814e6c09>] ? dev_gro_receive+0x1b9/0x2b0
> [98143.863667] [<ffffffff814e6950>] napi_skb_finish+0x50/0x70
> [98143.863673] [<ffffffff814e6f45>] napi_gro_receive+0xf5/0x140
> [98143.863697] [<ffffffffa0241fab>] e1000_receive_skb+0x5b/0x70 [e1000e]
Actually __vlan_hwaccel_put_tag is called here not vlan_untag
> [98143.863718] [<ffffffffa0244b21>] e1000_clean_rx_irq+0x2f1/0x400 [e1000e]
> [98143.863737] [<ffffffffa02432e8>] e1000_clean+0x78/0x2c0 [e1000e]
> [98143.863745] [<ffffffff814e718b>] net_rx_action+0x12b/0x270
> [98143.863752] [<ffffffff8108daed>] ? sched_clock_cpu+0xbd/0x110
> [98143.863759] [<ffffffff8105efb8>] __do_softirq+0xb8/0x230
> [98143.863767] [<ffffffff810e3c30>] ? handle_irq_event+0x50/0x70
> [98143.863775] [<ffffffff815fd49c>] call_softirq+0x1c/0x30
> [98143.863782] [<ffffffff81016455>] do_softirq+0x65/0xa0
> [98143.863788] [<ffffffff8105f3ce>] irq_exit+0x9e/0xc0
> [98143.863796] [<ffffffff815fdd03>] do_IRQ+0x63/0xe0
> [98143.863803] [<ffffffff815f42ae>] common_interrupt+0x6e/0x6e
> [98143.863807] <EOI> [<ffffffff8131b236>] ? intel_idle+0xe6/0x150
> [98143.863818] [<ffffffff8131b218>] ? intel_idle+0xc8/0x150
> [98143.863826] [<ffffffff814a4071>] cpuidle_idle_call+0xc1/0x280
> [98143.863834] [<ffffffff8101322f>] cpu_idle+0xcf/0x120
> [98143.863841] [<ffffffff815e3b1f>] start_secondary+0x282/0x284
>
> and as Eric Biederman noted, why not add vlan header back at the last moment? in
> macvtap_put_user. And it would work for user space applications which read
> /dev/tapX, but in kvm case actual reading is done by vhost_net driver. And this
> driver actually does skb_peek on macvtap queue to get next packet size before
> reading (in handle_rx).
>
> [98143.863878] vhost peek_head_len devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
>
> so, it gets skb len without vlan tag and then performs read with buffer smaller then needed
>
> [98143.863885] macvtap_do_read buflen 102 <--- 90 + (vnet_hdr_sz 12 bytes)
> [98143.863889] devname macvtap0 vlan_tci 100a vlan 10 proto 0800 len 90
> __vlan_put_tag is called here
> [98143.863894] macvtap_do_read reallen 106 <--- 90 + 4 + (vnet_hdr_sz 12)
> [98143.863898] devname macvtap0 vlan_tci 0 vlan 0 proto 8100 len 94
> [98143.863904] Pid: 7289, comm: vhost-7236 Tainted: G O 3.3.1-3.fc16.x86_64 #1
> [98143.863935] Call Trace:
> [98143.863944] [<ffffffffa0268593>] macvtap_do_read+0x243/0x420 [macvtap]
> [98143.863954] [<ffffffff81088b90>] ? try_to_wake_up+0x2b0/0x2b0
> [98143.863962] [<ffffffffa02687ba>] macvtap_recvmsg+0x4a/0x70 [macvtap]
> [98143.863971] [<ffffffffa02e149e>] handle_rx+0x39e/0x6e0 [vhost_net]
> [98143.863983] [<ffffffffa02e17f5>] handle_rx_net+0x15/0x20 [vhost_net]
> [98143.863996] [<ffffffffa02de84c>] vhost_worker+0xcc/0x150 [vhost_net]
> [98143.864008] [<ffffffffa02de780>] ? __vhost_add_used_n+0x110/0x110 [vhost_net]
> [98143.864020] [<ffffffff81079af3>] kthread+0x93/0xa0
> [98143.864032] [<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
> [98143.864044] [<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
> [98143.864056] [<ffffffff815fd3a0>] ? gs_change+0x13/0x13
>
> things get more interesting when we take another case in account. When one kvm guest sends
> packet on the same macvlan to another guest macvtap gets skb with vlan id in the header
> and vlan_tci is not used.
>
> [99564.523943] macvtap_forward devname (null) vlan_tci 0 vlan 0 proto 8100 len 94
> [99564.523946] Pid: 8849, comm: vhost-8797 Tainted: G O 3.3.1-3.fc16.x86_64 #1
> [99564.523947] Call Trace:
> [99564.523952] [<ffffffffa02de19c>] macvtap_forward+0x8c/0x1b0 [macvtap]
> [99564.523963] [<ffffffffa02c0502>] macvlan_broadcast+0x142/0x160 [macvlan]
> [99564.523967] [<ffffffffa02c146d>] macvlan_start_xmit+0x14d/0x178 [macvlan]
> [99564.523969] [<ffffffffa02df378>] macvtap_get_user+0x388/0x420 [macvtap]
> [99564.523971] [<ffffffffa02df43b>] macvtap_sendmsg+0x2b/0x30 [macvtap]
> [99564.523973] [<ffffffffa026bb3d>] handle_tx+0x2dd/0x620 [vhost_net]
> [99564.523976] [<ffffffffa026beb5>] handle_tx_kick+0x15/0x20 [vhost_net]
> [99564.523978] [<ffffffffa026884c>] vhost_worker+0xcc/0x150 [vhost_net]
> [99564.523980] [<ffffffffa0268780>] ? __vhost_add_used_n+0x110/0x110 [vhost_net]
> [99564.523984] [<ffffffff81079af3>] kthread+0x93/0xa0
> [99564.523987] [<ffffffff815fd3a4>] kernel_thread_helper+0x4/0x10
> [99564.523989] [<ffffffff81079a60>] ? kthread_freezable_should_stop+0x70/0x70
> [99564.523991] [<ffffffff815fd3a0>] ? gs_change+0x13/0x13
> [99564.523999] vhost peek_head_len devname (null) vlan_tci 0 vlan 0 proto 8100 len 94
> [99564.524003] macvtap_do_read buflen 106
> [99564.524004] macvtap_do_read reallen 106
>
> And we definitely want to have common rules for all cases. So we either
> 1. restore vlan headers from vlan_tci for any packets coming outside of macvlan in
> macvtap_receive and we don't need to fix vhost_net and we preserve same vlan id
> policy that tun/tap driver have. (my original patch)
> or
> 2. we extract vlan ids for packets coming from the same macvlan, fixing vhost_net to
> take vlan_tci into account and restoring vlan headers on
> macvtap_put_user
And 2 seems to be the right answer.
Not long ago we went a few rounds with this with the core parts of the
networking stack and the rule that evolved was that we always strip the
vlan tag. Which is why we have the vlan_untag call in __netif_receive_skb()
to handle the small handful of networking devices that don't.
vhost/net.c is just buggy. PF_PACKET sockets have been returning
the vlan_id in ancillary data from year hardware like the e1000 since
at least 2005.
So I agree that both macvtap and vhost/net.c both need to be fixed.
Eric
^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH v3 1/2] vhost-net: fix handle_rx buffer size
2012-04-21 1:49 ` Eric W. Biederman
@ 2012-04-25 17:01 ` Basil Gor
2012-04-26 5:30 ` Eric W. Biederman
2012-05-03 13:16 ` Michael S. Tsirkin
2012-04-25 17:01 ` [PATCH v3 2/2] macvtap: restore vlan header on user read Basil Gor
1 sibling, 2 replies; 20+ messages in thread
From: Basil Gor @ 2012-04-25 17:01 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: David S. Miller, netdev
Take vlan header length into account, when vlan id is stored as
vlan_tci. Otherwise tagged packets comming from macvtap will be
truncated.
Signed-off-by: Basil Gor <basil.gor@gmail.com>
---
drivers/vhost/net.c | 7 ++++++-
1 files changed, 6 insertions(+), 1 deletions(-)
diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 1f21d2a..5c17010 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -24,6 +24,7 @@
#include <linux/if_arp.h>
#include <linux/if_tun.h>
#include <linux/if_macvlan.h>
+#include <linux/if_vlan.h>
#include <net/sock.h>
@@ -283,8 +284,12 @@ static int peek_head_len(struct sock *sk)
spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
head = skb_peek(&sk->sk_receive_queue);
- if (likely(head))
+ if (likely(head)) {
len = head->len;
+ if (vlan_tx_tag_present(head))
+ len += VLAN_HLEN;
+ }
+
spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
return len;
}
--
1.7.6.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-04-21 1:49 ` Eric W. Biederman
2012-04-25 17:01 ` [PATCH v3 1/2] vhost-net: fix handle_rx buffer size Basil Gor
@ 2012-04-25 17:01 ` Basil Gor
2012-04-26 5:31 ` Eric W. Biederman
1 sibling, 1 reply; 20+ messages in thread
From: Basil Gor @ 2012-04-25 17:01 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: David S. Miller, netdev
Vlan tag is restored during buffer transmit to a network device (bridge
port) in bridging code in case of tun/tap driver. In case of macvtap it
has to be done explicitly. Otherwise vlan_tci is ignored and user always
gets untagged packets.
Signed-off-by: Basil Gor <basil.gor@gmail.com>
---
drivers/net/macvtap.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 0427c65..28d2678 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1,6 +1,7 @@
#include <linux/etherdevice.h>
#include <linux/if_macvlan.h>
#include <linux/interrupt.h>
+#include <linux/if_vlan.h>
#include <linux/nsproxy.h>
#include <linux/compat.h>
#include <linux/if_tun.h>
@@ -753,13 +754,21 @@ static ssize_t macvtap_aio_write(struct kiocb *iocb, const struct iovec *iv,
/* Put packet to the user space buffer */
static ssize_t macvtap_put_user(struct macvtap_queue *q,
- const struct sk_buff *skb,
+ struct sk_buff *skb,
const struct iovec *iv, int len)
{
struct macvlan_dev *vlan;
int ret;
int vnet_hdr_len = 0;
+ if (vlan_tx_tag_present(skb)) {
+ skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
+ if (unlikely(!skb))
+ return -ENOMEM;
+
+ skb->vlan_tci = 0;
+ }
+
if (q->flags & IFF_VNET_HDR) {
struct virtio_net_hdr vnet_hdr;
vnet_hdr_len = q->vnet_hdr_sz;
--
1.7.6.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/2] vhost-net: fix handle_rx buffer size
2012-04-25 17:01 ` [PATCH v3 1/2] vhost-net: fix handle_rx buffer size Basil Gor
@ 2012-04-26 5:30 ` Eric W. Biederman
2012-05-03 12:39 ` Michael S. Tsirkin
2012-05-03 13:16 ` Michael S. Tsirkin
1 sibling, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2012-04-26 5:30 UTC (permalink / raw)
To: Basil Gor; +Cc: David S. Miller, netdev
Basil Gor <basil.gor@gmail.com> writes:
> Take vlan header length into account, when vlan id is stored as
> vlan_tci. Otherwise tagged packets comming from macvtap will be
> truncated.
Better.
In recvmsg you are still missing handling of PACKET_AUXDATA
that includes aux.tp_vlan_tci === vlan_id and
aux.tp_status === TP_STATUS_VLAN_VALID.
That is the way pf_packet sockets report the vlan. Not
in the actual packet itself.
Eric
> Signed-off-by: Basil Gor <basil.gor@gmail.com>
> ---
> drivers/vhost/net.c | 7 ++++++-
> 1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 1f21d2a..5c17010 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -24,6 +24,7 @@
> #include <linux/if_arp.h>
> #include <linux/if_tun.h>
> #include <linux/if_macvlan.h>
> +#include <linux/if_vlan.h>
>
> #include <net/sock.h>
>
> @@ -283,8 +284,12 @@ static int peek_head_len(struct sock *sk)
>
> spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
> head = skb_peek(&sk->sk_receive_queue);
> - if (likely(head))
> + if (likely(head)) {
> len = head->len;
> + if (vlan_tx_tag_present(head))
> + len += VLAN_HLEN;
> + }
> +
> spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
> return len;
> }
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-04-25 17:01 ` [PATCH v3 2/2] macvtap: restore vlan header on user read Basil Gor
@ 2012-04-26 5:31 ` Eric W. Biederman
2012-05-03 13:07 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2012-04-26 5:31 UTC (permalink / raw)
To: Basil Gor; +Cc: David S. Miller, netdev
Basil Gor <basil.gor@gmail.com> writes:
> Vlan tag is restored during buffer transmit to a network device (bridge
> port) in bridging code in case of tun/tap driver. In case of macvtap it
> has to be done explicitly. Otherwise vlan_tci is ignored and user always
> gets untagged packets.
We could quibble about efficiencies but this looks good except for
macvtap_recvmsg which isn't setting the auxdata for the vlan header.
Eric
> Signed-off-by: Basil Gor <basil.gor@gmail.com>
> ---
> drivers/net/macvtap.c | 11 ++++++++++-
> 1 files changed, 10 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 0427c65..28d2678 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1,6 +1,7 @@
> #include <linux/etherdevice.h>
> #include <linux/if_macvlan.h>
> #include <linux/interrupt.h>
> +#include <linux/if_vlan.h>
> #include <linux/nsproxy.h>
> #include <linux/compat.h>
> #include <linux/if_tun.h>
> @@ -753,13 +754,21 @@ static ssize_t macvtap_aio_write(struct kiocb *iocb, const struct iovec *iv,
>
> /* Put packet to the user space buffer */
> static ssize_t macvtap_put_user(struct macvtap_queue *q,
> - const struct sk_buff *skb,
> + struct sk_buff *skb,
> const struct iovec *iv, int len)
> {
> struct macvlan_dev *vlan;
> int ret;
> int vnet_hdr_len = 0;
>
> + if (vlan_tx_tag_present(skb)) {
> + skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> + if (unlikely(!skb))
> + return -ENOMEM;
> +
> + skb->vlan_tci = 0;
> + }
> +
> if (q->flags & IFF_VNET_HDR) {
> struct virtio_net_hdr vnet_hdr;
> vnet_hdr_len = q->vnet_hdr_sz;
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/2] vhost-net: fix handle_rx buffer size
2012-04-26 5:30 ` Eric W. Biederman
@ 2012-05-03 12:39 ` Michael S. Tsirkin
0 siblings, 0 replies; 20+ messages in thread
From: Michael S. Tsirkin @ 2012-05-03 12:39 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Basil Gor, David S. Miller, netdev
On Wed, Apr 25, 2012 at 10:30:10PM -0700, Eric W. Biederman wrote:
> Basil Gor <basil.gor@gmail.com> writes:
>
> > Take vlan header length into account, when vlan id is stored as
> > vlan_tci. Otherwise tagged packets comming from macvtap will be
> > truncated.
>
> Better.
>
> In recvmsg you are still missing handling of PACKET_AUXDATA
> that includes aux.tp_vlan_tci === vlan_id and
> aux.tp_status === TP_STATUS_VLAN_VALID.
>
> That is the way pf_packet sockets report the vlan. Not
> in the actual packet itself.
>
> Eric
And vhost can't handle this without userspace changes,
because put_cmsg expects a userspace pointer.
So maybe, until someone says they care about this bug,
for now it's enough to check
msg->msg_flags & MSG_CTRUNC
and error out in vhost.
Either this, or add a flag to put_cmsg that will
cause it to do memcpy instead of copy to user.
> > Signed-off-by: Basil Gor <basil.gor@gmail.com>
> > ---
> > drivers/vhost/net.c | 7 ++++++-
> > 1 files changed, 6 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 1f21d2a..5c17010 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -24,6 +24,7 @@
> > #include <linux/if_arp.h>
> > #include <linux/if_tun.h>
> > #include <linux/if_macvlan.h>
> > +#include <linux/if_vlan.h>
> >
> > #include <net/sock.h>
> >
> > @@ -283,8 +284,12 @@ static int peek_head_len(struct sock *sk)
> >
> > spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
> > head = skb_peek(&sk->sk_receive_queue);
> > - if (likely(head))
> > + if (likely(head)) {
> > len = head->len;
> > + if (vlan_tx_tag_present(head))
> > + len += VLAN_HLEN;
> > + }
> > +
> > spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
> > return len;
> > }
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-04-26 5:31 ` Eric W. Biederman
@ 2012-05-03 13:07 ` Michael S. Tsirkin
2012-05-03 13:37 ` Eric W. Biederman
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2012-05-03 13:07 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Basil Gor, David S. Miller, netdev
On Wed, Apr 25, 2012 at 10:31:25PM -0700, Eric W. Biederman wrote:
> Basil Gor <basil.gor@gmail.com> writes:
>
> > Vlan tag is restored during buffer transmit to a network device (bridge
> > port) in bridging code in case of tun/tap driver. In case of macvtap it
> > has to be done explicitly. Otherwise vlan_tci is ignored and user always
> > gets untagged packets.
>
> We could quibble about efficiencies but this looks good except for
> macvtap_recvmsg which isn't setting the auxdata for the vlan header.
>
> Eric
Right. I'm guessing we need to support old userspace
so if there's auxdata, put vlan there but if not,
put the vlan in the packet like this patch does.
> > Signed-off-by: Basil Gor <basil.gor@gmail.com>
> > ---
> > drivers/net/macvtap.c | 11 ++++++++++-
> > 1 files changed, 10 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > index 0427c65..28d2678 100644
> > --- a/drivers/net/macvtap.c
> > +++ b/drivers/net/macvtap.c
> > @@ -1,6 +1,7 @@
> > #include <linux/etherdevice.h>
> > #include <linux/if_macvlan.h>
> > #include <linux/interrupt.h>
> > +#include <linux/if_vlan.h>
> > #include <linux/nsproxy.h>
> > #include <linux/compat.h>
> > #include <linux/if_tun.h>
> > @@ -753,13 +754,21 @@ static ssize_t macvtap_aio_write(struct kiocb *iocb, const struct iovec *iv,
> >
> > /* Put packet to the user space buffer */
> > static ssize_t macvtap_put_user(struct macvtap_queue *q,
> > - const struct sk_buff *skb,
> > + struct sk_buff *skb,
> > const struct iovec *iv, int len)
> > {
> > struct macvlan_dev *vlan;
> > int ret;
> > int vnet_hdr_len = 0;
> >
> > + if (vlan_tx_tag_present(skb)) {
> > + skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
> > + if (unlikely(!skb))
> > + return -ENOMEM;
> > +
> > + skb->vlan_tci = 0;
> > + }
> > +
> > if (q->flags & IFF_VNET_HDR) {
> > struct virtio_net_hdr vnet_hdr;
> > vnet_hdr_len = q->vnet_hdr_sz;
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/2] vhost-net: fix handle_rx buffer size
2012-04-25 17:01 ` [PATCH v3 1/2] vhost-net: fix handle_rx buffer size Basil Gor
2012-04-26 5:30 ` Eric W. Biederman
@ 2012-05-03 13:16 ` Michael S. Tsirkin
2012-05-03 14:43 ` Basil Gor
1 sibling, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2012-05-03 13:16 UTC (permalink / raw)
To: Basil Gor; +Cc: Eric W. Biederman, David S. Miller, netdev
On Wed, Apr 25, 2012 at 09:01:15PM +0400, Basil Gor wrote:
> Take vlan header length into account, when vlan id is stored as
> vlan_tci. Otherwise tagged packets comming from macvtap will be
> truncated.
>
> Signed-off-by: Basil Gor <basil.gor@gmail.com>
So I'm inclined to apply these two patches, we
this doesn't fix packet socket backend
but could be fixed by a follow-up patch.
> ---
> drivers/vhost/net.c | 7 ++++++-
> 1 files changed, 6 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> index 1f21d2a..5c17010 100644
> --- a/drivers/vhost/net.c
> +++ b/drivers/vhost/net.c
> @@ -24,6 +24,7 @@
> #include <linux/if_arp.h>
> #include <linux/if_tun.h>
> #include <linux/if_macvlan.h>
> +#include <linux/if_vlan.h>
>
> #include <net/sock.h>
>
> @@ -283,8 +284,12 @@ static int peek_head_len(struct sock *sk)
>
> spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
> head = skb_peek(&sk->sk_receive_queue);
> - if (likely(head))
> + if (likely(head)) {
> len = head->len;
> + if (vlan_tx_tag_present(head))
> + len += VLAN_HLEN;
> + }
> +
> spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
> return len;
> }
> --
> 1.7.6.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-05-03 13:07 ` Michael S. Tsirkin
@ 2012-05-03 13:37 ` Eric W. Biederman
2012-05-03 14:31 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Eric W. Biederman @ 2012-05-03 13:37 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Basil Gor, David S. Miller, netdev
"Michael S. Tsirkin" <mst@redhat.com> writes:
> On Wed, Apr 25, 2012 at 10:31:25PM -0700, Eric W. Biederman wrote:
>> Basil Gor <basil.gor@gmail.com> writes:
>>
>> > Vlan tag is restored during buffer transmit to a network device (bridge
>> > port) in bridging code in case of tun/tap driver. In case of macvtap it
>> > has to be done explicitly. Otherwise vlan_tci is ignored and user always
>> > gets untagged packets.
>>
>> We could quibble about efficiencies but this looks good except for
>> macvtap_recvmsg which isn't setting the auxdata for the vlan header.
>>
>> Eric
>
> Right. I'm guessing we need to support old userspace
> so if there's auxdata, put vlan there but if not,
> put the vlan in the packet like this patch does.
This patch isn't horrible.
Still why copy the skb when you can just split the copy to userspace
into a couple of pieces?
We don't need to change the skb and changing the skb looks like
it is likely to confuse things and cause bugs because we are
not working with a consistent model of how vlan information
is encoded.
Still something needs to happen and this works in more cases even if it
isn't perfect.
Eric
>> > Signed-off-by: Basil Gor <basil.gor@gmail.com>
>> > ---
>> > drivers/net/macvtap.c | 11 ++++++++++-
>> > 1 files changed, 10 insertions(+), 1 deletions(-)
>> >
>> > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
>> > index 0427c65..28d2678 100644
>> > --- a/drivers/net/macvtap.c
>> > +++ b/drivers/net/macvtap.c
>> > @@ -1,6 +1,7 @@
>> > #include <linux/etherdevice.h>
>> > #include <linux/if_macvlan.h>
>> > #include <linux/interrupt.h>
>> > +#include <linux/if_vlan.h>
>> > #include <linux/nsproxy.h>
>> > #include <linux/compat.h>
>> > #include <linux/if_tun.h>
>> > @@ -753,13 +754,21 @@ static ssize_t macvtap_aio_write(struct kiocb *iocb, const struct iovec *iv,
>> >
>> > /* Put packet to the user space buffer */
>> > static ssize_t macvtap_put_user(struct macvtap_queue *q,
>> > - const struct sk_buff *skb,
>> > + struct sk_buff *skb,
>> > const struct iovec *iv, int len)
>> > {
>> > struct macvlan_dev *vlan;
>> > int ret;
>> > int vnet_hdr_len = 0;
>> >
>> > + if (vlan_tx_tag_present(skb)) {
>> > + skb = __vlan_put_tag(skb, vlan_tx_tag_get(skb));
>> > + if (unlikely(!skb))
>> > + return -ENOMEM;
>> > +
>> > + skb->vlan_tci = 0;
>> > + }
>> > +
>> > if (q->flags & IFF_VNET_HDR) {
>> > struct virtio_net_hdr vnet_hdr;
>> > vnet_hdr_len = q->vnet_hdr_sz;
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-05-03 13:37 ` Eric W. Biederman
@ 2012-05-03 14:31 ` Michael S. Tsirkin
2012-05-03 15:22 ` Basil Gor
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2012-05-03 14:31 UTC (permalink / raw)
To: Eric W. Biederman; +Cc: Basil Gor, David S. Miller, netdev
On Thu, May 03, 2012 at 06:37:46AM -0700, Eric W. Biederman wrote:
> "Michael S. Tsirkin" <mst@redhat.com> writes:
>
> > On Wed, Apr 25, 2012 at 10:31:25PM -0700, Eric W. Biederman wrote:
> >> Basil Gor <basil.gor@gmail.com> writes:
> >>
> >> > Vlan tag is restored during buffer transmit to a network device (bridge
> >> > port) in bridging code in case of tun/tap driver. In case of macvtap it
> >> > has to be done explicitly. Otherwise vlan_tci is ignored and user always
> >> > gets untagged packets.
> >>
> >> We could quibble about efficiencies but this looks good except for
> >> macvtap_recvmsg which isn't setting the auxdata for the vlan header.
> >>
> >> Eric
> >
> > Right. I'm guessing we need to support old userspace
> > so if there's auxdata, put vlan there but if not,
> > put the vlan in the packet like this patch does.
>
> This patch isn't horrible.
>
> Still why copy the skb when you can just split the copy to userspace
> into a couple of pieces?
>
> We don't need to change the skb and changing the skb looks like
> it is likely to confuse things and cause bugs because we are
> not working with a consistent model of how vlan information
> is encoded.
>
> Still something needs to happen and this works in more cases even if it
> isn't perfect.
>
> Eric
Absolutely. And it's easier than I thought.
So we can do something like the below (warning: compiled only).
Basil - want to take a look?
My only concern if we put this logic in an out of way
driver like macvtap will people remember to update it?
Maybe better to update skb_copy_datagram_const_iovec which is in core?
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 0427c65..5a1724c 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1,5 +1,6 @@
#include <linux/etherdevice.h>
#include <linux/if_macvlan.h>
+#include <linux/if_vlan.h>
#include <linux/interrupt.h>
#include <linux/nsproxy.h>
#include <linux/compat.h>
@@ -759,6 +760,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
struct macvlan_dev *vlan;
int ret;
int vnet_hdr_len = 0;
+ int vlan_offset = 0;
if (q->flags & IFF_VNET_HDR) {
struct virtio_net_hdr vnet_hdr;
@@ -776,8 +778,29 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
len = min_t(int, skb->len, len);
- ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len, len);
+ if (vlan_tx_tag_present(skb)) {
+ struct {
+ __be16 h_vlan_proto;
+ __be16 h_vlan_TCI;
+ } veth;
+ veth.h_vlan_proto = htons(ETH_P_8021Q);
+ veth.h_vlan_TCI = vlan_tx_tag_get(skb);
+
+ vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
+ ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len,
+ vlan_offset);
+ if (ret)
+ goto done;
+ ret = memcpy_toiovecend(iv, (unsigned char *)&veth, vlan_offset,
+ sizeof veth);
+ if (ret)
+ goto done;
+ vlan_offset += sizeof veth;
+ }
+ ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, vnet_hdr_len,
+ len);
+done:
rcu_read_lock_bh();
vlan = rcu_dereference_bh(q->vlan);
if (vlan)
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v3 1/2] vhost-net: fix handle_rx buffer size
2012-05-03 13:16 ` Michael S. Tsirkin
@ 2012-05-03 14:43 ` Basil Gor
0 siblings, 0 replies; 20+ messages in thread
From: Basil Gor @ 2012-05-03 14:43 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Eric W. Biederman, David S. Miller, netdev
[-- Attachment #1: Type: text/plain, Size: 1949 bytes --]
On Thu, May 03, 2012 at 04:16:24PM +0300, Michael S. Tsirkin wrote:
> On Wed, Apr 25, 2012 at 09:01:15PM +0400, Basil Gor wrote:
> > Take vlan header length into account, when vlan id is stored as
> > vlan_tci. Otherwise tagged packets comming from macvtap will be
> > truncated.
> >
> > Signed-off-by: Basil Gor <basil.gor@gmail.com>
>
> So I'm inclined to apply these two patches, we
> this doesn't fix packet socket backend
> but could be fixed by a follow-up patch.
>
That's what I'm going to do.
While testing packet socket I noticed that tcpdump doesn't work
on macvtap0, since there is no dev_hard_start_xmit like in
tun/tap0 case I think (lines 120-144 in trace attached). And I
have no clear picture how to fix this gracefully.
Also I think there are issues with macvtap on top of bonding, that
I'm also going to verify and debug.
> > ---
> > drivers/vhost/net.c | 7 ++++++-
> > 1 files changed, 6 insertions(+), 1 deletions(-)
> >
> > diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
> > index 1f21d2a..5c17010 100644
> > --- a/drivers/vhost/net.c
> > +++ b/drivers/vhost/net.c
> > @@ -24,6 +24,7 @@
> > #include <linux/if_arp.h>
> > #include <linux/if_tun.h>
> > #include <linux/if_macvlan.h>
> > +#include <linux/if_vlan.h>
> >
> > #include <net/sock.h>
> >
> > @@ -283,8 +284,12 @@ static int peek_head_len(struct sock *sk)
> >
> > spin_lock_irqsave(&sk->sk_receive_queue.lock, flags);
> > head = skb_peek(&sk->sk_receive_queue);
> > - if (likely(head))
> > + if (likely(head)) {
> > len = head->len;
> > + if (vlan_tx_tag_present(head))
> > + len += VLAN_HLEN;
> > + }
> > +
> > spin_unlock_irqrestore(&sk->sk_receive_queue.lock, flags);
> > return len;
> > }
> > --
> > 1.7.6.5
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
[-- Attachment #2: tap_trace.log --]
[-- Type: text/plain, Size: 21099 bytes --]
single arp packet receive
br0
^ \
| +->tap0 <-- tapread + tcpdump
->wlan0-+->macvlan0
\->macvtap0 <-- macvtapread + tcpdump
001 0 irq/28-b43(25823): -> netpoll_trap()
002 0xffffffff815209c0 : netpoll_trap+0x0/0x20 [kernel]
003 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
004 0xffffffffa0260a8a [mac80211]
005 0xffffffffa0260ac0 [mac80211]
006 0xffffffffa0318b02 [b43]
007 0xffffffff810e4b60 : irq_thread_fn+0x0/0x50 [kernel]
008 0xffffffffa0313330 [b43]
009 0xffffffffa02f71d6 [b43]
010 0xffffffffa02f7486 [b43]
011 0xffffffff810e4b89 : irq_thread_fn+0x29/0x50 [kernel]
012 0xffffffff810e4ae0 : irq_thread+0x1a0/0x220 [kernel]
013 0xffffffff810e4940 : irq_thread+0x0/0x220 [kernel]
014 0xffffffff81079da3 : kthread+0x93/0xa0 [kernel]
015 0xffffffff81620f24 : kernel_thread_helper+0x4/0x10 [kernel]
016 0xffffffff81079d10 : kthread+0x0/0xa0 [kernel]
017 0xffffffff81620f20 : kernel_thread_helper+0x0/0x10 [kernel]
018 480 irq/28-b43(25823): <- netpoll_trap(): return=0x0
019 0 irq/28-b43(25823): -> netif_receive_skb(skb=0xffff8800a8183300)
020 skb_dump:dev:wlan0 proto:8100 len:32 vlan_tci:{prio:0 cfi:0 vid:0}
021 0xffffffff8150b360 : netif_receive_skb+0x0/0x90 [kernel]
022 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
023 0xffffffffa02584d6 [mac80211]
024 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
025 0xffffffffa02598f6 [mac80211]
026 0xffffffffa025a56e [mac80211]
027 0xffffffffa0312fd4 [b43]
028 0xffffffffa0318d8a [b43]
029 0xffffffff810e4b60 : irq_thread_fn+0x0/0x50 [kernel]
030 0xffffffffa02f71fd [b43]
031 0xffffffffa02f7486 [b43]
032 0xffffffff810e4b89 : irq_thread_fn+0x29/0x50 [kernel]
033 0xffffffff810e4ae0 : irq_thread+0x1a0/0x220 [kernel]
034 0xffffffff810e4940 : irq_thread+0x0/0x220 [kernel]
035 0xffffffff81079da3 : kthread+0x93/0xa0 [kernel]
036 0xffffffff81620f24 : kernel_thread_helper+0x4/0x10 [kernel]
037 0xffffffff81079d10 : kthread+0x0/0xa0 [kernel]
038 0xffffffff81620f20 : kernel_thread_helper+0x0/0x10 [kernel]
039 551 irq/28-b43(25823): -> __netif_receive_skb(skb=0xffff8800a8183300 ptype=? pt_prev=? rx_handler=? orig_dev=? null_or_dev=? deliver_exact=? ret=? type=?)
040 skb_dump:dev:wlan0 proto:8100 len:32 vlan_tci:{prio:0 cfi:0 vid:0}
041 587 irq/28-b43(25823): -> vlan_untag(skb=0xffff8800a8183300 vhdr=? vlan_tci=?)
042 skb_dump:dev:wlan0 proto:8100 len:32 vlan_tci:{prio:0 cfi:0 vid:0}
043 616 irq/28-b43(25823): <- vlan_untag(): return=0xffff8800a8183300
044 639 irq/28-b43(25823): -> packet_rcv(skb=0xffff8800a8183300 dev=0xffff8800aa24b000 pt=0xffff8800276f5cc0 orig_dev=0xffff8800aa24b000 sk=? sll=? po=? skb_head=? skb_len=? snaplen=? res=?)
045 skb_dump:dev:wlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
046 dev:name:wlan0
047 orig_dev:name:wlan0
048 690 irq/28-b43(25823): <- packet_rcv(): return=0x0
049 705 irq/28-b43(25823): -> vlan_do_receive(skbp=0xffff880045c1fa08 last_handler=0x0 skb=? vlan_id=? vlan_dev=0xffff8800aa24b000 rx_stats=?)
050 skbp_dump:dev:wlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
051 738 irq/28-b43(25823): <- vlan_do_receive(): return=0x0
052 755 irq/28-b43(25823): -> macvlan_handle_frame(pskb=0xffff880045c1fa08 port=? skb=? eth=? vlan=? src=0xffff8800aa24b000 dev=? len=? ret=?)
053 pskb_dump:dev:wlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
054 794 irq/28-b43(25823): -> macvlan_broadcast(skb=0xffff8800a8183300 port=0xffff8800a9466000 src=0x0 mode=0xf eth=0xffff880045c1f9f0 vlan=? n=? nskb=? i=? err=0x0)
055 skb_dump:dev:wlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
056 843 irq/28-b43(25823): -> netif_rx(skb=0xffff8800a7012e00 ret=0xffffffffa7012e00)
057 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
058 877 irq/28-b43(25823): -> enqueue_to_backlog(skb=0xffff8800a7012e00 cpu=0x0 qtail=0xffff880045c1f930 sd=? flags=?)
059 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
060 930 irq/28-b43(25823): <- enqueue_to_backlog(): return=0x0
061 942 irq/28-b43(25823): <- netif_rx(): return=0x0
062 962 irq/28-b43(25823): -> macvtap_receive(skb=0xffff8800136a7200)
063 skb_dump:dev:macvtap0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
064 992 irq/28-b43(25823): -> macvtap_forward(dev=0xffff8800a9467000 skb=0xffff8800136a7200 q=0x0)
065 skb_dump:dev:macvtap0 proto:0806 len:42 vlan_tci:{prio:0 cfi:4096 vid:50}
066 dev:name:macvtap0
067 1030 irq/28-b43(25823): -> __skb_get_rxhash(skb=0xffff8800136a7200 keys={...} hash=?)
068 skb_dump:dev:macvtap0 proto:0806 len:42 vlan_tci:{prio:0 cfi:4096 vid:50}
069 1058 irq/28-b43(25823): <- __skb_get_rxhash():
070 1080 irq/28-b43(25823): <- macvtap_forward(): return=0x0
071 1092 irq/28-b43(25823): <- macvtap_receive(): return=0x0
072 1105 irq/28-b43(25823): <- macvlan_broadcast():
073 1117 irq/28-b43(25823): <- macvlan_handle_frame(): return=0x3
074 1149 irq/28-b43(25823): <- __netif_receive_skb(): return=0x0
075 1161 irq/28-b43(25823): <- netif_receive_skb(): return=0x0
076 0 irq/28-b43(25823): -> net_rx_action(h=0xffffffff81c04098 sd=? time_limit=? budget=? have=0x3)
077 0xffffffff8150bc00 : net_rx_action+0x0/0x270 [kernel]
078 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
079 0xffffffff8162101c : call_softirq+0x1c/0x30 [kernel]
080 0xffffffff81016455 : do_softirq+0x65/0xa0 [kernel]
081 0xffffffff8105e654 : local_bh_enable+0x94/0xa0 [kernel]
082 0xffffffffa0312fd9 [b43]
083 0xffffffffa0318d8a [b43]
084 0xffffffff810e4b60 : irq_thread_fn+0x0/0x50 [kernel]
085 0xffffffffa02f71fd [b43]
086 0xffffffffa02f7486 [b43]
087 0xffffffff810e4b89 : irq_thread_fn+0x29/0x50 [kernel]
088 0xffffffff810e4ae0 : irq_thread+0x1a0/0x220 [kernel]
089 0xffffffff810e4940 : irq_thread+0x0/0x220 [kernel]
090 0xffffffff81079da3 : kthread+0x93/0xa0 [kernel]
091 0xffffffff81620f24 : kernel_thread_helper+0x4/0x10 [kernel]
092 0xffffffff81079d10 : kthread+0x0/0xa0 [kernel]
093 0xffffffff81620f20 : kernel_thread_helper+0x0/0x10 [kernel]
094 506 irq/28-b43(25823): -> process_backlog(napi=0xffff8800afc14118 quota=0x40 work=? sd=?)
095 530 irq/28-b43(25823): -> __netif_receive_skb(skb=0xffff8800a7012e00 ptype=? pt_prev=? rx_handler=? orig_dev=? null_or_dev=? deliver_exact=? ret=? type=?)
096 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
097 561 irq/28-b43(25823): -> vlan_do_receive(skbp=0xffff8800afc03e30 last_handler=0x0 skb=? vlan_id=? vlan_dev=0xffff880088528000 rx_stats=?)
098 skbp_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
099 591 irq/28-b43(25823): <- vlan_do_receive(): return=0x0
100 626 irq/28-b43(25823): -> br_flood_forward(br=0xffff8800a995e780 skb=0xffff8800a7012e00 skb2=0xffff8800a7012e00)
101 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
102 667 irq/28-b43(25823): -> br_flood(br=0xffff8800a995e780 skb=0xffff8800a7012e00 skb0=0xffff8800a7012e00 __packet_hook=0xffffffffa040f380 p=? prev=?)
103 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
104 711 irq/28-b43(25823): -> maybe_deliver(prev=0x0 p=0xffff8800a85bec00 skb=0xffff8800a7012e00 __packet_hook=0xffffffffa040f380 err=?)
105 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
106 766 irq/28-b43(25823): <- maybe_deliver(): return=0xffff8800a85bec00
107 784 irq/28-b43(25823): -> maybe_deliver(prev=0xffff8800a85bec00 p=0xffff8800a7972800 skb=0xffff8800a7012e00 __packet_hook=0xffffffffa040f380 err=?)
108 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
109 820 irq/28-b43(25823): <- maybe_deliver(): return=0xffff8800a85bec00
110 839 irq/28-b43(25823): -> deliver_clone(prev=0xffff8800a85bec00 skb=0xffff8800a7012e00 __packet_hook=0xffffffffa040f380 dev=?)
111 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
112 879 irq/28-b43(25823): -> __br_forward(to=0xffff8800a85bec00 skb=0xffff8800a8183300 indev=?)
113 skb_dump:dev:macvlan0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
114 914 irq/28-b43(25823): -> br_forward_finish(skb=0xffff8800a8183300)
115 skb_dump:dev:tap0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
116 943 irq/28-b43(25823): -> br_dev_queue_push_xmit(skb=0xffff8800a8183300)
117 skb_dump:dev:tap0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
118 973 irq/28-b43(25823): -> dev_queue_xmit(skb=0xffff8800a8183300 dev=? txq=0xffffffffa040f380 q=? rc=?)
119 skb_dump:dev:tap0 proto:0806 len:42 vlan_tci:{prio:0 cfi:4096 vid:50}
120 1010 irq/28-b43(25823): -> dev_hard_start_xmit(skb=0xffff8800a8183300 dev=0xffff8800a9866000 txq=0xffff88008864fa00 ops=? rc=? skb_len=?)
121 skb_dump:dev:tap0 proto:0806 len:42 vlan_tci:{prio:0 cfi:4096 vid:50}
122 dev:name:tap0
123 1056 irq/28-b43(25823): -> tpacket_rcv(skb=0xffff8800a9849600 dev=0xffff8800a9866000 pt=0xffff8800662994c0 orig_dev=0xffff8800a9866000 sk=? po=? sll=? h={...} skb_head=? skb_len=? snaplen=? res=? status=? macoff=? netoff=? hdrlen=? copy_skb=? tv={...} ts={...} shhwtstamps=?)
124 skb_dump:dev:tap0 proto:0806 len:42 vlan_tci:{prio:0 cfi:4096 vid:50}
125 dev:name:tap0
126 orig_dev:name:tap0
127 1109 irq/28-b43(25823): -> packet_lookup_frame(po=0xffff880066299000 rb=0xffff8800662992a0 position=0x6 status=0x0 pg_vec_pos=? frame_offset=? h={...})
128 1137 irq/28-b43(25823): -> __packet_get_status(po=0xffff880066299000 frame=0xffff880019b60000 h={...})
129 1159 irq/28-b43(25823): <- __packet_get_status(): return=0x0
130 1171 irq/28-b43(25823): <- packet_lookup_frame(): return=0xffff880019b60000
131 1192 irq/28-b43(25823): -> __packet_set_status(po=0xffff880066299000 frame=0xffff880019b60000 status=0x11 h={...})
132 1217 irq/28-b43(25823): <- __packet_set_status():
133 1240 irq/28-b43(25823): <- tpacket_rcv(): return=0x0
134 1255 irq/28-b43(25823): -> netif_skb_features(skb=0xffff8800a8183300 protocol=? features=?)
135 skb_dump:dev:tap0 proto:0806 len:42 vlan_tci:{prio:0 cfi:4096 vid:50}
136 1286 irq/28-b43(25823): -> harmonize_features(skb=0xffff8800a8183300 protocol=0x608 features=0x0)
137 skb_dump:dev:tap0 proto:0806 len:42 vlan_tci:{prio:0 cfi:4096 vid:50}
138 1314 irq/28-b43(25823): <- harmonize_features(): return=0x0
139 1326 irq/28-b43(25823): <- netif_skb_features(): return=0x0
140 1348 irq/28-b43(25823): -> tun_net_xmit(skb=0xffff8800a8183300 dev=0xffff8800a9866000 tun=?)
141 skb_dump:dev:tap0 proto:8100 len:46 vlan_tci:{prio:0 cfi:0 vid:0}
142 dev:name:tap0
143 1388 irq/28-b43(25823): <- tun_net_xmit(): return=0x0
144 1401 irq/28-b43(25823): <- dev_hard_start_xmit(): return=0x0
145 1414 irq/28-b43(25823): <- dev_queue_xmit(): return=0x0
146 1426 irq/28-b43(25823): <- br_dev_queue_push_xmit(): return=0x0
147 1438 irq/28-b43(25823): <- br_forward_finish(): return=0x0
148 1450 irq/28-b43(25823): <- __br_forward():
149 1461 irq/28-b43(25823): <- deliver_clone(): return=0x0
150 1473 irq/28-b43(25823): <- br_flood():
151 1484 irq/28-b43(25823): <- br_flood_forward():
152 1499 irq/28-b43(25823): -> netif_receive_skb(skb=0xffff8800a7012e00)
153 skb_dump:dev:br0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
154 1524 irq/28-b43(25823): -> __netif_receive_skb(skb=0xffff8800a7012e00 ptype=? pt_prev=? rx_handler=? orig_dev=? null_or_dev=? deliver_exact=? ret=? type=?)
155 skb_dump:dev:br0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
156 1553 irq/28-b43(25823): -> vlan_do_receive(skbp=0xffff8800afc03d20 last_handler=0x1 skb=? vlan_id=? vlan_dev=0xffff8800a995e000 rx_stats=?)
157 skbp_dump:dev:br0 proto:0806 len:28 vlan_tci:{prio:0 cfi:4096 vid:50}
158 1583 irq/28-b43(25823): <- vlan_do_receive(): return=0x0
159 1596 irq/28-b43(25823): <- __netif_receive_skb(): return=0x0
160 1608 irq/28-b43(25823): <- netif_receive_skb(): return=0x0
161 1619 irq/28-b43(25823): <- __netif_receive_skb(): return=0x1
162 1631 irq/28-b43(25823): <- process_backlog(): return=0x1
163 1646 irq/28-b43(25823): -> net_rps_action_and_irq_enable(sd=0xffff8800afc14040 remsd=?)
164 1666 irq/28-b43(25823): <- net_rps_action_and_irq_enable():
165 1678 irq/28-b43(25823): <- net_rx_action():
166 0 macvtapread(13215): -> macvtap_poll(file=0xffff88006bb35c00 wait=0xffff8800a86a9ab8 q=? mask=?)
167 0xffffffffa03f6000 : macvtap_poll+0x0/0xa0 [macvtap]
168 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
169 0xffffffff8119561c : core_sys_select+0x1ec/0x370 [kernel]
170 0xffffffff81195860 : sys_select+0xc0/0x100 [kernel]
171 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
172 206 macvtapread(13215): <- macvtap_poll(): return=0x145
173 0 macvtapread(13215): -> macvtap_aio_read(iocb=0xffff8800a86a9e00 iv=0xffff8800a86a9ed8 count=0x1 pos=0x0 file=? q=? len=? ret=?)
174 0xffffffffa03f65e0 : macvtap_aio_read+0x0/0x80 [macvtap]
175 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
176 0xffffffff81182305 : vfs_read+0x165/0x180 [kernel]
177 0xffffffff8118236a : sys_read+0x4a/0x90 [kernel]
178 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
179 193 macvtapread(13215): -> macvtap_do_read(q=0xffff88001a62f800 iocb=0xffff8800a86a9e00 iv=0xffff8800a86a9ed8 len=0x5de noblock=0x0 wait={...} skb=? ret=?)
180 233 macvtapread(13215): <- macvtap_do_read(): return=0x34
181 246 macvtapread(13215): <- macvtap_aio_read(): return=0x34
182 0 macvtapread(13215): -> macvtap_poll(file=0xffff88006bb35c00 wait=0xffff8800a86a9ab8 q=? mask=?)
183 0xffffffffa03f6000 : macvtap_poll+0x0/0xa0 [macvtap]
184 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
185 0xffffffff8119561c : core_sys_select+0x1ec/0x370 [kernel]
186 0xffffffff81195860 : sys_select+0xc0/0x100 [kernel]
187 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
188 190 macvtapread(13215): <- macvtap_poll(): return=0x104
189 0 tcpdump(tap0): -> packet_poll(file=0xffff8800885f0e00 sock=0xffff88001904a000 wait=0xffff880017601b88 sk=? po=? mask=?)
190 0xffffffff815e51f0 : packet_poll+0x0/0x130 [kernel]
191 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
192 0xffffffff81195ceb : do_sys_poll+0x25b/0x4c0 [kernel]
193 0xffffffff8119602b : sys_poll+0x6b/0x100 [kernel]
194 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
195 223 tcpdump(tap0): -> packet_lookup_frame(po=0xffff880066299000 rb=0xffff8800662992a0 position=0x6 status=0x0 pg_vec_pos=? frame_offset=? h={...})
196 249 tcpdump(tap0): -> __packet_get_status(po=0xffff880066299000 frame=0xffff880019b60000 h={...})
197 269 tcpdump(tap0): <- __packet_get_status(): return=0x11
198 282 tcpdump(tap0): <- packet_lookup_frame(): return=0x0
199 294 tcpdump(tap0): <- packet_poll(): return=0x345
200 0 tcpdump(tap0): -> packet_poll(file=0xffff8800885f0e00 sock=0xffff88001904a000 wait=0xffff880017601b88 sk=? po=? mask=?)
201 0xffffffff815e51f0 : packet_poll+0x0/0x130 [kernel]
202 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
203 0xffffffff81195ceb : do_sys_poll+0x25b/0x4c0 [kernel]
204 0xffffffff8119602b : sys_poll+0x6b/0x100 [kernel]
205 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
206 204 tcpdump(tap0): -> packet_lookup_frame(po=0xffff880066299000 rb=0xffff8800662992a0 position=0x6 status=0x0 pg_vec_pos=? frame_offset=? h={...})
207 229 tcpdump(tap0): -> __packet_get_status(po=0xffff880066299000 frame=0xffff880019b60000 h={...})
208 250 tcpdump(tap0): <- __packet_get_status(): return=0x0
209 262 tcpdump(tap0): <- packet_lookup_frame(): return=0xffff880019b60000
210 277 tcpdump(tap0): <- packet_poll(): return=0x304
211 0 tapread(13310): -> tun_chr_poll(file=0xffff880002451f00 wait=0xffff8800a7435ab8 tfile=? tun=? sk=? mask=?)
212 0xffffffffa042f4a0 : tun_chr_poll+0x0/0x110 [tun]
213 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
214 0xffffffff8119561c : core_sys_select+0x1ec/0x370 [kernel]
215 0xffffffff81195860 : sys_select+0xc0/0x100 [kernel]
216 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
217 196 tapread(13310): -> __tun_get(tfile=0xffff88000456afc0 tun=?)
218 215 tapread(13310): <- __tun_get(): return=0xffff8800a9866780
219 234 tapread(13310): -> tun_put(tun=0xffff8800a9866780 tfile=?)
220 252 tapread(13310): <- tun_put():
221 262 tapread(13310): <- tun_chr_poll(): return=0x145
222 0 tapread(13310): -> tun_chr_aio_read(iocb=0xffff8800a7435e00 iv=0xffff8800a7435ed8 count=0x1 pos=0x0 file=? tfile=? tun=? len=? ret=?)
223 0xffffffffa042f6a0 : tun_chr_aio_read+0x0/0xe0 [tun]
224 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
225 0xffffffff81182250 : vfs_read+0xb0/0x180 [kernel]
226 0xffffffff8118236a : sys_read+0x4a/0x90 [kernel]
227 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
228 213 tapread(13310): -> __tun_get(tfile=0xffff88000456afc0 tun=?)
229 230 tapread(13310): <- __tun_get(): return=0xffff8800a9866780
230 248 tapread(13310): -> tun_do_read(tun=0xffff8800a9866780 iocb=0xffff8800a7435e00 iv=0xffff8800a7435ed8 len=0x5de noblock=0x0 wait={...} skb=? ret=?)
231 284 tapread(13310): -> netpoll_trap()
232 295 tapread(13310): <- netpoll_trap(): return=0x0
233 312 tapread(13310): <- tun_do_read(): return=0x2e
234 325 tapread(13310): -> tun_put(tun=0xffff8800a9866780 tfile=?)
235 340 tapread(13310): <- tun_put():
236 350 tapread(13310): <- tun_chr_aio_read(): return=0x2e
237 0 tapread(13310): -> tun_chr_poll(file=0xffff880002451f00 wait=0xffff8800a7435ab8 tfile=? tun=? sk=? mask=?)
238 0xffffffffa042f4a0 : tun_chr_poll+0x0/0x110 [tun]
239 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
240 0xffffffff8119561c : core_sys_select+0x1ec/0x370 [kernel]
241 0xffffffff81195860 : sys_select+0xc0/0x100 [kernel]
242 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
243 192 tapread(13310): -> __tun_get(tfile=0xffff88000456afc0 tun=?)
244 209 tapread(13310): <- __tun_get(): return=0xffff8800a9866780
245 225 tapread(13310): -> tun_put(tun=0xffff8800a9866780 tfile=?)
246 241 tapread(13310): <- tun_put():
247 251 tapread(13310): <- tun_chr_poll(): return=0x104
248 0 tcpdump(macvtap0): -> packet_poll(file=0xffff880027793a00 sock=0xffff8800a925c500 wait=0xffff8800a9a99b88 sk=? po=? mask=?)
249 0xffffffff815e51f0 : packet_poll+0x0/0x130 [kernel]
250 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
251 0xffffffff81195ceb : do_sys_poll+0x25b/0x4c0 [kernel]
252 0xffffffff8119602b : sys_poll+0x6b/0x100 [kernel]
253 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
254 242 tcpdump(macvtap0): -> packet_lookup_frame(po=0xffff88003dd2b000 rb=0xffff88003dd2b2a0 position=0x1e status=0x0 pg_vec_pos=? frame_offset=? h={...})
255 271 tcpdump(macvtap0): -> __packet_get_status(po=0xffff88003dd2b000 frame=0xffff88003c180000 h={...})
256 294 tcpdump(macvtap0): <- __packet_get_status(): return=0x0
257 308 tcpdump(macvtap0): <- packet_lookup_frame(): return=0xffff88003c180000
258 324 tcpdump(macvtap0): <- packet_poll(): return=0x304
259 0 tcpdump(macvtap0): -> packet_poll(file=0xffff880027793a00 sock=0xffff8800a925c500 wait=0xffff8800a9a99b88 sk=? po=? mask=?)
260 0xffffffff815e51f0 : packet_poll+0x0/0x130 [kernel]
261 0xffffffff8161a189 : kretprobe_trampoline+0x0/0x57 [kernel]
262 0xffffffff81195ceb : do_sys_poll+0x25b/0x4c0 [kernel]
263 0xffffffff8119602b : sys_poll+0x6b/0x100 [kernel]
264 0xffffffff8161fb69 : system_call_fastpath+0x16/0x1b [kernel]
265 193 tcpdump(macvtap0): -> packet_lookup_frame(po=0xffff88003dd2b000 rb=0xffff88003dd2b2a0 position=0x1e status=0x0 pg_vec_pos=? frame_offset=? h={...})
266 217 tcpdump(macvtap0): -> __packet_get_status(po=0xffff88003dd2b000 frame=0xffff88003c180000 h={...})
267 237 tcpdump(macvtap0): <- __packet_get_status(): return=0x0
268 248 tcpdump(macvtap0): <- packet_lookup_frame(): return=0xffff88003c180000
269 264 tcpdump(macvtap0): <- packet_poll(): return=0x304
[-- Attachment #3: tun-trace.stp --]
[-- Type: text/plain, Size: 2747 bytes --]
%{
#include <linux/skbuff.h>
#include <linux/if_vlan.h>
%}
function skb_dump:string(skb_ptr:long) %{
struct sk_buff *skb = (void*)THIS->skb_ptr;
if (!skb) {
snprintf(THIS->__retvalue, MAXSTRINGLEN, "skb=NULL");
return;
}
snprintf(THIS->__retvalue, MAXSTRINGLEN, "dev:%s proto:%04x len:%u vlan_tci:{prio:%d cfi:%d vid:%d}",
skb->dev ? skb->dev->name : "NULL",
htons(skb->protocol),
skb->len,
skb->vlan_tci & VLAN_PRIO_MASK,
skb->vlan_tci & VLAN_CFI_MASK,
skb->vlan_tci & VLAN_VID_MASK);
%}
function dev_dump:string(dev_ptr:long) %{
struct net_device *dev = (void*)THIS->dev_ptr;
if (!dev) {
snprintf(THIS->__retvalue, MAXSTRINGLEN, "dev=NULL");
return;
}
snprintf(THIS->__retvalue, MAXSTRINGLEN, "name:%s", dev->name);
%}
function deref_unsafe:long(pskb_ptr:long) %{
struct sk_buff **pskb = (void*)THIS->pskb_ptr;
if (!pskb || !(*pskb))
THIS->__retvalue = 0;
else
THIS->__retvalue = (long)*pskb;
%}
probe begin {
printf ("started\n")
}
global nesting = 0
probe probepoints.call = module("tun").function("*@drivers/net/tun.c").call,
module("macvlan").function("*@drivers/net/macvlan.c").call,
module("macvtap").function("*@drivers/net/macvtap.c").call,
module("bridge").function("*@net/bridge/br_device.c").call,
module("bridge").function("*@net/bridge/br_forward.c").call,
kernel.function("vlan_*").call,
kernel.function("__vlan_*").call,
kernel.function("*@net/packet/af_packet.c").call,
kernel.function("*@net/core/netpoll.c").call,
kernel.function("*@net/core/dev.c").call {
nesting++
}
probe probepoints.return = module("tun").function("*@drivers/net/tun.c").return,
module("macvlan").function("*@drivers/net/macvlan.c").return,
module("macvtap").function("*@drivers/net/macvtap.c").return,
module("bridge").function("*@net/bridge/br_device.c").return,
module("bridge").function("*@net/bridge/br_forward.c").return,
kernel.function("vlan_*").return,
kernel.function("__vlan_*").return,
kernel.function("*@net/packet/af_packet.c").return,
kernel.function("*@net/core/netpoll.c").return,
kernel.function("*@net/core/dev.c").return {
nesting--
}
probe probepoints.call {
printf ("%s -> %s(%s)\n", thread_indent(1), probefunc(), $$vars)
if (@defined($skb))
printf("\tskb_dump:%s\n", skb_dump($skb))
if (@defined($pskb))
printf("\tpskb_dump:%s\n", skb_dump(deref_unsafe($pskb)))
if (@defined($skbp))
printf("\tskbp_dump:%s\n", skb_dump(deref_unsafe($skbp)))
if (@defined($dev))
printf("\tdev:%s\n", dev_dump($dev))
if (@defined($orig_dev))
printf("\torig_dev:%s\n", dev_dump($dev))
if (nesting == 1)
print_stack(backtrace())
}
probe probepoints.return {
printf ("%s <- %s(): %s\n", thread_indent(-1), probefunc(), $$return)
}
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-05-03 14:31 ` Michael S. Tsirkin
@ 2012-05-03 15:22 ` Basil Gor
2012-05-03 23:11 ` Basil Gor
0 siblings, 1 reply; 20+ messages in thread
From: Basil Gor @ 2012-05-03 15:22 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Eric W. Biederman, David S. Miller, netdev
On Thu, May 03, 2012 at 05:31:10PM +0300, Michael S. Tsirkin wrote:
> On Thu, May 03, 2012 at 06:37:46AM -0700, Eric W. Biederman wrote:
> > "Michael S. Tsirkin" <mst@redhat.com> writes:
> >
> > > On Wed, Apr 25, 2012 at 10:31:25PM -0700, Eric W. Biederman wrote:
> > >> Basil Gor <basil.gor@gmail.com> writes:
> > >>
> > >> > Vlan tag is restored during buffer transmit to a network device (bridge
> > >> > port) in bridging code in case of tun/tap driver. In case of macvtap it
> > >> > has to be done explicitly. Otherwise vlan_tci is ignored and user always
> > >> > gets untagged packets.
> > >>
> > >> We could quibble about efficiencies but this looks good except for
> > >> macvtap_recvmsg which isn't setting the auxdata for the vlan header.
> > >>
> > >> Eric
> > >
> > > Right. I'm guessing we need to support old userspace
> > > so if there's auxdata, put vlan there but if not,
> > > put the vlan in the packet like this patch does.
> >
> > This patch isn't horrible.
> >
> > Still why copy the skb when you can just split the copy to userspace
> > into a couple of pieces?
> >
> > We don't need to change the skb and changing the skb looks like
> > it is likely to confuse things and cause bugs because we are
> > not working with a consistent model of how vlan information
> > is encoded.
> >
> > Still something needs to happen and this works in more cases even if it
> > isn't perfect.
> >
> > Eric
>
> Absolutely. And it's easier than I thought.
> So we can do something like the below (warning: compiled only).
> Basil - want to take a look?
Sure, I'll give it a try.
Thanks
Basil Gor
> My only concern if we put this logic in an out of way
> driver like macvtap will people remember to update it?
> Maybe better to update skb_copy_datagram_const_iovec which is in core?
>
>
> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 0427c65..5a1724c 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1,5 +1,6 @@
> #include <linux/etherdevice.h>
> #include <linux/if_macvlan.h>
> +#include <linux/if_vlan.h>
> #include <linux/interrupt.h>
> #include <linux/nsproxy.h>
> #include <linux/compat.h>
> @@ -759,6 +760,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
> struct macvlan_dev *vlan;
> int ret;
> int vnet_hdr_len = 0;
> + int vlan_offset = 0;
>
> if (q->flags & IFF_VNET_HDR) {
> struct virtio_net_hdr vnet_hdr;
> @@ -776,8 +778,29 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
>
> len = min_t(int, skb->len, len);
>
> - ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len, len);
> + if (vlan_tx_tag_present(skb)) {
> + struct {
> + __be16 h_vlan_proto;
> + __be16 h_vlan_TCI;
> + } veth;
> + veth.h_vlan_proto = htons(ETH_P_8021Q);
> + veth.h_vlan_TCI = vlan_tx_tag_get(skb);
> +
> + vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
> + ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len,
> + vlan_offset);
> + if (ret)
> + goto done;
> + ret = memcpy_toiovecend(iv, (unsigned char *)&veth, vlan_offset,
> + sizeof veth);
> + if (ret)
> + goto done;
> + vlan_offset += sizeof veth;
> + }
> + ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, vnet_hdr_len,
> + len);
>
> +done:
> rcu_read_lock_bh();
> vlan = rcu_dereference_bh(q->vlan);
> if (vlan)
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-05-03 15:22 ` Basil Gor
@ 2012-05-03 23:11 ` Basil Gor
2012-05-03 23:30 ` Michael S. Tsirkin
0 siblings, 1 reply; 20+ messages in thread
From: Basil Gor @ 2012-05-03 23:11 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Eric W. Biederman, David S. Miller, netdev
On Thu, May 03, 2012 at 07:22:25PM +0400, Basil Gor wrote:
> On Thu, May 03, 2012 at 05:31:10PM +0300, Michael S. Tsirkin wrote:
> > On Thu, May 03, 2012 at 06:37:46AM -0700, Eric W. Biederman wrote:
> > > "Michael S. Tsirkin" <mst@redhat.com> writes:
> > >
> > > > On Wed, Apr 25, 2012 at 10:31:25PM -0700, Eric W. Biederman wrote:
> > > >> Basil Gor <basil.gor@gmail.com> writes:
> > > >>
> > > >> > Vlan tag is restored during buffer transmit to a network device (bridge
> > > >> > port) in bridging code in case of tun/tap driver. In case of macvtap it
> > > >> > has to be done explicitly. Otherwise vlan_tci is ignored and user always
> > > >> > gets untagged packets.
> > > >>
> > > >> We could quibble about efficiencies but this looks good except for
> > > >> macvtap_recvmsg which isn't setting the auxdata for the vlan header.
> > > >>
> > > >> Eric
> > > >
> > > > Right. I'm guessing we need to support old userspace
> > > > so if there's auxdata, put vlan there but if not,
> > > > put the vlan in the packet like this patch does.
> > >
> > > This patch isn't horrible.
> > >
> > > Still why copy the skb when you can just split the copy to userspace
> > > into a couple of pieces?
> > >
> > > We don't need to change the skb and changing the skb looks like
> > > it is likely to confuse things and cause bugs because we are
> > > not working with a consistent model of how vlan information
> > > is encoded.
> > >
> > > Still something needs to happen and this works in more cases even if it
> > > isn't perfect.
> > >
> > > Eric
> >
> > Absolutely. And it's easier than I thought.
> > So we can do something like the below (warning: compiled only).
> > Basil - want to take a look?
>
> Sure, I'll give it a try.
> Thanks
>
> Basil Gor
>
> > My only concern if we put this logic in an out of way
> > driver like macvtap will people remember to update it?
> > Maybe better to update skb_copy_datagram_const_iovec which is in core?
> >
> >
> > Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
> >
> > diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> > index 0427c65..5a1724c 100644
> > --- a/drivers/net/macvtap.c
> > +++ b/drivers/net/macvtap.c
> > @@ -1,5 +1,6 @@
> > #include <linux/etherdevice.h>
> > #include <linux/if_macvlan.h>
> > +#include <linux/if_vlan.h>
> > #include <linux/interrupt.h>
> > #include <linux/nsproxy.h>
> > #include <linux/compat.h>
> > @@ -759,6 +760,7 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
> > struct macvlan_dev *vlan;
> > int ret;
> > int vnet_hdr_len = 0;
> > + int vlan_offset = 0;
> >
> > if (q->flags & IFF_VNET_HDR) {
> > struct virtio_net_hdr vnet_hdr;
> > @@ -776,8 +778,29 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
> >
> > len = min_t(int, skb->len, len);
skb->len + VLAN_HLEN if vlan tag present
> >
> > - ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len, len);
> > + if (vlan_tx_tag_present(skb)) {
> > + struct {
> > + __be16 h_vlan_proto;
> > + __be16 h_vlan_TCI;
> > + } veth;
> > + veth.h_vlan_proto = htons(ETH_P_8021Q);
> > + veth.h_vlan_TCI = vlan_tx_tag_get(skb);
htons(vlan_tx_tag_get(skb))
> > +
> > + vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
> > + ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len,
> > + vlan_offset);
do we need to count how much we copy carefully?
> > + if (ret)
> > + goto done;
> > + ret = memcpy_toiovecend(iv, (unsigned char *)&veth, vlan_offset,
> > + sizeof veth);
offset has to be vnet_hdr_len + vlan_offset
> > + if (ret)
> > + goto done;
> > + vlan_offset += sizeof veth;
offset to use in next copy: vnet_hdr_len + vlan_offset + sizeof(veth)
> > + }
> > + ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, vnet_hdr_len,
> > + len);
> >
> > +done:
> > rcu_read_lock_bh();
> > vlan = rcu_dereference_bh(q->vlan);
> > if (vlan)
Below is a bit reworked version I've tested.
---
drivers/net/macvtap.c | 43 ++++++++++++++++++++++++++++++++++++++-----
1 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 0427c65..cb8fd50 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1,5 +1,6 @@
#include <linux/etherdevice.h>
#include <linux/if_macvlan.h>
+#include <linux/if_vlan.h>
#include <linux/interrupt.h>
#include <linux/nsproxy.h>
#include <linux/compat.h>
@@ -759,6 +760,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
struct macvlan_dev *vlan;
int ret;
int vnet_hdr_len = 0;
+ int vlan_offset = 0;
+ int copied;
if (q->flags & IFF_VNET_HDR) {
struct virtio_net_hdr vnet_hdr;
@@ -773,18 +776,48 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if (memcpy_toiovecend(iv, (void *)&vnet_hdr, 0, sizeof(vnet_hdr)))
return -EFAULT;
}
+ copied = vnet_hdr_len;
+
+ if (!vlan_tx_tag_present(skb))
+ len = min_t(int, skb->len, len);
+ else {
+ int copy;
+ struct {
+ __be16 h_vlan_proto;
+ __be16 h_vlan_TCI;
+ } veth;
+ veth.h_vlan_proto = htons(ETH_P_8021Q);
+ veth.h_vlan_TCI = htons(vlan_tx_tag_get(skb));
+
+ vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
+ len = min_t(int, skb->len + VLAN_HLEN, len);
+
+ copy = min_t(int, vlan_offset, len);
+ ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);
+ len -= copy;
+ copied += copy;
+ if (ret || !len)
+ goto done;
+
+ copy = min_t(int, sizeof(veth), len);
+ ret = memcpy_toiovecend(iv, (void *)&veth, copied, copy);
+ len -= copy;
+ copied += copy;
+ if (ret || !len)
+ goto done;
+ }
- len = min_t(int, skb->len, len);
-
- ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len, len);
+ ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len);
+ copied += len;
+done:
rcu_read_lock_bh();
vlan = rcu_dereference_bh(q->vlan);
if (vlan)
- macvlan_count_rx(vlan, len, ret == 0, 0);
+ macvlan_count_rx(vlan, copied - vnet_hdr_len, ret == 0, 0);
rcu_read_unlock_bh();
- return ret ? ret : (len + vnet_hdr_len);
+ return ret ? ret : copied;
}
static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
--
1.7.6.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-05-03 23:11 ` Basil Gor
@ 2012-05-03 23:30 ` Michael S. Tsirkin
2012-05-03 23:47 ` Basil Gor
0 siblings, 1 reply; 20+ messages in thread
From: Michael S. Tsirkin @ 2012-05-03 23:30 UTC (permalink / raw)
To: Basil Gor; +Cc: Eric W. Biederman, David S. Miller, netdev
On Fri, May 04, 2012 at 03:11:52AM +0400, Basil Gor wrote:
> Below is a bit reworked version I've tested.
Will review thanks.
Can you add the signature according to the rules pls?
> ---
> drivers/net/macvtap.c | 43 ++++++++++++++++++++++++++++++++++++++-----
> 1 files changed, 38 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
> index 0427c65..cb8fd50 100644
> --- a/drivers/net/macvtap.c
> +++ b/drivers/net/macvtap.c
> @@ -1,5 +1,6 @@
> #include <linux/etherdevice.h>
> #include <linux/if_macvlan.h>
> +#include <linux/if_vlan.h>
> #include <linux/interrupt.h>
> #include <linux/nsproxy.h>
> #include <linux/compat.h>
> @@ -759,6 +760,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
> struct macvlan_dev *vlan;
> int ret;
> int vnet_hdr_len = 0;
> + int vlan_offset = 0;
> + int copied;
>
> if (q->flags & IFF_VNET_HDR) {
> struct virtio_net_hdr vnet_hdr;
> @@ -773,18 +776,48 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
> if (memcpy_toiovecend(iv, (void *)&vnet_hdr, 0, sizeof(vnet_hdr)))
> return -EFAULT;
> }
> + copied = vnet_hdr_len;
> +
> + if (!vlan_tx_tag_present(skb))
> + len = min_t(int, skb->len, len);
> + else {
> + int copy;
> + struct {
> + __be16 h_vlan_proto;
> + __be16 h_vlan_TCI;
> + } veth;
> + veth.h_vlan_proto = htons(ETH_P_8021Q);
> + veth.h_vlan_TCI = htons(vlan_tx_tag_get(skb));
> +
> + vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
> + len = min_t(int, skb->len + VLAN_HLEN, len);
> +
> + copy = min_t(int, vlan_offset, len);
> + ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);
> + len -= copy;
> + copied += copy;
> + if (ret || !len)
> + goto done;
> +
> + copy = min_t(int, sizeof(veth), len);
> + ret = memcpy_toiovecend(iv, (void *)&veth, copied, copy);
> + len -= copy;
> + copied += copy;
> + if (ret || !len)
> + goto done;
> + }
>
> - len = min_t(int, skb->len, len);
> -
> - ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len, len);
> + ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len);
> + copied += len;
>
> +done:
> rcu_read_lock_bh();
> vlan = rcu_dereference_bh(q->vlan);
> if (vlan)
> - macvlan_count_rx(vlan, len, ret == 0, 0);
> + macvlan_count_rx(vlan, copied - vnet_hdr_len, ret == 0, 0);
> rcu_read_unlock_bh();
>
> - return ret ? ret : (len + vnet_hdr_len);
> + return ret ? ret : copied;
> }
>
> static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
> --
> 1.7.6.5
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-05-03 23:30 ` Michael S. Tsirkin
@ 2012-05-03 23:47 ` Basil Gor
2012-05-04 1:06 ` David Miller
0 siblings, 1 reply; 20+ messages in thread
From: Basil Gor @ 2012-05-03 23:47 UTC (permalink / raw)
To: Michael S. Tsirkin; +Cc: Eric W. Biederman, David S. Miller, netdev
On Fri, May 04, 2012 at 02:30:29AM +0300, Michael S. Tsirkin wrote:
> On Fri, May 04, 2012 at 03:11:52AM +0400, Basil Gor wrote:
> > Below is a bit reworked version I've tested.
>
> Will review thanks.
>
> Can you add the signature according to the rules pls?
Copy vlan header to user when vlan id is stored in skb->vlan_tci
Signed-off-by: Basil Gor <basil.gor@gmail.com>
---
drivers/net/macvtap.c | 43 ++++++++++++++++++++++++++++++++++++++-----
1 files changed, 38 insertions(+), 5 deletions(-)
diff --git a/drivers/net/macvtap.c b/drivers/net/macvtap.c
index 0427c65..cb8fd50 100644
--- a/drivers/net/macvtap.c
+++ b/drivers/net/macvtap.c
@@ -1,5 +1,6 @@
#include <linux/etherdevice.h>
#include <linux/if_macvlan.h>
+#include <linux/if_vlan.h>
#include <linux/interrupt.h>
#include <linux/nsproxy.h>
#include <linux/compat.h>
@@ -759,6 +760,8 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
struct macvlan_dev *vlan;
int ret;
int vnet_hdr_len = 0;
+ int vlan_offset = 0;
+ int copied;
if (q->flags & IFF_VNET_HDR) {
struct virtio_net_hdr vnet_hdr;
@@ -773,18 +776,48 @@ static ssize_t macvtap_put_user(struct macvtap_queue *q,
if (memcpy_toiovecend(iv, (void *)&vnet_hdr, 0, sizeof(vnet_hdr)))
return -EFAULT;
}
+ copied = vnet_hdr_len;
+
+ if (!vlan_tx_tag_present(skb))
+ len = min_t(int, skb->len, len);
+ else {
+ int copy;
+ struct {
+ __be16 h_vlan_proto;
+ __be16 h_vlan_TCI;
+ } veth;
+ veth.h_vlan_proto = htons(ETH_P_8021Q);
+ veth.h_vlan_TCI = htons(vlan_tx_tag_get(skb));
+
+ vlan_offset = offsetof(struct vlan_ethhdr, h_vlan_proto);
+ len = min_t(int, skb->len + VLAN_HLEN, len);
+
+ copy = min_t(int, vlan_offset, len);
+ ret = skb_copy_datagram_const_iovec(skb, 0, iv, copied, copy);
+ len -= copy;
+ copied += copy;
+ if (ret || !len)
+ goto done;
+
+ copy = min_t(int, sizeof(veth), len);
+ ret = memcpy_toiovecend(iv, (void *)&veth, copied, copy);
+ len -= copy;
+ copied += copy;
+ if (ret || !len)
+ goto done;
+ }
- len = min_t(int, skb->len, len);
-
- ret = skb_copy_datagram_const_iovec(skb, 0, iv, vnet_hdr_len, len);
+ ret = skb_copy_datagram_const_iovec(skb, vlan_offset, iv, copied, len);
+ copied += len;
+done:
rcu_read_lock_bh();
vlan = rcu_dereference_bh(q->vlan);
if (vlan)
- macvlan_count_rx(vlan, len, ret == 0, 0);
+ macvlan_count_rx(vlan, copied - vnet_hdr_len, ret == 0, 0);
rcu_read_unlock_bh();
- return ret ? ret : (len + vnet_hdr_len);
+ return ret ? ret : copied;
}
static ssize_t macvtap_do_read(struct macvtap_queue *q, struct kiocb *iocb,
--
1.7.6.5
^ permalink raw reply related [flat|nested] 20+ messages in thread
* Re: [PATCH v3 2/2] macvtap: restore vlan header on user read
2012-05-03 23:47 ` Basil Gor
@ 2012-05-04 1:06 ` David Miller
0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2012-05-04 1:06 UTC (permalink / raw)
To: basil.gor; +Cc: mst, ebiederm, netdev
From: Basil Gor <basil.gor@gmail.com>
Date: Fri, 4 May 2012 03:47:33 +0400
> On Fri, May 04, 2012 at 02:30:29AM +0300, Michael S. Tsirkin wrote:
>> On Fri, May 04, 2012 at 03:11:52AM +0400, Basil Gor wrote:
>> > Below is a bit reworked version I've tested.
>>
>> Will review thanks.
>>
>> Can you add the signature according to the rules pls?
>
> Copy vlan header to user when vlan id is stored in skb->vlan_tci
>
> Signed-off-by: Basil Gor <basil.gor@gmail.com>
This doesn't work, you have to make the patch submission a
completely fresh one. With a full, complete, commit message.
Not as a reply to another email posting.
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2012-05-04 1:07 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-18 18:34 [PATCH] macvlan/macvtap: Fix vlan tagging on user read Basil Gor
2012-04-18 18:54 ` Eric W. Biederman
2012-04-18 19:33 ` Basil Gor
2012-04-20 23:11 ` Basil Gor
2012-04-21 1:49 ` Eric W. Biederman
2012-04-25 17:01 ` [PATCH v3 1/2] vhost-net: fix handle_rx buffer size Basil Gor
2012-04-26 5:30 ` Eric W. Biederman
2012-05-03 12:39 ` Michael S. Tsirkin
2012-05-03 13:16 ` Michael S. Tsirkin
2012-05-03 14:43 ` Basil Gor
2012-04-25 17:01 ` [PATCH v3 2/2] macvtap: restore vlan header on user read Basil Gor
2012-04-26 5:31 ` Eric W. Biederman
2012-05-03 13:07 ` Michael S. Tsirkin
2012-05-03 13:37 ` Eric W. Biederman
2012-05-03 14:31 ` Michael S. Tsirkin
2012-05-03 15:22 ` Basil Gor
2012-05-03 23:11 ` Basil Gor
2012-05-03 23:30 ` Michael S. Tsirkin
2012-05-03 23:47 ` Basil Gor
2012-05-04 1:06 ` David Miller
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.