netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
@ 2013-07-30 13:07 Ronny Meeus
  2013-07-30 14:09 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Ronny Meeus @ 2013-07-30 13:07 UTC (permalink / raw)
  To: netdev

Hello

I have ported a legacy application that is processing several packet
streams based on protocol and vlan.
Internally in the application a dispatching is done based on the
VLAN/Protocol field in the Ethernet packets.

To receive the packets I use a AF_PACKET socket on a pure Ethernet
interface (not vlan aware).
A BPF filter is attached to the socket to drop packets I'm not
interested in as soon as possible in the processing path.

This setup worked well until I switched to a 3.4 kernel (I was using
2.6.32 before).
In the 3.4 kernel I see that the vlan information is stripped from the
packets I receive from the socket.

After some searches on Google and browsing the Linux code I found that
the Vlan is stripped from the packet very early in the receive path.
This is the info of the commit:

commit bcc6d47903612c3861201cc3a866fb604f26b8b2
Author: Jiri Pirko <jpirko@redhat.com>
Date:   Thu Apr 7 19:48:33 2011 +0000

    net: vlan: make non-hw-accel rx path similar to hw-accel

    Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
    enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
    vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.

    For non-rx-vlan-hw-accel however, tagged skb goes thru whole
    __netif_receive_skb, it's untagged in ptype_base hander and reinjected

    This incosistency is fixed by this patch. Vlan untagging happens early in
    __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
    see the skb like it was untagged by hw.


Now the question is: What is the correct solution to handle this?

One option I found is using the pcap library since this uses the
auxillary data received from the recvmsg call to reconstruct the vlan
headers, but this would mean that first of all I have to adapt my
application(s) and more importantly that I loose the BPF filter
feature since this is implemented in the kernel.
Another disadvantage is that this requires more processing since the
mac header needs to be moved the packet to make room to store the VLAN
tags.
So first cycles are lost in the kernel to strip the info and a bit
later, the packet to be reconstructed again.

Is there any other way to proceed?

A side question: If I would switch to the libpcap approach, I assume
the application can work on both the 2.6 and 3.4 version of the
kernel, but is there a guarantee that this will also work on future
versions?

Best regards,
Ronny

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-30 13:07 How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel? Ronny Meeus
@ 2013-07-30 14:09 ` Eric Dumazet
  2013-07-31 12:51   ` Ronny Meeus
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2013-07-30 14:09 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: netdev

On Tue, 2013-07-30 at 15:07 +0200, Ronny Meeus wrote:
> Hello
> 
> I have ported a legacy application that is processing several packet
> streams based on protocol and vlan.
> Internally in the application a dispatching is done based on the
> VLAN/Protocol field in the Ethernet packets.
> 
> To receive the packets I use a AF_PACKET socket on a pure Ethernet
> interface (not vlan aware).
> A BPF filter is attached to the socket to drop packets I'm not
> interested in as soon as possible in the processing path.
> 
> This setup worked well until I switched to a 3.4 kernel (I was using
> 2.6.32 before).
> In the 3.4 kernel I see that the vlan information is stripped from the
> packets I receive from the socket.
> 
> After some searches on Google and browsing the Linux code I found that
> the Vlan is stripped from the packet very early in the receive path.
> This is the info of the commit:
> 
> commit bcc6d47903612c3861201cc3a866fb604f26b8b2
> Author: Jiri Pirko <jpirko@redhat.com>
> Date:   Thu Apr 7 19:48:33 2011 +0000
> 
>     net: vlan: make non-hw-accel rx path similar to hw-accel
> 
>     Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
>     enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
>     vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.
> 
>     For non-rx-vlan-hw-accel however, tagged skb goes thru whole
>     __netif_receive_skb, it's untagged in ptype_base hander and reinjected
> 
>     This incosistency is fixed by this patch. Vlan untagging happens early in
>     __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
>     see the skb like it was untagged by hw.
> 
> 
> Now the question is: What is the correct solution to handle this?
> 
> One option I found is using the pcap library since this uses the
> auxillary data received from the recvmsg call to reconstruct the vlan
> headers, but this would mean that first of all I have to adapt my
> application(s) and more importantly that I loose the BPF filter
> feature since this is implemented in the kernel.
> Another disadvantage is that this requires more processing since the
> mac header needs to be moved the packet to make room to store the VLAN
> tags.
> So first cycles are lost in the kernel to strip the info and a bit
> later, the packet to be reconstructed again.
> 
> Is there any other way to proceed?
> 
> A side question: If I would switch to the libpcap approach, I assume
> the application can work on both the 2.6 and 3.4 version of the
> kernel, but is there a guarantee that this will also work on future
> versions?


If you use a BPF, it can access vlan tag (skb->vlan_tci) since linux-3.8

commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1
Author: Eric Dumazet <edumazet@google.com>
Date:   Sat Oct 27 02:26:17 2012 +0000

    net: filter: add vlan tag access
    
    BPF filters lack ability to access skb->vlan_tci
    
    This patch adds two new ancillary accessors :
    
    SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)
    
    SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)
    
    This allows libpcap/tcpdump to use a kernel filter instead of
    having to fallback to accept all packets, then filter them in
    user space.
    
    Signed-off-by: Eric Dumazet <edumazet@google.com>
    Suggested-by: Ani Sinha <ani@aristanetworks.com>
    Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net>
    Signed-off-by: David S. Miller <davem@davemloft.net>


You can update your BPF to use these new features, and get support for
both old kernels and new ones.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-30 14:09 ` Eric Dumazet
@ 2013-07-31 12:51   ` Ronny Meeus
  2013-07-31 12:54     ` Daniel Borkmann
  2013-07-31 14:16     ` Eric Dumazet
  0 siblings, 2 replies; 13+ messages in thread
From: Ronny Meeus @ 2013-07-31 12:51 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Tue, Jul 30, 2013 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Tue, 2013-07-30 at 15:07 +0200, Ronny Meeus wrote:
>> Hello
>>
>> I have ported a legacy application that is processing several packet
>> streams based on protocol and vlan.
>> Internally in the application a dispatching is done based on the
>> VLAN/Protocol field in the Ethernet packets.
>>
>> To receive the packets I use a AF_PACKET socket on a pure Ethernet
>> interface (not vlan aware).
>> A BPF filter is attached to the socket to drop packets I'm not
>> interested in as soon as possible in the processing path.
>>
>> This setup worked well until I switched to a 3.4 kernel (I was using
>> 2.6.32 before).
>> In the 3.4 kernel I see that the vlan information is stripped from the
>> packets I receive from the socket.
>>
>> After some searches on Google and browsing the Linux code I found that
>> the Vlan is stripped from the packet very early in the receive path.
>> This is the info of the commit:
>>
>> commit bcc6d47903612c3861201cc3a866fb604f26b8b2
>> Author: Jiri Pirko <jpirko@redhat.com>
>> Date:   Thu Apr 7 19:48:33 2011 +0000
>>
>>     net: vlan: make non-hw-accel rx path similar to hw-accel
>>
>>     Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
>>     enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
>>     vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.
>>
>>     For non-rx-vlan-hw-accel however, tagged skb goes thru whole
>>     __netif_receive_skb, it's untagged in ptype_base hander and reinjected
>>
>>     This incosistency is fixed by this patch. Vlan untagging happens early in
>>     __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
>>     see the skb like it was untagged by hw.
>>
>>
>> Now the question is: What is the correct solution to handle this?
>>
>> One option I found is using the pcap library since this uses the
>> auxillary data received from the recvmsg call to reconstruct the vlan
>> headers, but this would mean that first of all I have to adapt my
>> application(s) and more importantly that I loose the BPF filter
>> feature since this is implemented in the kernel.
>> Another disadvantage is that this requires more processing since the
>> mac header needs to be moved the packet to make room to store the VLAN
>> tags.
>> So first cycles are lost in the kernel to strip the info and a bit
>> later, the packet to be reconstructed again.
>>
>> Is there any other way to proceed?
>>
>> A side question: If I would switch to the libpcap approach, I assume
>> the application can work on both the 2.6 and 3.4 version of the
>> kernel, but is there a guarantee that this will also work on future
>> versions?
>
>
> If you use a BPF, it can access vlan tag (skb->vlan_tci) since linux-3.8
>
> commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1
> Author: Eric Dumazet <edumazet@google.com>
> Date:   Sat Oct 27 02:26:17 2012 +0000
>
>     net: filter: add vlan tag access
>
>     BPF filters lack ability to access skb->vlan_tci
>
>     This patch adds two new ancillary accessors :
>
>     SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)
>
>     SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)
>
>     This allows libpcap/tcpdump to use a kernel filter instead of
>     having to fallback to accept all packets, then filter them in
>     user space.
>
>     Signed-off-by: Eric Dumazet <edumazet@google.com>
>     Suggested-by: Ani Sinha <ani@aristanetworks.com>
>     Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net>
>     Signed-off-by: David S. Miller <davem@davemloft.net>
>
>
> You can update your BPF to use these new features, and get support for
> both old kernels and new ones.


Thanks for the feedback. High level it is almost clear.

At implementation level I do not understand how it is supposed to work.
If I use tcpdump to generate a filter for example on vlan 4094 I see
no reference at all to the newly added instructions to get the VLAN.

~ # tcpdump -i eth-ntb vlan 4094 -d
tcpdump: WARNING: eth-ntb: no IPv4 address assigned
(000) ldh      [12]
(001) jeq      #0x8100          jt 3    jf 2
(002) jeq      #0x9100          jt 3    jf 7
(003) ldh      [14]
(004) and      #0xfff
(005) jeq      #0xffe           jt 6    jf 7
(006) ret      #65535
(007) ret      #0

To me it looks like to code above is just checking the bytes in the
raw Ethernet packet at offset 12 and 14.
Since the command above seems to work it looks to me that the
filtering is done in the tcpdump application instead of in the kernel.

If I use the strace command while starting tcpdump I see that the
SO_ATTACH_FILTER sockopt is passed to the kernel:

<snip>
setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\1\0\0\20\f\366\340", 8) = 0
fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
recvfrom(3, 0x7f6f6630, 1, 32, 0, 0)    = -1 EAGAIN (Resource
temporarily unavailable)
fcntl64(3, F_SETFL, O_RDWR)             = 0
setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\10\0\0\20>\210@", 8) = 0
<snip>

So I'm confused. I would expect to see some commands to read access
the VLAN field in the additional data and compare it to the VLAN
(4094) I want to filter.


Best regards,
Ronny

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 12:51   ` Ronny Meeus
@ 2013-07-31 12:54     ` Daniel Borkmann
  2013-07-31 14:16     ` Eric Dumazet
  1 sibling, 0 replies; 13+ messages in thread
From: Daniel Borkmann @ 2013-07-31 12:54 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: Eric Dumazet, netdev

On 07/31/2013 02:51 PM, Ronny Meeus wrote:
> On Tue, Jul 30, 2013 at 4:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Tue, 2013-07-30 at 15:07 +0200, Ronny Meeus wrote:
>>> Hello
>>>
>>> I have ported a legacy application that is processing several packet
>>> streams based on protocol and vlan.
>>> Internally in the application a dispatching is done based on the
>>> VLAN/Protocol field in the Ethernet packets.
>>>
>>> To receive the packets I use a AF_PACKET socket on a pure Ethernet
>>> interface (not vlan aware).
>>> A BPF filter is attached to the socket to drop packets I'm not
>>> interested in as soon as possible in the processing path.
>>>
>>> This setup worked well until I switched to a 3.4 kernel (I was using
>>> 2.6.32 before).
>>> In the 3.4 kernel I see that the vlan information is stripped from the
>>> packets I receive from the socket.
>>>
>>> After some searches on Google and browsing the Linux code I found that
>>> the Vlan is stripped from the packet very early in the receive path.
>>> This is the info of the commit:
>>>
>>> commit bcc6d47903612c3861201cc3a866fb604f26b8b2
>>> Author: Jiri Pirko <jpirko@redhat.com>
>>> Date:   Thu Apr 7 19:48:33 2011 +0000
>>>
>>>      net: vlan: make non-hw-accel rx path similar to hw-accel
>>>
>>>      Now there are 2 paths for rx vlan frames. When rx-vlan-hw-accel is
>>>      enabled, skb is untagged by NIC, vlan_tci is set and the skb gets into
>>>      vlan code in __netif_receive_skb - vlan_hwaccel_do_receive.
>>>
>>>      For non-rx-vlan-hw-accel however, tagged skb goes thru whole
>>>      __netif_receive_skb, it's untagged in ptype_base hander and reinjected
>>>
>>>      This incosistency is fixed by this patch. Vlan untagging happens early in
>>>      __netif_receive_skb so the rest of code (ptype_all handlers, rx_handlers)
>>>      see the skb like it was untagged by hw.
>>>
>>>
>>> Now the question is: What is the correct solution to handle this?
>>>
>>> One option I found is using the pcap library since this uses the
>>> auxillary data received from the recvmsg call to reconstruct the vlan
>>> headers, but this would mean that first of all I have to adapt my
>>> application(s) and more importantly that I loose the BPF filter
>>> feature since this is implemented in the kernel.
>>> Another disadvantage is that this requires more processing since the
>>> mac header needs to be moved the packet to make room to store the VLAN
>>> tags.
>>> So first cycles are lost in the kernel to strip the info and a bit
>>> later, the packet to be reconstructed again.
>>>
>>> Is there any other way to proceed?
>>>
>>> A side question: If I would switch to the libpcap approach, I assume
>>> the application can work on both the 2.6 and 3.4 version of the
>>> kernel, but is there a guarantee that this will also work on future
>>> versions?
>>
>>
>> If you use a BPF, it can access vlan tag (skb->vlan_tci) since linux-3.8
>>
>> commit f3335031b9452baebfe49b8b5e55d3fe0c4677d1
>> Author: Eric Dumazet <edumazet@google.com>
>> Date:   Sat Oct 27 02:26:17 2012 +0000
>>
>>      net: filter: add vlan tag access
>>
>>      BPF filters lack ability to access skb->vlan_tci
>>
>>      This patch adds two new ancillary accessors :
>>
>>      SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)
>>
>>      SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)
>>
>>      This allows libpcap/tcpdump to use a kernel filter instead of
>>      having to fallback to accept all packets, then filter them in
>>      user space.
>>
>>      Signed-off-by: Eric Dumazet <edumazet@google.com>
>>      Suggested-by: Ani Sinha <ani@aristanetworks.com>
>>      Suggested-by: Daniel Borkmann <danborkmann@iogearbox.net>
>>      Signed-off-by: David S. Miller <davem@davemloft.net>
>>
>>
>> You can update your BPF to use these new features, and get support for
>> both old kernels and new ones.
>
> Thanks for the feedback. High level it is almost clear.
>
> At implementation level I do not understand how it is supposed to work.
> If I use tcpdump to generate a filter for example on vlan 4094 I see
> no reference at all to the newly added instructions to get the VLAN.
>
> ~ # tcpdump -i eth-ntb vlan 4094 -d
> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
> (000) ldh      [12]
> (001) jeq      #0x8100          jt 3    jf 2
> (002) jeq      #0x9100          jt 3    jf 7
> (003) ldh      [14]
> (004) and      #0xfff
> (005) jeq      #0xffe           jt 6    jf 7
> (006) ret      #65535
> (007) ret      #0

I assume that's because libpcap BPF compiler has not implemented it so far.

Therefore, tcpdump doesn't make use of it either.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 12:51   ` Ronny Meeus
  2013-07-31 12:54     ` Daniel Borkmann
@ 2013-07-31 14:16     ` Eric Dumazet
  2013-07-31 14:36       ` Ronny Meeus
  1 sibling, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2013-07-31 14:16 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: netdev

On Wed, 2013-07-31 at 14:51 +0200, Ronny Meeus wrote:

> Thanks for the feedback. High level it is almost clear.
> 
> At implementation level I do not understand how it is supposed to work.
> If I use tcpdump to generate a filter for example on vlan 4094 I see
> no reference at all to the newly added instructions to get the VLAN.
> 
> ~ # tcpdump -i eth-ntb vlan 4094 -d
> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
> (000) ldh      [12]
> (001) jeq      #0x8100          jt 3    jf 2
> (002) jeq      #0x9100          jt 3    jf 7
> (003) ldh      [14]
> (004) and      #0xfff
> (005) jeq      #0xffe           jt 6    jf 7
> (006) ret      #65535
> (007) ret      #0
> 
> To me it looks like to code above is just checking the bytes in the
> raw Ethernet packet at offset 12 and 14.
> Since the command above seems to work it looks to me that the
> filtering is done in the tcpdump application instead of in the kernel.
> 
> If I use the strace command while starting tcpdump I see that the
> SO_ATTACH_FILTER sockopt is passed to the kernel:
> 
> <snip>
> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\1\0\0\20\f\366\340", 8) = 0
> fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
> recvfrom(3, 0x7f6f6630, 1, 32, 0, 0)    = -1 EAGAIN (Resource
> temporarily unavailable)
> fcntl64(3, F_SETFL, O_RDWR)             = 0
> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\10\0\0\20>\210@", 8) = 0
> <snip>
> 
> So I'm confused. I would expect to see some commands to read access
> the VLAN field in the additional data and compare it to the VLAN
> (4094) I want to filter.
> 

I assumed from you initial mail you were using a BPF filter, not
libpcap, which presumably doesnt use these new 'instructions'

Adapting the BPF filter generated by libpcap is a matter of adding 3 or
4 instructions. In your case 2 instructions actually

One to load tag id into A
One to compare A against immediate value 4094 and conditional jump.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 14:16     ` Eric Dumazet
@ 2013-07-31 14:36       ` Ronny Meeus
  2013-07-31 14:42         ` Daniel Borkmann
  0 siblings, 1 reply; 13+ messages in thread
From: Ronny Meeus @ 2013-07-31 14:36 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: netdev

On Wed, Jul 31, 2013 at 4:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-07-31 at 14:51 +0200, Ronny Meeus wrote:
>
>> Thanks for the feedback. High level it is almost clear.
>>
>> At implementation level I do not understand how it is supposed to work.
>> If I use tcpdump to generate a filter for example on vlan 4094 I see
>> no reference at all to the newly added instructions to get the VLAN.
>>
>> ~ # tcpdump -i eth-ntb vlan 4094 -d
>> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
>> (000) ldh      [12]
>> (001) jeq      #0x8100          jt 3    jf 2
>> (002) jeq      #0x9100          jt 3    jf 7
>> (003) ldh      [14]
>> (004) and      #0xfff
>> (005) jeq      #0xffe           jt 6    jf 7
>> (006) ret      #65535
>> (007) ret      #0
>>
>> To me it looks like to code above is just checking the bytes in the
>> raw Ethernet packet at offset 12 and 14.
>> Since the command above seems to work it looks to me that the
>> filtering is done in the tcpdump application instead of in the kernel.
>>
>> If I use the strace command while starting tcpdump I see that the
>> SO_ATTACH_FILTER sockopt is passed to the kernel:
>>
>> <snip>
>> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\1\0\0\20\f\366\340", 8) = 0
>> fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
>> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
>> recvfrom(3, 0x7f6f6630, 1, 32, 0, 0)    = -1 EAGAIN (Resource
>> temporarily unavailable)
>> fcntl64(3, F_SETFL, O_RDWR)             = 0
>> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\10\0\0\20>\210@", 8) = 0
>> <snip>
>>
>> So I'm confused. I would expect to see some commands to read access
>> the VLAN field in the additional data and compare it to the VLAN
>> (4094) I want to filter.
>>
>
> I assumed from you initial mail you were using a BPF filter, not
> libpcap, which presumably doesnt use these new 'instructions'
>

I used the tcpdump tool to generate the filter I need to use in my application.

> Adapting the BPF filter generated by libpcap is a matter of adding 3 or
> 4 instructions. In your case 2 instructions actually
>
> One to load tag id into A
> One to compare A against immediate value 4094 and conditional jump.
>

Can you give an real example of a filter that passes all packets that
have a VLAN 4094 attached and drops all others?
Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 14:36       ` Ronny Meeus
@ 2013-07-31 14:42         ` Daniel Borkmann
  2013-07-31 15:09           ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2013-07-31 14:42 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: Eric Dumazet, netdev

On 07/31/2013 04:36 PM, Ronny Meeus wrote:
> On Wed, Jul 31, 2013 at 4:16 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2013-07-31 at 14:51 +0200, Ronny Meeus wrote:
>>
>>> Thanks for the feedback. High level it is almost clear.
>>>
>>> At implementation level I do not understand how it is supposed to work.
>>> If I use tcpdump to generate a filter for example on vlan 4094 I see
>>> no reference at all to the newly added instructions to get the VLAN.
>>>
>>> ~ # tcpdump -i eth-ntb vlan 4094 -d
>>> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
>>> (000) ldh      [12]
>>> (001) jeq      #0x8100          jt 3    jf 2
>>> (002) jeq      #0x9100          jt 3    jf 7
>>> (003) ldh      [14]
>>> (004) and      #0xfff
>>> (005) jeq      #0xffe           jt 6    jf 7
>>> (006) ret      #65535
>>> (007) ret      #0
>>>
>>> To me it looks like to code above is just checking the bytes in the
>>> raw Ethernet packet at offset 12 and 14.
>>> Since the command above seems to work it looks to me that the
>>> filtering is done in the tcpdump application instead of in the kernel.
>>>
>>> If I use the strace command while starting tcpdump I see that the
>>> SO_ATTACH_FILTER sockopt is passed to the kernel:
>>>
>>> <snip>
>>> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\1\0\0\20\f\366\340", 8) = 0
>>> fcntl64(3, F_GETFL)                     = 0x2 (flags O_RDWR)
>>> fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK)  = 0
>>> recvfrom(3, 0x7f6f6630, 1, 32, 0, 0)    = -1 EAGAIN (Resource
>>> temporarily unavailable)
>>> fcntl64(3, F_SETFL, O_RDWR)             = 0
>>> setsockopt(3, SOL_SOCKET, SO_ATTACH_FILTER, "\0\10\0\0\20>\210@", 8) = 0
>>> <snip>
>>>
>>> So I'm confused. I would expect to see some commands to read access
>>> the VLAN field in the additional data and compare it to the VLAN
>>> (4094) I want to filter.
>>
>> I assumed from you initial mail you were using a BPF filter, not
>> libpcap, which presumably doesnt use these new 'instructions'
>
> I used the tcpdump tool to generate the filter I need to use in my application.
>
>> Adapting the BPF filter generated by libpcap is a matter of adding 3 or
>> 4 instructions. In your case 2 instructions actually
>>
>> One to load tag id into A
>> One to compare A against immediate value 4094 and conditional jump.
>
> Can you give an real example of a filter that passes all packets that
> have a VLAN 4094 attached and drops all others?

You can use bpfc (git://github.com/borkmann/netsniff-ng.git), it also has
an extensive man page. That should probably do it:

$ cat foo
ld vlant
jneq #4094, drop
ret #-1
drop: ret #0

$ bpfc foo
{ 0x20, 0, 0, 0xfffff02c },
{ 0x15, 0, 1, 0x00000ffe },
{ 0x6, 0, 0, 0xffffffff },
{ 0x6, 0, 0, 0x00000000 },

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 14:42         ` Daniel Borkmann
@ 2013-07-31 15:09           ` Eric Dumazet
  2013-07-31 20:01             ` Ronny Meeus
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2013-07-31 15:09 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Ronny Meeus, netdev

On Wed, 2013-07-31 at 16:42 +0200, Daniel Borkmann wrote:

> You can use bpfc (git://github.com/borkmann/netsniff-ng.git), it also has
> an extensive man page. That should probably do it:
> 
> $ cat foo
> ld vlant
> jneq #4094, drop
> ret #-1
> drop: ret #0
> 
> $ bpfc foo
> { 0x20, 0, 0, 0xfffff02c },
> { 0x15, 0, 1, 0x00000ffe },
> { 0x6, 0, 0, 0xffffffff },
> { 0x6, 0, 0, 0x00000000 },

Thanks Daniel.

If the load of this BPF fails (because its an old kernel), then load
your old filter.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 15:09           ` Eric Dumazet
@ 2013-07-31 20:01             ` Ronny Meeus
  2013-07-31 20:47               ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Ronny Meeus @ 2013-07-31 20:01 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Daniel Borkmann, netdev

On Wed, Jul 31, 2013 at 5:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-07-31 at 16:42 +0200, Daniel Borkmann wrote:
>
>> You can use bpfc (git://github.com/borkmann/netsniff-ng.git), it also has
>> an extensive man page. That should probably do it:
>>
>> $ cat foo
>> ld vlant
>> jneq #4094, drop
>> ret #-1
>> drop: ret #0
>>
>> $ bpfc foo
>> { 0x20, 0, 0, 0xfffff02c },
>> { 0x15, 0, 1, 0x00000ffe },
>> { 0x6, 0, 0, 0xffffffff },
>> { 0x6, 0, 0, 0x00000000 },
>

Thanks Daniel, this is very useful information.
I have cloned the repo and compiled the tool myself. It will be very
useful in the future.

> Thanks Daniel.
>
> If the load of this BPF fails (because its an old kernel), then load
> your old filter.
>

I created a small test application after I backported the filter code
to the 3.4 kernel.
I instrumented the kernel with a printk at the moment the
vlan_tx_tag_get call is done to see the actual value of the vlan tag
since it did not work initially.

These are the packets displayed by tcpdump:

tcpdump: WARNING: eth-ntb: no IPv4 address assigned
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth-ntb, link-type EN10MB (Ethernet), capture size 65535 bytes
00:18:49.233283 06:00:00:00:00:80 > f7:00:00:00:ff:ff, ethertype
802.1Q (0x8100), length 64:
        0x0000:  f700 0000 ffff 0600 0000 0080 8100 affe
        0x0010:  08ab 0014 0000 0000 0f00 0001 0096 6000
        0x0020:  0096 0000 0001 0000 000d 0000 0000 0000
        0x0030:  0000 0000 0000 0000 0000 0000 0000 0000

So the Vlan is 0xffe and the priority/CFI field is 0xA.
Apparently the value I need to  use in the filter is 0xaffe to make it
work. Is this normal or is this a bug in the kernel?

This is the filter I used:
{ 0x20, 0, 0, 0xfffff02c }
{ 0x15, 0, 1, 0x0000affe }
{ 0x06, 0, 0, 0x00000800 }
{ 0x06, 0, 0, 0x00000000 }

And this is the trace of the kernel and my application:

[12529.357172] BPF_S_ANC_VLAN_TAG: affe
packets received:          1
[12533.020743] BPF_S_ANC_VLAN_TAG: affe
packets received:          2
[12536.667159] BPF_S_ANC_VLAN_TAG: affe
packets received:          3
[12540.343857] BPF_S_ANC_VLAN_TAG: affe
packets received:          4


Best regards,
Ronny

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 20:01             ` Ronny Meeus
@ 2013-07-31 20:47               ` Eric Dumazet
  2013-08-01  9:24                 ` Ronny Meeus
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2013-07-31 20:47 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: Daniel Borkmann, netdev

On Wed, 2013-07-31 at 22:01 +0200, Ronny Meeus wrote:
> On Wed, Jul 31, 2013 at 5:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> > On Wed, 2013-07-31 at 16:42 +0200, Daniel Borkmann wrote:
> >
> >> You can use bpfc (git://github.com/borkmann/netsniff-ng.git), it also has
> >> an extensive man page. That should probably do it:
> >>
> >> $ cat foo
> >> ld vlant
> >> jneq #4094, drop
> >> ret #-1
> >> drop: ret #0
> >>
> >> $ bpfc foo
> >> { 0x20, 0, 0, 0xfffff02c },
> >> { 0x15, 0, 1, 0x00000ffe },
> >> { 0x6, 0, 0, 0xffffffff },
> >> { 0x6, 0, 0, 0x00000000 },
> >
> 
> Thanks Daniel, this is very useful information.
> I have cloned the repo and compiled the tool myself. It will be very
> useful in the future.
> 
> > Thanks Daniel.
> >
> > If the load of this BPF fails (because its an old kernel), then load
> > your old filter.
> >
> 
> I created a small test application after I backported the filter code
> to the 3.4 kernel.
> I instrumented the kernel with a printk at the moment the
> vlan_tx_tag_get call is done to see the actual value of the vlan tag
> since it did not work initially.
> 
> These are the packets displayed by tcpdump:
> 
> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
> listening on eth-ntb, link-type EN10MB (Ethernet), capture size 65535 bytes
> 00:18:49.233283 06:00:00:00:00:80 > f7:00:00:00:ff:ff, ethertype
> 802.1Q (0x8100), length 64:
>         0x0000:  f700 0000 ffff 0600 0000 0080 8100 affe
>         0x0010:  08ab 0014 0000 0000 0f00 0001 0096 6000
>         0x0020:  0096 0000 0001 0000 000d 0000 0000 0000
>         0x0030:  0000 0000 0000 0000 0000 0000 0000 0000
> 
> So the Vlan is 0xffe and the priority/CFI field is 0xA.
> Apparently the value I need to  use in the filter is 0xaffe to make it
> work. Is this normal or is this a bug in the kernel?
> 
> This is the filter I used:
> { 0x20, 0, 0, 0xfffff02c }
> { 0x15, 0, 1, 0x0000affe }
> { 0x06, 0, 0, 0x00000800 }
> { 0x06, 0, 0, 0x00000000 }
> 
> And this is the trace of the kernel and my application:
> 
> [12529.357172] BPF_S_ANC_VLAN_TAG: affe
> packets received:          1
> [12533.020743] BPF_S_ANC_VLAN_TAG: affe
> packets received:          2
> [12536.667159] BPF_S_ANC_VLAN_TAG: affe
> packets received:          3
> [12540.343857] BPF_S_ANC_VLAN_TAG: affe
> packets received:          4

Right, vlan_tx_tag_get() gets the whole tag, so you want to mask A with
0xfff before the compare (to strip the prio)

ld vlant
and #4095
jneq #4094, drop
ret #-1
drop: ret #0

or something like that.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-07-31 20:47               ` Eric Dumazet
@ 2013-08-01  9:24                 ` Ronny Meeus
  2013-08-02  8:15                   ` Daniel Borkmann
  0 siblings, 1 reply; 13+ messages in thread
From: Ronny Meeus @ 2013-08-01  9:24 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Daniel Borkmann, netdev

On Wed, Jul 31, 2013 at 10:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Wed, 2013-07-31 at 22:01 +0200, Ronny Meeus wrote:
>> On Wed, Jul 31, 2013 at 5:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> > On Wed, 2013-07-31 at 16:42 +0200, Daniel Borkmann wrote:
>> >
>> >> You can use bpfc (git://github.com/borkmann/netsniff-ng.git), it also has
>> >> an extensive man page. That should probably do it:
>> >>
>> >> $ cat foo
>> >> ld vlant
>> >> jneq #4094, drop
>> >> ret #-1
>> >> drop: ret #0
>> >>
>> >> $ bpfc foo
>> >> { 0x20, 0, 0, 0xfffff02c },
>> >> { 0x15, 0, 1, 0x00000ffe },
>> >> { 0x6, 0, 0, 0xffffffff },
>> >> { 0x6, 0, 0, 0x00000000 },
>> >
>>
>> Thanks Daniel, this is very useful information.
>> I have cloned the repo and compiled the tool myself. It will be very
>> useful in the future.
>>
>> > Thanks Daniel.
>> >
>> > If the load of this BPF fails (because its an old kernel), then load
>> > your old filter.
>> >
>>
>> I created a small test application after I backported the filter code
>> to the 3.4 kernel.
>> I instrumented the kernel with a printk at the moment the
>> vlan_tx_tag_get call is done to see the actual value of the vlan tag
>> since it did not work initially.
>>
>> These are the packets displayed by tcpdump:
>>
>> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>> listening on eth-ntb, link-type EN10MB (Ethernet), capture size 65535 bytes
>> 00:18:49.233283 06:00:00:00:00:80 > f7:00:00:00:ff:ff, ethertype
>> 802.1Q (0x8100), length 64:
>>         0x0000:  f700 0000 ffff 0600 0000 0080 8100 affe
>>         0x0010:  08ab 0014 0000 0000 0f00 0001 0096 6000
>>         0x0020:  0096 0000 0001 0000 000d 0000 0000 0000
>>         0x0030:  0000 0000 0000 0000 0000 0000 0000 0000
>>
>> So the Vlan is 0xffe and the priority/CFI field is 0xA.
>> Apparently the value I need to  use in the filter is 0xaffe to make it
>> work. Is this normal or is this a bug in the kernel?
>>
>> This is the filter I used:
>> { 0x20, 0, 0, 0xfffff02c }
>> { 0x15, 0, 1, 0x0000affe }
>> { 0x06, 0, 0, 0x00000800 }
>> { 0x06, 0, 0, 0x00000000 }
>>
>> And this is the trace of the kernel and my application:
>>
>> [12529.357172] BPF_S_ANC_VLAN_TAG: affe
>> packets received:          1
>> [12533.020743] BPF_S_ANC_VLAN_TAG: affe
>> packets received:          2
>> [12536.667159] BPF_S_ANC_VLAN_TAG: affe
>> packets received:          3
>> [12540.343857] BPF_S_ANC_VLAN_TAG: affe
>> packets received:          4
>
> Right, vlan_tx_tag_get() gets the whole tag, so you want to mask A with
> 0xfff before the compare (to strip the prio)
>
> ld vlant
> and #4095
> jneq #4094, drop
> ret #-1
> drop: ret #0
>
> or something like that.
>
>
>

OK the receiving side is clear now. Thanks.

Now the sending side.
I created an application that sends packets using libpcap. These
packets are full Ethernet packets, including VLAN tags etc.
If I connect a PC running Wireshark to the Ethernet port I'm sending
on I receive the packets, no issues.

If a start on the device that is sending the packets also the receive
application I created before I do not receive anything.
This is because the filter attached to the kernel by this application
is checking the VLAN tag in metadata of the buffer, which is in this
case not filled in.
If I do not attach a filter to the receiving application all packets I
send are also received by the receiving application, which is what I
expect since all packets sent on a raw socket are received by all
other sockets listening on the same interface.

I have the feeling that there is something wrong with the current
implementation.
In my opinion, the same VLAN processing as done for packets received
from the network (strip vlan and put it in the meta data) should be
done on packets that are sent by an application just before passing
them to other sockets listening on the same interface.
Right?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-08-01  9:24                 ` Ronny Meeus
@ 2013-08-02  8:15                   ` Daniel Borkmann
  2013-08-02  9:03                     ` Ronny Meeus
  0 siblings, 1 reply; 13+ messages in thread
From: Daniel Borkmann @ 2013-08-02  8:15 UTC (permalink / raw)
  To: Ronny Meeus; +Cc: Eric Dumazet, netdev

On 08/01/2013 11:24 AM, Ronny Meeus wrote:
> On Wed, Jul 31, 2013 at 10:47 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> On Wed, 2013-07-31 at 22:01 +0200, Ronny Meeus wrote:
>>> On Wed, Jul 31, 2013 at 5:09 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>> On Wed, 2013-07-31 at 16:42 +0200, Daniel Borkmann wrote:
>>>>
>>>>> You can use bpfc (git://github.com/borkmann/netsniff-ng.git), it also has
>>>>> an extensive man page. That should probably do it:
>>>>>
>>>>> $ cat foo
>>>>> ld vlant
>>>>> jneq #4094, drop
>>>>> ret #-1
>>>>> drop: ret #0
>>>>>
>>>>> $ bpfc foo
>>>>> { 0x20, 0, 0, 0xfffff02c },
>>>>> { 0x15, 0, 1, 0x00000ffe },
>>>>> { 0x6, 0, 0, 0xffffffff },
>>>>> { 0x6, 0, 0, 0x00000000 },
>>>>
>>>
>>> Thanks Daniel, this is very useful information.
>>> I have cloned the repo and compiled the tool myself. It will be very
>>> useful in the future.
>>>
>>>> Thanks Daniel.
>>>>
>>>> If the load of this BPF fails (because its an old kernel), then load
>>>> your old filter.
>>>>
>>>
>>> I created a small test application after I backported the filter code
>>> to the 3.4 kernel.
>>> I instrumented the kernel with a printk at the moment the
>>> vlan_tx_tag_get call is done to see the actual value of the vlan tag
>>> since it did not work initially.
>>>
>>> These are the packets displayed by tcpdump:
>>>
>>> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
>>> tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
>>> listening on eth-ntb, link-type EN10MB (Ethernet), capture size 65535 bytes
>>> 00:18:49.233283 06:00:00:00:00:80 > f7:00:00:00:ff:ff, ethertype
>>> 802.1Q (0x8100), length 64:
>>>          0x0000:  f700 0000 ffff 0600 0000 0080 8100 affe
>>>          0x0010:  08ab 0014 0000 0000 0f00 0001 0096 6000
>>>          0x0020:  0096 0000 0001 0000 000d 0000 0000 0000
>>>          0x0030:  0000 0000 0000 0000 0000 0000 0000 0000
>>>
>>> So the Vlan is 0xffe and the priority/CFI field is 0xA.
>>> Apparently the value I need to  use in the filter is 0xaffe to make it
>>> work. Is this normal or is this a bug in the kernel?
>>>
>>> This is the filter I used:
>>> { 0x20, 0, 0, 0xfffff02c }
>>> { 0x15, 0, 1, 0x0000affe }
>>> { 0x06, 0, 0, 0x00000800 }
>>> { 0x06, 0, 0, 0x00000000 }
>>>
>>> And this is the trace of the kernel and my application:
>>>
>>> [12529.357172] BPF_S_ANC_VLAN_TAG: affe
>>> packets received:          1
>>> [12533.020743] BPF_S_ANC_VLAN_TAG: affe
>>> packets received:          2
>>> [12536.667159] BPF_S_ANC_VLAN_TAG: affe
>>> packets received:          3
>>> [12540.343857] BPF_S_ANC_VLAN_TAG: affe
>>> packets received:          4
>>
>> Right, vlan_tx_tag_get() gets the whole tag, so you want to mask A with
>> 0xfff before the compare (to strip the prio)
>>
>> ld vlant
>> and #4095
>> jneq #4094, drop
>> ret #-1
>> drop: ret #0
>>
>> or something like that.
>
> OK the receiving side is clear now. Thanks.
>
> Now the sending side.
> I created an application that sends packets using libpcap. These
> packets are full Ethernet packets, including VLAN tags etc.
> If I connect a PC running Wireshark to the Ethernet port I'm sending
> on I receive the packets, no issues.
>
> If a start on the device that is sending the packets also the receive
> application I created before I do not receive anything.
> This is because the filter attached to the kernel by this application
> is checking the VLAN tag in metadata of the buffer, which is in this
> case not filled in.
> If I do not attach a filter to the receiving application all packets I
> send are also received by the receiving application, which is what I
> expect since all packets sent on a raw socket are received by all
> other sockets listening on the same interface.
>
> I have the feeling that there is something wrong with the current
> implementation.
> In my opinion, the same VLAN processing as done for packets received
> from the network (strip vlan and put it in the meta data) should be
> done on packets that are sent by an application just before passing
> them to other sockets listening on the same interface.
> Right?

Nope, we already had this discussion in the past [1]. ;-)

The vlan id is in skb->vlan_id when vlan accel is on, or in the packet
itself when vlan accel off. Thus, you can distinguish accelerated traffic
from non-accelerated one as well.

If you want to filter for it, you need to extend your BPF filter by adding
an ethernet type/vlan id check in the packet itself in case the loaded
vlant instruction does not equal the id that you're looking for, thus this
is being done as a fallback. And actually libpcap is supposed to do the same
in their filter compiler.

  [1] http://article.gmane.org/gmane.linux.network/254454

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel?
  2013-08-02  8:15                   ` Daniel Borkmann
@ 2013-08-02  9:03                     ` Ronny Meeus
  0 siblings, 0 replies; 13+ messages in thread
From: Ronny Meeus @ 2013-08-02  9:03 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Eric Dumazet, netdev

On Fri, Aug 2, 2013 at 10:15 AM, Daniel Borkmann <dborkman@redhat.com> wrote:
> On 08/01/2013 11:24 AM, Ronny Meeus wrote:
>>
>> On Wed, Jul 31, 2013 at 10:47 PM, Eric Dumazet <eric.dumazet@gmail.com>
>> wrote:
>>>
>>> On Wed, 2013-07-31 at 22:01 +0200, Ronny Meeus wrote:
>>>>
>>>> On Wed, Jul 31, 2013 at 5:09 PM, Eric Dumazet <eric.dumazet@gmail.com>
>>>> wrote:
>>>>>
>>>>> On Wed, 2013-07-31 at 16:42 +0200, Daniel Borkmann wrote:
>>>>>
>>>>>> You can use bpfc (git://github.com/borkmann/netsniff-ng.git), it also
>>>>>> has
>>>>>> an extensive man page. That should probably do it:
>>>>>>
>>>>>> $ cat foo
>>>>>> ld vlant
>>>>>> jneq #4094, drop
>>>>>> ret #-1
>>>>>> drop: ret #0
>>>>>>
>>>>>> $ bpfc foo
>>>>>> { 0x20, 0, 0, 0xfffff02c },
>>>>>> { 0x15, 0, 1, 0x00000ffe },
>>>>>> { 0x6, 0, 0, 0xffffffff },
>>>>>> { 0x6, 0, 0, 0x00000000 },
>>>>>
>>>>>
>>>>
>>>> Thanks Daniel, this is very useful information.
>>>> I have cloned the repo and compiled the tool myself. It will be very
>>>> useful in the future.
>>>>
>>>>> Thanks Daniel.
>>>>>
>>>>> If the load of this BPF fails (because its an old kernel), then load
>>>>> your old filter.
>>>>>
>>>>
>>>> I created a small test application after I backported the filter code
>>>> to the 3.4 kernel.
>>>> I instrumented the kernel with a printk at the moment the
>>>> vlan_tx_tag_get call is done to see the actual value of the vlan tag
>>>> since it did not work initially.
>>>>
>>>> These are the packets displayed by tcpdump:
>>>>
>>>> tcpdump: WARNING: eth-ntb: no IPv4 address assigned
>>>> tcpdump: verbose output suppressed, use -v or -vv for full protocol
>>>> decode
>>>> listening on eth-ntb, link-type EN10MB (Ethernet), capture size 65535
>>>> bytes
>>>> 00:18:49.233283 06:00:00:00:00:80 > f7:00:00:00:ff:ff, ethertype
>>>> 802.1Q (0x8100), length 64:
>>>>          0x0000:  f700 0000 ffff 0600 0000 0080 8100 affe
>>>>          0x0010:  08ab 0014 0000 0000 0f00 0001 0096 6000
>>>>          0x0020:  0096 0000 0001 0000 000d 0000 0000 0000
>>>>          0x0030:  0000 0000 0000 0000 0000 0000 0000 0000
>>>>
>>>> So the Vlan is 0xffe and the priority/CFI field is 0xA.
>>>> Apparently the value I need to  use in the filter is 0xaffe to make it
>>>> work. Is this normal or is this a bug in the kernel?
>>>>
>>>> This is the filter I used:
>>>> { 0x20, 0, 0, 0xfffff02c }
>>>> { 0x15, 0, 1, 0x0000affe }
>>>> { 0x06, 0, 0, 0x00000800 }
>>>> { 0x06, 0, 0, 0x00000000 }
>>>>
>>>> And this is the trace of the kernel and my application:
>>>>
>>>> [12529.357172] BPF_S_ANC_VLAN_TAG: affe
>>>> packets received:          1
>>>> [12533.020743] BPF_S_ANC_VLAN_TAG: affe
>>>> packets received:          2
>>>> [12536.667159] BPF_S_ANC_VLAN_TAG: affe
>>>> packets received:          3
>>>> [12540.343857] BPF_S_ANC_VLAN_TAG: affe
>>>> packets received:          4
>>>
>>>
>>> Right, vlan_tx_tag_get() gets the whole tag, so you want to mask A with
>>> 0xfff before the compare (to strip the prio)
>>>
>>> ld vlant
>>> and #4095
>>> jneq #4094, drop
>>> ret #-1
>>> drop: ret #0
>>>
>>> or something like that.
>>
>>
>> OK the receiving side is clear now. Thanks.
>>
>> Now the sending side.
>> I created an application that sends packets using libpcap. These
>> packets are full Ethernet packets, including VLAN tags etc.
>> If I connect a PC running Wireshark to the Ethernet port I'm sending
>> on I receive the packets, no issues.
>>
>> If a start on the device that is sending the packets also the receive
>> application I created before I do not receive anything.
>> This is because the filter attached to the kernel by this application
>> is checking the VLAN tag in metadata of the buffer, which is in this
>> case not filled in.
>> If I do not attach a filter to the receiving application all packets I
>> send are also received by the receiving application, which is what I
>> expect since all packets sent on a raw socket are received by all
>> other sockets listening on the same interface.
>>
>> I have the feeling that there is something wrong with the current
>> implementation.
>> In my opinion, the same VLAN processing as done for packets received
>> from the network (strip vlan and put it in the meta data) should be
>> done on packets that are sent by an application just before passing
>> them to other sockets listening on the same interface.
>> Right?
>
>
> Nope, we already had this discussion in the past [1]. ;-)
>
> The vlan id is in skb->vlan_id when vlan accel is on, or in the packet
> itself when vlan accel off. Thus, you can distinguish accelerated traffic
> from non-accelerated one as well.
>

Not completely correct in my opinion.
The vlan is never present in the packet anymore. It is always put in
the skb->vlan_id, also for non-accelerated environments.

> If you want to filter for it, you need to extend your BPF filter by adding
> an ethernet type/vlan id check in the packet itself in case the loaded
> vlant instruction does not equal the id that you're looking for, thus this
> is being done as a fallback. And actually libpcap is supposed to do the same
> in their filter compiler.

This is an option, but it costs performance and like I said before,
the vlan is always removed by the kernel.

>  [1] http://article.gmane.org/gmane.linux.network/254454

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-08-02  9:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-30 13:07 How do I receive vlan tags on an AF_PACKET socket in 3.4 kernel? Ronny Meeus
2013-07-30 14:09 ` Eric Dumazet
2013-07-31 12:51   ` Ronny Meeus
2013-07-31 12:54     ` Daniel Borkmann
2013-07-31 14:16     ` Eric Dumazet
2013-07-31 14:36       ` Ronny Meeus
2013-07-31 14:42         ` Daniel Borkmann
2013-07-31 15:09           ` Eric Dumazet
2013-07-31 20:01             ` Ronny Meeus
2013-07-31 20:47               ` Eric Dumazet
2013-08-01  9:24                 ` Ronny Meeus
2013-08-02  8:15                   ` Daniel Borkmann
2013-08-02  9:03                     ` Ronny Meeus

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).