linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: Linux, tcpdump and vlan
@ 2007-07-19 16:02 andrei radulescu-banu
  2007-07-20 19:58 ` Krzysztof Halasa
  0 siblings, 1 reply; 30+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 16:02 UTC (permalink / raw)
  To: Stephen Hemminger, Krzysztof Halasa
  Cc: Patrick McHardy, linux-kernel, Linux Netdev List

One additional thought: with the proposed changes in my prev message, the driver can be set to hw vlan accelerated mode, even if no vlan interfaces are configured. We would not have to switch hw vlan accelerated mode anymore, when vlan interfaces are created or destroyed.






       
____________________________________________________________________________________
Be a better Heartthrob. Get better relationship answers from someone who knows. Yahoo! Answers - Check it out. 
http://answers.yahoo.com/dir/?link=list&sid=396545433

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 16:02 Linux, tcpdump and vlan andrei radulescu-banu
@ 2007-07-20 19:58 ` Krzysztof Halasa
  2007-07-20 20:34   ` Ben Greear
  0 siblings, 1 reply; 30+ messages in thread
From: Krzysztof Halasa @ 2007-07-20 19:58 UTC (permalink / raw)
  To: andrei radulescu-banu
  Cc: Stephen Hemminger, Patrick McHardy, linux-kernel, Linux Netdev List

Another idea - perhaps we could make the software VLANs behave
the same as hw ones? I.e., stripping the tag on RX while setting
some magic skb field?

The packets could go via main interface first (normal path, with
eth_type_trans stripping the tag and setting protocol = some 802.1Q),
netif_rx | netif_receive_skb, then through the VLAN device with
finally eth_type_trans setting the IPv4 etc. protocol to pass to
L3 layers.

I can see potential problems on TX, the packets would have to be
presented without the tag (but with VLAN ID set somewhere in the skb)
and that probably means all drivers would have to be modified.

Seems a bit of work, I know my message is missing the patch...
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-20 19:58 ` Krzysztof Halasa
@ 2007-07-20 20:34   ` Ben Greear
  2007-07-21 11:32     ` Krzysztof Halasa
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Greear @ 2007-07-20 20:34 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: andrei radulescu-banu, Stephen Hemminger, Patrick McHardy,
	linux-kernel, Linux Netdev List

Krzysztof Halasa wrote:
> Another idea - perhaps we could make the software VLANs behave
> the same as hw ones? I.e., stripping the tag on RX while setting
> some magic skb field?
> 
> The packets could go via main interface first (normal path, with
> eth_type_trans stripping the tag and setting protocol = some 802.1Q),
> netif_rx | netif_receive_skb, then through the VLAN device with
> finally eth_type_trans setting the IPv4 etc. protocol to pass to
> L3 layers.

There is already a flag you can set on vlan devices (reorder-header)
that strips the VLAN tag before presenting it to user-space.

> I can see potential problems on TX, the packets would have to be
> presented without the tag (but with VLAN ID set somewhere in the skb)
> and that probably means all drivers would have to be modified.

On tx, if it shows up on the vlan device, we add that device's VID to
the header if no VID is currently in the SKB.  If it is in the SKB header
we change the VID to be the tx dev's VID (if it was different).  This allows user-space
to send a raw ethernet frame on a vlan device and have it automatically
go out of the box on the correct vlan.  User-space can also send raw VLAN frames
and have those also go out on the correct VLAN.

> Seems a bit of work, I know my message is missing the patch...

Unless I mis-understand, this has been working since 2.4 days :)

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-20 20:34   ` Ben Greear
@ 2007-07-21 11:32     ` Krzysztof Halasa
  2007-07-21 17:57       ` Ben Greear
  0 siblings, 1 reply; 30+ messages in thread
From: Krzysztof Halasa @ 2007-07-21 11:32 UTC (permalink / raw)
  To: Ben Greear
  Cc: andrei radulescu-banu, Stephen Hemminger, Patrick McHardy,
	linux-kernel, Linux Netdev List

Ben Greear <greearb@candelatech.com> writes:

> There is already a flag you can set on vlan devices (reorder-header)
> that strips the VLAN tag before presenting it to user-space.

Sure, but isn't it only valid for VLAN device (not the main ethX)?
I.e., can you have the tag stripped from frames captured on ethX?

> On tx, if it shows up on the vlan device, we add that device's VID to
> the header if no VID is currently in the SKB.  If it is in the SKB header
> we change the VID to be the tx dev's VID (if it was different).  This allows user-space
> to send a raw ethernet frame on a vlan device and have it automatically
> go out of the box on the correct vlan.  User-space can also send raw VLAN frames
> and have those also go out on the correct VLAN.

Well... I think the tag should be added unconditionally (for things like
QinQ) but that's trivial and minor.

IOW: I think all Ethernet interfaces should always be VLAN-aware,
stripping the tag (only one) early on RX and adding it late on TX.
That means tcpdump would see packets with exactly one tag removed
(unless there was no tag), in both RX and TX.

Tcpdump would need other means to get VLAN id...
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-21 11:32     ` Krzysztof Halasa
@ 2007-07-21 17:57       ` Ben Greear
  2007-07-21 21:15         ` Krzysztof Halasa
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Greear @ 2007-07-21 17:57 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: andrei radulescu-banu, Stephen Hemminger, Patrick McHardy,
	linux-kernel, Linux Netdev List

Krzysztof Halasa wrote:
> Ben Greear <greearb@candelatech.com> writes:
>
>   
>> There is already a flag you can set on vlan devices (reorder-header)
>> that strips the VLAN tag before presenting it to user-space.
>>     
>
> Sure, but isn't it only valid for VLAN device (not the main ethX)?
> I.e., can you have the tag stripped from frames captured on ethX?
>   
No.  I don't see a good reason to strip on ethX.  That hardware accel 
VLANs strip
is an inconvenience in my opinion, no need to force it in software as well.
>> On tx, if it shows up on the vlan device, we add that device's VID to
>> the header if no VID is currently in the SKB.  If it is in the SKB header
>> we change the VID to be the tx dev's VID (if it was different).  This allows user-space
>> to send a raw ethernet frame on a vlan device and have it automatically
>> go out of the box on the correct vlan.  User-space can also send raw VLAN frames
>> and have those also go out on the correct VLAN.
>>     
>
> Well... I think the tag should be added unconditionally (for things like
> QinQ) but that's trivial and minor.
>   
I think that for Q in Q, we would need some explicit flag on each skb to 
know when to add or modify
the VID.  I was never able to think of an automatic solution that worked 
in all cases
(bridging, writing raw packets from user space, normal receive, normal 
transmit, ...)

Modifying the bridging code would fix that path, and adding a socket opt 
to deal with writing
raw packets from user-space should handle the other tricky case I believe.

> IOW: I think all Ethernet interfaces should always be VLAN-aware,
> stripping the tag (only one) early on RX and adding it late on TX.
> That means tcpdump would see packets with exactly one tag removed
> (unless there was no tag), in both RX and TX.
>
> Tcpdump would need other means to get VLAN id...
>   
What benefit will this add?  It will certainly decrease performance to 
copy around
the header for every VLAN packet, so there would have to be a good reason to
add this logic...

Ben


-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-21 17:57       ` Ben Greear
@ 2007-07-21 21:15         ` Krzysztof Halasa
  0 siblings, 0 replies; 30+ messages in thread
From: Krzysztof Halasa @ 2007-07-21 21:15 UTC (permalink / raw)
  To: Ben Greear
  Cc: andrei radulescu-banu, Stephen Hemminger, Patrick McHardy,
	linux-kernel, Linux Netdev List

Ben Greear <greearb@candelatech.com> writes:

>> IOW: I think all Ethernet interfaces should always be VLAN-aware,
>> stripping the tag (only one) early on RX and adding it late on TX.
>> That means tcpdump would see packets with exactly one tag removed
>> (unless there was no tag), in both RX and TX.
>>
>> Tcpdump would need other means to get VLAN id...
>>
> What benefit will this add?  It will certainly decrease performance to
> copy around
> the header for every VLAN packet, so there would have to be a good reason to
> add this logic...

I'd have to do some tests... Hopefully in this decade, forget it for
now.

The primary reason - consistency with hw VLAN cards -> simpler
code.

The performance is already decreased (not sure if it's noticeable)
most of the time, i.e., when not transparently bridging VLAN
trunks. Bridging VLAN trunks is, of course, theoretically possible,
but it's rather not a common operation when using .1Q.
That is, with header reordering, of course.

Anyway, -ENOPATCH from me for now.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 23:38 ` Ben Greear
@ 2007-07-20 20:19   ` Krzysztof Halasa
  0 siblings, 0 replies; 30+ messages in thread
From: Krzysztof Halasa @ 2007-07-20 20:19 UTC (permalink / raw)
  To: Ben Greear
  Cc: andrei radulescu-banu, Patrick McHardy, Stephen Hemminger,
	linux-kernel, Linux Netdev List

Ben Greear <greearb@candelatech.com> writes:

>> The net result is that dev_queue_xmit_nit() is called twice, once
>> for dev=eth0.2 then for dev=eth0.
>
> Maybe binding to all isn't such a good idea then.

Anyway I would expect the frame on eth0.2 and then on eth0 as well.
Anything different is crazy.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-18 19:34 andrei radulescu-banu
  2007-07-18 22:57 ` Patrick McHardy
@ 2007-07-20 10:50 ` Florian Lohoff
  1 sibling, 0 replies; 30+ messages in thread
From: Florian Lohoff @ 2007-07-20 10:50 UTC (permalink / raw)
  To: andrei radulescu-banu; +Cc: linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]

On Wed, Jul 18, 2007 at 12:34:33PM -0700, andrei radulescu-banu wrote:
> 
> Dear kernel networking gurus, 
> 
> I am trying to understand why tcpdump does not work properly for vlan packets on linux. Here is the existing behavior, observed with:
>   - kernel 2.6.16,   
> - e1000 driver  
> - libpcap 0.9.6  
> - tcpdump 3.9.6 
>   
> 
> The e1000 driver has two modes when handling vlan frames:  
> (A) Default mode, when   
> - on rx, the mac includes vlan headers   
> - on tx, the mac expects tx frames to include vlan headers.   
> (B) Vlan hw accelerated mode, when:  
> - on rx, the mac does not include vlan headers, and instead passes vlan tag information in the status field of the ring buffer
>   - on tx, the mac expects no vlan headers, and instead expects vlan tag information to be passed in the status field of the ring buffer

I have seen similar behaviour. Once the kernel is compiled with VLAN
support the e1000 driver drops the vlan tag completely even when no
vlans are configured on that port. I would consider this beeing a bug
that enableing a kernel option changes behaviour even if the feature is
not in use.

As i was tracing dot1qinq i could actually see that only the outer
vlan tag was beeing dropped.

Flo
-- 
Florian Lohoff                  flo@rfc822.org             +49-171-2280134
	Those who would give up a little freedom to get a little 
          security shall soon have neither - Benjamin Franklin

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 21:38 andrei radulescu-banu
@ 2007-07-19 23:38 ` Ben Greear
  2007-07-20 20:19   ` Krzysztof Halasa
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Greear @ 2007-07-19 23:38 UTC (permalink / raw)
  To: andrei radulescu-banu
  Cc: Patrick McHardy, Stephen Hemminger, Krzysztof Halasa,
	linux-kernel, Linux Netdev List

andrei radulescu-banu wrote:
> During debugging, I noticed that dev_queue_xmit() is called twice for tx vlan frames. This results in a frame being passed twice to a packet socket bound to 'any' interface. If the packet socket is bound to a specific interface, though, it will get only one copy of the tx frame, which is good.
> 
> In more detail: suppose we're tx'ing a frame, and the route table lookup yields a vlan outgoing device eth0.2. dev_queue_xmit() is called, which calls dev_queue_xmit_nit() for dev = eth0.2 then dev->hard_start_xmit() for dev = eth0.2. 
> 
> The latter call gets into the vlan layer, which attaches the vlan id 2 (accelerated or not... in my e1000 case accelerated) then calls dev_queue_xmit() again. This time around dev_queue_xmit_nit() is called for dev = eth0, and dev->hard_start_xmit() actually calls the ethernet driver.
> 
> The net result is that dev_queue_xmit_nit() is called twice, once for dev=eth0.2 then for dev=eth0.

Maybe binding to all isn't such a good idea then.

Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
@ 2007-07-19 21:38 andrei radulescu-banu
  2007-07-19 23:38 ` Ben Greear
  0 siblings, 1 reply; 30+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 21:38 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Stephen Hemminger, Krzysztof Halasa, linux-kernel, Linux Netdev List

During debugging, I noticed that dev_queue_xmit() is called twice for tx vlan frames. This results in a frame being passed twice to a packet socket bound to 'any' interface. If the packet socket is bound to a specific interface, though, it will get only one copy of the tx frame, which is good.

In more detail: suppose we're tx'ing a frame, and the route table lookup yields a vlan outgoing device eth0.2. dev_queue_xmit() is called, which calls dev_queue_xmit_nit() for dev = eth0.2 then dev->hard_start_xmit() for dev = eth0.2. 

The latter call gets into the vlan layer, which attaches the vlan id 2 (accelerated or not... in my e1000 case accelerated) then calls dev_queue_xmit() again. This time around dev_queue_xmit_nit() is called for dev = eth0, and dev->hard_start_xmit() actually calls the ethernet driver.

The net result is that dev_queue_xmit_nit() is called twice, once for dev=eth0.2 then for dev=eth0.



      ____________________________________________________________________________________
Shape Yahoo! in your own image.  Join our Network Research Panel today!   http://surveylink.yahoo.com/gmrs/yahoo_panel_invite.asp?a=7 



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 18:20 andrei radulescu-banu
@ 2007-07-19 19:28 ` Stephen Hemminger
  0 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2007-07-19 19:28 UTC (permalink / raw)
  To: andrei radulescu-banu
  Cc: Ben Greear, Patrick McHardy, Krzysztof Halasa, linux-kernel,
	Linux Netdev List

On Thu, 19 Jul 2007 11:20:43 -0700 (PDT)
andrei radulescu-banu <iubica2@yahoo.com> wrote:

> > [Ben] If tcpdump and/or bridging needs to disable the hw-accel, then it can 
> explicitly do so by some API.  That is better than overloading
> the promisc flag in my opinion.  
> 
> I guess I could be persuaded in the end. But let me still play devil advocate. The semantics of 'promiscuous', in my opinion, mean 'receive everything', including vlan.
> 
> > [Ben] This is especially true since promisc 
> is not easily readable by user-space and things like tcpdump
> cannot have full control of promisc (if a mac-vlan has the NIC in 
> promisc mode, for instance, then tcpdump can never disable it.)
> 
> I agree with all the above. For example when you run 'ifconfig' during 'tcpdump', the interface does not have the promiscuous flag set!! 

In kernel it is a nice atomic counter, no problem.

> 
> This confused me for a while, until I realized that tcpdump's packet socket was using an obscure packet_dev_mc() API (af_packet.c) to get the interface in promiscuous mode. The reason for this is that packet_mc_add() implements a reference counted mechanism for promiscuous. So that:
> - starting tcpdump instance 1 sets promiscuous mode
> - starting tcpdump instance 2 bumps the ref count in packet_mc_add()
> - killing tcpdump instance 1 bumps down the ref count, the interface stays promiscuous
> - killing tcpdump instance 2 truly clear promiscuous mode.
> 
> The trick here is that when you kill tcpdump, the kernel clears the packet socket, and in process bumps down the ref count. Had tcpdump manually set/cleared the promisc flag, the interface would have stayed promisc after tcpdump was killed.
> 
> (The mac-vlan driver must have this corner problem as well. If a mac-vlan interface is disabled while tcpdump runs, it may yank promiscuousness from under tcpdump.)

The kernel has no such problem

> So if you want to create an ethtool API to set vlan-promiscuous mode, one problem to grapple is that we need a similar mechanism to the above, so you can run two concurrent tcpdump's (or tcpdump while bridging vlans) and the vlan-promiscuous mode gets set correctly each time.  For tcpdump at least, the new ethtool API needs to be called from packet_mc_add().

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
@ 2007-07-19 18:20 andrei radulescu-banu
  2007-07-19 19:28 ` Stephen Hemminger
  0 siblings, 1 reply; 30+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 18:20 UTC (permalink / raw)
  To: Ben Greear
  Cc: Patrick McHardy, Stephen Hemminger, Krzysztof Halasa,
	linux-kernel, Linux Netdev List

> [Ben] If tcpdump and/or bridging needs to disable the hw-accel, then it can 
explicitly do so by some API.  That is better than overloading
the promisc flag in my opinion.  

I guess I could be persuaded in the end. But let me still play devil advocate. The semantics of 'promiscuous', in my opinion, mean 'receive everything', including vlan.

> [Ben] This is especially true since promisc 
is not easily readable by user-space and things like tcpdump
cannot have full control of promisc (if a mac-vlan has the NIC in 
promisc mode, for instance, then tcpdump can never disable it.)

I agree with all the above. For example when you run 'ifconfig' during 'tcpdump', the interface does not have the promiscuous flag set!! 

This confused me for a while, until I realized that tcpdump's packet socket was using an obscure packet_dev_mc() API (af_packet.c) to get the interface in promiscuous mode. The reason for this is that packet_mc_add() implements a reference counted mechanism for promiscuous. So that:
- starting tcpdump instance 1 sets promiscuous mode
- starting tcpdump instance 2 bumps the ref count in packet_mc_add()
- killing tcpdump instance 1 bumps down the ref count, the interface stays promiscuous
- killing tcpdump instance 2 truly clear promiscuous mode.

The trick here is that when you kill tcpdump, the kernel clears the packet socket, and in process bumps down the ref count. Had tcpdump manually set/cleared the promisc flag, the interface would have stayed promisc after tcpdump was killed.

(The mac-vlan driver must have this corner problem as well. If a mac-vlan interface is disabled while tcpdump runs, it may yank promiscuousness from under tcpdump.)

So if you want to create an ethtool API to set vlan-promiscuous mode, one problem to grapple is that we need a similar mechanism to the above, so you can run two concurrent tcpdump's (or tcpdump while bridging vlans) and the vlan-promiscuous mode gets set correctly each time.  For tcpdump at least, the new ethtool API needs to be called from packet_mc_add().








       
____________________________________________________________________________________
Yahoo! oneSearch: Finally, mobile search 
that gives answers, not web links. 
http://mobile.yahoo.com/mobileweb/onesearch?refer=1ONXIC

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
@ 2007-07-19 17:46 andrei radulescu-banu
  0 siblings, 0 replies; 30+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 17:46 UTC (permalink / raw)
  To: Patrick McHardy
  Cc: Stephen Hemminger, Krzysztof Halasa, linux-kernel, Linux Netdev List


>> [Andrei] VLAN_TX_SKB_CB() is perfect for that.
> [Patrick, Stephen] No its not. Its only legal to use while something has ownership
of the skb. Between VLAN devices and real devices qdiscs are
free to use it.

All right, using VLAN_TX_SKB_CB() is a bad idea. In that case, we need to amend the skb struct, I don't see another way.





       
____________________________________________________________________________________
Be a better Globetrotter. Get better travel answers from someone who knows. Yahoo! Answers - Check it out.
http://answers.yahoo.com/dir/?link=list&sid=396545469

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 15:47 andrei radulescu-banu
  2007-07-19 16:21 ` Stephen Hemminger
  2007-07-19 16:33 ` Patrick McHardy
@ 2007-07-19 16:47 ` Ben Greear
  2 siblings, 0 replies; 30+ messages in thread
From: Ben Greear @ 2007-07-19 16:47 UTC (permalink / raw)
  To: andrei radulescu-banu
  Cc: Patrick McHardy, Stephen Hemminger, Krzysztof Halasa,
	linux-kernel, Linux Netdev List

andrei radulescu-banu wrote:
>> [Ben] I think a better method would be to allow disabling VLAN HW accel for a NIC with ethtool.
>>     
>
> This requires changes to ethtool and e1000 driver, +other drivers. It is a handy thing to have. I don't view it as a solution to tcpdump - or to the vlan bridging problem. One concern: if we're switching hw accel mode on the fly, we need to carefully protect tx frames that are just about going out and have already been set up for the opposite mode.
>   
I think it would be valid to let a few packets slip through on the old 
behaviour during changeover..or perhaps to drop them
entirely if that is required.

Turning off vlan hw-accel when the nic goes promisc is also going to 
require driver changes, I believe, so
either way you have to do that work.

If tcpdump and/or bridging needs to disable the hw-accel, then it can 
explicitly do so by some API.  That is better than overloading
the promisc flag in my opinion.  This is especially true since promisc 
is not easily readable by user-space and things like tcpdump
cannot have full control of promisc (if a mac-vlan has the NIC in 
promisc mode, for instance, then tcpdump can never disable it.)

> Any comments on what is the expected behavior of 'tcpdump -i eth0.2' vs. 'tcpdump -i eth0'?
>   
I would expect that you see tags with -i eth0, but not with -i eth0.2

That is the way it currently works with non-hw-accell VLANs (or it was 
the last I checked).

Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 15:47 andrei radulescu-banu
  2007-07-19 16:21 ` Stephen Hemminger
@ 2007-07-19 16:33 ` Patrick McHardy
  2007-07-19 16:47 ` Ben Greear
  2 siblings, 0 replies; 30+ messages in thread
From: Patrick McHardy @ 2007-07-19 16:33 UTC (permalink / raw)
  To: andrei radulescu-banu
  Cc: Stephen Hemminger, Krzysztof Halasa, linux-kernel, Linux Netdev List

andrei radulescu-banu wrote:
> The consensus seems to be that skb's need to carry vlan accelerated tags in their cb's, on rx as well as tx. VLAN_TX_SKB_CB() is perfect for that.


No its not. Its only legal to use while something has ownership
of the skb. Between VLAN devices and real devices qdiscs are
free to use it.

>>[Patrick] On the TX path, it could simply use the CB, but this is actually
> 
> also wrong (for both macvlan and real devices) since qdiscs have
> ownership of the skb in between, and at least netem *does* modify
> the CB, breaking VLAN.
> 
> Thanks for pointing that out... It appears to me that qdisc/netem already breaks the vlan implementation, in the path 
> 
> vlan_dev_hwaccel_hard_start_xmit(): sets accelerated vlan tag in skb->cb, calls
> dev_queue_xmit(): may pass skb to qdisc/netem, which may mangle skb->cb before calling
> dev->hard_start_xmit(), resulting in a tx frame without its vlan tag.
> 
> So netem needs to look for hw accelerated vlan metadata and insert it in the skb... Don't see any other way around this. 


No, we might want to put other data in the cb in the future.
VLAN should follow the rules instead.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 15:47 andrei radulescu-banu
@ 2007-07-19 16:21 ` Stephen Hemminger
  2007-07-19 16:33 ` Patrick McHardy
  2007-07-19 16:47 ` Ben Greear
  2 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2007-07-19 16:21 UTC (permalink / raw)
  To: andrei radulescu-banu
  Cc: Patrick McHardy, Krzysztof Halasa, linux-kernel, Linux Netdev List

On Thu, 19 Jul 2007 08:47:01 -0700 (PDT)
andrei radulescu-banu <iubica2@yahoo.com> wrote:

> The consensus seems to be that skb's need to carry vlan accelerated tags in their cb's, on rx as well as tx. VLAN_TX_SKB_CB() is perfect for that.
> 
> > [Patrick] On the TX path, it could simply use the CB, but this is actually
> also wrong (for both macvlan and real devices) since qdiscs have
> ownership of the skb in between, and at least netem *does* modify
> the CB, breaking VLAN.

No, VLAN is wrong to expect the CB to survive through layers. The CB is
a private scribble area that can be used by which ever piece of code currently
"owns" the skb.  If data needs to be passed from layer to layer, it needs to
be done as separate fields in the skb itself. If A passes an skb to B, then
the CB can be changed by B (or things it calls) before it arrives at C.




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
@ 2007-07-19 15:47 andrei radulescu-banu
  2007-07-19 16:21 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: andrei radulescu-banu @ 2007-07-19 15:47 UTC (permalink / raw)
  To: Patrick McHardy, Stephen Hemminger
  Cc: Krzysztof Halasa, linux-kernel, Linux Netdev List

The consensus seems to be that skb's need to carry vlan accelerated tags in their cb's, on rx as well as tx. VLAN_TX_SKB_CB() is perfect for that.

> [Patrick] On the TX path, it could simply use the CB, but this is actually
also wrong (for both macvlan and real devices) since qdiscs have
ownership of the skb in between, and at least netem *does* modify
the CB, breaking VLAN.

Thanks for pointing that out... It appears to me that qdisc/netem already breaks the vlan implementation, in the path 

vlan_dev_hwaccel_hard_start_xmit(): sets accelerated vlan tag in skb->cb, calls
dev_queue_xmit(): may pass skb to qdisc/netem, which may mangle skb->cb before calling
dev->hard_start_xmit(), resulting in a tx frame without its vlan tag.

So netem needs to look for hw accelerated vlan metadata and insert it in the skb... Don't see any other way around this. 

> [Patrick] Your suggestion of disabling VLAN acceleration in promiscuous
mode sounds like a reasonable solution until then ..

I was rather thinking of keeping hw vlan acceleration in promiscuous mode. Upon becoming promisc, the driver will be changed to disable vlan filters - it will reenable them when leaving promisc mode.

My 2 cents on vlan hw acceleration: it does not save much in computing cycles, if software is written carefully. It is vlan filtering that saves computing time.

> [Ben] I think a better method would be to allow disabling VLAN HW accel for a NIC with ethtool.

This requires changes to ethtool and e1000 driver, +other drivers. It is a handy thing to have. I don't view it as a solution to tcpdump - or to the vlan bridging problem. One concern: if we're switching hw accel mode on the fly, we need to carefully protect tx frames that are just about going out and have already been set up for the opposite mode.

Any comments on what is the expected behavior of 'tcpdump -i eth0.2' vs. 'tcpdump -i eth0'?

Andrei Radulescu-Banu
Brix Networks





       
____________________________________________________________________________________
Need a vacation? Get great deals
to amazing places on Yahoo! Travel.
http://travel.yahoo.com/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 15:00         ` Stephen Hemminger
@ 2007-07-19 15:45           ` Krzysztof Halasa
  0 siblings, 0 replies; 30+ messages in thread
From: Krzysztof Halasa @ 2007-07-19 15:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Patrick McHardy, andrei radulescu-banu, linux-kernel, Linux Netdev List

Stephen Hemminger <shemminger@linux-foundation.org> writes:

> Not at runtime, acceleration is always on if you compile kernel with vlan
> support.  That is a design mistake as far as I can tell.

I think so.

>> However seeing unknown tags on master device (with tcpdump etc)
>> would certainly be useful.
>
> Only in promiscuous mode. In some sense tag is part of the mac address.

Well, in "some sense" maybe, though the MAC address is rather
strictly defined to be a 6-octet value. I can live with
promiscous anyway, it's really minor issue.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 14:23       ` Krzysztof Halasa
  2007-07-19 15:00         ` Stephen Hemminger
@ 2007-07-19 15:20         ` Stephen Hemminger
  1 sibling, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2007-07-19 15:20 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Patrick McHardy, andrei radulescu-banu, linux-kernel, Linux Netdev List

On Thu, 19 Jul 2007 16:23:48 +0200
Krzysztof Halasa <khc@pm.waw.pl> wrote:

> Stephen Hemminger <shemminger@linux-foundation.org> writes:
> 
> > 1) non-accelerated device 
> >     * all frames show in promiscious mode
> >     * tag is part of the frame that shows up
> >        in tcpdump, and then gets stripped by the 8021q module.
> 
> Sure. It's IMHO good and working, modulo the tag being removed
> on the master device (optional cloning or something, IIRC).
> 
> > 2) rx tag stripping device
> >      * all frames show in promiscious mode
> >      * tag is in skb but NOT passed to tcpdump
> > 3) rx vlan acceleration
> >      * only frames that for vlan's that are registered show up
> >         in promisicous mode
> >      * tag is in skb but NOT passed to tcpdump
> 
> I wasn't aware of devices doing 3. Aren't we able to tell them
> to receive all packets anyway (even unknown VLANs#)?

See NETIF_F_HW_VLAN_FILTER (e1000, etc).

> > Unfortunately, the tag is lost as part of the VLAN acceleration process
> > so it is not a simple matter of changing code in AF_PACKET receive
> > to restore the tag.
> 
> I'm not sure if we really want it. If needed we can disable
> acceleration, can't we? While accelerated we can see the packets
> (without tags) on logical devices.

Not at runtime, acceleration is always on if you compile kernel with vlan
support.  That is a design mistake as far as I can tell.

> However seeing unknown tags on master device (with tcpdump etc)
> would certainly be useful.

Only in promiscuous mode. In some sense tag is part of the mac address.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 14:23       ` Krzysztof Halasa
@ 2007-07-19 15:00         ` Stephen Hemminger
  2007-07-19 15:45           ` Krzysztof Halasa
  2007-07-19 15:20         ` Stephen Hemminger
  1 sibling, 1 reply; 30+ messages in thread
From: Stephen Hemminger @ 2007-07-19 15:00 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Patrick McHardy, andrei radulescu-banu, linux-kernel, Linux Netdev List

On Thu, 19 Jul 2007 16:23:48 +0200
Krzysztof Halasa <khc@pm.waw.pl> wrote:

> Stephen Hemminger <shemminger@linux-foundation.org> writes:
> 
> > 1) non-accelerated device 
> >     * all frames show in promiscious mode
> >     * tag is part of the frame that shows up
> >        in tcpdump, and then gets stripped by the 8021q module.
> 
> Sure. It's IMHO good and working, modulo the tag being removed
> on the master device (optional cloning or something, IIRC).
> 
> > 2) rx tag stripping device
> >      * all frames show in promiscious mode
> >      * tag is in skb but NOT passed to tcpdump
> > 3) rx vlan acceleration
> >      * only frames that for vlan's that are registered show up
> >         in promisicous mode
> >      * tag is in skb but NOT passed to tcpdump
> 
> I wasn't aware of devices doing 3. Aren't we able to tell them
> to receive all packets anyway (even unknown VLANs#)?

See NETIF_F_HW_VLAN_FILTER (e1000, etc).

> > Unfortunately, the tag is lost as part of the VLAN acceleration process
> > so it is not a simple matter of changing code in AF_PACKET receive
> > to restore the tag.
> 
> I'm not sure if we really want it. If needed we can disable
> acceleration, can't we? While accelerated we can see the packets
> (without tags) on logical devices.

Not at runtime, acceleration is always on if you compile kernel with vlan
support.  That is a design mistake as far as I can tell.

> However seeing unknown tags on master device (with tcpdump etc)
> would certainly be useful.

Only in promiscuous mode. In some sense tag is part of the mac address.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 13:41     ` Stephen Hemminger
  2007-07-19 14:00       ` Patrick McHardy
@ 2007-07-19 14:23       ` Krzysztof Halasa
  2007-07-19 15:00         ` Stephen Hemminger
  2007-07-19 15:20         ` Stephen Hemminger
  1 sibling, 2 replies; 30+ messages in thread
From: Krzysztof Halasa @ 2007-07-19 14:23 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Patrick McHardy, andrei radulescu-banu, linux-kernel, Linux Netdev List

Stephen Hemminger <shemminger@linux-foundation.org> writes:

> 1) non-accelerated device 
>     * all frames show in promiscious mode
>     * tag is part of the frame that shows up
>        in tcpdump, and then gets stripped by the 8021q module.

Sure. It's IMHO good and working, modulo the tag being removed
on the master device (optional cloning or something, IIRC).

> 2) rx tag stripping device
>      * all frames show in promiscious mode
>      * tag is in skb but NOT passed to tcpdump
> 3) rx vlan acceleration
>      * only frames that for vlan's that are registered show up
>         in promisicous mode
>      * tag is in skb but NOT passed to tcpdump

I wasn't aware of devices doing 3. Aren't we able to tell them
to receive all packets anyway (even unknown VLANs#)?

> Unfortunately, the tag is lost as part of the VLAN acceleration process
> so it is not a simple matter of changing code in AF_PACKET receive
> to restore the tag.

I'm not sure if we really want it. If needed we can disable
acceleration, can't we? While accelerated we can see the packets
(without tags) on logical devices.

However seeing unknown tags on master device (with tcpdump etc)
would certainly be useful.
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 13:41     ` Stephen Hemminger
@ 2007-07-19 14:00       ` Patrick McHardy
  2007-07-19 14:23       ` Krzysztof Halasa
  1 sibling, 0 replies; 30+ messages in thread
From: Patrick McHardy @ 2007-07-19 14:00 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Krzysztof Halasa, andrei radulescu-banu, linux-kernel, Linux Netdev List

Stephen Hemminger wrote:
> On Thu, 19 Jul 2007 15:28:46 +0200
> Krzysztof Halasa <khc@pm.waw.pl> wrote:
> 
>>>Your suggestion of disabling VLAN acceleration in promiscous
>>>mode sounds like a reasonable solution until then ..
>>
>>From a user perspective:
>>
>>I'm not sure promiscous mode is related to the problem.
>>Tcpdump without promiscous mode makes perfect sense.


Good point.

>>I don't know very well VLAN code internals, but I think
>>the VLAN # is used for looking up the interface, so
>>presenting the "original" packet on the trunk device
>>would IMHO involve some skb cloning, and perhaps some
>>ethtool option could probably control that.
>>
>>Not sure about untagged frames vs. tagged frames with
>>the default VLAN id - can the hardware at all differentiate
>>between them?
>>
>>
>>Or, perhaps it should be left (almost) as is - with "software"
>>VLANs the traffic always goes through the master interface,
>>but with "accelerated" mode it only goes through logical
>>interfaces and doesn't show up on master? Probably with
>>exception of invalid VLANs, which could be injected back to
>>master (because no logical device exists)?


The last case is the problematic one, the tag might be gone.

> I don't claim to be a VLAN expert but there are really three cases
> for handling tagged frames
> 
> 1) non-accelerated device 
>     * all frames show in promiscious mode
>     * tag is part of the frame that shows up
>        in tcpdump, and then gets stripped by the 8021q module.
> 2) rx tag stripping device
>      * all frames show in promiscious mode
>      * tag is in skb but NOT passed to tcpdump
> 3) rx vlan acceleration
>      * only frames that for vlan's that are registered show up
>         in promisicous mode
>      * tag is in skb but NOT passed to tcpdump
> 
> Unfortunately, the tag is lost as part of the VLAN acceleration process
> so it is not a simple matter of changing code in AF_PACKET receive
> to restore the tag.


I think case 2) is not correct, the tag is stripped and is not in the
skb. Check out sky2 for example :)

        if (sky2->vlgrp && (status & GMR_FS_VLAN)) {
                vlan_hwaccel_receive_skb(skb,

                                         sky2->vlgrp,
                                         be16_to_cpu(sky2->rx_tag));


The tag it uses for the lookup comes from the descriptor. I don't
know any examples for case 3), but I would expect that the header
is also removed.

Anyway, I think what we should do is store the VLAN tag in the skb
meta data. That would not only allow tcpdump to reconstruct it, it
would also fix the invalid use of skb->cb on the TX path. It would
also fix the bridge eating VLAN headers case (bridge on eth0 + eth1,
additionally eth0.1 on eth0 using vlan RX accerlation with header
stripping) and would allow to simply forward the vlan tag to the
outgoing device in case it supports hardware accererated vlan tagging.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19 13:28   ` Krzysztof Halasa
@ 2007-07-19 13:41     ` Stephen Hemminger
  2007-07-19 14:00       ` Patrick McHardy
  2007-07-19 14:23       ` Krzysztof Halasa
  0 siblings, 2 replies; 30+ messages in thread
From: Stephen Hemminger @ 2007-07-19 13:41 UTC (permalink / raw)
  To: Krzysztof Halasa
  Cc: Patrick McHardy, andrei radulescu-banu, linux-kernel, Linux Netdev List

On Thu, 19 Jul 2007 15:28:46 +0200
Krzysztof Halasa <khc@pm.waw.pl> wrote:

> Patrick McHardy <kaber@trash.net> writes:
> 
> > Your suggestion of disabling VLAN acceleration in promiscous
> > mode sounds like a reasonable solution until then ..
> 
> From a user perspective:
> 
> I'm not sure promiscous mode is related to the problem.
> Tcpdump without promiscous mode makes perfect sense.
> 
> I don't know very well VLAN code internals, but I think
> the VLAN # is used for looking up the interface, so
> presenting the "original" packet on the trunk device
> would IMHO involve some skb cloning, and perhaps some
> ethtool option could probably control that.
> 
> Not sure about untagged frames vs. tagged frames with
> the default VLAN id - can the hardware at all differentiate
> between them?
> 
> 
> Or, perhaps it should be left (almost) as is - with "software"
> VLANs the traffic always goes through the master interface,
> but with "accelerated" mode it only goes through logical
> interfaces and doesn't show up on master? Probably with
> exception of invalid VLANs, which could be injected back to
> master (because no logical device exists)?


I don't claim to be a VLAN expert but there are really three cases
for handling tagged frames

1) non-accelerated device 
    * all frames show in promiscious mode
    * tag is part of the frame that shows up
       in tcpdump, and then gets stripped by the 8021q module.
2) rx tag stripping device
     * all frames show in promiscious mode
     * tag is in skb but NOT passed to tcpdump
3) rx vlan acceleration
     * only frames that for vlan's that are registered show up
        in promisicous mode
     * tag is in skb but NOT passed to tcpdump

Unfortunately, the tag is lost as part of the VLAN acceleration process
so it is not a simple matter of changing code in AF_PACKET receive
to restore the tag.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-18 22:57 ` Patrick McHardy
  2007-07-18 23:22   ` Ben Greear
@ 2007-07-19 13:28   ` Krzysztof Halasa
  2007-07-19 13:41     ` Stephen Hemminger
  1 sibling, 1 reply; 30+ messages in thread
From: Krzysztof Halasa @ 2007-07-19 13:28 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: andrei radulescu-banu, linux-kernel, Linux Netdev List

Patrick McHardy <kaber@trash.net> writes:

> Your suggestion of disabling VLAN acceleration in promiscous
> mode sounds like a reasonable solution until then ..

>From a user perspective:

I'm not sure promiscous mode is related to the problem.
Tcpdump without promiscous mode makes perfect sense.

I don't know very well VLAN code internals, but I think
the VLAN # is used for looking up the interface, so
presenting the "original" packet on the trunk device
would IMHO involve some skb cloning, and perhaps some
ethtool option could probably control that.

Not sure about untagged frames vs. tagged frames with
the default VLAN id - can the hardware at all differentiate
between them?


Or, perhaps it should be left (almost) as is - with "software"
VLANs the traffic always goes through the master interface,
but with "accelerated" mode it only goes through logical
interfaces and doesn't show up on master? Probably with
exception of invalid VLANs, which could be injected back to
master (because no logical device exists)?
-- 
Krzysztof Halasa

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-19  0:01       ` Ben Greear
@ 2007-07-19  0:19         ` Patrick McHardy
  0 siblings, 0 replies; 30+ messages in thread
From: Patrick McHardy @ 2007-07-19  0:19 UTC (permalink / raw)
  To: Ben Greear; +Cc: andrei radulescu-banu, linux-kernel, Linux Netdev List

Ben Greear wrote:
> Patrick McHardy wrote:
> 
>> Its actually more a problem on the RX path. VLAN acceleration
>> works (at least with some drivers) by enabling HW header striping
>> and using the VLAN ID for an immediate lookup in the VLAN devices
>> configured on that device. So if the VLAN is not configured on the
>> real device but something like macvlan, it will get the packet
>> without a header and without any indication that this was a VLAN
>> packet. This is also what causes the tcpdump problem.
>>   
> 
> This reminded me of something:
> 
> If we are using VLAN HW-Accel, then the skb hits the mac-vlan check with
> the skb->dev == vlan-device.
> So, in this case, we can put mac-vlans on top of 802.1Q VLANs.
> 
> But, if we are not using VLAN hw-accel, the skb hits the mac-vlan check
> with skb->dev == ethernet-device.
> In this case, we could NOT have the mac-vlan on top of the 802.1Q VLAN,
> but we can have a MAC-VLAN
> on the raw ethernet and we could add 802.1Q vlans on top of the
> mac-vlan.  This is because the
> .1Q vlan will only be found once we go into the protocol handler logic,
> which is necessarily after the
> MAC-VLAN check logic.
> 
> Unless I am confused in my conjecture above, this is likely to confuse
> others who try to mix and
> match MAC-VLANs and 802.1Q VLANs.


The current code doesn't use hardware acceleration and works fine
in all combinations where only vlan *or* macvlan devices are used
on the underlying device.

If you mix them macvlan won't get to see vlan headers anymore,
same as for tcpdump, bridge devices, or anything else that
might care. A bridge eating VLAN headers should be a clearer
indication of a bug than an inaccurate tcpdump ..

The real problem is that the device removes the header for all
vlans, not only for those that are configured on the device.
This is a result of how the hardware works. But since we don't
have the data available later, we can't even fix it up in
software.


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-18 23:34     ` Patrick McHardy
@ 2007-07-19  0:01       ` Ben Greear
  2007-07-19  0:19         ` Patrick McHardy
  0 siblings, 1 reply; 30+ messages in thread
From: Ben Greear @ 2007-07-19  0:01 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: andrei radulescu-banu, linux-kernel, Linux Netdev List

Patrick McHardy wrote:
> Ben Greear wrote:
>   
>> Patrick McHardy wrote:
>>
>>     
>>> Put another way, once you enable VLAN header stripping, you
>>> won't see the headers for *any* VLAN, not only for those you're
>>> actually running locally. This is also a problem for devices
>>> like macvlan, where it would be desirable to make use of
>>> hardware VLAN accerlation. I was thinking about storing the
>>> information somewhere in the packets meta-data on both RX and
>>> TX paths, that would also allow tcpdump to properly display
>>> packets.
>>>   
>>>       
>> MAC-VLAN could gather this information based on it's parent
>> device (ie, if parent-dev has VID 7, then add VID 7 to the meta
>> data.  There would be no need for any driver changes I think.
>>     
>
>
> Its actually more a problem on the RX path. VLAN acceleration
> works (at least with some drivers) by enabling HW header striping
> and using the VLAN ID for an immediate lookup in the VLAN devices
> configured on that device. So if the VLAN is not configured on the
> real device but something like macvlan, it will get the packet
> without a header and without any indication that this was a VLAN
> packet. This is also what causes the tcpdump problem.
>   
This reminded me of something:

If we are using VLAN HW-Accel, then the skb hits the mac-vlan check with 
the skb->dev == vlan-device.
So, in this case, we can put mac-vlans on top of 802.1Q VLANs.

But, if we are not using VLAN hw-accel, the skb hits the mac-vlan check 
with skb->dev == ethernet-device.
In this case, we could NOT have the mac-vlan on top of the 802.1Q VLAN, 
but we can have a MAC-VLAN
on the raw ethernet and we could add 802.1Q vlans on top of the 
mac-vlan.  This is because the
.1Q vlan will only be found once we go into the protocol handler logic, 
which is necessarily after the
MAC-VLAN check logic.

Unless I am confused in my conjecture above, this is likely to confuse 
others who try to mix and
match MAC-VLANs and 802.1Q VLANs.

>>> I have planned to look into this when I find some time.
>>> Your suggestion of disabling VLAN acceleration in promiscous
>>> mode sounds like a reasonable solution until then ..
>>>   
>>>       
>> I think a better method would be to allow disabling VLAN HW accel for a
>> NIC with
>> ethtool.  Then, the packets will be received by the software stack with
>> the vlan
>> header intact.  Something sniffing on the physical dev will
>> automatically get the
>> VLAN header.
>>     
>
>
> That would also be fine. But considering that the TX path is
> problematic too, a clean solution for all of this would be
> to store the VLAN id in the skb. And we do have some holes
> to plug currently :)
>   

With VLAN HW accel disabled, the skb will have the VLAN header in it by 
the time it
hits the ethX interface, so sniffing there should still show the 
header.  It won't show
when sniffing the VLAN device, but I think that is OK.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-18 23:22   ` Ben Greear
@ 2007-07-18 23:34     ` Patrick McHardy
  2007-07-19  0:01       ` Ben Greear
  0 siblings, 1 reply; 30+ messages in thread
From: Patrick McHardy @ 2007-07-18 23:34 UTC (permalink / raw)
  To: Ben Greear; +Cc: andrei radulescu-banu, linux-kernel, Linux Netdev List

Ben Greear wrote:
> Patrick McHardy wrote:
> 
>> Put another way, once you enable VLAN header stripping, you
>> won't see the headers for *any* VLAN, not only for those you're
>> actually running locally. This is also a problem for devices
>> like macvlan, where it would be desirable to make use of
>> hardware VLAN accerlation. I was thinking about storing the
>> information somewhere in the packets meta-data on both RX and
>> TX paths, that would also allow tcpdump to properly display
>> packets.
>>   
> 
> MAC-VLAN could gather this information based on it's parent
> device (ie, if parent-dev has VID 7, then add VID 7 to the meta
> data.  There would be no need for any driver changes I think.


Its actually more a problem on the RX path. VLAN acceleration
works (at least with some drivers) by enabling HW header striping
and using the VLAN ID for an immediate lookup in the VLAN devices
configured on that device. So if the VLAN is not configured on the
real device but something like macvlan, it will get the packet
without a header and without any indication that this was a VLAN
packet. This is also what causes the tcpdump problem.

On the TX path, it could simply use the CB, but this is actually
also wrong (for both macvlan and real devices) since qdiscs have
ownership of the skb in between, and at least netem *does* modify
the CB, breaking VLAN.

> Other than TCP-dump, or some other raw protocol that wants to see
> the VLAN header in user-space, I can't think of what use this would
> be, however.  And, if you just disable VLAN accel in the NIC (see below),
> that would make this mac-vlan hackery not needed at all?


Optimizations for macvlan are not too important, I agree. But for
tcpdump I consider it a bug.

>> I have planned to look into this when I find some time.
>> Your suggestion of disabling VLAN acceleration in promiscous
>> mode sounds like a reasonable solution until then ..
>>   
> 
> I think a better method would be to allow disabling VLAN HW accel for a
> NIC with
> ethtool.  Then, the packets will be received by the software stack with
> the vlan
> header intact.  Something sniffing on the physical dev will
> automatically get the
> VLAN header.


That would also be fine. But considering that the TX path is
problematic too, a clean solution for all of this would be
to store the VLAN id in the skb. And we do have some holes
to plug currently :)


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-18 22:57 ` Patrick McHardy
@ 2007-07-18 23:22   ` Ben Greear
  2007-07-18 23:34     ` Patrick McHardy
  2007-07-19 13:28   ` Krzysztof Halasa
  1 sibling, 1 reply; 30+ messages in thread
From: Ben Greear @ 2007-07-18 23:22 UTC (permalink / raw)
  To: Patrick McHardy; +Cc: andrei radulescu-banu, linux-kernel, Linux Netdev List

Patrick McHardy wrote:
> andrei radulescu-banu wrote:
>   
>> [...]
>>
>> In conclusion, here is the buglist:  
>> 1). If set promiscuous, the e1000 should disable any vlan rx filtering, so that it can receive vlan frames of other vlan id's. Other ethernet drivers probably need fixed as well.
>>   2). The packet layer should change the rx skb device from the vlan 'fake' device (eth0.2) to the corresponding physical device (eth0), so when we run tcpdump on eth0 we see all vlan-tagged and non-vlan-tagged frames
>>   3). The packet socket layer should insert the vlan tag header before passing frames to the upper layer, so tcpdump can display them.
>>     
> Put another way, once you enable VLAN header stripping, you
> won't see the headers for *any* VLAN, not only for those you're
> actually running locally. This is also a problem for devices
> like macvlan, where it would be desirable to make use of
> hardware VLAN accerlation. I was thinking about storing the
> information somewhere in the packets meta-data on both RX and
> TX paths, that would also allow tcpdump to properly display
> packets.
>   
MAC-VLAN could gather this information based on it's parent
device (ie, if parent-dev has VID 7, then add VID 7 to the meta
data.  There would be no need for any driver changes I think.

Other than TCP-dump, or some other raw protocol that wants to see
the VLAN header in user-space, I can't think of what use this would
be, however.  And, if you just disable VLAN accel in the NIC (see below),
that would make this mac-vlan hackery not needed at all?
> I have planned to look into this when I find some time.
> Your suggestion of disabling VLAN acceleration in promiscous
> mode sounds like a reasonable solution until then ..
>   
I think a better method would be to allow disabling VLAN HW accel for a 
NIC with
ethtool.  Then, the packets will be received by the software stack with 
the vlan
header intact.  Something sniffing on the physical dev will 
automatically get the
VLAN header.

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com> 
Candela Technologies Inc  http://www.candelatech.com



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Linux, tcpdump and vlan
  2007-07-18 19:34 andrei radulescu-banu
@ 2007-07-18 22:57 ` Patrick McHardy
  2007-07-18 23:22   ` Ben Greear
  2007-07-19 13:28   ` Krzysztof Halasa
  2007-07-20 10:50 ` Florian Lohoff
  1 sibling, 2 replies; 30+ messages in thread
From: Patrick McHardy @ 2007-07-18 22:57 UTC (permalink / raw)
  To: andrei radulescu-banu; +Cc: linux-kernel, Linux Netdev List

andrei radulescu-banu wrote:
> [...]
> 
> In conclusion, here is the buglist:  
> 1). If set promiscuous, the e1000 should disable any vlan rx filtering, so that it can receive vlan frames of other vlan id's. Other ethernet drivers probably need fixed as well.
>   2). The packet layer should change the rx skb device from the vlan 'fake' device (eth0.2) to the corresponding physical device (eth0), so when we run tcpdump on eth0 we see all vlan-tagged and non-vlan-tagged frames
>   3). The packet socket layer should insert the vlan tag header before passing frames to the upper layer, so tcpdump can display them.


Put another way, once you enable VLAN header stripping, you
won't see the headers for *any* VLAN, not only for those you're
actually running locally. This is also a problem for devices
like macvlan, where it would be desirable to make use of
hardware VLAN accerlation. I was thinking about storing the
information somewhere in the packets meta-data on both RX and
TX paths, that would also allow tcpdump to properly display
packets.

I have planned to look into this when I find some time.
Your suggestion of disabling VLAN acceleration in promiscous
mode sounds like a reasonable solution until then ..


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Linux, tcpdump and vlan
@ 2007-07-18 19:34 andrei radulescu-banu
  2007-07-18 22:57 ` Patrick McHardy
  2007-07-20 10:50 ` Florian Lohoff
  0 siblings, 2 replies; 30+ messages in thread
From: andrei radulescu-banu @ 2007-07-18 19:34 UTC (permalink / raw)
  To: linux-kernel

Dear kernel networking gurus, 
  

I am trying to understand why tcpdump does not work properly for vlan packets on linux. Here is the existing behavior, observed with:
  - kernel 2.6.16,   
- e1000 driver  
- libpcap 0.9.6  
- tcpdump 3.9.6 
  

The e1000 driver has two modes when handling vlan frames:  
(A) Default mode, when   
- on rx, the mac includes vlan headers   
- on tx, the mac expects tx frames to include vlan headers.   
(B) Vlan hw accelerated mode, when:  
- on rx, the mac does not include vlan headers, and instead passes vlan tag information in the status field of the ring buffer
  - on tx, the mac expects no vlan headers, and instead expects vlan tag information to be passed in the status field of the ring buffer
  

If no vlan interfaces are used, the e1000 driver configures the mac in default mode (A). The system will only receive vlan traffic, and not transmit any. Tcpdump then gets the entire rx vlan buffers, and displays them correctly.
  

Suppose now that at least one vlan interface is used - say eth0 is the main physical interface, and eth0.2 is an interface created on vlan id 2. The e1000 driver then switches to vlan hw accelerated mode (B). Furthermore, even if set promiscuous, the e1000 will filter out any rx vlan frames of id other than 2, which breaks tcpdump (bug 1).
  

Suppose in our scenario with eth0 and eth0.2 we're running tcpdump on eth0 - which uses a packet socket. The rx vlan frames with id 2 are then assigned by the driver to eth0.2, and are therefore not passed to the packet socket and to tcpdump (bug 2). The tx vlan frames on eth0.2 are passed to the packet socket without any vlan information, and tcpdump does not display the vlan header (bug 3)
  

In conclusion, here is the buglist:  
1). If set promiscuous, the e1000 should disable any vlan rx filtering, so that it can receive vlan frames of other vlan id's. Other ethernet drivers probably need fixed as well.
  2). The packet layer should change the rx skb device from the vlan 'fake' device (eth0.2) to the corresponding physical device (eth0), so when we run tcpdump on eth0 we see all vlan-tagged and non-vlan-tagged frames
  3). The packet socket layer should insert the vlan tag header before passing frames to the upper layer, so tcpdump can display them.
  

Open issue:  
4). What is the expected behavior when running 'tcpdump -I eth0.2'? Perhaps the packet socket should silently display all frames on eth0, so running 'tcpdump -i eth0' is equivalent to 'tcpdump -i eth0.2'

  

Thoughts? Comments? Please cc iubica2@yahoo.com, I am not subscribed. 
  

Andrei Radulescu-Banu  
Brix Networks 





      ____________________________________________________________________________________
Luggage? GPS? Comic books? 
Check out fitting gifts for grads at Yahoo! Search
http://search.yahoo.com/search?fr=oni_on_mail&p=graduation+gifts&cs=bz

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2007-07-21 21:15 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-07-19 16:02 Linux, tcpdump and vlan andrei radulescu-banu
2007-07-20 19:58 ` Krzysztof Halasa
2007-07-20 20:34   ` Ben Greear
2007-07-21 11:32     ` Krzysztof Halasa
2007-07-21 17:57       ` Ben Greear
2007-07-21 21:15         ` Krzysztof Halasa
  -- strict thread matches above, loose matches on Subject: below --
2007-07-19 21:38 andrei radulescu-banu
2007-07-19 23:38 ` Ben Greear
2007-07-20 20:19   ` Krzysztof Halasa
2007-07-19 18:20 andrei radulescu-banu
2007-07-19 19:28 ` Stephen Hemminger
2007-07-19 17:46 andrei radulescu-banu
2007-07-19 15:47 andrei radulescu-banu
2007-07-19 16:21 ` Stephen Hemminger
2007-07-19 16:33 ` Patrick McHardy
2007-07-19 16:47 ` Ben Greear
2007-07-18 19:34 andrei radulescu-banu
2007-07-18 22:57 ` Patrick McHardy
2007-07-18 23:22   ` Ben Greear
2007-07-18 23:34     ` Patrick McHardy
2007-07-19  0:01       ` Ben Greear
2007-07-19  0:19         ` Patrick McHardy
2007-07-19 13:28   ` Krzysztof Halasa
2007-07-19 13:41     ` Stephen Hemminger
2007-07-19 14:00       ` Patrick McHardy
2007-07-19 14:23       ` Krzysztof Halasa
2007-07-19 15:00         ` Stephen Hemminger
2007-07-19 15:45           ` Krzysztof Halasa
2007-07-19 15:20         ` Stephen Hemminger
2007-07-20 10:50 ` Florian Lohoff

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).