All of lore.kernel.org
 help / color / mirror / Atom feed
* macvlan devices and vlan interaction
@ 2018-01-29 23:01 Keller, Jacob E
  2018-01-30  1:53 ` Yuan, Linyu (NSB - CN/Shanghai)
  2018-01-30 20:29 ` Shannon Nelson
  0 siblings, 2 replies; 9+ messages in thread
From: Keller, Jacob E @ 2018-01-29 23:01 UTC (permalink / raw)
  To: netdev; +Cc: Duyck, Alexander H

Hi,

I'm currently investigating how macvlan devices behave in regards to vlan support, and found some interesting behavior that I am not sure how best to correct, or what the right path forward is.

If I create a macvlan device:

ip link add link ens0 name macvlan0 type macvlan:

and then add a VLAN to it:

ip link add link macvlan0 name vlan10 type vlan id 10

This works to pass VLAN 10 traffic over the macvlan device. This seems like expected behavior.

However, if I then also add vlan 10 to the lowerdev:

ip link add link ens0 name lowervlan10  type vlan id 10

Then traffic stops flowing to the VLAN on the macvlan device.

This happens, as far as I can tell, because of how the VLAN traffic is filtered first, and then forwarded to the VLAN device, which doesn't know about how the macvlan device exists.

It seems, essentially, that vlan stacked on top of a macvlan shouldn't work. Because the vlan code basically expects each vlan to apply to every MAC address, and the macvlan device works by putting its MAC address into the unicast address list, there's no way for a device driver to know when or how to apply the vlan.

This gets a bit more confusing when we add in the l2 fwd hardware offload.

Currently, at least for the Intel network parts, this isn't supported, because of a bug in which the device drivers don't apply the VLANs to the macvlan accelerated addresses. If we fix this, at least for fm10k, the behavior is slightly better, because of how the hardware filtering at the MAC address happens first, and we direct the traffic to the proper device regardless of VLAN.

In addition to this peculiarity of VLANs on both the macvlan and lowerdev, is that when a macvlan device adds a VLAN, the lowerdev gets an indication to add the vlan via its .ndo_vlan_rx_add_vid(), which doesn't distinguish between which addresses the VLAN might apply to. It thus simply, depending on hardware design, enables the VLAN for all its unicast and multicast addresses. Some hardware could theoretically support MAC+VLAN pairs, where it could distinguish that a VLAN should only be added for some subset of addresses. Other hardware might not be so lucky..

Unfortunately, this has the weird consequence that if we have the following stack of devices:

vlan10@macvlan0
macvlan0@ens0
ens0

Then ens0 will receive VLAN10 traffic on every address. So VLAN 10 traffic destined to the MAC of the lowerdev will be received, instead of dropped.

If we add VLAN 10 to the lowerdev so we have both the above stack and also

lowervlan10@ens0
ens0 (mac gg:hh:ii:jj:kk)

then all vlan 10 traffic will be received on the lowerdev VLAN 10, without any being forwarded to the VLAN10 attached to the macvlan.

However, if we add two macvlans, and each add the vlan10, so we have the following:

avlan10@macvlan0
macvlan0@ens0
ens0

bvlan10@macvlan1
macvlan1@ens0
ens0

In this case, it does appear that traffic is sorted out correctly. It seems that only if the lowerdev gets the VLAN does it end up breaking. If I remove bvlan10 from macvlan1, the traffic associated with vlan10 is still received by macvlan1, even though in principle it should no longer be.

What is the correct behavior here? Should this just be "administrators should know better"? I don't think that's a great argument, and either way we're still essentially leaking VLANs across the macvlan interfaces, which I don't think is ideal.

I see two possible solutions:

1) modify macvlan driver so that it is marked as VLAN_CHALLENGED, and thus indicate it cannot handle VLAN traffic on top of it.
  a. In order to get the VLANs associated, administrator could instead add the VLAN first, and then add the macvlan on top. This I think is a better configuration.
  b. that doesn't work in the offload case, unless/until we fix the VLAN interface to forward the l2_dfwd_add_station() along with a vid.
  c. this could appear as loss of functionality, since in some cases these VLAN on top of macvlan work today (with the interesting caveats listed above).

2) modify how VLANs interact with MAC addresses, so that the lowerdev can explicitly be aware of which VLANs are tied to which address groups, in order to allow for the explicit configuration of which MAC+VLAN pairs are actually allowed.
  a. this is a much more invasive change to driver interface, and more difficult to get right
  b. possibly other configurations of stacked devices might have a similar problem, so we could solve more here? Or create more problems.. I'm not really certain.


I think the correct solution is (1) but I wasn't sure what others thought, and whether anyone else has encountered the problems I mention and outline above. I cc'd Alex who I discussed with offline when I first heard of and began investigating this, in case he has anything further to add.

Regards,
Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: macvlan devices and vlan interaction
  2018-01-29 23:01 macvlan devices and vlan interaction Keller, Jacob E
@ 2018-01-30  1:53 ` Yuan, Linyu (NSB - CN/Shanghai)
  2018-01-30 22:23   ` Keller, Jacob E
  2018-01-30 20:29 ` Shannon Nelson
  1 sibling, 1 reply; 9+ messages in thread
From: Yuan, Linyu (NSB - CN/Shanghai) @ 2018-01-30  1:53 UTC (permalink / raw)
  To: Keller, Jacob E, netdev; +Cc: Duyck, Alexander H

https://www.spinics.net/lists/netdev/msg476083.html

I also have a macvlan device question, but get no answer.

But my original thought is in __netif_receive_skb_core() we should check packet destination mac address,
if it match macvlan device, change packet as receive from macvlan device, not lower device, then packet go to upper layer.

But I don't know how to process broadcast mac address. Do macvlan device can receive broadcast packet ?

> -----Original Message-----
> From: netdev-owner@vger.kernel.org [mailto:netdev-owner@vger.kernel.org]
> On Behalf Of Keller, Jacob E
> Sent: Tuesday, January 30, 2018 7:02 AM
> To: netdev@vger.kernel.org
> Cc: Duyck, Alexander H
> Subject: macvlan devices and vlan interaction
> 
> Hi,
> 
> I'm currently investigating how macvlan devices behave in regards to vlan
> support, and found some interesting behavior that I am not sure how best to
> correct, or what the right path forward is.
> 
> If I create a macvlan device:
> 
> ip link add link ens0 name macvlan0 type macvlan:
> 
> and then add a VLAN to it:
> 
> ip link add link macvlan0 name vlan10 type vlan id 10
> 
> This works to pass VLAN 10 traffic over the macvlan device. This seems like
> expected behavior.
> 
> However, if I then also add vlan 10 to the lowerdev:
> 
> ip link add link ens0 name lowervlan10  type vlan id 10
> 
> Then traffic stops flowing to the VLAN on the macvlan device.
> 
> This happens, as far as I can tell, because of how the VLAN traffic is filtered
> first, and then forwarded to the VLAN device, which doesn't know about how
> the macvlan device exists.
> 
> It seems, essentially, that vlan stacked on top of a macvlan shouldn't work.
> Because the vlan code basically expects each vlan to apply to every MAC
> address, and the macvlan device works by putting its MAC address into the
> unicast address list, there's no way for a device driver to know when or how to
> apply the vlan.
> 
> This gets a bit more confusing when we add in the l2 fwd hardware offload.
> 
> Currently, at least for the Intel network parts, this isn't supported, because of a
> bug in which the device drivers don't apply the VLANs to the macvlan
> accelerated addresses. If we fix this, at least for fm10k, the behavior is slightly
> better, because of how the hardware filtering at the MAC address happens
> first, and we direct the traffic to the proper device regardless of VLAN.
> 
> In addition to this peculiarity of VLANs on both the macvlan and lowerdev, is
> that when a macvlan device adds a VLAN, the lowerdev gets an indication to
> add the vlan via its .ndo_vlan_rx_add_vid(), which doesn't distinguish between
> which addresses the VLAN might apply to. It thus simply, depending on
> hardware design, enables the VLAN for all its unicast and multicast addresses.
> Some hardware could theoretically support MAC+VLAN pairs, where it could
> distinguish that a VLAN should only be added for some subset of addresses.
> Other hardware might not be so lucky..
> 
> Unfortunately, this has the weird consequence that if we have the following
> stack of devices:
> 
> vlan10@macvlan0
> macvlan0@ens0
> ens0
> 
> Then ens0 will receive VLAN10 traffic on every address. So VLAN 10 traffic
> destined to the MAC of the lowerdev will be received, instead of dropped.
> 
> If we add VLAN 10 to the lowerdev so we have both the above stack and also
> 
> lowervlan10@ens0
> ens0 (mac gg:hh:ii:jj:kk)
> 
> then all vlan 10 traffic will be received on the lowerdev VLAN 10, without any
> being forwarded to the VLAN10 attached to the macvlan.
> 
> However, if we add two macvlans, and each add the vlan10, so we have the
> following:
> 
> avlan10@macvlan0
> macvlan0@ens0
> ens0
> 
> bvlan10@macvlan1
> macvlan1@ens0
> ens0
> 
> In this case, it does appear that traffic is sorted out correctly. It seems that
> only if the lowerdev gets the VLAN does it end up breaking. If I remove bvlan10
> from macvlan1, the traffic associated with vlan10 is still received by macvlan1,
> even though in principle it should no longer be.
> 
> What is the correct behavior here? Should this just be "administrators should
> know better"? I don't think that's a great argument, and either way we're still
> essentially leaking VLANs across the macvlan interfaces, which I don't think is
> ideal.
> 
> I see two possible solutions:
> 
> 1) modify macvlan driver so that it is marked as VLAN_CHALLENGED, and thus
> indicate it cannot handle VLAN traffic on top of it.
>   a. In order to get the VLANs associated, administrator could instead add the
> VLAN first, and then add the macvlan on top. This I think is a better
> configuration.
>   b. that doesn't work in the offload case, unless/until we fix the VLAN
> interface to forward the l2_dfwd_add_station() along with a vid.
>   c. this could appear as loss of functionality, since in some cases these VLAN
> on top of macvlan work today (with the interesting caveats listed above).
> 
> 2) modify how VLANs interact with MAC addresses, so that the lowerdev can
> explicitly be aware of which VLANs are tied to which address groups, in order to
> allow for the explicit configuration of which MAC+VLAN pairs are actually
> allowed.
>   a. this is a much more invasive change to driver interface, and more difficult
> to get right
>   b. possibly other configurations of stacked devices might have a similar
> problem, so we could solve more here? Or create more problems.. I'm not
> really certain.
> 
> 
> I think the correct solution is (1) but I wasn't sure what others thought, and
> whether anyone else has encountered the problems I mention and outline
> above. I cc'd Alex who I discussed with offline when I first heard of and began
> investigating this, in case he has anything further to add.
> 
> Regards,
> Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: macvlan devices and vlan interaction
  2018-01-29 23:01 macvlan devices and vlan interaction Keller, Jacob E
  2018-01-30  1:53 ` Yuan, Linyu (NSB - CN/Shanghai)
@ 2018-01-30 20:29 ` Shannon Nelson
  2018-01-30 20:49   ` Alexander Duyck
  2018-01-30 22:14   ` Keller, Jacob E
  1 sibling, 2 replies; 9+ messages in thread
From: Shannon Nelson @ 2018-01-30 20:29 UTC (permalink / raw)
  To: Keller, Jacob E, netdev; +Cc: Duyck, Alexander H

On 1/29/2018 3:01 PM, Keller, Jacob E wrote:
> Hi,
> 
> I'm currently investigating how macvlan devices behave in regards to vlan support, and found some interesting behavior that I am not sure how best to correct, or what the right path forward is.
> 
> If I create a macvlan device:
> 
> ip link add link ens0 name macvlan0 type macvlan:
> 
> and then add a VLAN to it:
> 
> ip link add link macvlan0 name vlan10 type vlan id 10
> 
> This works to pass VLAN 10 traffic over the macvlan device. This seems like expected behavior.
> 
> However, if I then also add vlan 10 to the lowerdev:
> 
> ip link add link ens0 name lowervlan10  type vlan id 10
> 
> Then traffic stops flowing to the VLAN on the macvlan device.
> 
> This happens, as far as I can tell, because of how the VLAN traffic is filtered first, and then forwarded to the VLAN device, which doesn't know about how the macvlan device exists.
> 
> It seems, essentially, that vlan stacked on top of a macvlan shouldn't work. Because the vlan code basically expects each vlan to apply to every MAC address, and the macvlan device works by putting its MAC address into the unicast address list, there's no way for a device driver to know when or how to apply the vlan.
> 
> This gets a bit more confusing when we add in the l2 fwd hardware offload.
> 
> Currently, at least for the Intel network parts, this isn't supported, because of a bug in which the device drivers don't apply the VLANs to the macvlan accelerated addresses. If we fix this, at least for fm10k, the behavior is slightly better, because of how the hardware filtering at the MAC address happens first, and we direct the traffic to the proper device regardless of VLAN.
> 
> In addition to this peculiarity of VLANs on both the macvlan and lowerdev, is that when a macvlan device adds a VLAN, the lowerdev gets an indication to add the vlan via its .ndo_vlan_rx_add_vid(), which doesn't distinguish between which addresses the VLAN might apply to. It thus simply, depending on hardware design, enables the VLAN for all its unicast and multicast addresses. Some hardware could theoretically support MAC+VLAN pairs, where it could distinguish that a VLAN should only be added for some subset of addresses. Other hardware might not be so lucky..
> 
> Unfortunately, this has the weird consequence that if we have the following stack of devices:
> 
> vlan10@macvlan0
> macvlan0@ens0
> ens0
> 
> Then ens0 will receive VLAN10 traffic on every address. So VLAN 10 traffic destined to the MAC of the lowerdev will be received, instead of dropped.
> 
> If we add VLAN 10 to the lowerdev so we have both the above stack and also
> 
> lowervlan10@ens0
> ens0 (mac gg:hh:ii:jj:kk)
> 
> then all vlan 10 traffic will be received on the lowerdev VLAN 10, without any being forwarded to the VLAN10 attached to the macvlan.
> 
> However, if we add two macvlans, and each add the vlan10, so we have the following:
> 
> avlan10@macvlan0
> macvlan0@ens0
> ens0
> 
> bvlan10@macvlan1
> macvlan1@ens0
> ens0
> 
> In this case, it does appear that traffic is sorted out correctly. It seems that only if the lowerdev gets the VLAN does it end up breaking. If I remove bvlan10 from macvlan1, the traffic associated with vlan10 is still received by macvlan1, even though in principle it should no longer be.
> 
> What is the correct behavior here? Should this just be "administrators should know better"? I don't think that's a great argument, and either way we're still essentially leaking VLANs across the macvlan interfaces, which I don't think is ideal.
> 
> I see two possible solutions:
> 
> 1) modify macvlan driver so that it is marked as VLAN_CHALLENGED, and thus indicate it cannot handle VLAN traffic on top of it.
>    a. In order to get the VLANs associated, administrator could instead add the VLAN first, and then add the macvlan on top. This I think is a better configuration.
>    b. that doesn't work in the offload case, unless/until we fix the VLAN interface to forward the l2_dfwd_add_station() along with a vid.
>    c. this could appear as loss of functionality, since in some cases these VLAN on top of macvlan work today (with the interesting caveats listed above).
> 
> 2) modify how VLANs interact with MAC addresses, so that the lowerdev can explicitly be aware of which VLANs are tied to which address groups, in order to allow for the explicit configuration of which MAC+VLAN pairs are actually allowed.
>    a. this is a much more invasive change to driver interface, and more difficult to get right
>    b. possibly other configurations of stacked devices might have a similar problem, so we could solve more here? Or create more problems.. I'm not really certain.
> 
> 
> I think the correct solution is (1) but I wasn't sure what others thought, and whether anyone else has encountered the problems I mention and outline above. I cc'd Alex who I discussed with offline when I first heard of and began investigating this, in case he has anything further to add.
> 
> Regards,
> Jake
> 

Hi Jake,

The current behavior seems logical to me, but I suppose Alex might argue 
differently.  The macvlan was put onto the default lowerdev assuming the 
lowerdev will hand it all the default traffic, and then the macvlan 
splits out its own vlan traffic.  As soon as the lowerdev assumption 
changes, it is going to change what gets pushed up to the macvlan dev. 
If the lowerdev is separating the vlan traffic out of the "default" flow 
headed to the macvlan, then the initial assumption has changed and the 
vlan traffic has been vectored off before it can be delivered up the 
stack to the macvlan.

There's an argument that the lowerdev shouldn't know anything about the 
upperdev's routing, just deliver to the upperdev and let the upperdev 
worry about it.  But perhaps this becomes is a question of precedence: 
does the lowerdev split traffic first by mac address or by vlan tag.

I don't like your option 1: as you point out, it breaks current 
functionality, likely depended upon in some containers that are using 
macvlans to manage their traffic.  We don't know what's going on inside 
that container and I don't think we want to break its ability to split 
its own vlans.

Like I said, I think the current behavior is mostly correct, but a 
version of option 2 might be good to help support offload of the 
mac+vlan pair into a macvlan channel.

sln

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: macvlan devices and vlan interaction
  2018-01-30 20:29 ` Shannon Nelson
@ 2018-01-30 20:49   ` Alexander Duyck
  2018-01-30 22:20     ` Keller, Jacob E
  2018-01-30 22:14   ` Keller, Jacob E
  1 sibling, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2018-01-30 20:49 UTC (permalink / raw)
  To: Shannon Nelson; +Cc: Keller, Jacob E, netdev, Duyck, Alexander H

On Tue, Jan 30, 2018 at 12:29 PM, Shannon Nelson
<shannon.nelson@oracle.com> wrote:
> On 1/29/2018 3:01 PM, Keller, Jacob E wrote:
>>
>> Hi,
>>
>> I'm currently investigating how macvlan devices behave in regards to vlan
>> support, and found some interesting behavior that I am not sure how best to
>> correct, or what the right path forward is.
>>
>> If I create a macvlan device:
>>
>> ip link add link ens0 name macvlan0 type macvlan:
>>
>> and then add a VLAN to it:
>>
>> ip link add link macvlan0 name vlan10 type vlan id 10
>>
>> This works to pass VLAN 10 traffic over the macvlan device. This seems
>> like expected behavior.
>>
>> However, if I then also add vlan 10 to the lowerdev:
>>
>> ip link add link ens0 name lowervlan10  type vlan id 10
>>
>> Then traffic stops flowing to the VLAN on the macvlan device.
>>
>> This happens, as far as I can tell, because of how the VLAN traffic is
>> filtered first, and then forwarded to the VLAN device, which doesn't know
>> about how the macvlan device exists.
>>
>> It seems, essentially, that vlan stacked on top of a macvlan shouldn't
>> work. Because the vlan code basically expects each vlan to apply to every
>> MAC address, and the macvlan device works by putting its MAC address into
>> the unicast address list, there's no way for a device driver to know when or
>> how to apply the vlan.
>>
>> This gets a bit more confusing when we add in the l2 fwd hardware offload.
>>
>> Currently, at least for the Intel network parts, this isn't supported,
>> because of a bug in which the device drivers don't apply the VLANs to the
>> macvlan accelerated addresses. If we fix this, at least for fm10k, the
>> behavior is slightly better, because of how the hardware filtering at the
>> MAC address happens first, and we direct the traffic to the proper device
>> regardless of VLAN.
>>
>> In addition to this peculiarity of VLANs on both the macvlan and lowerdev,
>> is that when a macvlan device adds a VLAN, the lowerdev gets an indication
>> to add the vlan via its .ndo_vlan_rx_add_vid(), which doesn't distinguish
>> between which addresses the VLAN might apply to. It thus simply, depending
>> on hardware design, enables the VLAN for all its unicast and multicast
>> addresses. Some hardware could theoretically support MAC+VLAN pairs, where
>> it could distinguish that a VLAN should only be added for some subset of
>> addresses. Other hardware might not be so lucky..
>>
>> Unfortunately, this has the weird consequence that if we have the
>> following stack of devices:
>>
>> vlan10@macvlan0
>> macvlan0@ens0
>> ens0
>>
>> Then ens0 will receive VLAN10 traffic on every address. So VLAN 10 traffic
>> destined to the MAC of the lowerdev will be received, instead of dropped.
>>
>> If we add VLAN 10 to the lowerdev so we have both the above stack and also
>>
>> lowervlan10@ens0
>> ens0 (mac gg:hh:ii:jj:kk)
>>
>> then all vlan 10 traffic will be received on the lowerdev VLAN 10, without
>> any being forwarded to the VLAN10 attached to the macvlan.
>>
>> However, if we add two macvlans, and each add the vlan10, so we have the
>> following:
>>
>> avlan10@macvlan0
>> macvlan0@ens0
>> ens0
>>
>> bvlan10@macvlan1
>> macvlan1@ens0
>> ens0
>>
>> In this case, it does appear that traffic is sorted out correctly. It
>> seems that only if the lowerdev gets the VLAN does it end up breaking. If I
>> remove bvlan10 from macvlan1, the traffic associated with vlan10 is still
>> received by macvlan1, even though in principle it should no longer be.
>>
>> What is the correct behavior here? Should this just be "administrators
>> should know better"? I don't think that's a great argument, and either way
>> we're still essentially leaking VLANs across the macvlan interfaces, which I
>> don't think is ideal.
>>
>> I see two possible solutions:
>>
>> 1) modify macvlan driver so that it is marked as VLAN_CHALLENGED, and thus
>> indicate it cannot handle VLAN traffic on top of it.
>>    a. In order to get the VLANs associated, administrator could instead
>> add the VLAN first, and then add the macvlan on top. This I think is a
>> better configuration.
>>    b. that doesn't work in the offload case, unless/until we fix the VLAN
>> interface to forward the l2_dfwd_add_station() along with a vid.
>>    c. this could appear as loss of functionality, since in some cases
>> these VLAN on top of macvlan work today (with the interesting caveats listed
>> above).
>>
>> 2) modify how VLANs interact with MAC addresses, so that the lowerdev can
>> explicitly be aware of which VLANs are tied to which address groups, in
>> order to allow for the explicit configuration of which MAC+VLAN pairs are
>> actually allowed.
>>    a. this is a much more invasive change to driver interface, and more
>> difficult to get right
>>    b. possibly other configurations of stacked devices might have a
>> similar problem, so we could solve more here? Or create more problems.. I'm
>> not really certain.
>>
>>
>> I think the correct solution is (1) but I wasn't sure what others thought,
>> and whether anyone else has encountered the problems I mention and outline
>> above. I cc'd Alex who I discussed with offline when I first heard of and
>> began investigating this, in case he has anything further to add.
>>
>> Regards,
>> Jake
>>
>
> Hi Jake,
>
> The current behavior seems logical to me, but I suppose Alex might argue
> differently.  The macvlan was put onto the default lowerdev assuming the
> lowerdev will hand it all the default traffic, and then the macvlan splits
> out its own vlan traffic.  As soon as the lowerdev assumption changes, it is
> going to change what gets pushed up to the macvlan dev. If the lowerdev is
> separating the vlan traffic out of the "default" flow headed to the macvlan,
> then the initial assumption has changed and the vlan traffic has been
> vectored off before it can be delivered up the stack to the macvlan.

It depends on what your goal is. In my mind making macvlan VLAN
challenged is the easier solution since you just have to add some
pass-thru ops to the VLAN drivers and you can guarantee that you are
passing MAC-VLAN pair for each address on the interface for the call.
The alternative gets to be a bit more complex since it requires
multiple rules, one for non-tagged and one per VLAN for tagged
traffic.

> There's an argument that the lowerdev shouldn't know anything about the
> upperdev's routing, just deliver to the upperdev and let the upperdev worry
> about it.  But perhaps this becomes is a question of precedence: does the
> lowerdev split traffic first by mac address or by vlan tag.

That is where things get messy. We found it splits by VLAN tag if the
VLAN is present on the lowerdev, or it splits by MAC if it is not.
That is why as Jake pointed out adding the VLAN to the lower dev
causes issues.

> I don't like your option 1: as you point out, it breaks current
> functionality, likely depended upon in some containers that are using
> macvlans to manage their traffic.  We don't know what's going on inside that
> container and I don't think we want to break its ability to split its own
> vlans.

Maybe we should look at an option 1.5. Mark the lowerdev as VLAN
challenged if any macvlan is operating with any VLANs enabled on it
since we can only really allow VLAN filtering to occur at one level
reliably. Either that or maybe we look at making VLANs and rx_handler
setups mutually exclusive.

> Like I said, I think the current behavior is mostly correct, but a version
> of option 2 might be good to help support offload of the mac+vlan pair into
> a macvlan channel.

The only issue is I am not completely sure how option 2 solves the
original issue. Yes it makes the filtering more explicit, but the
network stack is still filtering VLANs before we get to the rx_handler
calls, or is this a fix that works for the offloaded approach only and
doesn't address the issues in the non-offloaded case? It's also
possible I might have missed something.

- Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: macvlan devices and vlan interaction
  2018-01-30 20:29 ` Shannon Nelson
  2018-01-30 20:49   ` Alexander Duyck
@ 2018-01-30 22:14   ` Keller, Jacob E
  1 sibling, 0 replies; 9+ messages in thread
From: Keller, Jacob E @ 2018-01-30 22:14 UTC (permalink / raw)
  To: Shannon Nelson, netdev; +Cc: Duyck, Alexander H

> -----Original Message-----
> From: Shannon Nelson [mailto:shannon.nelson@oracle.com]
> Sent: Tuesday, January 30, 2018 12:30 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> Cc: Duyck, Alexander H <alexander.h.duyck@intel.com>
> Subject: Re: macvlan devices and vlan interaction
> 
> Hi Jake,
> 
> The current behavior seems logical to me, but I suppose Alex might argue
> differently.  The macvlan was put onto the default lowerdev assuming the
> lowerdev will hand it all the default traffic, and then the macvlan
> splits out its own vlan traffic.  As soon as the lowerdev assumption
> changes, it is going to change what gets pushed up to the macvlan dev.
> If the lowerdev is separating the vlan traffic out of the "default" flow
> headed to the macvlan, then the initial assumption has changed and the
> vlan traffic has been vectored off before it can be delivered up the
> stack to the macvlan.
> 
> There's an argument that the lowerdev shouldn't know anything about the
> upperdev's routing, just deliver to the upperdev and let the upperdev
> worry about it.  But perhaps this becomes is a question of precedence:
> does the lowerdev split traffic first by mac address or by vlan tag.
> 

There's a few issues at play here. (1) the device driver has no idea which VLANs apply to which devs. So when adding a VLAN to upperdev, it just sends a notification to the lowerdev, saying please add VLAN N. The lowerdev doesn't have a clue which this applies to.

The second issue (2) is that partially, when deciding where traffic goes, the stack prioritises VLANs over macvlan upperdevs, so we end up routing traffic that should have gone to a macvlan into a VLAN attached to the lowerdev instead.

> I don't like your option 1: as you point out, it breaks current
> functionality, likely depended upon in some containers that are using
> macvlans to manage their traffic.  We don't know what's going on inside
> that container and I don't think we want to break its ability to split
> its own vlans.
> 

I don't really want to break the ability either, but look at this scenario:

upperdev macvlan created on some lowerdev, and put into a container.
upperdev creates VLAN 10 and starts receiving VLAN 10 traffic.

now, lowerdev creates VLAN 10 on the same lowerdev, possibly unaware of what the container did.
 
suddenly the upperdev macvlan no longer receives any VLAN 10 traffic.

Worse, the behavior is *different* depending on whether the macvlan is offloaded or not.

In an offloaded macvlan, at least from what i can tell, VLANs have not worked on any open source driver in the upstream kernel today, so the original case of upperdev creates VLAN 10 will just not receive traffic. This is a separate issue which I have a patch to resolve, but it still has problems with the leaked VLAN issue (where VLANs are added to the lowerdev directly).

You can argue that this is administrator error, but I'd rather fix it so that it's not possible one way or another. Unfortunately, I don't have any good way to figure out how to prevent this. The driver doesn't have any indication which VLANs apply to which devices.

> Like I said, I think the current behavior is mostly correct, but a
> version of option 2 might be good to help support offload of the
> mac+vlan pair into a macvlan channel.
> 
> sln
> 

I don't really like either option, so suggestions are welcome.

Thanks,
Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: macvlan devices and vlan interaction
  2018-01-30 20:49   ` Alexander Duyck
@ 2018-01-30 22:20     ` Keller, Jacob E
  2018-01-30 22:38       ` Alexander Duyck
  0 siblings, 1 reply; 9+ messages in thread
From: Keller, Jacob E @ 2018-01-30 22:20 UTC (permalink / raw)
  To: Alexander Duyck, Shannon Nelson; +Cc: netdev, Duyck, Alexander H

> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Tuesday, January 30, 2018 12:49 PM
> To: Shannon Nelson <shannon.nelson@oracle.com>
> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org; Duyck,
> Alexander H <alexander.h.duyck@intel.com>
> Subject: Re: macvlan devices and vlan interaction
> 
> > Hi Jake,
> >
> > The current behavior seems logical to me, but I suppose Alex might argue
> > differently.  The macvlan was put onto the default lowerdev assuming the
> > lowerdev will hand it all the default traffic, and then the macvlan splits
> > out its own vlan traffic.  As soon as the lowerdev assumption changes, it is
> > going to change what gets pushed up to the macvlan dev. If the lowerdev is
> > separating the vlan traffic out of the "default" flow headed to the macvlan,
> > then the initial assumption has changed and the vlan traffic has been
> > vectored off before it can be delivered up the stack to the macvlan.
> 
> It depends on what your goal is. In my mind making macvlan VLAN
> challenged is the easier solution since you just have to add some
> pass-thru ops to the VLAN drivers and you can guarantee that you are
> passing MAC-VLAN pair for each address on the interface for the call.
> The alternative gets to be a bit more complex since it requires
> multiple rules, one for non-tagged and one per VLAN for tagged
> traffic.
> 
> > There's an argument that the lowerdev shouldn't know anything about the
> > upperdev's routing, just deliver to the upperdev and let the upperdev worry
> > about it.  But perhaps this becomes is a question of precedence: does the
> > lowerdev split traffic first by mac address or by vlan tag.
> 
> That is where things get messy. We found it splits by VLAN tag if the
> VLAN is present on the lowerdev, or it splits by MAC if it is not.
> That is why as Jake pointed out adding the VLAN to the lower dev
> causes issues.
>

Yes, right now the problem is that it splits differently depending on whether or not a VLAN is present on the lower dev.

> > I don't like your option 1: as you point out, it breaks current
> > functionality, likely depended upon in some containers that are using
> > macvlans to manage their traffic.  We don't know what's going on inside that
> > container and I don't think we want to break its ability to split its own
> > vlans.
> 
> Maybe we should look at an option 1.5. Mark the lowerdev as VLAN
> challenged if any macvlan is operating with any VLANs enabled on it
> since we can only really allow VLAN filtering to occur at one level
> reliably. Either that or maybe we look at making VLANs and rx_handler
> setups mutually exclusive.
> 

Actually.. what if we changed the order of splitting, so that we always check macvlan MAC address first, before checking VLANs?

This should work in both cases of macvlan -> VLAN -> lowerdev, or VLAN -> macvlan -> lowerdev.

In the first case, the macvlan isn't directly attached to the lowerdev, so we'd do VLAN filtering first, and then the VLAN would check MAC address.

In the second case, even if lowerdev also had the VLAN, we'd do macvlan filtering first, and things would work.

Both the lowerdev VLAN and upperdev macvlan should receive traffic correctly in this case.

I think this resolves the problem of which device goes to which VLAN.

I don't know if it resolves the issues with leaked VLANs, where a VLAN added to the macvlan device causes traffic for that VLAN to be received by all the MAC addresses of the lowerdev...

I suppose this might not be considered a problem? The traffic could be received either way if you're in promiscuous mode. It's not like we have a sense of "trusted" configuration either.

I think some separate work for the case of macvlan on top of VLAN on top of lower dev can be done as well, to enable offloading in this case. I'll have some more thoughts on that soon.

Thanks,
Jake

> > Like I said, I think the current behavior is mostly correct, but a version
> > of option 2 might be good to help support offload of the mac+vlan pair into
> > a macvlan channel.
> 
> The only issue is I am not completely sure how option 2 solves the
> original issue. Yes it makes the filtering more explicit, but the
> network stack is still filtering VLANs before we get to the rx_handler
> calls, or is this a fix that works for the offloaded approach only and
> doesn't address the issues in the non-offloaded case? It's also
> possible I might have missed something.
> 
> - Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: macvlan devices and vlan interaction
  2018-01-30  1:53 ` Yuan, Linyu (NSB - CN/Shanghai)
@ 2018-01-30 22:23   ` Keller, Jacob E
  0 siblings, 0 replies; 9+ messages in thread
From: Keller, Jacob E @ 2018-01-30 22:23 UTC (permalink / raw)
  To: Yuan, Linyu (NSB - CN/Shanghai), netdev; +Cc: Duyck, Alexander H

> -----Original Message-----
> From: Yuan, Linyu (NSB - CN/Shanghai) [mailto:linyu.yuan@nokia-sbell.com]
> Sent: Monday, January 29, 2018 5:53 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org
> Cc: Duyck, Alexander H <alexander.h.duyck@intel.com>
> Subject: RE: macvlan devices and vlan interaction
> 
> https://www.spinics.net/lists/netdev/msg476083.html
> 
> I also have a macvlan device question, but get no answer.
> 
> But my original thought is in __netif_receive_skb_core() we should check packet
> destination mac address,
> if it match macvlan device, change packet as receive from macvlan device, not
> lower device, then packet go to upper layer.
> 
> But I don't know how to process broadcast mac address. Do macvlan device can
> receive broadcast packet ?
> 

I don't know how macvlans behave in regards to broadcast addresses.

I do think that we should make sure macvlan filtering occurs earlier than VLAN filtering to ensure that we get the correct behavior (see the other emails on this thread).

I can't comment on how that impacts AF_PACKET, because I think AF_PACKET sockets bypass a lot of the stack don't they?

Thanks,
Jake

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: macvlan devices and vlan interaction
  2018-01-30 22:20     ` Keller, Jacob E
@ 2018-01-30 22:38       ` Alexander Duyck
  2018-01-30 23:07         ` Keller, Jacob E
  0 siblings, 1 reply; 9+ messages in thread
From: Alexander Duyck @ 2018-01-30 22:38 UTC (permalink / raw)
  To: Keller, Jacob E; +Cc: Shannon Nelson, netdev, Duyck, Alexander H

On Tue, Jan 30, 2018 at 2:20 PM, Keller, Jacob E
<jacob.e.keller@intel.com> wrote:
>> -----Original Message-----
>> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
>> Sent: Tuesday, January 30, 2018 12:49 PM
>> To: Shannon Nelson <shannon.nelson@oracle.com>
>> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org; Duyck,
>> Alexander H <alexander.h.duyck@intel.com>
>> Subject: Re: macvlan devices and vlan interaction
>>
>> > Hi Jake,
>> >
>> > The current behavior seems logical to me, but I suppose Alex might argue
>> > differently.  The macvlan was put onto the default lowerdev assuming the
>> > lowerdev will hand it all the default traffic, and then the macvlan splits
>> > out its own vlan traffic.  As soon as the lowerdev assumption changes, it is
>> > going to change what gets pushed up to the macvlan dev. If the lowerdev is
>> > separating the vlan traffic out of the "default" flow headed to the macvlan,
>> > then the initial assumption has changed and the vlan traffic has been
>> > vectored off before it can be delivered up the stack to the macvlan.
>>
>> It depends on what your goal is. In my mind making macvlan VLAN
>> challenged is the easier solution since you just have to add some
>> pass-thru ops to the VLAN drivers and you can guarantee that you are
>> passing MAC-VLAN pair for each address on the interface for the call.
>> The alternative gets to be a bit more complex since it requires
>> multiple rules, one for non-tagged and one per VLAN for tagged
>> traffic.
>>
>> > There's an argument that the lowerdev shouldn't know anything about the
>> > upperdev's routing, just deliver to the upperdev and let the upperdev worry
>> > about it.  But perhaps this becomes is a question of precedence: does the
>> > lowerdev split traffic first by mac address or by vlan tag.
>>
>> That is where things get messy. We found it splits by VLAN tag if the
>> VLAN is present on the lowerdev, or it splits by MAC if it is not.
>> That is why as Jake pointed out adding the VLAN to the lower dev
>> causes issues.
>>
>
> Yes, right now the problem is that it splits differently depending on whether or not a VLAN is present on the lower dev.
>
>> > I don't like your option 1: as you point out, it breaks current
>> > functionality, likely depended upon in some containers that are using
>> > macvlans to manage their traffic.  We don't know what's going on inside that
>> > container and I don't think we want to break its ability to split its own
>> > vlans.
>>
>> Maybe we should look at an option 1.5. Mark the lowerdev as VLAN
>> challenged if any macvlan is operating with any VLANs enabled on it
>> since we can only really allow VLAN filtering to occur at one level
>> reliably. Either that or maybe we look at making VLANs and rx_handler
>> setups mutually exclusive.
>>
>
> Actually.. what if we changed the order of splitting, so that we always check macvlan MAC address first, before checking VLANs?
>
> This should work in both cases of macvlan -> VLAN -> lowerdev, or VLAN -> macvlan -> lowerdev.

The thing you have to then watch out for is how something like this
would impact bonding or bridging since both of those use the Rx
handler as well from my understanding. I suppose it would make sense
though to do Rx handler first and then VLAN since the Rx handler
should be placed on the VLAN itself if you are
bridging/bonding/macvlan over a VLAN versus the reverse.

> In the first case, the macvlan isn't directly attached to the lowerdev, so we'd do VLAN filtering first, and then the VLAN would check MAC address.

Right, that bit works without any issues.

> In the second case, even if lowerdev also had the VLAN, we'd do macvlan filtering first, and things would work.

That piece makes sense, at least for macvlan.

> Both the lowerdev VLAN and upperdev macvlan should receive traffic correctly in this case.
>
> I think this resolves the problem of which device goes to which VLAN.

Right, this works for macvlan. The only concern I have then is bond
and bridging. It is probably fine but it wouldn't hurt to check.

> I don't know if it resolves the issues with leaked VLANs, where a VLAN added to the macvlan device causes traffic for that VLAN to be received by all the MAC addresses of the lowerdev...
>
> I suppose this might not be considered a problem? The traffic could be received either way if you're in promiscuous mode. It's not like we have a sense of "trusted" configuration either.
>
> I think some separate work for the case of macvlan on top of VLAN on top of lower dev can be done as well, to enable offloading in this case. I'll have some more thoughts on that soon.
>
> Thanks,
> Jake

So the issue with all of this apears to be:

commit 2425717b27eb92b175335ca4ff0bb218cbe0cb64
Author: John Fastabend <john.r.fastabend@intel.com>
Date:   Mon Oct 10 09:16:41 2011 +0000

    net: allow vlan traffic to be received under bond

The problem is that patch made it so that you could put a bond on two
interfaces and still peel out traffic via VLANs. Prior to this patch
the code is the way we were already discussing.

>
>> > Like I said, I think the current behavior is mostly correct, but a version
>> > of option 2 might be good to help support offload of the mac+vlan pair into
>> > a macvlan channel.
>>
>> The only issue is I am not completely sure how option 2 solves the
>> original issue. Yes it makes the filtering more explicit, but the
>> network stack is still filtering VLANs before we get to the rx_handler
>> calls, or is this a fix that works for the offloaded approach only and
>> doesn't address the issues in the non-offloaded case? It's also
>> possible I might have missed something.
>>
>> - Alex

^ permalink raw reply	[flat|nested] 9+ messages in thread

* RE: macvlan devices and vlan interaction
  2018-01-30 22:38       ` Alexander Duyck
@ 2018-01-30 23:07         ` Keller, Jacob E
  0 siblings, 0 replies; 9+ messages in thread
From: Keller, Jacob E @ 2018-01-30 23:07 UTC (permalink / raw)
  To: Alexander Duyck; +Cc: Shannon Nelson, netdev, Duyck, Alexander H

> -----Original Message-----
> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> Sent: Tuesday, January 30, 2018 2:39 PM
> To: Keller, Jacob E <jacob.e.keller@intel.com>
> Cc: Shannon Nelson <shannon.nelson@oracle.com>; netdev@vger.kernel.org;
> Duyck, Alexander H <alexander.h.duyck@intel.com>
> Subject: Re: macvlan devices and vlan interaction
> 
> On Tue, Jan 30, 2018 at 2:20 PM, Keller, Jacob E
> <jacob.e.keller@intel.com> wrote:
> >> -----Original Message-----
> >> From: Alexander Duyck [mailto:alexander.duyck@gmail.com]
> >> Sent: Tuesday, January 30, 2018 12:49 PM
> >> To: Shannon Nelson <shannon.nelson@oracle.com>
> >> Cc: Keller, Jacob E <jacob.e.keller@intel.com>; netdev@vger.kernel.org;
> Duyck,
> >> Alexander H <alexander.h.duyck@intel.com>
> >> Subject: Re: macvlan devices and vlan interaction
> >>
> >> > Hi Jake,
> >> >
> >> > The current behavior seems logical to me, but I suppose Alex might argue
> >> > differently.  The macvlan was put onto the default lowerdev assuming the
> >> > lowerdev will hand it all the default traffic, and then the macvlan splits
> >> > out its own vlan traffic.  As soon as the lowerdev assumption changes, it is
> >> > going to change what gets pushed up to the macvlan dev. If the lowerdev is
> >> > separating the vlan traffic out of the "default" flow headed to the macvlan,
> >> > then the initial assumption has changed and the vlan traffic has been
> >> > vectored off before it can be delivered up the stack to the macvlan.
> >>
> >> It depends on what your goal is. In my mind making macvlan VLAN
> >> challenged is the easier solution since you just have to add some
> >> pass-thru ops to the VLAN drivers and you can guarantee that you are
> >> passing MAC-VLAN pair for each address on the interface for the call.
> >> The alternative gets to be a bit more complex since it requires
> >> multiple rules, one for non-tagged and one per VLAN for tagged
> >> traffic.
> >>
> >> > There's an argument that the lowerdev shouldn't know anything about the
> >> > upperdev's routing, just deliver to the upperdev and let the upperdev worry
> >> > about it.  But perhaps this becomes is a question of precedence: does the
> >> > lowerdev split traffic first by mac address or by vlan tag.
> >>
> >> That is where things get messy. We found it splits by VLAN tag if the
> >> VLAN is present on the lowerdev, or it splits by MAC if it is not.
> >> That is why as Jake pointed out adding the VLAN to the lower dev
> >> causes issues.
> >>
> >
> > Yes, right now the problem is that it splits differently depending on whether or
> not a VLAN is present on the lower dev.
> >
> >> > I don't like your option 1: as you point out, it breaks current
> >> > functionality, likely depended upon in some containers that are using
> >> > macvlans to manage their traffic.  We don't know what's going on inside that
> >> > container and I don't think we want to break its ability to split its own
> >> > vlans.
> >>
> >> Maybe we should look at an option 1.5. Mark the lowerdev as VLAN
> >> challenged if any macvlan is operating with any VLANs enabled on it
> >> since we can only really allow VLAN filtering to occur at one level
> >> reliably. Either that or maybe we look at making VLANs and rx_handler
> >> setups mutually exclusive.
> >>
> >
> > Actually.. what if we changed the order of splitting, so that we always check
> macvlan MAC address first, before checking VLANs?
> >
> > This should work in both cases of macvlan -> VLAN -> lowerdev, or VLAN ->
> macvlan -> lowerdev.
> 
> The thing you have to then watch out for is how something like this
> would impact bonding or bridging since both of those use the Rx
> handler as well from my understanding. I suppose it would make sense
> though to do Rx handler first and then VLAN since the Rx handler
> should be placed on the VLAN itself if you are
> bridging/bonding/macvlan over a VLAN versus the reverse.
> 
> > In the first case, the macvlan isn't directly attached to the lowerdev, so we'd do
> VLAN filtering first, and then the VLAN would check MAC address.
> 
> Right, that bit works without any issues.
> 
> > In the second case, even if lowerdev also had the VLAN, we'd do macvlan
> filtering first, and things would work.
> 
> That piece makes sense, at least for macvlan.
> 
> > Both the lowerdev VLAN and upperdev macvlan should receive traffic correctly
> in this case.
> >
> > I think this resolves the problem of which device goes to which VLAN.
> 
> Right, this works for macvlan. The only concern I have then is bond
> and bridging. It is probably fine but it wouldn't hurt to check.
> 

Yea, it's quite possible bonding and bridging won't work quite right...

> > I don't know if it resolves the issues with leaked VLANs, where a VLAN added to
> the macvlan device causes traffic for that VLAN to be received by all the MAC
> addresses of the lowerdev...
> >
> > I suppose this might not be considered a problem? The traffic could be received
> either way if you're in promiscuous mode. It's not like we have a sense of
> "trusted" configuration either.
> >
> > I think some separate work for the case of macvlan on top of VLAN on top of
> lower dev can be done as well, to enable offloading in this case. I'll have some
> more thoughts on that soon.
> >
> > Thanks,
> > Jake
> 
> So the issue with all of this apears to be:
> 
> commit 2425717b27eb92b175335ca4ff0bb218cbe0cb64
> Author: John Fastabend <john.r.fastabend@intel.com>
> Date:   Mon Oct 10 09:16:41 2011 +0000
> 
>     net: allow vlan traffic to be received under bond
> 
> The problem is that patch made it so that you could put a bond on two
> interfaces and still peel out traffic via VLANs. Prior to this patch
> the code is the way we were already discussing.
> 

Interesting. I am not sure how to resolve this without breaking one or the other case. The commit message itself calls out that:

Putting a VLAN.228 on both the bond0 and eth2 device will
result in eth2.228 receiving the skb. I don't think this is
completely unexpected and was the result prior to the rx_handler
result.


So.. to me this *was* unexpected, but I guess it matches how things used to work?

Thanks,
Jake 

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2018-01-30 23:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-01-29 23:01 macvlan devices and vlan interaction Keller, Jacob E
2018-01-30  1:53 ` Yuan, Linyu (NSB - CN/Shanghai)
2018-01-30 22:23   ` Keller, Jacob E
2018-01-30 20:29 ` Shannon Nelson
2018-01-30 20:49   ` Alexander Duyck
2018-01-30 22:20     ` Keller, Jacob E
2018-01-30 22:38       ` Alexander Duyck
2018-01-30 23:07         ` Keller, Jacob E
2018-01-30 22:14   ` Keller, Jacob E

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.