All of lore.kernel.org
 help / color / mirror / Atom feed
* switchdev and VLAN ranges
@ 2015-10-09 23:30 Vivien Didelot
  2015-10-10  4:22 ` Scott Feldman
  2015-10-10  7:49 ` Elad Raz
  0 siblings, 2 replies; 34+ messages in thread
From: Vivien Didelot @ 2015-10-09 23:30 UTC (permalink / raw)
  To: Jiri Pirko, Scott Feldman; +Cc: netdev, stephen, Florian Fainelli, Andrew Lunn

Hi All,

I understand that specifying a VLAN range on the command line is nice
for the user, and it makes no big deal for software implementation.

However, AFAICT a VLAN range does not make sense at all for hardware
such as Ethernet switch chips. Am I wrong?

I would suggest to make switchdev directly answer to a bridge request
that the operation is not supported when the user asks for a VLAN range.

That way, we can simply use a single "vid" member in struct
switchdev_obj_port_vlan instead of "vid_begin" and "vid_end" and thus
avoid making drivers heavier with iteration loops on such range.

I have two concerns in mind:

a) if we imagine that drivers like Rocker allocate memory in the prepare
phase for each VID, preparing a range like 100-4000 would definitely not
be recommended.

b) imagine that you have two Linux bridges on a switch, one using the
hardware VLAN 100. If you request the VLAN range 99-101 for the other
bridge members, it is not possible for the driver to say "I can
accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
the whole range.

That's why I think that avoiding VLAN range at the switchdev level would
be a good idea.

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-09 23:30 switchdev and VLAN ranges Vivien Didelot
@ 2015-10-10  4:22 ` Scott Feldman
  2015-10-10 16:33   ` Vivien Didelot
  2015-10-10  7:49 ` Elad Raz
  1 sibling, 1 reply; 34+ messages in thread
From: Scott Feldman @ 2015-10-10  4:22 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Jiri Pirko, netdev, stephen, Florian Fainelli, Andrew Lunn, Roopa Prabhu

On Fri, Oct 9, 2015 at 4:30 PM, Vivien Didelot
<vivien.didelot@savoirfairelinux.com> wrote:
> Hi All,
>
> I understand that specifying a VLAN range on the command line is nice
> for the user, and it makes no big deal for software implementation.

[Adding Roopa, since she did the original vlan range support in the
kernel/iproute2]

> However, AFAICT a VLAN range does not make sense at all for hardware
> such as Ethernet switch chips. Am I wrong?
>
> I would suggest to make switchdev directly answer to a bridge request
> that the operation is not supported when the user asks for a VLAN range.
>
> That way, we can simply use a single "vid" member in struct
> switchdev_obj_port_vlan instead of "vid_begin" and "vid_end" and thus
> avoid making drivers heavier with iteration loops on such range.
>
> I have two concerns in mind:
>
> a) if we imagine that drivers like Rocker allocate memory in the prepare
> phase for each VID, preparing a range like 100-4000 would definitely not
> be recommended.

This call should be in process context so it doesn't seem to terrible
for the driver to take its time to reserve/allocate resources in
prepare phase, even for a vlan range.  I think I'm missing your point.

> b) imagine that you have two Linux bridges on a switch, one using the
> hardware VLAN 100. If you request the VLAN range 99-101 for the other
> bridge members, it is not possible for the driver to say "I can
> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> the whole range.

Well, it probably should return -ERANGE to indicate the range can't be
added, but that's an aside.

The reason why vlan ranges need to work down to the switchdev driver
is, from the user's perspective, it's an all-or-nothing request from
the user to add the vlan range to the device.  So we need to ask the
driver in the prepare phase, "can you support this range,
completely?", and if yes, then commit it as a whole.  The netlink
response back to the user isn't equipped to describe what subset of
the range was added, and what subset was not.

> That's why I think that avoiding VLAN range at the switchdev level would
> be a good idea.

As a general rule with switchdev, we've tried to keep the user's
experience the same when using {Linux} as a soft switch/router vs.
using {Linux + offload device} as a hard switch/router.  So if native
Linux supports some operation, for example vlan ranges, then we should
try to extend that to the offload model.  In other words, we don't
want to re-train the user when moving from soft switch to hard switch!
 But there are physical limitations when dealing with an offload
device....

Anyway, with your vlan range example, we've got a case where each soft
bridge has an independent vlan set, and the vlan sets between soft
bridges can overlap.  For the (typical) hard switch, there is one vlan
set for the whole switch, and trying to overlay the soft bridges'
(overlapping) vlan sets on the hard switch fails.  That failure is
reported to the user.  We tried, but due to offload device
limitations, we can't support that operation.  Of course, if the vlan
sets didn't overlap, then we don't have a problem.

This will not be the only case where something we can do on a soft
Linux switch/router can't be offloaded to some physical offload
device.  But I think the philosophy has been to try offload what we
can, up to the point of failure.  In some cases, we can mask that
failure from the user by falling back to soft-switch only, but in
other cases the failure will pop up right in the user's face, like in
your example.

One idea to help mitigate the user's confusion would be to limit the
number of bridges overlaid on the device to just one.  Our drivers
know when ports are enslaved to bridges, so is there something we can
do there to fail the enslave on a second bridge?  Exercise left to the
reader.  If we had that, now vlan ranges work 1:1 with soft Linux
because both soft bridge and device have single vlan set.

Sorry for the long-winded response.

-scott

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-09 23:30 switchdev and VLAN ranges Vivien Didelot
  2015-10-10  4:22 ` Scott Feldman
@ 2015-10-10  7:49 ` Elad Raz
  2015-10-10 10:36   ` Nikolay Aleksandrov
  1 sibling, 1 reply; 34+ messages in thread
From: Elad Raz @ 2015-10-10  7:49 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Jiri Pirko, Scott Feldman, netdev, stephen, Florian Fainelli,
	Andrew Lunn, Ido Schimmel, Or Gerlitz


> On Oct 10, 2015, at 2:30 AM, Vivien Didelot <vivien.didelot@savoirfairelinux.com> wrote:
> 
> I have two concerns in mind:
> 
> a) if we imagine that drivers like Rocker allocate memory in the prepare
> phase for each VID, preparing a range like 100-4000 would definitely not
> be recommended.
> 
> b) imagine that you have two Linux bridges on a switch, one using the
> hardware VLAN 100. If you request the VLAN range 99-101 for the other
> bridge members, it is not possible for the driver to say "I can
> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> the whole range.

Another concern I have with vid_being..vid_end range is the “flags”. Where flags can be BRIDGE_VLAN_INFO_PVID.
There is no sense having more than one VLAN as a PVID.
This leave the HW vendor the choice which VLAN id they will use as the PVID.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-10  7:49 ` Elad Raz
@ 2015-10-10 10:36   ` Nikolay Aleksandrov
  2015-10-11  7:12     ` Jiri Pirko
  0 siblings, 1 reply; 34+ messages in thread
From: Nikolay Aleksandrov @ 2015-10-10 10:36 UTC (permalink / raw)
  To: Elad Raz, Vivien Didelot
  Cc: Jiri Pirko, Scott Feldman, netdev, stephen, Florian Fainelli,
	Andrew Lunn, Ido Schimmel, Or Gerlitz

On 10/10/2015 09:49 AM, Elad Raz wrote:
> 
>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot <vivien.didelot@savoirfairelinux.com> wrote:
>>
>> I have two concerns in mind:
>>
>> a) if we imagine that drivers like Rocker allocate memory in the prepare
>> phase for each VID, preparing a range like 100-4000 would definitely not
>> be recommended.
>>
>> b) imagine that you have two Linux bridges on a switch, one using the
>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
>> bridge members, it is not possible for the driver to say "I can
>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>> the whole range.
> 
> Another concern I have with vid_being..vid_end range is the “flags”. Where flags can be BRIDGE_VLAN_INFO_PVID.
> There is no sense having more than one VLAN as a PVID.
> This leave the HW vendor the choice which VLAN id they will use as the PVID.
> 

iproute2 doesn't allow to do it but I can see that someone can actually make it
so the flags for the range have it and it doesn't look correct. Perhaps we need
something like the patch below to enforce this from kernel-side.


diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index d78b4429505a..02b17b53e9a6 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
 			if (vinfo_start)
 				return -EINVAL;
 			vinfo_start = vinfo;
+			/* don't allow range of pvids */
+			if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
+				return -EINVAL;
 			continue;
 		}
 

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-10  4:22 ` Scott Feldman
@ 2015-10-10 16:33   ` Vivien Didelot
  2015-10-10 18:10     ` Florian Fainelli
  0 siblings, 1 reply; 34+ messages in thread
From: Vivien Didelot @ 2015-10-10 16:33 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Jiri Pirko, netdev, stephen, Florian Fainelli, Andrew Lunn, Roopa Prabhu

On Oct. Friday 09 (41) 09:22 PM, Scott Feldman wrote:
> On Fri, Oct 9, 2015 at 4:30 PM, Vivien Didelot
> <vivien.didelot@savoirfairelinux.com> wrote:
> > Hi All,
> >
> > I understand that specifying a VLAN range on the command line is nice
> > for the user, and it makes no big deal for software implementation.
> 
> [Adding Roopa, since she did the original vlan range support in the
> kernel/iproute2]
> 
> > However, AFAICT a VLAN range does not make sense at all for hardware
> > such as Ethernet switch chips. Am I wrong?
> >
> > I would suggest to make switchdev directly answer to a bridge request
> > that the operation is not supported when the user asks for a VLAN range.
> >
> > That way, we can simply use a single "vid" member in struct
> > switchdev_obj_port_vlan instead of "vid_begin" and "vid_end" and thus
> > avoid making drivers heavier with iteration loops on such range.
> >
> > I have two concerns in mind:
> >
> > a) if we imagine that drivers like Rocker allocate memory in the prepare
> > phase for each VID, preparing a range like 100-4000 would definitely not
> > be recommended.
> 
> This call should be in process context so it doesn't seem to terrible
> for the driver to take its time to reserve/allocate resources in
> prepare phase, even for a vlan range.  I think I'm missing your point.

I'll try to give a concrete example. If I want to use the prepare phase
in the mv88e6xxx driver, I will allocate a struct mv88e6xxx_vtu_entry
(basically a row in the hardware VLAN table) in the prepare phase before
programming it in the commit phase.

With this VLAN range, the driver will allocate 3900 struct. This seems
pointless for drivers and error-prone.

> > b) imagine that you have two Linux bridges on a switch, one using the
> > hardware VLAN 100. If you request the VLAN range 99-101 for the other
> > bridge members, it is not possible for the driver to say "I can
> > accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> > the whole range.
> 
> Well, it probably should return -ERANGE to indicate the range can't be
> added, but that's an aside.
> 
> The reason why vlan ranges need to work down to the switchdev driver
> is, from the user's perspective, it's an all-or-nothing request from
> the user to add the vlan range to the device.  So we need to ask the
> driver in the prepare phase, "can you support this range,
> completely?", and if yes, then commit it as a whole.  The netlink
> response back to the user isn't equipped to describe what subset of
> the range was added, and what subset was not.

I understand the all-or-nothing request, and I don't want to re-train
the user. I am basically suggesting that switchdev returns -ERANGE in
its code, if the device is a switch port and a VLAN range is requested.

I mean, I like the per-port paradigm of Linux, but an infrastructure
such as switchdev should not just forward user request to the device,
but also protect it and push consistent calls to the hardware.

> > That's why I think that avoiding VLAN range at the switchdev level would
> > be a good idea.
> 
> As a general rule with switchdev, we've tried to keep the user's
> experience the same when using {Linux} as a soft switch/router vs.
> using {Linux + offload device} as a hard switch/router.  So if native
> Linux supports some operation, for example vlan ranges, then we should
> try to extend that to the offload model.  In other words, we don't
> want to re-train the user when moving from soft switch to hard switch!
>  But there are physical limitations when dealing with an offload
> device....
> 
> Anyway, with your vlan range example, we've got a case where each soft
> bridge has an independent vlan set, and the vlan sets between soft
> bridges can overlap.  For the (typical) hard switch, there is one vlan
> set for the whole switch, and trying to overlay the soft bridges'
> (overlapping) vlan sets on the hard switch fails.  That failure is
> reported to the user.  We tried, but due to offload device
> limitations, we can't support that operation.  Of course, if the vlan
> sets didn't overlap, then we don't have a problem.
> 
> This will not be the only case where something we can do on a soft
> Linux switch/router can't be offloaded to some physical offload
> device.  But I think the philosophy has been to try offload what we
> can, up to the point of failure.  In some cases, we can mask that
> failure from the user by falling back to soft-switch only, but in
> other cases the failure will pop up right in the user's face, like in
> your example.
> 
> One idea to help mitigate the user's confusion would be to limit the
> number of bridges overlaid on the device to just one.  Our drivers
> know when ports are enslaved to bridges, so is there something we can
> do there to fail the enslave on a second bridge?  Exercise left to the
> reader.  If we had that, now vlan ranges work 1:1 with soft Linux
> because both soft bridge and device have single vlan set.
> 
> Sorry for the long-winded response.

I'm fine with the multiple software bridges on top of hardware switches.
IIRC, Andrew is actually using this feature. The user should be able to
request mixing of any ports like (s)he wants.

A use case I can imagine is a bridge on a router between a WAN interface
and a few switch ports, and an isolated group of ports (e.g. sandbox).

What I have done in net/dsa/slave.c:dsa_bridge_check_vlan_range() should
ideally be moved up to switchdev (this code prevents overlapping of VLAN
between bridges in DSA switches).

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-10 16:33   ` Vivien Didelot
@ 2015-10-10 18:10     ` Florian Fainelli
  2015-10-10 19:47       ` Vivien Didelot
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Fainelli @ 2015-10-10 18:10 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Scott Feldman, Jiri Pirko, netdev, stephen, Andrew Lunn, Roopa Prabhu

2015-10-10 9:33 GMT-07:00 Vivien Didelot <vivien.didelot@savoirfairelinux.com>:
> On Oct. Friday 09 (41) 09:22 PM, Scott Feldman wrote:
>> On Fri, Oct 9, 2015 at 4:30 PM, Vivien Didelot
>> <vivien.didelot@savoirfairelinux.com> wrote:
>> > Hi All,
>> >
>> > I understand that specifying a VLAN range on the command line is nice
>> > for the user, and it makes no big deal for software implementation.
>>
>> [Adding Roopa, since she did the original vlan range support in the
>> kernel/iproute2]
>>
>> > However, AFAICT a VLAN range does not make sense at all for hardware
>> > such as Ethernet switch chips. Am I wrong?
>> >
>> > I would suggest to make switchdev directly answer to a bridge request
>> > that the operation is not supported when the user asks for a VLAN range.
>> >
>> > That way, we can simply use a single "vid" member in struct
>> > switchdev_obj_port_vlan instead of "vid_begin" and "vid_end" and thus
>> > avoid making drivers heavier with iteration loops on such range.
>> >
>> > I have two concerns in mind:
>> >
>> > a) if we imagine that drivers like Rocker allocate memory in the prepare
>> > phase for each VID, preparing a range like 100-4000 would definitely not
>> > be recommended.
>>
>> This call should be in process context so it doesn't seem to terrible
>> for the driver to take its time to reserve/allocate resources in
>> prepare phase, even for a vlan range.  I think I'm missing your point.
>
> I'll try to give a concrete example. If I want to use the prepare phase
> in the mv88e6xxx driver, I will allocate a struct mv88e6xxx_vtu_entry
> (basically a row in the hardware VLAN table) in the prepare phase before
> programming it in the commit phase.
>
> With this VLAN range, the driver will allocate 3900 struct. This seems
> pointless for drivers and error-prone.

We could amortize this allocation cost by having a way to tell the
bridge VLAN filtering code, once associated with a switchdev driver,
that this driver needs X amount of bytes for its private VLAN entry,
but that sounds overkill. If the switch driver needs to maintain
bookeeping information about VLANs, we would have to allocate this
data somewhere, so we might as well have the switch driver do it?

>
>> > b) imagine that you have two Linux bridges on a switch, one using the
>> > hardware VLAN 100. If you request the VLAN range 99-101 for the other
>> > bridge members, it is not possible for the driver to say "I can
>> > accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>> > the whole range.
>>
>> Well, it probably should return -ERANGE to indicate the range can't be
>> added, but that's an aside.
>>
>> The reason why vlan ranges need to work down to the switchdev driver
>> is, from the user's perspective, it's an all-or-nothing request from
>> the user to add the vlan range to the device.  So we need to ask the
>> driver in the prepare phase, "can you support this range,
>> completely?", and if yes, then commit it as a whole.  The netlink
>> response back to the user isn't equipped to describe what subset of
>> the range was added, and what subset was not.
>
> I understand the all-or-nothing request, and I don't want to re-train
> the user. I am basically suggesting that switchdev returns -ERANGE in
> its code, if the device is a switch port and a VLAN range is requested.

Denying VLAN ranges in switchdev would at best, be something that
should be negotiated with the underlying switch driver. There could be
cases where it is desirable to push VLAN ranges to batch expensive HW
operations for instance, or just because you can do it and this maps
nicely to your HW is nice to have.  Do you have something else in
mind?

>
> I mean, I like the per-port paradigm of Linux, but an infrastructure
> such as switchdev should not just forward user request to the device,
> but also protect it and push consistent calls to the hardware.
>
>> > That's why I think that avoiding VLAN range at the switchdev level would
>> > be a good idea.
>>
>> As a general rule with switchdev, we've tried to keep the user's
>> experience the same when using {Linux} as a soft switch/router vs.
>> using {Linux + offload device} as a hard switch/router.  So if native
>> Linux supports some operation, for example vlan ranges, then we should
>> try to extend that to the offload model.  In other words, we don't
>> want to re-train the user when moving from soft switch to hard switch!
>>  But there are physical limitations when dealing with an offload
>> device....
>>
>> Anyway, with your vlan range example, we've got a case where each soft
>> bridge has an independent vlan set, and the vlan sets between soft
>> bridges can overlap.  For the (typical) hard switch, there is one vlan
>> set for the whole switch, and trying to overlay the soft bridges'
>> (overlapping) vlan sets on the hard switch fails.  That failure is
>> reported to the user.  We tried, but due to offload device
>> limitations, we can't support that operation.  Of course, if the vlan
>> sets didn't overlap, then we don't have a problem.
>>
>> This will not be the only case where something we can do on a soft
>> Linux switch/router can't be offloaded to some physical offload
>> device.  But I think the philosophy has been to try offload what we
>> can, up to the point of failure.  In some cases, we can mask that
>> failure from the user by falling back to soft-switch only, but in
>> other cases the failure will pop up right in the user's face, like in
>> your example.
>>
>> One idea to help mitigate the user's confusion would be to limit the
>> number of bridges overlaid on the device to just one.  Our drivers
>> know when ports are enslaved to bridges, so is there something we can
>> do there to fail the enslave on a second bridge?  Exercise left to the
>> reader.  If we had that, now vlan ranges work 1:1 with soft Linux
>> because both soft bridge and device have single vlan set.
>>
>> Sorry for the long-winded response.
>
> I'm fine with the multiple software bridges on top of hardware switches.
> IIRC, Andrew is actually using this feature. The user should be able to
> request mixing of any ports like (s)he wants.
>
> A use case I can imagine is a bridge on a router between a WAN interface
> and a few switch ports, and an isolated group of ports (e.g. sandbox).

That use case can be solved by having a single bridge entity spanning
an entire switch device, and configuring different PVIDs on e.g: ports
2-3-4 (WAN) and ports 0-1 (sandbox).

>
> What I have done in net/dsa/slave.c:dsa_bridge_check_vlan_range() should
> ideally be moved up to switchdev (this code prevents overlapping of VLAN
> between bridges in DSA switches).

As we discussed before over IRC, checking for overlapping VLANs is
something that may have to be switch driver specific at some point.

For instance, some switches do a double tag VLAN tag normalization
whenever a packet is ingressed, which means that as long as your are
not doing double VLAN tagging (that would be fun to support, but even
there), you could dedicate an outer VLAN tag per bridge instance, and
still allow the same inner VLAN tags to be configured by an user. The
switch would still guarantee proper isolation/switching/individual or
shared FDBs for these logical domains.
-- 
Florian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-10 18:10     ` Florian Fainelli
@ 2015-10-10 19:47       ` Vivien Didelot
  0 siblings, 0 replies; 34+ messages in thread
From: Vivien Didelot @ 2015-10-10 19:47 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Scott Feldman, Jiri Pirko, netdev, stephen, Andrew Lunn, Roopa Prabhu

On Oct. Saturday 10 (41) 11:10 AM, Florian Fainelli wrote:
> 2015-10-10 9:33 GMT-07:00 Vivien Didelot <vivien.didelot@savoirfairelinux.com>:
> > On Oct. Friday 09 (41) 09:22 PM, Scott Feldman wrote:
> >> On Fri, Oct 9, 2015 at 4:30 PM, Vivien Didelot
> >> <vivien.didelot@savoirfairelinux.com> wrote:
> >> > Hi All,
> >> >
> >> > I understand that specifying a VLAN range on the command line is nice
> >> > for the user, and it makes no big deal for software implementation.
> >>
> >> [Adding Roopa, since she did the original vlan range support in the
> >> kernel/iproute2]
> >>
> >> > However, AFAICT a VLAN range does not make sense at all for hardware
> >> > such as Ethernet switch chips. Am I wrong?
> >> >
> >> > I would suggest to make switchdev directly answer to a bridge request
> >> > that the operation is not supported when the user asks for a VLAN range.
> >> >
> >> > That way, we can simply use a single "vid" member in struct
> >> > switchdev_obj_port_vlan instead of "vid_begin" and "vid_end" and thus
> >> > avoid making drivers heavier with iteration loops on such range.
> >> >
> >> > I have two concerns in mind:
> >> >
> >> > a) if we imagine that drivers like Rocker allocate memory in the prepare
> >> > phase for each VID, preparing a range like 100-4000 would definitely not
> >> > be recommended.
> >>
> >> This call should be in process context so it doesn't seem to terrible
> >> for the driver to take its time to reserve/allocate resources in
> >> prepare phase, even for a vlan range.  I think I'm missing your point.
> >
> > I'll try to give a concrete example. If I want to use the prepare phase
> > in the mv88e6xxx driver, I will allocate a struct mv88e6xxx_vtu_entry
> > (basically a row in the hardware VLAN table) in the prepare phase before
> > programming it in the commit phase.
> >
> > With this VLAN range, the driver will allocate 3900 struct. This seems
> > pointless for drivers and error-prone.
> 
> We could amortize this allocation cost by having a way to tell the
> bridge VLAN filtering code, once associated with a switchdev driver,
> that this driver needs X amount of bytes for its private VLAN entry,
> but that sounds overkill. If the switch driver needs to maintain
> bookeeping information about VLANs, we would have to allocate this
> data somewhere, so we might as well have the switch driver do it?

You are right, the allocation would have to be done anyway. I am
considering caching the VLAN table in mv88e6xxx sometimes.

> >
> >> > b) imagine that you have two Linux bridges on a switch, one using the
> >> > hardware VLAN 100. If you request the VLAN range 99-101 for the other
> >> > bridge members, it is not possible for the driver to say "I can
> >> > accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> >> > the whole range.
> >>
> >> Well, it probably should return -ERANGE to indicate the range can't be
> >> added, but that's an aside.
> >>
> >> The reason why vlan ranges need to work down to the switchdev driver
> >> is, from the user's perspective, it's an all-or-nothing request from
> >> the user to add the vlan range to the device.  So we need to ask the
> >> driver in the prepare phase, "can you support this range,
> >> completely?", and if yes, then commit it as a whole.  The netlink
> >> response back to the user isn't equipped to describe what subset of
> >> the range was added, and what subset was not.
> >
> > I understand the all-or-nothing request, and I don't want to re-train
> > the user. I am basically suggesting that switchdev returns -ERANGE in
> > its code, if the device is a switch port and a VLAN range is requested.
> 
> Denying VLAN ranges in switchdev would at best, be something that
> should be negotiated with the underlying switch driver. There could be
> cases where it is desirable to push VLAN ranges to batch expensive HW
> operations for instance, or just because you can do it and this maps
> nicely to your HW is nice to have.  Do you have something else in
> mind?

I don't really have something else in mind. I was just worried that we
are pushing features to drivers based on potential (but not existent
AFAICT) hardware features.

I am curious to find an hardware that might program a batch of VLAN in a
single shot, especially on a per-port basis as Linux does. If someone
could point me an example of such hardware, I would gladly stop
questioning the need for VLAN ranges in drivers.

> >
> > I mean, I like the per-port paradigm of Linux, but an infrastructure
> > such as switchdev should not just forward user request to the device,
> > but also protect it and push consistent calls to the hardware.
> >
> >> > That's why I think that avoiding VLAN range at the switchdev level would
> >> > be a good idea.
> >>
> >> As a general rule with switchdev, we've tried to keep the user's
> >> experience the same when using {Linux} as a soft switch/router vs.
> >> using {Linux + offload device} as a hard switch/router.  So if native
> >> Linux supports some operation, for example vlan ranges, then we should
> >> try to extend that to the offload model.  In other words, we don't
> >> want to re-train the user when moving from soft switch to hard switch!
> >>  But there are physical limitations when dealing with an offload
> >> device....
> >>
> >> Anyway, with your vlan range example, we've got a case where each soft
> >> bridge has an independent vlan set, and the vlan sets between soft
> >> bridges can overlap.  For the (typical) hard switch, there is one vlan
> >> set for the whole switch, and trying to overlay the soft bridges'
> >> (overlapping) vlan sets on the hard switch fails.  That failure is
> >> reported to the user.  We tried, but due to offload device
> >> limitations, we can't support that operation.  Of course, if the vlan
> >> sets didn't overlap, then we don't have a problem.
> >>
> >> This will not be the only case where something we can do on a soft
> >> Linux switch/router can't be offloaded to some physical offload
> >> device.  But I think the philosophy has been to try offload what we
> >> can, up to the point of failure.  In some cases, we can mask that
> >> failure from the user by falling back to soft-switch only, but in
> >> other cases the failure will pop up right in the user's face, like in
> >> your example.
> >>
> >> One idea to help mitigate the user's confusion would be to limit the
> >> number of bridges overlaid on the device to just one.  Our drivers
> >> know when ports are enslaved to bridges, so is there something we can
> >> do there to fail the enslave on a second bridge?  Exercise left to the
> >> reader.  If we had that, now vlan ranges work 1:1 with soft Linux
> >> because both soft bridge and device have single vlan set.
> >>
> >> Sorry for the long-winded response.
> >
> > I'm fine with the multiple software bridges on top of hardware switches.
> > IIRC, Andrew is actually using this feature. The user should be able to
> > request mixing of any ports like (s)he wants.
> >
> > A use case I can imagine is a bridge on a router between a WAN interface
> > and a few switch ports, and an isolated group of ports (e.g. sandbox).
> 
> That use case can be solved by having a single bridge entity spanning
> an entire switch device, and configuring different PVIDs on e.g: ports
> 2-3-4 (WAN) and ports 0-1 (sandbox).

That's true. The number of software bridge groups on top of an hardware
switch is not really the issue here.

If Linux and the bridge code has an explicit support for hardware
switch, then mapping a single software bridge group on a switch chip
would be really consistent. Otherwise, it doesn't add much value.

> 
> >
> > What I have done in net/dsa/slave.c:dsa_bridge_check_vlan_range() should
> > ideally be moved up to switchdev (this code prevents overlapping of VLAN
> > between bridges in DSA switches).
> 
> As we discussed before over IRC, checking for overlapping VLANs is
> something that may have to be switch driver specific at some point.
> 
> For instance, some switches do a double tag VLAN tag normalization
> whenever a packet is ingressed, which means that as long as your are
> not doing double VLAN tagging (that would be fun to support, but even
> there), you could dedicate an outer VLAN tag per bridge instance, and
> still allow the same inner VLAN tags to be configured by an user. The
> switch would still guarantee proper isolation/switching/individual or
> shared FDBs for these logical domains.

Indeed, you can do internal hacks in the driver to support other
features. But do we really want to do that? Wouldn't it be better to
keep simple and explicit support? Like, VLAN is VLAN, double tagging is
double tagging, etc. So we would avoid confusion in implementations.

DSA and its "hardware bridging" code -- which is a hack using port-based
VLAN maps until full 802.1Q support was added -- is good an example of
confusion that such flexibility might bring.

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-10 10:36   ` Nikolay Aleksandrov
@ 2015-10-11  7:12     ` Jiri Pirko
  2015-10-11 10:49         ` [Bridge] " Nikolay Aleksandrov
  2015-10-11 22:41       ` switchdev and VLAN ranges Vivien Didelot
  0 siblings, 2 replies; 34+ messages in thread
From: Jiri Pirko @ 2015-10-11  7:12 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Elad Raz, Vivien Didelot, Scott Feldman, netdev, stephen,
	Florian Fainelli, Andrew Lunn, Ido Schimmel, Or Gerlitz

Sat, Oct 10, 2015 at 12:36:26PM CEST, nikolay@cumulusnetworks.com wrote:
>On 10/10/2015 09:49 AM, Elad Raz wrote:
>> 
>>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot <vivien.didelot@savoirfairelinux.com> wrote:
>>>
>>> I have two concerns in mind:
>>>
>>> a) if we imagine that drivers like Rocker allocate memory in the prepare
>>> phase for each VID, preparing a range like 100-4000 would definitely not
>>> be recommended.
>>>
>>> b) imagine that you have two Linux bridges on a switch, one using the
>>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
>>> bridge members, it is not possible for the driver to say "I can
>>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>>> the whole range.
>> 
>> Another concern I have with vid_being..vid_end range is the “flags”. Where flags can be BRIDGE_VLAN_INFO_PVID.
>> There is no sense having more than one VLAN as a PVID.
>> This leave the HW vendor the choice which VLAN id they will use as the PVID.
>> 
>
>iproute2 doesn't allow to do it but I can see that someone can actually make it
>so the flags for the range have it and it doesn't look correct. Perhaps we need
>something like the patch below to enforce this from kernel-side.
>
>
>diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>index d78b4429505a..02b17b53e9a6 100644
>--- a/net/bridge/br_netlink.c
>+++ b/net/bridge/br_netlink.c
>@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
> 			if (vinfo_start)
> 				return -EINVAL;
> 			vinfo_start = vinfo;
>+			/* don't allow range of pvids */
>+			if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
>+				return -EINVAL;
> 			continue;
> 		}
> 

Looks correct to me. Could you please submit this properly? Thanks!

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges
  2015-10-11  7:12     ` Jiri Pirko
@ 2015-10-11 10:49         ` Nikolay Aleksandrov
  2015-10-11 22:41       ` switchdev and VLAN ranges Vivien Didelot
  1 sibling, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2015-10-11 10:49 UTC (permalink / raw)
  To: netdev; +Cc: jiri, Nikolay Aleksandrov, bridge, eladr, davem

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Currently it's possible for someone to send a vlan range to the kernel
with the pvid flag set which will result in the pvid bouncing from a
vlan to vlan and isn't correct, it also introduces problems for hardware
where it doesn't make sense having more than 1 pvid. iproute2 already
enforces this, so let's enforce it on kernel-side as well.

Reported-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 net/bridge/br_netlink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index d78b4429505a..02b17b53e9a6 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
 			if (vinfo_start)
 				return -EINVAL;
 			vinfo_start = vinfo;
+			/* don't allow range of pvids */
+			if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
+				return -EINVAL;
 			continue;
 		}
 
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [Bridge] [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges
@ 2015-10-11 10:49         ` Nikolay Aleksandrov
  0 siblings, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2015-10-11 10:49 UTC (permalink / raw)
  To: netdev; +Cc: jiri, Nikolay Aleksandrov, bridge, eladr, davem

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Currently it's possible for someone to send a vlan range to the kernel
with the pvid flag set which will result in the pvid bouncing from a
vlan to vlan and isn't correct, it also introduces problems for hardware
where it doesn't make sense having more than 1 pvid. iproute2 already
enforces this, so let's enforce it on kernel-side as well.

Reported-by: Elad Raz <eladr@mellanox.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 net/bridge/br_netlink.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
index d78b4429505a..02b17b53e9a6 100644
--- a/net/bridge/br_netlink.c
+++ b/net/bridge/br_netlink.c
@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
 			if (vinfo_start)
 				return -EINVAL;
 			vinfo_start = vinfo;
+			/* don't allow range of pvids */
+			if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
+				return -EINVAL;
 			continue;
 		}
 
-- 
2.4.3


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges
  2015-10-11 10:49         ` [Bridge] " Nikolay Aleksandrov
  (?)
@ 2015-10-11 14:13         ` Jiri Pirko
  -1 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2015-10-11 14:13 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, stephen, bridge, davem, eladr, Nikolay Aleksandrov

Sun, Oct 11, 2015 at 12:49:56PM CEST, razor@blackwall.org wrote:
>From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
>Currently it's possible for someone to send a vlan range to the kernel
>with the pvid flag set which will result in the pvid bouncing from a
>vlan to vlan and isn't correct, it also introduces problems for hardware
>where it doesn't make sense having more than 1 pvid. iproute2 already
>enforces this, so let's enforce it on kernel-side as well.
>
>Reported-by: Elad Raz <eladr@mellanox.com>
>Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-11  7:12     ` Jiri Pirko
  2015-10-11 10:49         ` [Bridge] " Nikolay Aleksandrov
@ 2015-10-11 22:41       ` Vivien Didelot
  2015-10-12  0:13         ` Nikolay Aleksandrov
  1 sibling, 1 reply; 34+ messages in thread
From: Vivien Didelot @ 2015-10-11 22:41 UTC (permalink / raw)
  To: Jiri Pirko
  Cc: Nikolay Aleksandrov, Elad Raz, Scott Feldman, netdev, stephen,
	Florian Fainelli, Andrew Lunn, Ido Schimmel, Or Gerlitz

On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote:
> Sat, Oct 10, 2015 at 12:36:26PM CEST, nikolay@cumulusnetworks.com wrote:
> >On 10/10/2015 09:49 AM, Elad Raz wrote:
> >> 
> >>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot <vivien.didelot@savoirfairelinux.com> wrote:
> >>>
> >>> I have two concerns in mind:
> >>>
> >>> a) if we imagine that drivers like Rocker allocate memory in the prepare
> >>> phase for each VID, preparing a range like 100-4000 would definitely not
> >>> be recommended.
> >>>
> >>> b) imagine that you have two Linux bridges on a switch, one using the
> >>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
> >>> bridge members, it is not possible for the driver to say "I can
> >>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> >>> the whole range.
> >> 
> >> Another concern I have with vid_being..vid_end range is the “flags”. Where flags can be BRIDGE_VLAN_INFO_PVID.
> >> There is no sense having more than one VLAN as a PVID.
> >> This leave the HW vendor the choice which VLAN id they will use as the PVID.
> >> 
> >
> >iproute2 doesn't allow to do it but I can see that someone can actually make it
> >so the flags for the range have it and it doesn't look correct. Perhaps we need
> >something like the patch below to enforce this from kernel-side.
> >
> >
> >diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
> >index d78b4429505a..02b17b53e9a6 100644
> >--- a/net/bridge/br_netlink.c
> >+++ b/net/bridge/br_netlink.c
> >@@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
> > 			if (vinfo_start)
> > 				return -EINVAL;
> > 			vinfo_start = vinfo;
> >+			/* don't allow range of pvids */
> >+			if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
> >+				return -EINVAL;
> > 			continue;
> > 		}
> > 
> 
> Looks correct to me. Could you please submit this properly? Thanks!

The above patch is correct, but we only solve part of the problem, since
the range and bridge flags are exposed by switchdev_obj_port_vlan as is.

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-11 22:41       ` switchdev and VLAN ranges Vivien Didelot
@ 2015-10-12  0:13         ` Nikolay Aleksandrov
  2015-10-12  5:14           ` Scott Feldman
  0 siblings, 1 reply; 34+ messages in thread
From: Nikolay Aleksandrov @ 2015-10-12  0:13 UTC (permalink / raw)
  To: Vivien Didelot, Jiri Pirko
  Cc: Elad Raz, Scott Feldman, netdev, stephen, Florian Fainelli,
	Andrew Lunn, Ido Schimmel, Or Gerlitz

On 10/12/2015 12:41 AM, Vivien Didelot wrote:
> On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote:
>> Sat, Oct 10, 2015 at 12:36:26PM CEST, nikolay@cumulusnetworks.com wrote:
>>> On 10/10/2015 09:49 AM, Elad Raz wrote:
>>>>
>>>>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot <vivien.didelot@savoirfairelinux.com> wrote:
>>>>>
>>>>> I have two concerns in mind:
>>>>>
>>>>> a) if we imagine that drivers like Rocker allocate memory in the prepare
>>>>> phase for each VID, preparing a range like 100-4000 would definitely not
>>>>> be recommended.
>>>>>
>>>>> b) imagine that you have two Linux bridges on a switch, one using the
>>>>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
>>>>> bridge members, it is not possible for the driver to say "I can
>>>>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>>>>> the whole range.
>>>>
>>>> Another concern I have with vid_being..vid_end range is the “flags”. Where flags can be BRIDGE_VLAN_INFO_PVID.
>>>> There is no sense having more than one VLAN as a PVID.
>>>> This leave the HW vendor the choice which VLAN id they will use as the PVID.
>>>>
>>>
>>> iproute2 doesn't allow to do it but I can see that someone can actually make it
>>> so the flags for the range have it and it doesn't look correct. Perhaps we need
>>> something like the patch below to enforce this from kernel-side.
>>>
>>>
>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>> index d78b4429505a..02b17b53e9a6 100644
>>> --- a/net/bridge/br_netlink.c
>>> +++ b/net/bridge/br_netlink.c
>>> @@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
>>> 			if (vinfo_start)
>>> 				return -EINVAL;
>>> 			vinfo_start = vinfo;
>>> +			/* don't allow range of pvids */
>>> +			if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
>>> +				return -EINVAL;
>>> 			continue;
>>> 		}
>>>
>>
>> Looks correct to me. Could you please submit this properly? Thanks!
> 
> The above patch is correct, but we only solve part of the problem, since
> the range and bridge flags are exposed by switchdev_obj_port_vlan as is.
> 
> Thanks,
> -v
> 

Yes, the above fixes the bridge side. About the switchdev side it seems like it's
up to the switchdev driver to do the right thing in its switchdev_ops. I took a
quick look at DSA and it seems correct, the flag isn't saved and on dump request
the flags are generated so it shouldn't be possible to export multiple pvids.
But switchdev_port_br_afspec() seems problematic, in fact I don't even see a vlan
id check, i.e. ==0 || >= VLAN_N_MASK.
Of course, I might be totally off point as I'm not that familiar with switchdev and
it's very late. :-)
But maybe it needs something like:

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 6e4a4f9ad927..3dd52a53867f 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -16,6 +16,7 @@
 #include <linux/notifier.h>
 #include <linux/netdevice.h>
 #include <linux/if_bridge.h>
+#include <linux/if_vlan.h>
 #include <linux/list.h>
 #include <net/ip_fib.h>
 #include <net/switchdev.h>
@@ -716,10 +717,14 @@ static int switchdev_port_br_afspec(struct net_device *dev,
 			return -EINVAL;
 		vinfo = nla_data(attr);
 		vlan.flags = vinfo->flags;
+		if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
+                        return -EINVAL;
 		if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
 			if (vlan.vid_begin)
 				return -EINVAL;
 			vlan.vid_begin = vinfo->vid;
+			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
+				return -EINVAL;
 		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
 			if (!vlan.vid_begin)
 				return -EINVAL;

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-12  0:13         ` Nikolay Aleksandrov
@ 2015-10-12  5:14           ` Scott Feldman
  2015-10-12 10:15             ` Nikolay Aleksandrov
  2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
  0 siblings, 2 replies; 34+ messages in thread
From: Scott Feldman @ 2015-10-12  5:14 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: Vivien Didelot, Jiri Pirko, Elad Raz, netdev, stephen,
	Florian Fainelli, Andrew Lunn, Ido Schimmel, Or Gerlitz

On Sun, Oct 11, 2015 at 5:13 PM, Nikolay Aleksandrov
<nikolay@cumulusnetworks.com> wrote:
> On 10/12/2015 12:41 AM, Vivien Didelot wrote:
>> On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote:
>>> Sat, Oct 10, 2015 at 12:36:26PM CEST, nikolay@cumulusnetworks.com wrote:
>>>> On 10/10/2015 09:49 AM, Elad Raz wrote:
>>>>>
>>>>>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot <vivien.didelot@savoirfairelinux.com> wrote:
>>>>>>
>>>>>> I have two concerns in mind:
>>>>>>
>>>>>> a) if we imagine that drivers like Rocker allocate memory in the prepare
>>>>>> phase for each VID, preparing a range like 100-4000 would definitely not
>>>>>> be recommended.
>>>>>>
>>>>>> b) imagine that you have two Linux bridges on a switch, one using the
>>>>>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
>>>>>> bridge members, it is not possible for the driver to say "I can
>>>>>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>>>>>> the whole range.
>>>>>
>>>>> Another concern I have with vid_being..vid_end range is the “flags”. Where flags can be BRIDGE_VLAN_INFO_PVID.
>>>>> There is no sense having more than one VLAN as a PVID.
>>>>> This leave the HW vendor the choice which VLAN id they will use as the PVID.
>>>>>
>>>>
>>>> iproute2 doesn't allow to do it but I can see that someone can actually make it
>>>> so the flags for the range have it and it doesn't look correct. Perhaps we need
>>>> something like the patch below to enforce this from kernel-side.
>>>>
>>>>
>>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>>> index d78b4429505a..02b17b53e9a6 100644
>>>> --- a/net/bridge/br_netlink.c
>>>> +++ b/net/bridge/br_netlink.c
>>>> @@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
>>>>                     if (vinfo_start)
>>>>                             return -EINVAL;
>>>>                     vinfo_start = vinfo;
>>>> +                   /* don't allow range of pvids */
>>>> +                   if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
>>>> +                           return -EINVAL;
>>>>                     continue;
>>>>             }
>>>>
>>>
>>> Looks correct to me. Could you please submit this properly? Thanks!
>>
>> The above patch is correct, but we only solve part of the problem, since
>> the range and bridge flags are exposed by switchdev_obj_port_vlan as is.
>>
>> Thanks,
>> -v
>>
>
> Yes, the above fixes the bridge side. About the switchdev side it seems like it's
> up to the switchdev driver to do the right thing in its switchdev_ops. I took a
> quick look at DSA and it seems correct, the flag isn't saved and on dump request
> the flags are generated so it shouldn't be possible to export multiple pvids.
> But switchdev_port_br_afspec() seems problematic, in fact I don't even see a vlan
> id check, i.e. ==0 || >= VLAN_N_MASK.
> Of course, I might be totally off point as I'm not that familiar with switchdev and
> it's very late. :-)
> But maybe it needs something like:
>
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 6e4a4f9ad927..3dd52a53867f 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -16,6 +16,7 @@
>  #include <linux/notifier.h>
>  #include <linux/netdevice.h>
>  #include <linux/if_bridge.h>
> +#include <linux/if_vlan.h>
>  #include <linux/list.h>
>  #include <net/ip_fib.h>
>  #include <net/switchdev.h>
> @@ -716,10 +717,14 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>                         return -EINVAL;
>                 vinfo = nla_data(attr);
>                 vlan.flags = vinfo->flags;
> +               if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
> +                        return -EINVAL;
>                 if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
>                         if (vlan.vid_begin)
>                                 return -EINVAL;
>                         vlan.vid_begin = vinfo->vid;
> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> +                               return -EINVAL;
>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>                         if (!vlan.vid_begin)
>                                 return -EINVAL;

This (and you other patch) seem right to me, if we're going to block
setting PVID when specifying a vlan range.  Would you mind combining
and resending both patches as one as a proper patch?

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: switchdev and VLAN ranges
  2015-10-12  5:14           ` Scott Feldman
@ 2015-10-12 10:15             ` Nikolay Aleksandrov
  2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
  1 sibling, 0 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2015-10-12 10:15 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Vivien Didelot, Jiri Pirko, Elad Raz, netdev, stephen,
	Florian Fainelli, Andrew Lunn, Ido Schimmel, Or Gerlitz

On 10/12/2015 07:14 AM, Scott Feldman wrote:
> On Sun, Oct 11, 2015 at 5:13 PM, Nikolay Aleksandrov
> <nikolay@cumulusnetworks.com> wrote:
>> On 10/12/2015 12:41 AM, Vivien Didelot wrote:
>>> On Oct. Sunday 11 (41) 09:12 AM, Jiri Pirko wrote:
>>>> Sat, Oct 10, 2015 at 12:36:26PM CEST, nikolay@cumulusnetworks.com wrote:
>>>>> On 10/10/2015 09:49 AM, Elad Raz wrote:
>>>>>>
>>>>>>> On Oct 10, 2015, at 2:30 AM, Vivien Didelot <vivien.didelot@savoirfairelinux.com> wrote:
>>>>>>>
>>>>>>> I have two concerns in mind:
>>>>>>>
>>>>>>> a) if we imagine that drivers like Rocker allocate memory in the prepare
>>>>>>> phase for each VID, preparing a range like 100-4000 would definitely not
>>>>>>> be recommended.
>>>>>>>
>>>>>>> b) imagine that you have two Linux bridges on a switch, one using the
>>>>>>> hardware VLAN 100. If you request the VLAN range 99-101 for the other
>>>>>>> bridge members, it is not possible for the driver to say "I can
>>>>>>> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
>>>>>>> the whole range.
>>>>>>
>>>>>> Another concern I have with vid_being..vid_end range is the “flags”. Where flags can be BRIDGE_VLAN_INFO_PVID.
>>>>>> There is no sense having more than one VLAN as a PVID.
>>>>>> This leave the HW vendor the choice which VLAN id they will use as the PVID.
>>>>>>
>>>>>
>>>>> iproute2 doesn't allow to do it but I can see that someone can actually make it
>>>>> so the flags for the range have it and it doesn't look correct. Perhaps we need
>>>>> something like the patch below to enforce this from kernel-side.
>>>>>
>>>>>
>>>>> diff --git a/net/bridge/br_netlink.c b/net/bridge/br_netlink.c
>>>>> index d78b4429505a..02b17b53e9a6 100644
>>>>> --- a/net/bridge/br_netlink.c
>>>>> +++ b/net/bridge/br_netlink.c
>>>>> @@ -524,6 +524,9 @@ static int br_afspec(struct net_bridge *br,
>>>>>                     if (vinfo_start)
>>>>>                             return -EINVAL;
>>>>>                     vinfo_start = vinfo;
>>>>> +                   /* don't allow range of pvids */
>>>>> +                   if (vinfo_start->flags & BRIDGE_VLAN_INFO_PVID)
>>>>> +                           return -EINVAL;
>>>>>                     continue;
>>>>>             }
>>>>>
>>>>
>>>> Looks correct to me. Could you please submit this properly? Thanks!
>>>
>>> The above patch is correct, but we only solve part of the problem, since
>>> the range and bridge flags are exposed by switchdev_obj_port_vlan as is.
>>>
>>> Thanks,
>>> -v
>>>
>>
>> Yes, the above fixes the bridge side. About the switchdev side it seems like it's
>> up to the switchdev driver to do the right thing in its switchdev_ops. I took a
>> quick look at DSA and it seems correct, the flag isn't saved and on dump request
>> the flags are generated so it shouldn't be possible to export multiple pvids.
>> But switchdev_port_br_afspec() seems problematic, in fact I don't even see a vlan
>> id check, i.e. ==0 || >= VLAN_N_MASK.
>> Of course, I might be totally off point as I'm not that familiar with switchdev and
>> it's very late. :-)
>> But maybe it needs something like:
>>
>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> index 6e4a4f9ad927..3dd52a53867f 100644
>> --- a/net/switchdev/switchdev.c
>> +++ b/net/switchdev/switchdev.c
>> @@ -16,6 +16,7 @@
>>  #include <linux/notifier.h>
>>  #include <linux/netdevice.h>
>>  #include <linux/if_bridge.h>
>> +#include <linux/if_vlan.h>
>>  #include <linux/list.h>
>>  #include <net/ip_fib.h>
>>  #include <net/switchdev.h>
>> @@ -716,10 +717,14 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>>                         return -EINVAL;
>>                 vinfo = nla_data(attr);
>>                 vlan.flags = vinfo->flags;
>> +               if (!vinfo->vid || vinfo->vid >= VLAN_VID_MASK)
>> +                        return -EINVAL;
>>                 if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_BEGIN) {
>>                         if (vlan.vid_begin)
>>                                 return -EINVAL;
>>                         vlan.vid_begin = vinfo->vid;
>> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>> +                               return -EINVAL;
>>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>>                         if (!vlan.vid_begin)
>>                                 return -EINVAL;
> 
> This (and you other patch) seem right to me, if we're going to block
> setting PVID when specifying a vlan range.  Would you mind combining
> and resending both patches as one as a proper patch?
> 

Thanks for the review, I'll prepare a small set as I'd like to keep these
separate since they touch two different subsystems and will re-post.
I'll target net-next with the pvid range change and -net with the vlan
range check patch. Does this sound okay ?

Thanks,
 Nik

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-12  5:14           ` Scott Feldman
  2015-10-12 10:15             ` Nikolay Aleksandrov
@ 2015-10-12 12:01             ` Nikolay Aleksandrov
  2015-10-12 12:11               ` Elad Raz
                                 ` (3 more replies)
  1 sibling, 4 replies; 34+ messages in thread
From: Nikolay Aleksandrov @ 2015-10-12 12:01 UTC (permalink / raw)
  To: netdev; +Cc: sfeldma, jiri, vivien.didelot, davem, Nikolay Aleksandrov

From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
---
 net/switchdev/switchdev.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
index 6e4a4f9ad927..256c596de896 100644
--- a/net/switchdev/switchdev.c
+++ b/net/switchdev/switchdev.c
@@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
 			if (vlan.vid_begin)
 				return -EINVAL;
 			vlan.vid_begin = vinfo->vid;
+			/* don't allow range of pvids */
+			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
+				return -EINVAL;
 		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
 			if (!vlan.vid_begin)
 				return -EINVAL;
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
@ 2015-10-12 12:11               ` Elad Raz
  2015-10-12 12:17               ` Jiri Pirko
                                 ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Elad Raz @ 2015-10-12 12:11 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, Scott Feldman, Jiri Pirko, Vivien Didelot, davem,
	Nikolay Aleksandrov


> On Oct 12, 2015, at 3:01 PM, Nikolay Aleksandrov <razor@blackwall.org> wrote:
> 
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
> net/switchdev/switchdev.c | 3 +++
> 1 file changed, 3 insertions(+)
> 
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 6e4a4f9ad927..256c596de896 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
> 			if (vlan.vid_begin)
> 				return -EINVAL;
> 			vlan.vid_begin = vinfo->vid;
> +			/* don't allow range of pvids */
> +			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> +				return -EINVAL;
> 		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
> 			if (!vlan.vid_begin)
> 				return -EINVAL;

Acked-by: Elad Raz <eladr@mellanox.com>

> -- 
> 2.4.3
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
  2015-10-12 12:11               ` Elad Raz
@ 2015-10-12 12:17               ` Jiri Pirko
  2015-10-12 17:36               ` Vivien Didelot
  2015-10-13 11:42               ` David Miller
  3 siblings, 0 replies; 34+ messages in thread
From: Jiri Pirko @ 2015-10-12 12:17 UTC (permalink / raw)
  To: Nikolay Aleksandrov
  Cc: netdev, sfeldma, vivien.didelot, davem, Nikolay Aleksandrov

Mon, Oct 12, 2015 at 02:01:39PM CEST, razor@blackwall.org wrote:
>From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>
>We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>
>Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Acked-by: Jiri Pirko <jiri@mellanox.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
  2015-10-12 12:11               ` Elad Raz
  2015-10-12 12:17               ` Jiri Pirko
@ 2015-10-12 17:36               ` Vivien Didelot
  2015-10-13  6:13                 ` Scott Feldman
  2015-10-13  8:31                 ` Ido Schimmel
  2015-10-13 11:42               ` David Miller
  3 siblings, 2 replies; 34+ messages in thread
From: Vivien Didelot @ 2015-10-12 17:36 UTC (permalink / raw)
  To: Nikolay Aleksandrov; +Cc: netdev, sfeldma, jiri, davem, Nikolay Aleksandrov

Hi guys,

On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> ---
>  net/switchdev/switchdev.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> index 6e4a4f9ad927..256c596de896 100644
> --- a/net/switchdev/switchdev.c
> +++ b/net/switchdev/switchdev.c
> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>  			if (vlan.vid_begin)
>  				return -EINVAL;
>  			vlan.vid_begin = vinfo->vid;
> +			/* don't allow range of pvids */
> +			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> +				return -EINVAL;
>  		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>  			if (!vlan.vid_begin)
>  				return -EINVAL;
> -- 
> 2.4.3
> 

Yes the patch looks good, but it is a minor check though. I hope the
subject of this thread is making sense.

VLAN ranges seem to have been included for an UX purpose (so commands
look like Cisco IOS). We don't want to change any existing interface, so
we pushed that down to drivers, with the only valid reason that, maybe
one day, an hardware can be capable of programming a range on a per-port
basis.

So what happens is that we'll add some code to fix and check non-sense
(e.g. range + PVID) in switchdev, bridge, and I'm sure we are missing
other spots.

Sorry for being insistent, but this still doesn't look right to me.

It seems like we are bloating bridge, switchdev and drivers for the only
reason to maintain a kernel support for something like:

    # for i in $(seq 100 4000); do bridge vlan add vid $i dev swp0; done

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges
  2015-10-11 10:49         ` [Bridge] " Nikolay Aleksandrov
@ 2015-10-13  2:59           ` David Miller
  -1 siblings, 0 replies; 34+ messages in thread
From: David Miller @ 2015-10-13  2:59 UTC (permalink / raw)
  To: razor; +Cc: netdev, jiri, stephen, bridge, eladr, nikolay

From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Sun, 11 Oct 2015 12:49:56 +0200

> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> Currently it's possible for someone to send a vlan range to the kernel
> with the pvid flag set which will result in the pvid bouncing from a
> vlan to vlan and isn't correct, it also introduces problems for hardware
> where it doesn't make sense having more than 1 pvid. iproute2 already
> enforces this, so let's enforce it on kernel-side as well.
> 
> Reported-by: Elad Raz <eladr@mellanox.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [Bridge] [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges
@ 2015-10-13  2:59           ` David Miller
  0 siblings, 0 replies; 34+ messages in thread
From: David Miller @ 2015-10-13  2:59 UTC (permalink / raw)
  To: razor; +Cc: jiri, nikolay, netdev, bridge, eladr

From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Sun, 11 Oct 2015 12:49:56 +0200

> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> Currently it's possible for someone to send a vlan range to the kernel
> with the pvid flag set which will result in the pvid bouncing from a
> vlan to vlan and isn't correct, it also introduces problems for hardware
> where it doesn't make sense having more than 1 pvid. iproute2 already
> enforces this, so let's enforce it on kernel-side as well.
> 
> Reported-by: Elad Raz <eladr@mellanox.com>
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-12 17:36               ` Vivien Didelot
@ 2015-10-13  6:13                 ` Scott Feldman
  2015-10-13  8:31                 ` Ido Schimmel
  1 sibling, 0 replies; 34+ messages in thread
From: Scott Feldman @ 2015-10-13  6:13 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Nikolay Aleksandrov, Netdev, Jiří Pírko,
	David S. Miller, Nikolay Aleksandrov, Roopa Prabhu

On Mon, Oct 12, 2015 at 10:36 AM, Vivien Didelot
<vivien.didelot@savoirfairelinux.com> wrote:
> Hi guys,
>
> On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>
>> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>>
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>>  net/switchdev/switchdev.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> index 6e4a4f9ad927..256c596de896 100644
>> --- a/net/switchdev/switchdev.c
>> +++ b/net/switchdev/switchdev.c
>> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>>                       if (vlan.vid_begin)
>>                               return -EINVAL;
>>                       vlan.vid_begin = vinfo->vid;
>> +                     /* don't allow range of pvids */
>> +                     if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>> +                             return -EINVAL;
>>               } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>>                       if (!vlan.vid_begin)
>>                               return -EINVAL;
>> --
>> 2.4.3
>>
>
> Yes the patch looks good, but it is a minor check though. I hope the
> subject of this thread is making sense.
>
> VLAN ranges seem to have been included for an UX purpose (so commands
> look like Cisco IOS). We don't want to change any existing interface, so
> we pushed that down to drivers, with the only valid reason that, maybe
> one day, an hardware can be capable of programming a range on a per-port
> basis.
>
> So what happens is that we'll add some code to fix and check non-sense
> (e.g. range + PVID) in switchdev, bridge, and I'm sure we are missing
> other spots.
>
> Sorry for being insistent, but this still doesn't look right to me.
>
> It seems like we are bloating bridge, switchdev and drivers for the only
> reason to maintain a kernel support for something like:
>
>     # for i in $(seq 100 4000); do bridge vlan add vid $i dev swp0; done

[Adding Roopa]

(Roopa, correct me if I'm wrong).

Ref: commit # bdced7ef

IIRC, vlan ranges were introduced to reduce the volume of netlink
traffic when doing a command like your's above.  Your command would
create 3901 netlink msgs with a single IFLA_BRIDGE_VLAN_INFO, sent to
the kernel and then echoed back to user-space.  With vlan ranges, the
cmd "bridge vlan add vid 100-4000 dev swp0" would result in one
netlink msg with one IFLA_BRIDGE_VLAN_INFO.  Seems there were problems
with a getlink dump over-running the netlink msg (not using
NLM_MULTI)?   Maybe Roopa can expand on the problem that was solved.

So the bridge command, and everything below it, has this feature.  We
can't undo it...it's in the wild, so to speak  And, as mentioned
before, it's and all-or-nothing command, and the underlying switchdev
or driver code needs to deal with overlapping vlan maps.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-12 17:36               ` Vivien Didelot
  2015-10-13  6:13                 ` Scott Feldman
@ 2015-10-13  8:31                 ` Ido Schimmel
  2015-10-13 14:32                   ` Vivien Didelot
  1 sibling, 1 reply; 34+ messages in thread
From: Ido Schimmel @ 2015-10-13  8:31 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Nikolay Aleksandrov, netdev, sfeldma, jiri, davem,
	Nikolay Aleksandrov, eladr

Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>Hi guys,
>
>On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> 
>> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>> 
>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> ---
>>  net/switchdev/switchdev.c | 3 +++
>>  1 file changed, 3 insertions(+)
>> 
>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> index 6e4a4f9ad927..256c596de896 100644
>> --- a/net/switchdev/switchdev.c
>> +++ b/net/switchdev/switchdev.c
>> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>>  			if (vlan.vid_begin)
>>  				return -EINVAL;
>>  			vlan.vid_begin = vinfo->vid;
>> +			/* don't allow range of pvids */
>> +			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>> +				return -EINVAL;
>>  		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>>  			if (!vlan.vid_begin)
>>  				return -EINVAL;
>> -- 
>> 2.4.3
>> 
>
>Yes the patch looks good, but it is a minor check though. I hope the
>subject of this thread is making sense.
>
>VLAN ranges seem to have been included for an UX purpose (so commands
>look like Cisco IOS). We don't want to change any existing interface, so
>we pushed that down to drivers, with the only valid reason that, maybe
>one day, an hardware can be capable of programming a range on a per-port
>basis.
Hi,

That's actually what we are doing in mlxsw. We can do up to 256 entries in
one go. We've yet to submit this part.

>
>So what happens is that we'll add some code to fix and check non-sense
>(e.g. range + PVID) in switchdev, bridge, and I'm sure we are missing
>other spots.
>
>Sorry for being insistent, but this still doesn't look right to me.
>
>It seems like we are bloating bridge, switchdev and drivers for the only
>reason to maintain a kernel support for something like:
>
>    # for i in $(seq 100 4000); do bridge vlan add vid $i dev swp0; done
>
>Thanks,
>-v
>--
>To unsubscribe from this list: send the line "unsubscribe netdev" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
                                 ` (2 preceding siblings ...)
  2015-10-12 17:36               ` Vivien Didelot
@ 2015-10-13 11:42               ` David Miller
  3 siblings, 0 replies; 34+ messages in thread
From: David Miller @ 2015-10-13 11:42 UTC (permalink / raw)
  To: razor; +Cc: netdev, sfeldma, jiri, vivien.didelot, nikolay

From: Nikolay Aleksandrov <razor@blackwall.org>
Date: Mon, 12 Oct 2015 14:01:39 +0200

> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> 
> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> 
> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

Applied, thanks.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-13  8:31                 ` Ido Schimmel
@ 2015-10-13 14:32                   ` Vivien Didelot
  2015-10-14  6:14                     ` Ido Schimmel
  0 siblings, 1 reply; 34+ messages in thread
From: Vivien Didelot @ 2015-10-13 14:32 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Nikolay Aleksandrov, netdev, sfeldma, jiri, davem,
	Nikolay Aleksandrov, eladr

On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
> >Hi guys,
> >
> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >> 
> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> >> 
> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >> ---
> >>  net/switchdev/switchdev.c | 3 +++
> >>  1 file changed, 3 insertions(+)
> >> 
> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> >> index 6e4a4f9ad927..256c596de896 100644
> >> --- a/net/switchdev/switchdev.c
> >> +++ b/net/switchdev/switchdev.c
> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
> >>  			if (vlan.vid_begin)
> >>  				return -EINVAL;
> >>  			vlan.vid_begin = vinfo->vid;
> >> +			/* don't allow range of pvids */
> >> +			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> >> +				return -EINVAL;
> >>  		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
> >>  			if (!vlan.vid_begin)
> >>  				return -EINVAL;
> >> -- 
> >> 2.4.3
> >> 
> >
> >Yes the patch looks good, but it is a minor check though. I hope the
> >subject of this thread is making sense.
> >
> >VLAN ranges seem to have been included for an UX purpose (so commands
> >look like Cisco IOS). We don't want to change any existing interface, so
> >we pushed that down to drivers, with the only valid reason that, maybe
> >one day, an hardware can be capable of programming a range on a per-port
> >basis.
> Hi,
> 
> That's actually what we are doing in mlxsw. We can do up to 256 entries in
> one go. We've yet to submit this part.

Perfect Ido, thanks for pointing this out! I'm OK with the range then. 

So there is now a very last question in my head for this, which is more
a matter of kernel design. Should the user be aware of such underlying
support? In other words, would it make sense to do this in a driver:

    foo_port_vlan_add(struct net_device *dev,
                      struct switchdev_obj_port_vlan *vlan)
    {
        if (vlan->vid_begin != vlan->vid_end)
            return -ENOTSUPP; /* or something more relevant for user */

        return foo_port_single_vlan_add(dev, vlan->vid_begin);
    }

So drivers keep being simple, and we can easily propagate the fact that
one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
implemented and must be done in software.

I'm not sure how transparent the hardware must be to the user.

(the problem of one VLAN unsupported in a range is still something we
need to address).

> 
> >
> >So what happens is that we'll add some code to fix and check non-sense
> >(e.g. range + PVID) in switchdev, bridge, and I'm sure we are missing
> >other spots.
> >
> >Sorry for being insistent, but this still doesn't look right to me.
> >
> >It seems like we are bloating bridge, switchdev and drivers for the only
> >reason to maintain a kernel support for something like:
> >
> >    # for i in $(seq 100 4000); do bridge vlan add vid $i dev swp0; done

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-13 14:32                   ` Vivien Didelot
@ 2015-10-14  6:14                     ` Ido Schimmel
  2015-10-14 15:25                       ` Vivien Didelot
  0 siblings, 1 reply; 34+ messages in thread
From: Ido Schimmel @ 2015-10-14  6:14 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Nikolay Aleksandrov, netdev, sfeldma, jiri, davem,
	Nikolay Aleksandrov, eladr

Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
>> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>> >Hi guys,
>> >
>> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> >> 
>> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>> >> 
>> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> >> ---
>> >>  net/switchdev/switchdev.c | 3 +++
>> >>  1 file changed, 3 insertions(+)
>> >> 
>> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> >> index 6e4a4f9ad927..256c596de896 100644
>> >> --- a/net/switchdev/switchdev.c
>> >> +++ b/net/switchdev/switchdev.c
>> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>> >>  			if (vlan.vid_begin)
>> >>  				return -EINVAL;
>> >>  			vlan.vid_begin = vinfo->vid;
>> >> +			/* don't allow range of pvids */
>> >> +			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>> >> +				return -EINVAL;
>> >>  		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>> >>  			if (!vlan.vid_begin)
>> >>  				return -EINVAL;
>> >> -- 
>> >> 2.4.3
>> >> 
>> >
>> >Yes the patch looks good, but it is a minor check though. I hope the
>> >subject of this thread is making sense.
>> >
>> >VLAN ranges seem to have been included for an UX purpose (so commands
>> >look like Cisco IOS). We don't want to change any existing interface, so
>> >we pushed that down to drivers, with the only valid reason that, maybe
>> >one day, an hardware can be capable of programming a range on a per-port
>> >basis.
>> Hi,
>> 
>> That's actually what we are doing in mlxsw. We can do up to 256 entries in
>> one go. We've yet to submit this part.
>
>Perfect Ido, thanks for pointing this out! I'm OK with the range then. 
>
>So there is now a very last question in my head for this, which is more
>a matter of kernel design. Should the user be aware of such underlying
>support? In other words, would it make sense to do this in a driver:
>
>    foo_port_vlan_add(struct net_device *dev,
>                      struct switchdev_obj_port_vlan *vlan)
>    {
>        if (vlan->vid_begin != vlan->vid_end)
>            return -ENOTSUPP; /* or something more relevant for user */
>
>        return foo_port_single_vlan_add(dev, vlan->vid_begin);
>    }
>
>So drivers keep being simple, and we can easily propagate the fact that
>one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
>implemented and must be done in software.
I think that if you want to keep it simple, then Scott's advice from the
previous thread is the most appropriate one. I believe the hardware you
are using is simply not meant to support multiple 802.1Q bridges.

Trying to go around this will simply result in weird behavior (such as
not supporting VLAN ranges), which is only sacrificed to support
something a user doesn't even require (given the fact he's aware the
hardware is meant to support only one 802.1Q bridge).

What do you think?
>
>I'm not sure how transparent the hardware must be to the user.
>
>(the problem of one VLAN unsupported in a range is still something we
>need to address).
>
>> 
>> >
>> >So what happens is that we'll add some code to fix and check non-sense
>> >(e.g. range + PVID) in switchdev, bridge, and I'm sure we are missing
>> >other spots.
>> >
>> >Sorry for being insistent, but this still doesn't look right to me.
>> >
>> >It seems like we are bloating bridge, switchdev and drivers for the only
>> >reason to maintain a kernel support for something like:
>> >
>> >    # for i in $(seq 100 4000); do bridge vlan add vid $i dev swp0; done
>
>Thanks,
>-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-14  6:14                     ` Ido Schimmel
@ 2015-10-14 15:25                       ` Vivien Didelot
  2015-10-14 17:14                         ` Scott Feldman
  0 siblings, 1 reply; 34+ messages in thread
From: Vivien Didelot @ 2015-10-14 15:25 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Nikolay Aleksandrov, netdev, sfeldma, jiri, davem,
	Nikolay Aleksandrov, eladr

On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
> >> >Hi guys,
> >> >
> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> >> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >> >> 
> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> >> >> 
> >> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >> >> ---
> >> >>  net/switchdev/switchdev.c | 3 +++
> >> >>  1 file changed, 3 insertions(+)
> >> >> 
> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> >> >> index 6e4a4f9ad927..256c596de896 100644
> >> >> --- a/net/switchdev/switchdev.c
> >> >> +++ b/net/switchdev/switchdev.c
> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
> >> >>  			if (vlan.vid_begin)
> >> >>  				return -EINVAL;
> >> >>  			vlan.vid_begin = vinfo->vid;
> >> >> +			/* don't allow range of pvids */
> >> >> +			if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> >> >> +				return -EINVAL;
> >> >>  		} else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
> >> >>  			if (!vlan.vid_begin)
> >> >>  				return -EINVAL;
> >> >> -- 
> >> >> 2.4.3
> >> >> 
> >> >
> >> >Yes the patch looks good, but it is a minor check though. I hope the
> >> >subject of this thread is making sense.
> >> >
> >> >VLAN ranges seem to have been included for an UX purpose (so commands
> >> >look like Cisco IOS). We don't want to change any existing interface, so
> >> >we pushed that down to drivers, with the only valid reason that, maybe
> >> >one day, an hardware can be capable of programming a range on a per-port
> >> >basis.
> >> Hi,
> >> 
> >> That's actually what we are doing in mlxsw. We can do up to 256 entries in
> >> one go. We've yet to submit this part.
> >
> >Perfect Ido, thanks for pointing this out! I'm OK with the range then. 
> >
> >So there is now a very last question in my head for this, which is more
> >a matter of kernel design. Should the user be aware of such underlying
> >support? In other words, would it make sense to do this in a driver:
> >
> >    foo_port_vlan_add(struct net_device *dev,
> >                      struct switchdev_obj_port_vlan *vlan)
> >    {
> >        if (vlan->vid_begin != vlan->vid_end)
> >            return -ENOTSUPP; /* or something more relevant for user */
> >
> >        return foo_port_single_vlan_add(dev, vlan->vid_begin);
> >    }
> >
> >So drivers keep being simple, and we can easily propagate the fact that
> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
> >implemented and must be done in software.
> I think that if you want to keep it simple, then Scott's advice from the
> previous thread is the most appropriate one. I believe the hardware you
> are using is simply not meant to support multiple 802.1Q bridges.

You mean allowing only one Linux bridge over an hardware switch?

It would for sure simplify how, as developers and users, we represent a
physical switch. But I am not sure how to achieve that and I don't have
strong opinions on this TBH.

> Trying to go around this will simply result in weird behavior (such as
> not supporting VLAN ranges), which is only sacrificed to support
> something a user doesn't even require (given the fact he's aware the
> hardware is meant to support only one 802.1Q bridge).
> 
> What do you think?

I think mapping one Linux bridge to one physical switch (and thus have
real bridge device) makes a lot of sense.

But since Linux soft bridges are not stackable (yet?), how do we join 2
distinct interfaces then? e.g. eth1 is WAN and switch ports behind eth0.

> >
> >I'm not sure how transparent the hardware must be to the user.
> >
> >(the problem of one VLAN unsupported in a range is still something we
> >need to address).
> >
> >> 
> >> >
> >> >So what happens is that we'll add some code to fix and check non-sense
> >> >(e.g. range + PVID) in switchdev, bridge, and I'm sure we are missing
> >> >other spots.
> >> >
> >> >Sorry for being insistent, but this still doesn't look right to me.
> >> >
> >> >It seems like we are bloating bridge, switchdev and drivers for the only
> >> >reason to maintain a kernel support for something like:
> >> >
> >> >    # for i in $(seq 100 4000); do bridge vlan add vid $i dev swp0; done

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-14 15:25                       ` Vivien Didelot
@ 2015-10-14 17:14                         ` Scott Feldman
  2015-10-14 17:42                           ` Ido Schimmel
  0 siblings, 1 reply; 34+ messages in thread
From: Scott Feldman @ 2015-10-14 17:14 UTC (permalink / raw)
  To: Vivien Didelot
  Cc: Ido Schimmel, Nikolay Aleksandrov, Netdev,
	Jiří Pírko, David S. Miller, Nikolay Aleksandrov,
	Elad Raz

On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
<vivien.didelot@savoirfairelinux.com> wrote:
> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
>> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>> >> >Hi guys,
>> >> >
>> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>> >> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> >> >>
>> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>> >> >>
>> >> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>> >> >> ---
>> >> >>  net/switchdev/switchdev.c | 3 +++
>> >> >>  1 file changed, 3 insertions(+)
>> >> >>
>> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>> >> >> index 6e4a4f9ad927..256c596de896 100644
>> >> >> --- a/net/switchdev/switchdev.c
>> >> >> +++ b/net/switchdev/switchdev.c
>> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>> >> >>                         if (vlan.vid_begin)
>> >> >>                                 return -EINVAL;
>> >> >>                         vlan.vid_begin = vinfo->vid;
>> >> >> +                       /* don't allow range of pvids */
>> >> >> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>> >> >> +                               return -EINVAL;
>> >> >>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>> >> >>                         if (!vlan.vid_begin)
>> >> >>                                 return -EINVAL;
>> >> >> --
>> >> >> 2.4.3
>> >> >>
>> >> >
>> >> >Yes the patch looks good, but it is a minor check though. I hope the
>> >> >subject of this thread is making sense.
>> >> >
>> >> >VLAN ranges seem to have been included for an UX purpose (so commands
>> >> >look like Cisco IOS). We don't want to change any existing interface, so
>> >> >we pushed that down to drivers, with the only valid reason that, maybe
>> >> >one day, an hardware can be capable of programming a range on a per-port
>> >> >basis.
>> >> Hi,
>> >>
>> >> That's actually what we are doing in mlxsw. We can do up to 256 entries in
>> >> one go. We've yet to submit this part.
>> >
>> >Perfect Ido, thanks for pointing this out! I'm OK with the range then.
>> >
>> >So there is now a very last question in my head for this, which is more
>> >a matter of kernel design. Should the user be aware of such underlying
>> >support? In other words, would it make sense to do this in a driver:
>> >
>> >    foo_port_vlan_add(struct net_device *dev,
>> >                      struct switchdev_obj_port_vlan *vlan)
>> >    {
>> >        if (vlan->vid_begin != vlan->vid_end)
>> >            return -ENOTSUPP; /* or something more relevant for user */
>> >
>> >        return foo_port_single_vlan_add(dev, vlan->vid_begin);
>> >    }
>> >
>> >So drivers keep being simple, and we can easily propagate the fact that
>> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
>> >implemented and must be done in software.
>> I think that if you want to keep it simple, then Scott's advice from the
>> previous thread is the most appropriate one. I believe the hardware you
>> are using is simply not meant to support multiple 802.1Q bridges.
>
> You mean allowing only one Linux bridge over an hardware switch?
>
> It would for sure simplify how, as developers and users, we represent a
> physical switch. But I am not sure how to achieve that and I don't have
> strong opinions on this TBH.

Hi Vivien, I think it's possible to keep switch ports on just one
bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
notifier.  This will give you the driver-level control you want.  Do
you have time to investigate?  The idea is:

1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
being added to a second bridge,then return NOTIFY_BAD.  Your driver
needs to track the bridge count.

2) In __netdev_upper_dev_link(), check the return code from the
call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
NOTIFY_BAD, abort the linking operation (goto rollback_xxx).

-scott

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-14 17:14                         ` Scott Feldman
@ 2015-10-14 17:42                           ` Ido Schimmel
  2015-10-14 18:51                             ` Vivien Didelot
  2015-10-15  2:58                             ` Scott Feldman
  0 siblings, 2 replies; 34+ messages in thread
From: Ido Schimmel @ 2015-10-14 17:42 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Vivien Didelot, Nikolay Aleksandrov, Netdev,
	Jiří Pírko, David S. Miller, Nikolay Aleksandrov,
	Elad Raz

Wed, Oct 14, 2015 at 08:14:24PM IDT, sfeldma@gmail.com wrote:
>On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
><vivien.didelot@savoirfairelinux.com> wrote:
>> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
>>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
>>> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>> >> >Hi guys,
>>> >> >
>>> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>>> >> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>> >> >>
>>> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>>> >> >>
>>> >> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>> >> >> ---
>>> >> >>  net/switchdev/switchdev.c | 3 +++
>>> >> >>  1 file changed, 3 insertions(+)
>>> >> >>
>>> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>> >> >> index 6e4a4f9ad927..256c596de896 100644
>>> >> >> --- a/net/switchdev/switchdev.c
>>> >> >> +++ b/net/switchdev/switchdev.c
>>> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>>> >> >>                         if (vlan.vid_begin)
>>> >> >>                                 return -EINVAL;
>>> >> >>                         vlan.vid_begin = vinfo->vid;
>>> >> >> +                       /* don't allow range of pvids */
>>> >> >> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>>> >> >> +                               return -EINVAL;
>>> >> >>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>>> >> >>                         if (!vlan.vid_begin)
>>> >> >>                                 return -EINVAL;
>>> >> >> --
>>> >> >> 2.4.3
>>> >> >>
>>> >> >
>>> >> >Yes the patch looks good, but it is a minor check though. I hope the
>>> >> >subject of this thread is making sense.
>>> >> >
>>> >> >VLAN ranges seem to have been included for an UX purpose (so commands
>>> >> >look like Cisco IOS). We don't want to change any existing interface, so
>>> >> >we pushed that down to drivers, with the only valid reason that, maybe
>>> >> >one day, an hardware can be capable of programming a range on a per-port
>>> >> >basis.
>>> >> Hi,
>>> >>
>>> >> That's actually what we are doing in mlxsw. We can do up to 256 entries in
>>> >> one go. We've yet to submit this part.
>>> >
>>> >Perfect Ido, thanks for pointing this out! I'm OK with the range then.
>>> >
>>> >So there is now a very last question in my head for this, which is more
>>> >a matter of kernel design. Should the user be aware of such underlying
>>> >support? In other words, would it make sense to do this in a driver:
>>> >
>>> >    foo_port_vlan_add(struct net_device *dev,
>>> >                      struct switchdev_obj_port_vlan *vlan)
>>> >    {
>>> >        if (vlan->vid_begin != vlan->vid_end)
>>> >            return -ENOTSUPP; /* or something more relevant for user */
>>> >
>>> >        return foo_port_single_vlan_add(dev, vlan->vid_begin);
>>> >    }
>>> >
>>> >So drivers keep being simple, and we can easily propagate the fact that
>>> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
>>> >implemented and must be done in software.
>>> I think that if you want to keep it simple, then Scott's advice from the
>>> previous thread is the most appropriate one. I believe the hardware you
>>> are using is simply not meant to support multiple 802.1Q bridges.
>>
>> You mean allowing only one Linux bridge over an hardware switch?
>>
>> It would for sure simplify how, as developers and users, we represent a
>> physical switch. But I am not sure how to achieve that and I don't have
>> strong opinions on this TBH.
>
>Hi Vivien, I think it's possible to keep switch ports on just one
>bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
>notifier.  This will give you the driver-level control you want.  Do
>you have time to investigate?  The idea is:
>
>1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
>being added to a second bridge,then return NOTIFY_BAD.  Your driver
>needs to track the bridge count.
>
>2) In __netdev_upper_dev_link(), check the return code from the
>call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
>NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
>
Hi,

We are doing something similar in mlxsw (not upstream yet). Jiri
introduced PRE_CHANGEUPPER, which is called from the function you
mentioned, but before the linking operation (so that you don't need to
rollback).

If the notification is about a linking operation and the master is a
bridge different than the current one, then NOTIFY_BAD is returned.

Vivien, regarding your WAN interface question, this is something we
currently don't do. We don't even flood traffic from bridged ports
to CPU (although we can), as it can saturate the bus. Only control
traffic is supposed to go there.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-14 17:42                           ` Ido Schimmel
@ 2015-10-14 18:51                             ` Vivien Didelot
  2015-10-14 22:08                               ` Florian Fainelli
  2015-10-15  2:58                             ` Scott Feldman
  1 sibling, 1 reply; 34+ messages in thread
From: Vivien Didelot @ 2015-10-14 18:51 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Scott Feldman, Nikolay Aleksandrov, Netdev,
	Jiří Pírko, David S. Miller, Nikolay Aleksandrov,
	Elad Raz

On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote:
> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfeldma@gmail.com wrote:
> >On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
> ><vivien.didelot@savoirfairelinux.com> wrote:
> >> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
> >>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
> >>> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
> >>> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
> >>> >> >Hi guys,
> >>> >> >
> >>> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> >>> >> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >>> >> >>
> >>> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> >>> >> >>
> >>> >> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >>> >> >> ---
> >>> >> >>  net/switchdev/switchdev.c | 3 +++
> >>> >> >>  1 file changed, 3 insertions(+)
> >>> >> >>
> >>> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> >>> >> >> index 6e4a4f9ad927..256c596de896 100644
> >>> >> >> --- a/net/switchdev/switchdev.c
> >>> >> >> +++ b/net/switchdev/switchdev.c
> >>> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
> >>> >> >>                         if (vlan.vid_begin)
> >>> >> >>                                 return -EINVAL;
> >>> >> >>                         vlan.vid_begin = vinfo->vid;
> >>> >> >> +                       /* don't allow range of pvids */
> >>> >> >> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> >>> >> >> +                               return -EINVAL;
> >>> >> >>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
> >>> >> >>                         if (!vlan.vid_begin)
> >>> >> >>                                 return -EINVAL;
> >>> >> >> --
> >>> >> >> 2.4.3
> >>> >> >>
> >>> >> >
> >>> >> >Yes the patch looks good, but it is a minor check though. I hope the
> >>> >> >subject of this thread is making sense.
> >>> >> >
> >>> >> >VLAN ranges seem to have been included for an UX purpose (so commands
> >>> >> >look like Cisco IOS). We don't want to change any existing interface, so
> >>> >> >we pushed that down to drivers, with the only valid reason that, maybe
> >>> >> >one day, an hardware can be capable of programming a range on a per-port
> >>> >> >basis.
> >>> >> Hi,
> >>> >>
> >>> >> That's actually what we are doing in mlxsw. We can do up to 256 entries in
> >>> >> one go. We've yet to submit this part.
> >>> >
> >>> >Perfect Ido, thanks for pointing this out! I'm OK with the range then.
> >>> >
> >>> >So there is now a very last question in my head for this, which is more
> >>> >a matter of kernel design. Should the user be aware of such underlying
> >>> >support? In other words, would it make sense to do this in a driver:
> >>> >
> >>> >    foo_port_vlan_add(struct net_device *dev,
> >>> >                      struct switchdev_obj_port_vlan *vlan)
> >>> >    {
> >>> >        if (vlan->vid_begin != vlan->vid_end)
> >>> >            return -ENOTSUPP; /* or something more relevant for user */
> >>> >
> >>> >        return foo_port_single_vlan_add(dev, vlan->vid_begin);
> >>> >    }
> >>> >
> >>> >So drivers keep being simple, and we can easily propagate the fact that
> >>> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
> >>> >implemented and must be done in software.
> >>> I think that if you want to keep it simple, then Scott's advice from the
> >>> previous thread is the most appropriate one. I believe the hardware you
> >>> are using is simply not meant to support multiple 802.1Q bridges.
> >>
> >> You mean allowing only one Linux bridge over an hardware switch?
> >>
> >> It would for sure simplify how, as developers and users, we represent a
> >> physical switch. But I am not sure how to achieve that and I don't have
> >> strong opinions on this TBH.
> >
> >Hi Vivien, I think it's possible to keep switch ports on just one
> >bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
> >notifier.  This will give you the driver-level control you want.  Do
> >you have time to investigate?  The idea is:
> >
> >1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
> >being added to a second bridge,then return NOTIFY_BAD.  Your driver
> >needs to track the bridge count.
> >
> >2) In __netdev_upper_dev_link(), check the return code from the
> >call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
> >NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
> >
> Hi,
> 
> We are doing something similar in mlxsw (not upstream yet). Jiri
> introduced PRE_CHANGEUPPER, which is called from the function you
> mentioned, but before the linking operation (so that you don't need to
> rollback).
> 
> If the notification is about a linking operation and the master is a
> bridge different than the current one, then NOTIFY_BAD is returned.

Great, I'll wait for this then.

Scott, this is another good reason why we definitely need a simple
struct device per switch chip. In addition to the port net_device
registration, the netdev notifier is another exact same piece of code
that both Rocker and DSA implement.

> Vivien, regarding your WAN interface question, this is something we
> currently don't do. We don't even flood traffic from bridged ports
> to CPU (although we can), as it can saturate the bus. Only control
> traffic is supposed to go there.

I kinda answered it myself: a Linux bridge needs to remain a user
abstraction of a logical group of net_device. In other words, we must
allow physical distinct ports under the same bridge.

Below is an example of a custom router with 2 chained switch chips sw0
and sw1, and what usage I believe we expect:

    [ Linux soft bridge "br0" which can accelerate VLAN, STP, etc.    ]
                                                       (CPU)    (WAN)
    [ sw0p0 sw0p1 sw0p2 ] [ sw1p0 sw1p1 sw1p2 sw1p3 ] [ eth0 ] [ eth1 ]
                     `--DSA--'                   `-------'

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-14 18:51                             ` Vivien Didelot
@ 2015-10-14 22:08                               ` Florian Fainelli
  2015-10-15  0:07                                 ` Vivien Didelot
  0 siblings, 1 reply; 34+ messages in thread
From: Florian Fainelli @ 2015-10-14 22:08 UTC (permalink / raw)
  To: Vivien Didelot, Ido Schimmel
  Cc: Scott Feldman, Nikolay Aleksandrov, Netdev,
	Jiří Pírko, David S. Miller, Nikolay Aleksandrov,
	Elad Raz

On 14/10/15 11:51, Vivien Didelot wrote:
> On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote:
>> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfeldma@gmail.com wrote:
>>> On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
>>> <vivien.didelot@savoirfairelinux.com> wrote:
>>>> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
>>>>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>>>>> On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
>>>>>>> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>>>>>>> Hi guys,
>>>>>>>>
>>>>>>>> On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>>>>>>>>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>>>>>>
>>>>>>>>> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>>>>>>>>>
>>>>>>>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>>>>>> ---
>>>>>>>>>  net/switchdev/switchdev.c | 3 +++
>>>>>>>>>  1 file changed, 3 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>>>>>> index 6e4a4f9ad927..256c596de896 100644
>>>>>>>>> --- a/net/switchdev/switchdev.c
>>>>>>>>> +++ b/net/switchdev/switchdev.c
>>>>>>>>> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>>>>>>>>>                         if (vlan.vid_begin)
>>>>>>>>>                                 return -EINVAL;
>>>>>>>>>                         vlan.vid_begin = vinfo->vid;
>>>>>>>>> +                       /* don't allow range of pvids */
>>>>>>>>> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>>>>>>>>> +                               return -EINVAL;
>>>>>>>>>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>>>>>>>>>                         if (!vlan.vid_begin)
>>>>>>>>>                                 return -EINVAL;
>>>>>>>>> --
>>>>>>>>> 2.4.3
>>>>>>>>>
>>>>>>>>
>>>>>>>> Yes the patch looks good, but it is a minor check though. I hope the
>>>>>>>> subject of this thread is making sense.
>>>>>>>>
>>>>>>>> VLAN ranges seem to have been included for an UX purpose (so commands
>>>>>>>> look like Cisco IOS). We don't want to change any existing interface, so
>>>>>>>> we pushed that down to drivers, with the only valid reason that, maybe
>>>>>>>> one day, an hardware can be capable of programming a range on a per-port
>>>>>>>> basis.
>>>>>>> Hi,
>>>>>>>
>>>>>>> That's actually what we are doing in mlxsw. We can do up to 256 entries in
>>>>>>> one go. We've yet to submit this part.
>>>>>>
>>>>>> Perfect Ido, thanks for pointing this out! I'm OK with the range then.
>>>>>>
>>>>>> So there is now a very last question in my head for this, which is more
>>>>>> a matter of kernel design. Should the user be aware of such underlying
>>>>>> support? In other words, would it make sense to do this in a driver:
>>>>>>
>>>>>>    foo_port_vlan_add(struct net_device *dev,
>>>>>>                      struct switchdev_obj_port_vlan *vlan)
>>>>>>    {
>>>>>>        if (vlan->vid_begin != vlan->vid_end)
>>>>>>            return -ENOTSUPP; /* or something more relevant for user */
>>>>>>
>>>>>>        return foo_port_single_vlan_add(dev, vlan->vid_begin);
>>>>>>    }
>>>>>>
>>>>>> So drivers keep being simple, and we can easily propagate the fact that
>>>>>> one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
>>>>>> implemented and must be done in software.
>>>>> I think that if you want to keep it simple, then Scott's advice from the
>>>>> previous thread is the most appropriate one. I believe the hardware you
>>>>> are using is simply not meant to support multiple 802.1Q bridges.
>>>>
>>>> You mean allowing only one Linux bridge over an hardware switch?
>>>>
>>>> It would for sure simplify how, as developers and users, we represent a
>>>> physical switch. But I am not sure how to achieve that and I don't have
>>>> strong opinions on this TBH.
>>>
>>> Hi Vivien, I think it's possible to keep switch ports on just one
>>> bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
>>> notifier.  This will give you the driver-level control you want.  Do
>>> you have time to investigate?  The idea is:
>>>
>>> 1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
>>> being added to a second bridge,then return NOTIFY_BAD.  Your driver
>>> needs to track the bridge count.
>>>
>>> 2) In __netdev_upper_dev_link(), check the return code from the
>>> call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
>>> NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
>>>
>> Hi,
>>
>> We are doing something similar in mlxsw (not upstream yet). Jiri
>> introduced PRE_CHANGEUPPER, which is called from the function you
>> mentioned, but before the linking operation (so that you don't need to
>> rollback).
>>
>> If the notification is about a linking operation and the master is a
>> bridge different than the current one, then NOTIFY_BAD is returned.
> 
> Great, I'll wait for this then.
> 
> Scott, this is another good reason why we definitely need a simple
> struct device per switch chip. In addition to the port net_device
> registration, the netdev notifier is another exact same piece of code
> that both Rocker and DSA implement.
> 
>> Vivien, regarding your WAN interface question, this is something we
>> currently don't do. We don't even flood traffic from bridged ports
>> to CPU (although we can), as it can saturate the bus. Only control
>> traffic is supposed to go there.
> 
> I kinda answered it myself: a Linux bridge needs to remain a user
> abstraction of a logical group of net_device. In other words, we must
> allow physical distinct ports under the same bridge.
> 
> Below is an example of a custom router with 2 chained switch chips sw0
> and sw1, and what usage I believe we expect:
> 
>     [ Linux soft bridge "br0" which can accelerate VLAN, STP, etc.    ]
>                                                        (CPU)    (WAN)
>     [ sw0p0 sw0p1 sw0p2 ] [ sw1p0 sw1p1 sw1p2 sw1p3 ] [ eth0 ] [ eth1 ]
>                      `--DSA--'                   `-------'

Your use case is something that is fairly common with PON/GPON devices,
but AFAIR they typically implement this by having two sets of bridges,
one which spans the WAN interface and a bunch of other ports, and
another bridge which is LAN only (few ports + Wi-Fi typically). Usually
this is under the same physical switch though, so this is all about
partitioning physical ports and physical interfaces under logical groups.

By definition the WAN and LAN domains are separate logical and broadcast
domains, with separate admission control rules, STP etc. I do not think
your "br0" example should span the WAN interface in this case. Also,
with eth0 being the conduit interface, it cannot be allowed to be a
bridge member, that's something I ended up fixing in the bridge layer,
otherwise untagging does not work, but that is nitpicking.

It still makes sense though allow the creation of a "br0" device which
spans the entire set of ports and interfaces (except eth0), but not name
eth1 "WAN", just treat it as an extension in that case.

Sorry if that seems like f'ing flies, but having concise example should
help make progress on these design issues ;)
-- 
Florian

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-14 22:08                               ` Florian Fainelli
@ 2015-10-15  0:07                                 ` Vivien Didelot
  0 siblings, 0 replies; 34+ messages in thread
From: Vivien Didelot @ 2015-10-15  0:07 UTC (permalink / raw)
  To: Florian Fainelli
  Cc: Ido Schimmel, Scott Feldman, Nikolay Aleksandrov, Netdev,
	Jiří Pírko, David S. Miller, Nikolay Aleksandrov,
	Elad Raz

On Oct. Wednesday 14 (42) 03:08 PM, Florian Fainelli wrote:
> On 14/10/15 11:51, Vivien Didelot wrote:
> > On Oct. Wednesday 14 (42) 08:42 PM, Ido Schimmel wrote:
> >> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfeldma@gmail.com wrote:
> >>> On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
> >>> <vivien.didelot@savoirfairelinux.com> wrote:
> >>>> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
> >>>>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
> >>>>>> On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
> >>>>>>> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
> >>>>>>>> Hi guys,
> >>>>>>>>
> >>>>>>>> On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
> >>>>>>>>> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >>>>>>>>>
> >>>>>>>>> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
> >>>>>>>>>
> >>>>>>>>> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
> >>>>>>>>> ---
> >>>>>>>>>  net/switchdev/switchdev.c | 3 +++
> >>>>>>>>>  1 file changed, 3 insertions(+)
> >>>>>>>>>
> >>>>>>>>> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
> >>>>>>>>> index 6e4a4f9ad927..256c596de896 100644
> >>>>>>>>> --- a/net/switchdev/switchdev.c
> >>>>>>>>> +++ b/net/switchdev/switchdev.c
> >>>>>>>>> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
> >>>>>>>>>                         if (vlan.vid_begin)
> >>>>>>>>>                                 return -EINVAL;
> >>>>>>>>>                         vlan.vid_begin = vinfo->vid;
> >>>>>>>>> +                       /* don't allow range of pvids */
> >>>>>>>>> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
> >>>>>>>>> +                               return -EINVAL;
> >>>>>>>>>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
> >>>>>>>>>                         if (!vlan.vid_begin)
> >>>>>>>>>                                 return -EINVAL;
> >>>>>>>>> --
> >>>>>>>>> 2.4.3
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> Yes the patch looks good, but it is a minor check though. I hope the
> >>>>>>>> subject of this thread is making sense.
> >>>>>>>>
> >>>>>>>> VLAN ranges seem to have been included for an UX purpose (so commands
> >>>>>>>> look like Cisco IOS). We don't want to change any existing interface, so
> >>>>>>>> we pushed that down to drivers, with the only valid reason that, maybe
> >>>>>>>> one day, an hardware can be capable of programming a range on a per-port
> >>>>>>>> basis.
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> That's actually what we are doing in mlxsw. We can do up to 256 entries in
> >>>>>>> one go. We've yet to submit this part.
> >>>>>>
> >>>>>> Perfect Ido, thanks for pointing this out! I'm OK with the range then.
> >>>>>>
> >>>>>> So there is now a very last question in my head for this, which is more
> >>>>>> a matter of kernel design. Should the user be aware of such underlying
> >>>>>> support? In other words, would it make sense to do this in a driver:
> >>>>>>
> >>>>>>    foo_port_vlan_add(struct net_device *dev,
> >>>>>>                      struct switchdev_obj_port_vlan *vlan)
> >>>>>>    {
> >>>>>>        if (vlan->vid_begin != vlan->vid_end)
> >>>>>>            return -ENOTSUPP; /* or something more relevant for user */
> >>>>>>
> >>>>>>        return foo_port_single_vlan_add(dev, vlan->vid_begin);
> >>>>>>    }
> >>>>>>
> >>>>>> So drivers keep being simple, and we can easily propagate the fact that
> >>>>>> one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
> >>>>>> implemented and must be done in software.
> >>>>> I think that if you want to keep it simple, then Scott's advice from the
> >>>>> previous thread is the most appropriate one. I believe the hardware you
> >>>>> are using is simply not meant to support multiple 802.1Q bridges.
> >>>>
> >>>> You mean allowing only one Linux bridge over an hardware switch?
> >>>>
> >>>> It would for sure simplify how, as developers and users, we represent a
> >>>> physical switch. But I am not sure how to achieve that and I don't have
> >>>> strong opinions on this TBH.
> >>>
> >>> Hi Vivien, I think it's possible to keep switch ports on just one
> >>> bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
> >>> notifier.  This will give you the driver-level control you want.  Do
> >>> you have time to investigate?  The idea is:
> >>>
> >>> 1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
> >>> being added to a second bridge,then return NOTIFY_BAD.  Your driver
> >>> needs to track the bridge count.
> >>>
> >>> 2) In __netdev_upper_dev_link(), check the return code from the
> >>> call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
> >>> NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
> >>>
> >> Hi,
> >>
> >> We are doing something similar in mlxsw (not upstream yet). Jiri
> >> introduced PRE_CHANGEUPPER, which is called from the function you
> >> mentioned, but before the linking operation (so that you don't need to
> >> rollback).
> >>
> >> If the notification is about a linking operation and the master is a
> >> bridge different than the current one, then NOTIFY_BAD is returned.
> > 
> > Great, I'll wait for this then.
> > 
> > Scott, this is another good reason why we definitely need a simple
> > struct device per switch chip. In addition to the port net_device
> > registration, the netdev notifier is another exact same piece of code
> > that both Rocker and DSA implement.
> > 
> >> Vivien, regarding your WAN interface question, this is something we
> >> currently don't do. We don't even flood traffic from bridged ports
> >> to CPU (although we can), as it can saturate the bus. Only control
> >> traffic is supposed to go there.
> > 
> > I kinda answered it myself: a Linux bridge needs to remain a user
> > abstraction of a logical group of net_device. In other words, we must
> > allow physical distinct ports under the same bridge.
> > 
> > Below is an example of a custom router with 2 chained switch chips sw0
> > and sw1, and what usage I believe we expect:
> > 
> >     [ Linux soft bridge "br0" which can accelerate VLAN, STP, etc.    ]
> >                                                        (CPU)    (WAN)
> >     [ sw0p0 sw0p1 sw0p2 ] [ sw1p0 sw1p1 sw1p2 sw1p3 ] [ eth0 ] [ eth1 ]
> >                      `--DSA--'                   `-------'
> 
> Your use case is something that is fairly common with PON/GPON devices,
> but AFAIR they typically implement this by having two sets of bridges,
> one which spans the WAN interface and a bunch of other ports, and
> another bridge which is LAN only (few ports + Wi-Fi typically). Usually
> this is under the same physical switch though, so this is all about
> partitioning physical ports and physical interfaces under logical groups.
> 
> By definition the WAN and LAN domains are separate logical and broadcast
> domains, with separate admission control rules, STP etc. I do not think
> your "br0" example should span the WAN interface in this case. Also,
> with eth0 being the conduit interface, it cannot be allowed to be a
> bridge member, that's something I ended up fixing in the bridge layer,
> otherwise untagging does not work, but that is nitpicking.
> 
> It still makes sense though allow the creation of a "br0" device which
> spans the entire set of ports and interfaces (except eth0), but not name
> eth1 "WAN", just treat it as an extension in that case.
> 
> Sorry if that seems like f'ing flies, but having concise example should
> help make progress on these design issues ;)

You are right, my drawing wasn't that correct. What you just described
is more accurate.

Thanks,
-v

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-14 17:42                           ` Ido Schimmel
  2015-10-14 18:51                             ` Vivien Didelot
@ 2015-10-15  2:58                             ` Scott Feldman
  2015-10-15  7:28                               ` Ido Schimmel
  1 sibling, 1 reply; 34+ messages in thread
From: Scott Feldman @ 2015-10-15  2:58 UTC (permalink / raw)
  To: Ido Schimmel
  Cc: Vivien Didelot, Nikolay Aleksandrov, Netdev,
	Jiří Pírko, David S. Miller, Nikolay Aleksandrov,
	Elad Raz

On Wed, Oct 14, 2015 at 10:42 AM, Ido Schimmel <idosch@mellanox.com> wrote:
> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfeldma@gmail.com wrote:
>>On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
>><vivien.didelot@savoirfairelinux.com> wrote:
>>> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
>>>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>>> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
>>>> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>>> >> >Hi guys,
>>>> >> >
>>>> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>>>> >> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>> >> >>
>>>> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>>>> >> >>
>>>> >> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>> >> >> ---
>>>> >> >>  net/switchdev/switchdev.c | 3 +++
>>>> >> >>  1 file changed, 3 insertions(+)
>>>> >> >>
>>>> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>> >> >> index 6e4a4f9ad927..256c596de896 100644
>>>> >> >> --- a/net/switchdev/switchdev.c
>>>> >> >> +++ b/net/switchdev/switchdev.c
>>>> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>>>> >> >>                         if (vlan.vid_begin)
>>>> >> >>                                 return -EINVAL;
>>>> >> >>                         vlan.vid_begin = vinfo->vid;
>>>> >> >> +                       /* don't allow range of pvids */
>>>> >> >> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>>>> >> >> +                               return -EINVAL;
>>>> >> >>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>>>> >> >>                         if (!vlan.vid_begin)
>>>> >> >>                                 return -EINVAL;
>>>> >> >> --
>>>> >> >> 2.4.3
>>>> >> >>
>>>> >> >
>>>> >> >Yes the patch looks good, but it is a minor check though. I hope the
>>>> >> >subject of this thread is making sense.
>>>> >> >
>>>> >> >VLAN ranges seem to have been included for an UX purpose (so commands
>>>> >> >look like Cisco IOS). We don't want to change any existing interface, so
>>>> >> >we pushed that down to drivers, with the only valid reason that, maybe
>>>> >> >one day, an hardware can be capable of programming a range on a per-port
>>>> >> >basis.
>>>> >> Hi,
>>>> >>
>>>> >> That's actually what we are doing in mlxsw. We can do up to 256 entries in
>>>> >> one go. We've yet to submit this part.
>>>> >
>>>> >Perfect Ido, thanks for pointing this out! I'm OK with the range then.
>>>> >
>>>> >So there is now a very last question in my head for this, which is more
>>>> >a matter of kernel design. Should the user be aware of such underlying
>>>> >support? In other words, would it make sense to do this in a driver:
>>>> >
>>>> >    foo_port_vlan_add(struct net_device *dev,
>>>> >                      struct switchdev_obj_port_vlan *vlan)
>>>> >    {
>>>> >        if (vlan->vid_begin != vlan->vid_end)
>>>> >            return -ENOTSUPP; /* or something more relevant for user */
>>>> >
>>>> >        return foo_port_single_vlan_add(dev, vlan->vid_begin);
>>>> >    }
>>>> >
>>>> >So drivers keep being simple, and we can easily propagate the fact that
>>>> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
>>>> >implemented and must be done in software.
>>>> I think that if you want to keep it simple, then Scott's advice from the
>>>> previous thread is the most appropriate one. I believe the hardware you
>>>> are using is simply not meant to support multiple 802.1Q bridges.
>>>
>>> You mean allowing only one Linux bridge over an hardware switch?
>>>
>>> It would for sure simplify how, as developers and users, we represent a
>>> physical switch. But I am not sure how to achieve that and I don't have
>>> strong opinions on this TBH.
>>
>>Hi Vivien, I think it's possible to keep switch ports on just one
>>bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
>>notifier.  This will give you the driver-level control you want.  Do
>>you have time to investigate?  The idea is:
>>
>>1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
>>being added to a second bridge,then return NOTIFY_BAD.  Your driver
>>needs to track the bridge count.
>>
>>2) In __netdev_upper_dev_link(), check the return code from the
>>call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
>>NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
>>
> Hi,
>
> We are doing something similar in mlxsw (not upstream yet). Jiri
> introduced PRE_CHANGEUPPER, which is called from the function you
> mentioned, but before the linking operation (so that you don't need to
> rollback).

Oh, cool.

> If the notification is about a linking operation and the master is a
> bridge different than the current one, then NOTIFY_BAD is returned.

So you're wanting to restrict to just one bridge also?  Or is
NOTIFY_BAD returned for some other reason?  I guess I should be
patient and wait for the patch.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges
  2015-10-15  2:58                             ` Scott Feldman
@ 2015-10-15  7:28                               ` Ido Schimmel
  0 siblings, 0 replies; 34+ messages in thread
From: Ido Schimmel @ 2015-10-15  7:28 UTC (permalink / raw)
  To: Scott Feldman
  Cc: Vivien Didelot, Nikolay Aleksandrov, Netdev,
	Jiří Pírko, David S. Miller, Nikolay Aleksandrov,
	Elad Raz

Thu, Oct 15, 2015 at 05:58:33AM IDT, sfeldma@gmail.com wrote:
>On Wed, Oct 14, 2015 at 10:42 AM, Ido Schimmel <idosch@mellanox.com> wrote:
>> Wed, Oct 14, 2015 at 08:14:24PM IDT, sfeldma@gmail.com wrote:
>>>On Wed, Oct 14, 2015 at 8:25 AM, Vivien Didelot
>>><vivien.didelot@savoirfairelinux.com> wrote:
>>>> On Oct. Wednesday 14 (42) 09:14 AM, Ido Schimmel wrote:
>>>>> Tue, Oct 13, 2015 at 05:32:26PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>>>> >On Oct. Tuesday 13 (42) 11:31 AM, Ido Schimmel wrote:
>>>>> >> Mon, Oct 12, 2015 at 08:36:25PM IDT, vivien.didelot@savoirfairelinux.com wrote:
>>>>> >> >Hi guys,
>>>>> >> >
>>>>> >> >On Oct. Monday 12 (42) 02:01 PM, Nikolay Aleksandrov wrote:
>>>>> >> >> From: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>> >> >>
>>>>> >> >> We shouldn't allow BRIDGE_VLAN_INFO_PVID flag in VLAN ranges.
>>>>> >> >>
>>>>> >> >> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
>>>>> >> >> ---
>>>>> >> >>  net/switchdev/switchdev.c | 3 +++
>>>>> >> >>  1 file changed, 3 insertions(+)
>>>>> >> >>
>>>>> >> >> diff --git a/net/switchdev/switchdev.c b/net/switchdev/switchdev.c
>>>>> >> >> index 6e4a4f9ad927..256c596de896 100644
>>>>> >> >> --- a/net/switchdev/switchdev.c
>>>>> >> >> +++ b/net/switchdev/switchdev.c
>>>>> >> >> @@ -720,6 +720,9 @@ static int switchdev_port_br_afspec(struct net_device *dev,
>>>>> >> >>                         if (vlan.vid_begin)
>>>>> >> >>                                 return -EINVAL;
>>>>> >> >>                         vlan.vid_begin = vinfo->vid;
>>>>> >> >> +                       /* don't allow range of pvids */
>>>>> >> >> +                       if (vlan.flags & BRIDGE_VLAN_INFO_PVID)
>>>>> >> >> +                               return -EINVAL;
>>>>> >> >>                 } else if (vinfo->flags & BRIDGE_VLAN_INFO_RANGE_END) {
>>>>> >> >>                         if (!vlan.vid_begin)
>>>>> >> >>                                 return -EINVAL;
>>>>> >> >> --
>>>>> >> >> 2.4.3
>>>>> >> >>
>>>>> >> >
>>>>> >> >Yes the patch looks good, but it is a minor check though. I hope the
>>>>> >> >subject of this thread is making sense.
>>>>> >> >
>>>>> >> >VLAN ranges seem to have been included for an UX purpose (so commands
>>>>> >> >look like Cisco IOS). We don't want to change any existing interface, so
>>>>> >> >we pushed that down to drivers, with the only valid reason that, maybe
>>>>> >> >one day, an hardware can be capable of programming a range on a per-port
>>>>> >> >basis.
>>>>> >> Hi,
>>>>> >>
>>>>> >> That's actually what we are doing in mlxsw. We can do up to 256 entries in
>>>>> >> one go. We've yet to submit this part.
>>>>> >
>>>>> >Perfect Ido, thanks for pointing this out! I'm OK with the range then.
>>>>> >
>>>>> >So there is now a very last question in my head for this, which is more
>>>>> >a matter of kernel design. Should the user be aware of such underlying
>>>>> >support? In other words, would it make sense to do this in a driver:
>>>>> >
>>>>> >    foo_port_vlan_add(struct net_device *dev,
>>>>> >                      struct switchdev_obj_port_vlan *vlan)
>>>>> >    {
>>>>> >        if (vlan->vid_begin != vlan->vid_end)
>>>>> >            return -ENOTSUPP; /* or something more relevant for user */
>>>>> >
>>>>> >        return foo_port_single_vlan_add(dev, vlan->vid_begin);
>>>>> >    }
>>>>> >
>>>>> >So drivers keep being simple, and we can easily propagate the fact that
>>>>> >one-or-all VLAN is not supportable, vs. the VLAN feature itself is not
>>>>> >implemented and must be done in software.
>>>>> I think that if you want to keep it simple, then Scott's advice from the
>>>>> previous thread is the most appropriate one. I believe the hardware you
>>>>> are using is simply not meant to support multiple 802.1Q bridges.
>>>>
>>>> You mean allowing only one Linux bridge over an hardware switch?
>>>>
>>>> It would for sure simplify how, as developers and users, we represent a
>>>> physical switch. But I am not sure how to achieve that and I don't have
>>>> strong opinions on this TBH.
>>>
>>>Hi Vivien, I think it's possible to keep switch ports on just one
>>>bridge if we do a little bit of work on the NETDEV_CHANGEUPPER
>>>notifier.  This will give you the driver-level control you want.  Do
>>>you have time to investigate?  The idea is:
>>>
>>>1) In your driver's handler for NETDEV_CHANGEUPPER, if switch port is
>>>being added to a second bridge,then return NOTIFY_BAD.  Your driver
>>>needs to track the bridge count.
>>>
>>>2) In __netdev_upper_dev_link(), check the return code from the
>>>call_netdevice_notifiers_info(NETDEV_CHANGEUPPER, ...) call, and if
>>>NOTIFY_BAD, abort the linking operation (goto rollback_xxx).
>>>
>> Hi,
>>
>> We are doing something similar in mlxsw (not upstream yet). Jiri
>> introduced PRE_CHANGEUPPER, which is called from the function you
>> mentioned, but before the linking operation (so that you don't need to
>> rollback).
>
>Oh, cool.
>
>> If the notification is about a linking operation and the master is a
>> bridge different than the current one, then NOTIFY_BAD is returned.
>
>So you're wanting to restrict to just one bridge also?  Or is
>NOTIFY_BAD returned for some other reason?  I guess I should be
>patient and wait for the patch.
Yes, currently that's what we are doing.

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2015-10-15  7:28 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-09 23:30 switchdev and VLAN ranges Vivien Didelot
2015-10-10  4:22 ` Scott Feldman
2015-10-10 16:33   ` Vivien Didelot
2015-10-10 18:10     ` Florian Fainelli
2015-10-10 19:47       ` Vivien Didelot
2015-10-10  7:49 ` Elad Raz
2015-10-10 10:36   ` Nikolay Aleksandrov
2015-10-11  7:12     ` Jiri Pirko
2015-10-11 10:49       ` [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
2015-10-11 10:49         ` [Bridge] " Nikolay Aleksandrov
2015-10-11 14:13         ` Jiri Pirko
2015-10-13  2:59         ` David Miller
2015-10-13  2:59           ` [Bridge] " David Miller
2015-10-11 22:41       ` switchdev and VLAN ranges Vivien Didelot
2015-10-12  0:13         ` Nikolay Aleksandrov
2015-10-12  5:14           ` Scott Feldman
2015-10-12 10:15             ` Nikolay Aleksandrov
2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
2015-10-12 12:11               ` Elad Raz
2015-10-12 12:17               ` Jiri Pirko
2015-10-12 17:36               ` Vivien Didelot
2015-10-13  6:13                 ` Scott Feldman
2015-10-13  8:31                 ` Ido Schimmel
2015-10-13 14:32                   ` Vivien Didelot
2015-10-14  6:14                     ` Ido Schimmel
2015-10-14 15:25                       ` Vivien Didelot
2015-10-14 17:14                         ` Scott Feldman
2015-10-14 17:42                           ` Ido Schimmel
2015-10-14 18:51                             ` Vivien Didelot
2015-10-14 22:08                               ` Florian Fainelli
2015-10-15  0:07                                 ` Vivien Didelot
2015-10-15  2:58                             ` Scott Feldman
2015-10-15  7:28                               ` Ido Schimmel
2015-10-13 11:42               ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.