All of lore.kernel.org
 help / color / mirror / Atom feed
From: Scott Feldman <sfeldma@gmail.com>
To: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Cc: Jiri Pirko <jiri@resnulli.us>, netdev <netdev@vger.kernel.org>,
	"stephen@networkplumber.org" <stephen@networkplumber.org>,
	Florian Fainelli <f.fainelli@gmail.com>,
	Andrew Lunn <andrew@lunn.ch>,
	Roopa Prabhu <roopa@cumulusnetworks.com>
Subject: Re: switchdev and VLAN ranges
Date: Fri, 9 Oct 2015 21:22:58 -0700	[thread overview]
Message-ID: <CAE4R7bBRySJZxq4HKfSkyLNHiC0oMUqb95E=fCHjKX6MpNUGBw@mail.gmail.com> (raw)
In-Reply-To: <493168159.229458.1444433451493.JavaMail.zimbra@savoirfairelinux.com>

On Fri, Oct 9, 2015 at 4:30 PM, Vivien Didelot
<vivien.didelot@savoirfairelinux.com> wrote:
> Hi All,
>
> I understand that specifying a VLAN range on the command line is nice
> for the user, and it makes no big deal for software implementation.

[Adding Roopa, since she did the original vlan range support in the
kernel/iproute2]

> However, AFAICT a VLAN range does not make sense at all for hardware
> such as Ethernet switch chips. Am I wrong?
>
> I would suggest to make switchdev directly answer to a bridge request
> that the operation is not supported when the user asks for a VLAN range.
>
> That way, we can simply use a single "vid" member in struct
> switchdev_obj_port_vlan instead of "vid_begin" and "vid_end" and thus
> avoid making drivers heavier with iteration loops on such range.
>
> I have two concerns in mind:
>
> a) if we imagine that drivers like Rocker allocate memory in the prepare
> phase for each VID, preparing a range like 100-4000 would definitely not
> be recommended.

This call should be in process context so it doesn't seem to terrible
for the driver to take its time to reserve/allocate resources in
prepare phase, even for a vlan range.  I think I'm missing your point.

> b) imagine that you have two Linux bridges on a switch, one using the
> hardware VLAN 100. If you request the VLAN range 99-101 for the other
> bridge members, it is not possible for the driver to say "I can
> accelerate VLAN 99 and 101, but not 100". It must return OPNOTSUPP for
> the whole range.

Well, it probably should return -ERANGE to indicate the range can't be
added, but that's an aside.

The reason why vlan ranges need to work down to the switchdev driver
is, from the user's perspective, it's an all-or-nothing request from
the user to add the vlan range to the device.  So we need to ask the
driver in the prepare phase, "can you support this range,
completely?", and if yes, then commit it as a whole.  The netlink
response back to the user isn't equipped to describe what subset of
the range was added, and what subset was not.

> That's why I think that avoiding VLAN range at the switchdev level would
> be a good idea.

As a general rule with switchdev, we've tried to keep the user's
experience the same when using {Linux} as a soft switch/router vs.
using {Linux + offload device} as a hard switch/router.  So if native
Linux supports some operation, for example vlan ranges, then we should
try to extend that to the offload model.  In other words, we don't
want to re-train the user when moving from soft switch to hard switch!
 But there are physical limitations when dealing with an offload
device....

Anyway, with your vlan range example, we've got a case where each soft
bridge has an independent vlan set, and the vlan sets between soft
bridges can overlap.  For the (typical) hard switch, there is one vlan
set for the whole switch, and trying to overlay the soft bridges'
(overlapping) vlan sets on the hard switch fails.  That failure is
reported to the user.  We tried, but due to offload device
limitations, we can't support that operation.  Of course, if the vlan
sets didn't overlap, then we don't have a problem.

This will not be the only case where something we can do on a soft
Linux switch/router can't be offloaded to some physical offload
device.  But I think the philosophy has been to try offload what we
can, up to the point of failure.  In some cases, we can mask that
failure from the user by falling back to soft-switch only, but in
other cases the failure will pop up right in the user's face, like in
your example.

One idea to help mitigate the user's confusion would be to limit the
number of bridges overlaid on the device to just one.  Our drivers
know when ports are enslaved to bridges, so is there something we can
do there to fail the enslave on a second bridge?  Exercise left to the
reader.  If we had that, now vlan ranges work 1:1 with soft Linux
because both soft bridge and device have single vlan set.

Sorry for the long-winded response.

-scott

  reply	other threads:[~2015-10-10  4:23 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-10-09 23:30 switchdev and VLAN ranges Vivien Didelot
2015-10-10  4:22 ` Scott Feldman [this message]
2015-10-10 16:33   ` Vivien Didelot
2015-10-10 18:10     ` Florian Fainelli
2015-10-10 19:47       ` Vivien Didelot
2015-10-10  7:49 ` Elad Raz
2015-10-10 10:36   ` Nikolay Aleksandrov
2015-10-11  7:12     ` Jiri Pirko
2015-10-11 10:49       ` [PATCH net-next] bridge: vlan: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
2015-10-11 10:49         ` [Bridge] " Nikolay Aleksandrov
2015-10-11 14:13         ` Jiri Pirko
2015-10-13  2:59         ` David Miller
2015-10-13  2:59           ` [Bridge] " David Miller
2015-10-11 22:41       ` switchdev and VLAN ranges Vivien Didelot
2015-10-12  0:13         ` Nikolay Aleksandrov
2015-10-12  5:14           ` Scott Feldman
2015-10-12 10:15             ` Nikolay Aleksandrov
2015-10-12 12:01             ` [PATCH net-next] switchdev: enforce no pvid flag in vlan ranges Nikolay Aleksandrov
2015-10-12 12:11               ` Elad Raz
2015-10-12 12:17               ` Jiri Pirko
2015-10-12 17:36               ` Vivien Didelot
2015-10-13  6:13                 ` Scott Feldman
2015-10-13  8:31                 ` Ido Schimmel
2015-10-13 14:32                   ` Vivien Didelot
2015-10-14  6:14                     ` Ido Schimmel
2015-10-14 15:25                       ` Vivien Didelot
2015-10-14 17:14                         ` Scott Feldman
2015-10-14 17:42                           ` Ido Schimmel
2015-10-14 18:51                             ` Vivien Didelot
2015-10-14 22:08                               ` Florian Fainelli
2015-10-15  0:07                                 ` Vivien Didelot
2015-10-15  2:58                             ` Scott Feldman
2015-10-15  7:28                               ` Ido Schimmel
2015-10-13 11:42               ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAE4R7bBRySJZxq4HKfSkyLNHiC0oMUqb95E=fCHjKX6MpNUGBw@mail.gmail.com' \
    --to=sfeldma@gmail.com \
    --cc=andrew@lunn.ch \
    --cc=f.fainelli@gmail.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=roopa@cumulusnetworks.com \
    --cc=stephen@networkplumber.org \
    --cc=vivien.didelot@savoirfairelinux.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.