netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: David Lamparter <equinox@diac24.net>
To: Ben Hutchings <bhutchings@solarflare.com>
Cc: David Lamparter <equinox@diac24.net>, netdev <netdev@vger.kernel.org>
Subject: Re: [PATCH net-next 0/2] 802.1ad S-VLAN support
Date: Tue, 8 Nov 2011 00:07:11 +0100	[thread overview]
Message-ID: <20111107230710.GF1833899@jupiter.n2.diac24.net> (raw)
In-Reply-To: <1320701749.3020.70.camel@bwh-desktop>

On Mon, Nov 07, 2011 at 09:35:49PM +0000, Ben Hutchings wrote:
> On Mon, 2011-11-07 at 16:48 +0100, David Lamparter wrote:
> > On Mon, Nov 07, 2011 at 03:11:44PM +0000, Ben Hutchings wrote:
> > > We definitely need to think about how MTU/MRU are configured when
> > > multiple VLAN tags are used, though I don't think it's essential to do
> > > before this goes in.  To be slightly more blunt than your documentation,
> > > our current handling of MTU/MRU and VLANs is a botch.
[...]
> >
> > Yes, what i'd like to do is introduce a new field into struct netdevice
> > that tracks the hardware Max Frame Size; it'd be a read-only field
> > that's initialized once by the driver. (The field would only be used by
> > ethernet-like devices.) To get things started easier, the field can have
> > a default value like 0xffff, so if the driver doesn't set it we end up
> > with the same old nothing-checked behaviour.
[...]
>
> The driver for a physical device may still need to know the overall
> MTU/MRU.  Certainly in case of hardware/drivers which do not support DMA
> scatter we do not want the driver to allocate oversized buffers.  Also
> some devices may partition internal FIFOs according to the MTU/MRU and
> we should nto unnecessarily reduce the maximum number of packets that
> can fit in those FIFOs.
>
> So I think that instead of propagating MFS down, we should propagate MTU
> change requests up, but maintaining a distinction between the MTUs for
> untagged and tagged (with different types) packets..

Hmm. I think we need to cleanly separate MTU and MFS. MTU is used for
upper layer stuff like setting TCP MSS, IP fragment size, etc.

MFS is the actual ethernet thing, and it's quite independent from the
MTU. Imagine the following example case:

subnet 1 has legacy 100 mbit hosts with 1514 byte limit. So it runs at
MTU 1500. subnet 2 is used for SAN and has all-9216-equipment. We have a
server connected with eth0 (9216 capable hw). The ethernet switch feeds
subnet 1 untagged and subnet 2 tagged 1Q id 2 ("eth0.2").

The current code cannot handle this since if eth0 MTU = 1500, eth0.2
cannot be set to 9200. (vlan_dev_change_mtu:
	if (vlan_dev_info(dev)->real_dev->mtu < new_mtu
		return -ERANGE;
Note that raising eth0's MTU is wrong because now the box will send 9k
IP packets to those poor 100mbit hosts... the only way around this would
be to add MTU values to the routes for that subnet.

So, I'd like to define "MTU" to be layer 3 and "MFS" to be layer 2. The
essential distinction is that the MFS value is interdependent between
VLANs and their masters while the MTU can be arbitrarily set (within MFS
limits).

> we should propagate MTU change requests up

Hm. If we propagate the MFS up, we either need to track the different
requestors so we can notice when we can lower it back down, or we end up
ever just raising the value.

How about instead of propagating the MFS up, we provide an user knob to
adjust the MFS (on physical devices)?

Might also be relevant for lxc/network namespaces; i don't think a
containered uid0 should have the possibility to increase your NIC's
buffers by x6 by changing the MTU on his VLAN...

(I'd still keep a max_mfs field, just to export these bits of knowledge
from the driver to userspace. I remember a recent thread about e100 and
hardware limits...)

> > 	dev->hw_features = NETIF_F_ALL_CSUM | NETIF_F_SG |
> > 			   NETIF_F_FRAGLIST | NETIF_F_ALL_TSO |
> > 			   NETIF_F_HIGHDMA | NETIF_F_SCTP_CSUM |
> > 			   NETIF_F_ALL_FCOE;
>
> Those are the features that can *potentially* be toggled.
>
> > which is pretty much the "basic" set. I don't see why any of that should
> > differ for 802.1ad (or even 802.1ah), but my understanding is barely
> > enough to tell that these flags should work for 802.1ad.
>
> See vlan_dev_fix_features() and note that vlan_features is zero for a
> VLAN device.

I admit ignorance and am duly reading code - in fact, I should probably
not use vlan_features for 802.1ad S-VLANs and instead force the features
to 0 to be on the safe side...


-David

  reply	other threads:[~2011-11-07 23:07 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-11-05 16:54 [PATCH net-next 0/2] 802.1ad S-VLAN support David Lamparter
2011-11-05 16:54 ` [PATCH 1/2] net: vlan: " David Lamparter
2011-11-05 17:05   ` [PATCH iproute2] link/vlan: Add 802.1ad / QinQ support David Lamparter
2011-11-07 21:41   ` [PATCH 1/2] net: vlan: 802.1ad S-VLAN support Stephen Hemminger
2011-11-07 22:02     ` David Lamparter
2011-11-07 21:44   ` Stephen Hemminger
2011-11-07 22:18     ` David Lamparter
2011-11-12  1:22   ` David Miller
2011-11-12  9:25     ` Michał Mirosław
2011-11-12 14:14     ` David Lamparter
2011-11-12 16:06       ` Michał Mirosław
2011-11-12 22:22       ` David Miller
2011-11-05 16:54 ` [PATCH 2/2] net: vlan: remove unused struct vlan_group->hlist David Lamparter
2011-11-07 15:11 ` [PATCH net-next 0/2] 802.1ad S-VLAN support Ben Hutchings
2011-11-07 15:48   ` David Lamparter
2011-11-07 21:35     ` Ben Hutchings
2011-11-07 23:07       ` David Lamparter [this message]
2011-11-08  0:16         ` Ben Hutchings
2011-11-09 15:34           ` David Lamparter
2011-11-09 23:58             ` Ben Hutchings
2011-11-07 23:18     ` Francois Romieu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20111107230710.GF1833899@jupiter.n2.diac24.net \
    --to=equinox@diac24.net \
    --cc=bhutchings@solarflare.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).