All of lore.kernel.org
 help / color / mirror / Atom feed
From: David L Stevens <david.stevens@oracle.com>
To: David Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Subject: Re: [PATCHv3 net-next 2/3] sunvnet: allow admin to set sunvnet MTU
Date: Sat, 13 Sep 2014 22:15:41 -0400	[thread overview]
Message-ID: <5414FA4D.6030504@oracle.com> (raw)
In-Reply-To: <20140913.162101.515634682549373073.davem@davemloft.net>



On 09/13/2014 04:21 PM, David Miller wrote:

> I personally find this scheme where we pretend that the device can
> have an arbitrary MTU, when in fact the effective MTU is a product of
> the sub-ports, quite ugly.

I wouldn't say I like it, either, but the problem is that without it, we
are tied to the least common denominator. Anything that doesn't support
v1.6 of the VIO protocol is stuck at the low MTU and low throughput, and
since Solaris itself is limited to 16000, Linux, which can do 64K-1, is
also limited to 16000. On my hardware, the original we'd be tied to is
about 1Gbps, the 16000 is about 5.4Gbps, and the full linux-linux is about
8Gbps. So, a big penalty.

I think of it as an Ethernet connected to a virtual switch, and the ICMP
errors are for PMTUD are analogous to IGMP snooping. This is not an Ethernet
device alone-- those don't negotiate per-destination link MTUs. But nothing forces
anyone to mix MTUs; the ICMP errors simply allow it.

> In fact, that ugly ICMP stuff in the next patch is absolutely required
> to avoid bogus behavior possible after this patch.  You have to
> combine #2 and #3 otherwise you are adding an intermediate regression.

I disagree here. It's not any more bogus for the admin to set an MTU value
of what s/he wants when the others have not been. It *always* happens that
way. Ordinary Ethernet comes up at 1500 and one of them must be increased
first. At that time, the others don't match, and it is the admin's responsibility
to make sure they match.

> Logic wise, at the very least you should limit the MTU setting to the
> largest MTU of all of the individual ports.

We can't directly do that, because the MTU for the port is negotiated at
probe time. That'll be 1500 IP data (always) and we have to raise one of them
first, so one of them has to be set at a higher value than the negotiated
MTU at some point, at least until it is reset and re-negotiated. But we
don't know until we try a higher value if all the links can use it, and we
can't prevent another link from joining later that has a lower MTU, but we
can't then lower our on MTU for the whole device.

I think in ordinary Ethernet, there is nothing at all enforcing a particular
MTU-- it is set to what the admin wants, regardless of what other hosts use.
That's the effect we ought to have here, despite the one-to-many p2p links
where we can know in advance what the link MTUs are, and that's what patch #2
does. I don't think we should try too hard to prevent a value an admin wants --
it will just get in the way of the admin, where it doesn't in ordinary Ethernet.

On the other hand, if the link MTU is lower, we shouldn't quietly drop packets, thus the ICMP
errors that allow both.

							+-DLS

  reply	other threads:[~2014-09-14  2:15 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-13 16:00 [PATCHv3 net-next 2/3] sunvnet: allow admin to set sunvnet MTU David L Stevens
2014-09-13 20:21 ` David Miller
2014-09-14  2:15   ` David L Stevens [this message]
2014-09-14 12:21     ` Sowmini Varadhan
2014-09-14 13:24       ` David L Stevens

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5414FA4D.6030504@oracle.com \
    --to=david.stevens@oracle.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.