From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <netdev-owner@vger.kernel.org>
Received: from mail-io0-f174.google.com ([209.85.223.174]:40203 "EHLO
        mail-io0-f174.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1422764AbeCBSyi (ORCPT
        <rfc822;netdev@vger.kernel.org>); Fri, 2 Mar 2018 13:54:38 -0500
Received: by mail-io0-f174.google.com with SMTP id v6so11621409iog.7
        for <netdev@vger.kernel.org>; Fri, 02 Mar 2018 10:54:37 -0800 (PST)
MIME-Version: 1.0
In-Reply-To: <81bfe26bed31319d765a4ebfe86705a49c0a39d5.1520003901.git.sbrivio@redhat.com>
References: <81bfe26bed31319d765a4ebfe86705a49c0a39d5.1520003901.git.sbrivio@redhat.com>
From: =?UTF-8?Q?Maciej_=C5=BBenczykowski?= <zenczykowski@gmail.com>
Date: Fri, 2 Mar 2018 10:54:36 -0800
Message-ID: <CANP3RGf9YVSqFv1sL_smxp0ci0p=k=gtT==dmtVSeGht2g_HZA@mail.gmail.com>
Subject: Re: [PATCH net] ipv6: Reflect MTU changes on PMTU of exceptions for
 MTU-less routes
To: Stefano Brivio <sbrivio@redhat.com>
Cc: "David S . Miller" <davem@davemloft.net>,
        Wei Wang <weiwan@google.com>,
        Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>,
        Linux NetDev <netdev@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Conceptually this is right.

And I'm 100% fine with dev mtu change triggering pmtu decrease.

I'm not so sold on the pmtu increase.

PMTUD is one of those things that never ever works right in practice.
There's too many icmp blackholes, rate limits, overloaded management
cpus in switches,
misconfigurations, missing tcp mss clamps, icmps routed differently
then the flows due to ecmp hashing, middle boxes that don't affect the
icmp but change the tcp stream, etc.

In particular there's a lot of routing hardware that can handle
gigabits or terabits of traffic, but can generate only 10s-100s of
packet too big messages per second (ie. a tiny fraction of line rate
pps).  Worse yet, under overload it often falls back to simply
dropping and generating no icmp errors.

I spend a significant fraction of my time making sure we never rely on PMTUD.

Debugging MTU related blackholes is a constant bane of my existence.

[btw. we're considering adding a hack to always fragment UDP to
min(1280, dev/route/path mtu)...]

Basically: lower is always better because it's more likely to work...