linux-sctp.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Xin Long <lucien.xin@gmail.com>
To: David Laight <David.Laight@aculab.com>
Cc: network dev <netdev@vger.kernel.org>,
	"davem@davemloft.net" <davem@davemloft.net>,
	"kuba@kernel.org" <kuba@kernel.org>,
	Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
	"linux-sctp@vger.kernel.org" <linux-sctp@vger.kernel.org>
Subject: Re: [PATCHv2 net-next 00/14] sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport
Date: Tue, 22 Jun 2021 23:48:33 -0400	[thread overview]
Message-ID: <CADvbK_eeJVoWps8UrygEfNdXL76Q2XMoNOoELWHFqOTq2634cA@mail.gmail.com> (raw)
In-Reply-To: <CADvbK_e7D4s81vS0rq=P4mQe47dshJgQzaWnrUyCi-Cis4xyhQ@mail.gmail.com>

On Tue, Jun 22, 2021 at 9:09 PM Xin Long <lucien.xin@gmail.com> wrote:
>
> On Tue, Jun 22, 2021 at 6:13 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > From: Xin Long
> > > Sent: 22 June 2021 19:05
> > >
> > > Overview(From RFC8899):
> > >
> > >   In contrast to PMTUD, Packetization Layer Path MTU Discovery
> > >   (PLPMTUD) [RFC4821] introduces a method that does not rely upon
> > >   reception and validation of PTB messages.  It is therefore more
> > >   robust than Classical PMTUD.  This has become the recommended
> > >   approach for implementing discovery of the PMTU [BCP145].
> > >
> > >   It uses a general strategy in which the PL sends probe packets to
> > >   search for the largest size of unfragmented datagram that can be sent
> > >   over a network path.  Probe packets are sent to explore using a
> > >   larger packet size.  If a probe packet is successfully delivered (as
> > >   determined by the PL), then the PLPMTU is raised to the size of the
> > >   successful probe.  If a black hole is detected (e.g., where packets
> > >   of size PLPMTU are consistently not received), the method reduces the
> > >   PLPMTU.
> >
> > This seems to take a long time (probably well over a minute)
> > to determine the mtu.
> I just noticed this is a misread of RFC8899, and the next probe packet
> should be sent immediately once the ACK of the last probe is received,
> instead of waiting the timeout, which should be for the missing probe.
>
> I will fix this with:
>
> diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
> index d29b579da904..f3aca1acf93a 100644
> --- a/net/sctp/sm_statefuns.c
> +++ b/net/sctp/sm_statefuns.c
> @@ -1275,6 +1275,8 @@ enum sctp_disposition
> sctp_sf_backbeat_8_3(struct net *net,
>                         return SCTP_DISPOSITION_DISCARD;
>
>                 sctp_transport_pl_recv(link);
> +               sctp_add_cmd_sf(commands, SCTP_CMD_PROBE_TIMER_UPDATE,
> +                               SCTP_TRANSPORT(link));
>                 return SCTP_DISPOSITION_CONSUME;
>         }
>
> diff --git a/net/sctp/transport.c b/net/sctp/transport.c
> index f27b856ea8ce..88815b98d9d0 100644
> --- a/net/sctp/transport.c
> +++ b/net/sctp/transport.c
> @@ -215,6 +215,11 @@ void sctp_transport_reset_probe_timer(struct
> sctp_transport *transport)
>  {
>         int scale = 1;
>
> +       if (transport->pl.probe_count == 0) {
> +               if (!mod_timer(&transport->probe_timer, jiffies +
> transport->rto))
> +                       sctp_transport_hold(transport);
> +               return;
> +       }
>         if (timer_pending(&transport->probe_timer))
>                 return;
>         if (transport->pl.state == SCTP_PL_COMPLETE &&
>
> Thanks for the comment.
A more efficient improvement:

diff --git a/net/sctp/sm_statefuns.c b/net/sctp/sm_statefuns.c
index d29b579da904..d5cb0124bafa 100644
--- a/net/sctp/sm_statefuns.c
+++ b/net/sctp/sm_statefuns.c
@@ -1275,7 +1275,13 @@ enum sctp_disposition
sctp_sf_backbeat_8_3(struct net *net,
                        return SCTP_DISPOSITION_DISCARD;

                sctp_transport_pl_recv(link);
-               return SCTP_DISPOSITION_CONSUME;
+
+               if (link->pl.state == SCTP_PL_COMPLETE) {
+                       sctp_add_cmd_sf(commands, SCTP_CMD_PROBE_TIMER_UPDATE,
+                                       SCTP_TRANSPORT(link));
+                       return SCTP_DISPOSITION_CONSUME;
+               }
+               return sctp_sf_send_probe(net, ep, asoc, type, link, commands);
        }

        max_interval = link->hbinterval + link->rto;
diff --git a/net/sctp/transport.c b/net/sctp/transport.c
index f27b856ea8ce..37f65f617f05 100644
--- a/net/sctp/transport.c
+++ b/net/sctp/transport.c
@@ -215,10 +215,8 @@ void sctp_transport_reset_probe_timer(struct
sctp_transport *transport)
 {
        int scale = 1;

-       if (timer_pending(&transport->probe_timer))
-               return;
        if (transport->pl.state == SCTP_PL_COMPLETE &&
-           transport->pl.probe_count == 1)
+           transport->pl.probe_count == 0)
                scale = 30; /* works as PMTU_RAISE_TIMER */
        if (!mod_timer(&transport->probe_timer,
                       jiffies + transport->probe_interval * scale))

[103] pl_send: PLPMTUD: state: 1, size: 1200, high: 0 <--[a]
[103] pl_recv: PLPMTUD: state: 1, size: 1200, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1232, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1232, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1264, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1264, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1296, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1296, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1328, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1328, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1360, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1360, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1392, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1392, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1424, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1424, high: 0
[103] pl_send: PLPMTUD: state: 2, size: 1456, high: 0
[103] pl_recv: PLPMTUD: state: 2, size: 1456, high: 0  <--[b]
[103] pl_send: PLPMTUD: state: 2, size: 1488, high: 0
[108] pl_send: PLPMTUD: state: 2, size: 1488, high: 0
[113] pl_send: PLPMTUD: state: 2, size: 1488, high: 0
[118] pl_send: PLPMTUD: state: 2, size: 1488, high: 0
[118] pl_recv: PLPMTUD: state: 2, size: 1456, high: 1488 <---[c]
[118] pl_send: PLPMTUD: state: 2, size: 1460, high: 1488
[118] pl_recv: PLPMTUD: state: 2, size: 1460, high: 1488 <--- [d]
[118] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488
[124] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488
[129] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488
[134] pl_send: PLPMTUD: state: 2, size: 1464, high: 1488
[134] pl_recv: PLPMTUD: state: 2, size: 1460, high: 1464 <-- around
30s "search complete from 1200 bytes"
[287] pl_send: PLPMTUD: state: 3, size: 1460, high: 0
[287] pl_recv: PLPMTUD: state: 3, size: 1460, high: 0
[287] pl_send: PLPMTUD: state: 2, size: 1464, high: 0 <-- [aa]
[292] pl_send: PLPMTUD: state: 2, size: 1464, high: 0
[298] pl_send: PLPMTUD: state: 2, size: 1464, high: 0
[303] pl_send: PLPMTUD: state: 2, size: 1464, high: 0
[303] pl_recv: PLPMTUD: state: 2, size: 1460, high: 1464  <--[bb]  <--
around 15s "re-search complete from current pmtu"

So since no interval to send the next probe when the ACK is received
for the last one,
it won't take much time from [a] to [b], and [c] to [d],
and there are at most 2 failures to find the right pmtu, each failure
takes 5s * 3 = 15s.

when it goes back to search from search complete after a long timeout,
it will take only 1 failure to get the right pmtu from [aa] to [bb].

 Thanks.
> >
> > What is used for the actual mtu while this is in progress?
> >
> > Does packet loss and packet retransmission cause the mtu
> > to be reduced as well?
> No, the data packet is not a probe in this implementation.
>
> >
> > I can imagine that there is an expectation (from the application)
> > that the mtu is that of an ethernet link - perhaps less a PPPoE
> > header.
> > Starting with an mtu of 1200 will break this assumption and may
> > have odd side effects.
> Starting searching from mtu of 1200, but the real pmtu will only be updated
> when the search is done and optimal mtu is found.
> So at the beginning, it will still use the dst mtu as before.
>
> > For TCP/UDP the ICMP segmentation required error is immediate
> > and gets used for the retransmissions.
> > This code seems to be looking at separate timeouts - so a lot of
> > packets could get discarded and application timers expire before
> > if determines the correct mtu.
> This patch will also process ICMP error msg, and gets the 'mtu' size from it
> but before using it, it will verify(probe) it first:
>
> see Patch: sctp: do state transition when receiving an icmp TOOBIG packet
>
> >
> > Maybe I missed something about this only being done on inactive
> > paths?
> >
> >         David
> >
> > -
> > Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
> > Registration No: 1397386 (Wales)
> >

  reply	other threads:[~2021-06-23  3:48 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-22 18:04 [PATCHv2 net-next 00/14] sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 01/14] sctp: add pad chunk and its make function and event table Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 02/14] sctp: add probe_interval in sysctl and sock/asoc/transport Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 03/14] sctp: add SCTP_PLPMTUD_PROBE_INTERVAL sockopt for sock/asoc/transport Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 04/14] sctp: add the constants/variables and states and some APIs for transport Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 05/14] sctp: add the probe timer in transport for PLPMTUD Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 06/14] sctp: do the basic send and recv for PLPMTUD probe Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 07/14] sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 08/14] sctp: do state transition when a probe succeeds on HB ACK recv path Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 09/14] sctp: do state transition when receiving an icmp TOOBIG packet Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 10/14] sctp: enable PLPMTUD when the transport is ready Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 11/14] sctp: remove the unessessary hold for idev in sctp_v6_err Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 12/14] sctp: extract sctp_v6_err_handle function from sctp_v6_err Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 13/14] sctp: extract sctp_v4_err_handle function from sctp_v4_err Xin Long
2021-06-22 18:05 ` [PATCHv2 net-next 14/14] sctp: process sctp over udp icmp err on sctp side Xin Long
2021-06-22 18:40 ` [PATCHv2 net-next 00/14] sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport patchwork-bot+netdevbpf
2021-06-22 22:13 ` David Laight
2021-06-23  1:09   ` Xin Long
2021-06-23  3:48     ` Xin Long [this message]
2021-06-23  9:50       ` David Laight
2021-06-23 15:59         ` Xin Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CADvbK_eeJVoWps8UrygEfNdXL76Q2XMoNOoELWHFqOTq2634cA@mail.gmail.com \
    --to=lucien.xin@gmail.com \
    --cc=David.Laight@aculab.com \
    --cc=davem@davemloft.net \
    --cc=kuba@kernel.org \
    --cc=linux-sctp@vger.kernel.org \
    --cc=marcelo.leitner@gmail.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).