From: Xin Long <lucien.xin@gmail.com>
To: network dev <netdev@vger.kernel.org>,
davem@davemloft.net, kuba@kernel.org,
Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>,
linux-sctp@vger.kernel.org
Subject: [PATCHv2 net-next 00/14] sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport
Date: Tue, 22 Jun 2021 14:04:46 -0400 [thread overview]
Message-ID: <cover.1624384990.git.lucien.xin@gmail.com> (raw)
Overview(From RFC8899):
In contrast to PMTUD, Packetization Layer Path MTU Discovery
(PLPMTUD) [RFC4821] introduces a method that does not rely upon
reception and validation of PTB messages. It is therefore more
robust than Classical PMTUD. This has become the recommended
approach for implementing discovery of the PMTU [BCP145].
It uses a general strategy in which the PL sends probe packets to
search for the largest size of unfragmented datagram that can be sent
over a network path. Probe packets are sent to explore using a
larger packet size. If a probe packet is successfully delivered (as
determined by the PL), then the PLPMTU is raised to the size of the
successful probe. If a black hole is detected (e.g., where packets
of size PLPMTU are consistently not received), the method reduces the
PLPMTU.
SCTP Probe Packets:
As the RFC suggested, the probe packets consist of an SCTP common header
followed by a HEARTBEAT chunk and a PAD chunk. The PAD chunk is used to
control the length of the probe packet. The HEARTBEAT chunk is used to
trigger the sending of a HEARTBEAT ACK chunk to confirm this probe on
the HEARTBEAT sender.
The HEARTBEAT chunk also carries a Heartbeat Information parameter that
includes the probe size to help an implementation associate a HEARTBEAT
ACK with the size of probe that was sent. The sender use the nonce and
the probe size to verify the information returned.
Detailed Implementation on SCTP:
+------+
+------->| Base |-----------------+ Connectivity
| +------+ | or BASE_PLPMTU
| | | confirmation failed
| | v
| | Connectivity +-------+
| | and BASE_PLPMTU | Error |
| | confirmed +-------+
| | | Consistent
| v | connectivity
Black Hole | +--------+ | and BASE_PLPMTU
detected | | Search |<---------------+ confirmed
| +--------+
| ^ |
| | |
| Raise | | Search
| timer | | algorithm
| expired | | completed
| | |
| | v
| +-----------------+
+---| Search Complete |
+-----------------+
When PLPMTUD is enabled, it's in Base state, and starts to probe with
BASE_PLPMTU (1200). If this probe succeeds, it goes to Search state;
If this probe fails, it goes to Error state under which pl.pmtu goes
down to MIN_PLPMTU (512) and keeps probing with BASE_PLPMTU until it
succeeds and goes to Search state.
During the Search state, the probe size is growing by a Big step (32)
every time when the last probe succeeds at the beginning. Once a probe
(such as 1420) fails after trying MAX_PROBES (3) times, the probe_size
goes back to the last one (1420 - 32 = 1388), meanwhile 'probe_high'
is set to 1420 and the growing step becomes a Small one (4). Then the
probe is continuing with a Small step grown each round. Until it gets
the optimal size (such as 1400) when probe with its next probe size
(1404) fails, it sync this size to pathmtu and goes to Complete state.
In Complete state, it will only does a probe check for the pathmtu just
set, if it fails, which means a Black Hole is detected and it goes back
to Base state. If it succeeds, it goes back to Search state again, and
probe is continuing with growing a Small step (1400 + 4). If this probe
fails, probe_high is set and goes back to 1388 and then Complete state,
which is kind of a loop normally. However if the env's pathmtu changes
to a big size somehow, this probe will succeed and then probe continues
with growing a Big step (1400 + 32) each round until another probe fails.
PTB Messages Process:
PLPMTUD doesn't rely on these package to find the pmtu, and shouldn't
trust it either. When processing them, it only changes the probe_size
to PL_PTB_SIZE(info - hlen) if 'pl.pmtu < PL_PTB_SIZE < the current
probe_size' druing Search state. As this could help probe_size to get
to the optimal size faster, for exmaple:
pl.pmtu = 1388, probe_size = 1420, while the env's pathmtu = 1400.
When probe_size is 1420, a Toobig packet with 1400 comes back. If probe
size changes to use 1400, it will save quite a few rounds to get there.
But of course after having this value, PLPMTUD will still verify it on
its own before using it.
Patches:
- Patch 1-6: introduce some new constants/variables from the RFC, systcl
and members in transport, APIs for the following patches, chunks and
a timer for the probe sending and some codes for the probe receiving.
- Patch 7-9: implement the state transition on the tx path, rx path and
toobig ICMP packet processing. This is the main algorithm part.
- Patch 10: activate this feature
- Patch 11-14: improve the process for ICMP packets for SCTP over UDP,
so that it can also be covered by this feature.
Tests:
- do sysctl and setsockopt tests for this feature's enabling and disabling.
- get these pr_debug points for this feature by
# cat /sys/kernel/debug/dynamic_debug/control | grep PLP
and enable them on kernel dynamic debug, then play with the pathmtu and
check if the state transition and plpmtu change match the RFC.
- do the above tests for SCTP over IPv4/IPv6 and SCTP over UDP.
v1->v2:
- See Patch 06/14.
Xin Long (14):
sctp: add pad chunk and its make function and event table
sctp: add probe_interval in sysctl and sock/asoc/transport
sctp: add SCTP_PLPMTUD_PROBE_INTERVAL sockopt for sock/asoc/transport
sctp: add the constants/variables and states and some APIs for
transport
sctp: add the probe timer in transport for PLPMTUD
sctp: do the basic send and recv for PLPMTUD probe
sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send
path
sctp: do state transition when a probe succeeds on HB ACK recv path
sctp: do state transition when receiving an icmp TOOBIG packet
sctp: enable PLPMTUD when the transport is ready
sctp: remove the unessessary hold for idev in sctp_v6_err
sctp: extract sctp_v6_err_handle function from sctp_v6_err
sctp: extract sctp_v4_err_handle function from sctp_v4_err
sctp: process sctp over udp icmp err on sctp side
Documentation/networking/ip-sysctl.rst | 8 ++
include/linux/sctp.h | 7 ++
include/net/netns/sctp.h | 3 +
include/net/sctp/command.h | 1 +
include/net/sctp/constants.h | 20 ++++
include/net/sctp/sctp.h | 57 ++++++++-
include/net/sctp/sm.h | 6 +-
include/net/sctp/structs.h | 19 +++
include/uapi/linux/sctp.h | 8 ++
net/sctp/associola.c | 6 +
net/sctp/debug.c | 1 +
net/sctp/input.c | 132 ++++++++++++---------
net/sctp/ipv6.c | 112 +++++++++++-------
net/sctp/output.c | 33 +++++-
net/sctp/outqueue.c | 13 ++-
net/sctp/protocol.c | 21 +---
net/sctp/sm_make_chunk.c | 31 ++++-
net/sctp/sm_sideeffect.c | 37 ++++++
net/sctp/sm_statefuns.c | 37 +++++-
net/sctp/sm_statetable.c | 43 +++++++
net/sctp/socket.c | 123 ++++++++++++++++++++
net/sctp/sysctl.c | 35 ++++++
net/sctp/transport.c | 153 ++++++++++++++++++++++++-
23 files changed, 779 insertions(+), 127 deletions(-)
--
2.27.0
next reply other threads:[~2021-06-22 18:08 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-22 18:04 Xin Long [this message]
2021-06-22 18:04 ` [PATCHv2 net-next 01/14] sctp: add pad chunk and its make function and event table Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 02/14] sctp: add probe_interval in sysctl and sock/asoc/transport Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 03/14] sctp: add SCTP_PLPMTUD_PROBE_INTERVAL sockopt for sock/asoc/transport Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 04/14] sctp: add the constants/variables and states and some APIs for transport Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 05/14] sctp: add the probe timer in transport for PLPMTUD Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 06/14] sctp: do the basic send and recv for PLPMTUD probe Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 07/14] sctp: do state transition when PROBE_COUNT == MAX_PROBES on HB send path Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 08/14] sctp: do state transition when a probe succeeds on HB ACK recv path Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 09/14] sctp: do state transition when receiving an icmp TOOBIG packet Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 10/14] sctp: enable PLPMTUD when the transport is ready Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 11/14] sctp: remove the unessessary hold for idev in sctp_v6_err Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 12/14] sctp: extract sctp_v6_err_handle function from sctp_v6_err Xin Long
2021-06-22 18:04 ` [PATCHv2 net-next 13/14] sctp: extract sctp_v4_err_handle function from sctp_v4_err Xin Long
2021-06-22 18:05 ` [PATCHv2 net-next 14/14] sctp: process sctp over udp icmp err on sctp side Xin Long
2021-06-22 18:40 ` [PATCHv2 net-next 00/14] sctp: implement RFC8899: Packetization Layer Path MTU Discovery for SCTP transport patchwork-bot+netdevbpf
2021-06-22 22:13 ` David Laight
2021-06-23 1:09 ` Xin Long
2021-06-23 3:48 ` Xin Long
2021-06-23 9:50 ` David Laight
2021-06-23 15:59 ` Xin Long
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1624384990.git.lucien.xin@gmail.com \
--to=lucien.xin@gmail.com \
--cc=davem@davemloft.net \
--cc=kuba@kernel.org \
--cc=linux-sctp@vger.kernel.org \
--cc=marcelo.leitner@gmail.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).