All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH net-next 00/10] Add support for custom multipath hash
@ 2021-05-17 18:15 Ido Schimmel
  2021-05-17 18:15 ` [PATCH net-next 01/10] ipv4: Calculate multipath hash inside switch statement Ido Schimmel
                   ` (10 more replies)
  0 siblings, 11 replies; 22+ messages in thread
From: Ido Schimmel @ 2021-05-17 18:15 UTC (permalink / raw)
  To: netdev
  Cc: davem, kuba, dsahern, petrm, roopa, nikolay, ssuryaextr, mlxsw,
	Ido Schimmel

This patchset adds support for custom multipath hash policy for both
IPv4 and IPv6 traffic. The new policy allows user space to control the
outer and inner packet fields used for the hash computation.

Motivation
==========

Linux currently supports different multipath hash policies for IPv4 and
IPv6 traffic:

* Layer 3
* Layer 4
* Layer 3 or inner layer 3, if present

These policies hash on a fixed set of fields, which is inflexible and
against operators' requirements to control the hash input: "The ability
to control the inputs to the hash function should be a consideration in
any load-balancing RFP" [1].

An example of this inflexibility can be seen by the fact that none of
the current policies allows operators to use the standard 5-tuple and
the flow label for multipath hash computation. Such a policy is useful
in the following real-world example of a data center with the following
types of traffic:

* Anycast IPv6 TCP traffic towards layer 4 load balancers. Flow label is
constant (zero) to avoid breaking established connections

* Non-encapsulated IPv6 traffic. Flow label is used to re-route flows
around problematic (congested / failed) paths [2]

* IPv6 encapsulated traffic (IPv4-in-IPv6 or IPv6-in-IPv6). Outer flow
label is generated from encapsulated packet

* UDP encapsulated traffic. Outer source port is generated from
encapsulated packet

In the above example, using the inner flow information for hash
computation in addition to the outer flow information is useful during
failures of the BPF agent that selectively generates the flow label
based on the traffic type. In such cases, the self-healing properties of
the flow label are lost, but encapsulated flows are still load balanced.

Control over the inner fields is even more critical when encapsulation
is performed by hardware routers. For example, the Spectrum ASIC can
only encode 8 bits of entropy in the outer flow label / outer UDP source
port when performing IP / UDP encapsulation. In the case of IPv4 GRE
encapsulation there is no outer field to encode the inner hash in.

User interface
==============

In accordance with existing multipath hash configuration, the new custom
policy is added as a new option (3) to the
net.ipv{4,6}.fib_multipath_hash_policy sysctls. When the new policy is
used, the packet fields used for hash computation are determined by the
net.ipv{4,6}.fib_multipath_hash_fields sysctls. These sysctls accept a
bitmask according to the following table (from ip-sysctl.rst):

	====== ============================
	0x0001 Source IP address
	0x0002 Destination IP address
	0x0004 IP protocol
	0x0008 Flow Label
	0x0010 Source port
	0x0020 Destination port
	0x0040 Inner source IP address
	0x0080 Inner destination IP address
	0x0100 Inner IP protocol
	0x0200 Inner Flow Label
	0x0400 Inner source port
	0x0800 Inner destination port
	====== ============================

For example, to allow IPv6 traffic to be hashed based on standard
5-tuple and flow label:

 # sysctl -wq net.ipv6.fib_multipath_hash_fields=0x0037
 # sysctl -wq net.ipv6.fib_multipath_hash_policy=3

Implementation
==============

As with existing policies, the new policy relies on the flow dissector
to extract the packet fields for the hash computation. However, unlike
existing policies that either use the outer or inner flow, the new
policy might require both flows to be dissected.

To avoid unnecessary invocations of the flow dissector, the data path
skips dissection of the outer or inner flows if none of the outer or
inner fields are required.

In addition, inner flow dissection is not performed when no
encapsulation was encountered (i.e., 'FLOW_DIS_ENCAPSULATION' not set by
flow dissector) during dissection of the outer flow.

Testing
=======

Three new selftests are added with three different topologies that allow
testing of following traffic combinations:

* Non-encapsulated IPv4 / IPv6 traffic
* IPv4 / IPv6 overlay over IPv4 underlay
* IPv4 / IPv6 overlay over IPv6 underlay

All three tests follow the same pattern. Each time a different packet
field is used for hash computation. When the field changes in the packet
stream, traffic is expected to be balanced across the two paths. When
the field does not change, traffic is expected to be unbalanced across
the two paths.

Patchset overview
=================

Patches #1-#3 add custom multipath hash support for IPv4 traffic
Patches #4-#7 do the same for IPv6
Patches #8-#10 add selftests

Future work
===========

mlxsw support can be found here [3].

Changes since RFC v2 [4]:

* Patch #2: Document that 0x0008 is used for Flow Label
* Patch #2: Do not allow the bitmask to be zero
* Patch #6: Do not allow the bitmask to be zero

Changes since RFC v1 [5]:

* Use a bitmask instead of a bitmap

[1] https://blog.apnic.net/2018/01/11/ipv6-flow-label-misuse-hashing/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3acf3ec3f4b0fd4263989f2e4227bbd1c42b5fe1
[3] https://github.com/idosch/linux/tree/submit/custom_hash_mlxsw_v2
[4] https://lore.kernel.org/netdev/20210509151615.200608-1-idosch@idosch.org/
[5] https://lore.kernel.org/netdev/20210502162257.3472453-1-idosch@idosch.org/

Ido Schimmel (10):
  ipv4: Calculate multipath hash inside switch statement
  ipv4: Add a sysctl to control multipath hash fields
  ipv4: Add custom multipath hash policy
  ipv6: Use a more suitable label name
  ipv6: Calculate multipath hash inside switch statement
  ipv6: Add a sysctl to control multipath hash fields
  ipv6: Add custom multipath hash policy
  selftests: forwarding: Add test for custom multipath hash
  selftests: forwarding: Add test for custom multipath hash with IPv4
    GRE
  selftests: forwarding: Add test for custom multipath hash with IPv6
    GRE

 Documentation/networking/ip-sysctl.rst        |  58 +++
 include/net/ip_fib.h                          |  43 ++
 include/net/ipv6.h                            |   8 +
 include/net/netns/ipv4.h                      |   1 +
 include/net/netns/ipv6.h                      |   3 +-
 net/ipv4/fib_frontend.c                       |   6 +
 net/ipv4/route.c                              | 127 ++++-
 net/ipv4/sysctl_net_ipv4.c                    |  15 +-
 net/ipv6/ip6_fib.c                            |   9 +-
 net/ipv6/route.c                              | 131 ++++-
 net/ipv6/sysctl_net_ipv6.c                    |  15 +-
 .../net/forwarding/custom_multipath_hash.sh   | 364 ++++++++++++++
 .../forwarding/gre_custom_multipath_hash.sh   | 456 +++++++++++++++++
 .../ip6gre_custom_multipath_hash.sh           | 458 ++++++++++++++++++
 14 files changed, 1685 insertions(+), 9 deletions(-)
 create mode 100755 tools/testing/selftests/net/forwarding/custom_multipath_hash.sh
 create mode 100755 tools/testing/selftests/net/forwarding/gre_custom_multipath_hash.sh
 create mode 100755 tools/testing/selftests/net/forwarding/ip6gre_custom_multipath_hash.sh

-- 
2.31.1


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2021-05-18 20:50 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-17 18:15 [PATCH net-next 00/10] Add support for custom multipath hash Ido Schimmel
2021-05-17 18:15 ` [PATCH net-next 01/10] ipv4: Calculate multipath hash inside switch statement Ido Schimmel
2021-05-18  1:39   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 02/10] ipv4: Add a sysctl to control multipath hash fields Ido Schimmel
2021-05-18  1:41   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 03/10] ipv4: Add custom multipath hash policy Ido Schimmel
2021-05-18  1:42   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 04/10] ipv6: Use a more suitable label name Ido Schimmel
2021-05-18  1:43   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 05/10] ipv6: Calculate multipath hash inside switch statement Ido Schimmel
2021-05-18  1:43   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 06/10] ipv6: Add a sysctl to control multipath hash fields Ido Schimmel
2021-05-18  1:44   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 07/10] ipv6: Add custom multipath hash policy Ido Schimmel
2021-05-18  1:45   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 08/10] selftests: forwarding: Add test for custom multipath hash Ido Schimmel
2021-05-18  1:48   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 09/10] selftests: forwarding: Add test for custom multipath hash with IPv4 GRE Ido Schimmel
2021-05-18  1:50   ` David Ahern
2021-05-17 18:15 ` [PATCH net-next 10/10] selftests: forwarding: Add test for custom multipath hash with IPv6 GRE Ido Schimmel
2021-05-18  1:50   ` David Ahern
2021-05-18 20:50 ` [PATCH net-next 00/10] Add support for custom multipath hash patchwork-bot+netdevbpf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.