All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
@ 2020-01-08 14:26 Alex Rosenbaum
  2020-01-15  9:48 ` Mark Zhang
  2020-02-06 14:18 ` Tom Talpey
  0 siblings, 2 replies; 24+ messages in thread
From: Alex Rosenbaum @ 2020-01-08 14:26 UTC (permalink / raw)
  To: RDMA mailing list
  Cc: Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox,
	Maor Gottlieb, Leon Romanovsky, Mark Zhang

A combination of the flow_label field in the IPv6 header and UDP source port
field in RoCE v2.0 are used to identify a group of packets that must be
delivered in order by the network, end-to-end.
These fields are used to create entropy for network routers (ECMP), load
balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
headers.

The flow_label field is defined by a 20 bit hash value. CM based connections
will use a hash function definition based on the service type (QP Type) and
Service ID (SID). Where CM services are not used, the 20 bit hash will be
according to the source and destination QPN values.
Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.

UDP source port selection must adhere IANA port allocation ranges. Thus we will
be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
hex: 0xC000-0xFFFF.

The below calculations take into account the importance of producing a symmetric
hash result so we can support symmetric hash calculation of network elements.

Hash Calculation for RDMA IP CM Service
=======================================
For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
REQ private data info and Service ID.

Flow label hash function calculations definition will be defined as:
Extract the following fields from the CM IP REQ:
  CM_REQ.ServiceID.DstPort [2 Bytes]
  CM_REQ.PrivateData.SrcPort [2 Bytes]
  u32 hash = DstPort * SrcPort;
  hash ^= (hash >> 16);
  hash ^= (hash >> 8);
  AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;

  #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF

Result of the above hash will be kept in the CM's route path record connection
context and will be used all across its vitality for all preceding CM messages
on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
Once connection is established, the corresponding Connected RC QPs, on both
ends of the connection, will update their context with the calculated RDMA IP
CM Service based flow_label and UDP src_port values at the Connect phase of
the active side and Accept phase of the passive side of the connection.

CM will provide to the calculated value of the flow_label hash (20 bit) result
in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
ibv_ah_attr'.
The 'struct ibv_ah_attr' is passed by the CM to the provider library when
modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).

Hash Calculation for non-RDMA CM Service ID
===========================================
For non CM QP's, the application can define the flow_label value in the
'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
a AH for the datagram QP's (e.g.: UD).

If the provided flow_label value is zero, not set by the application (e.g.:
legacy cases), then verbs providers should use the src.QP[24bit] and
dst.QP[24bit] as input arguments for flow_label calculation.
As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
value. We'll define a flow_label value as:
  DstQPn [3 Bytes]
  SrcQPn [3 Bytes]
  u64 hash = DstQPn * SrcQPn;
  hash ^= (hash >> 20);
  hash ^= (hash >> 40);
  AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;

Hash Calculation for UDP src_port
=================================
Providers supporting RoCEv2 will use the 'flow_label' value as input to
calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
AH context.

UDP src_port calculations from flow label:
[while considering the 14 bits UDP port range according to IANA recommendation]
  AH_ATTR.GRH.flow_label [20 bits]
  u32 fl_low  = fl & 0x03FFF;
  u32 fl_high = fl & 0xFC000;
  u16 udp_sport = fl_low XOR (fl_high >> 14);
  RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT

  #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000

This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]

[1] https://www.spinics.net/lists/linux-rdma/msg73735.html

Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2020-02-25 13:21 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum
2020-01-15  9:48 ` Mark Zhang
2020-02-06 14:18 ` Tom Talpey
2020-02-06 14:35   ` Jason Gunthorpe
2020-02-06 14:39   ` Alex Rosenbaum
2020-02-06 15:19     ` Tom Talpey
2020-02-08  9:58       ` Alex Rosenbaum
2020-02-12 15:47         ` Tom Talpey
2020-02-13 11:03           ` Alex Rosenbaum
2020-02-13 15:26             ` Tom Talpey
2020-02-13 15:41               ` Jason Gunthorpe
2020-02-14 14:23                 ` Mark Zhang
2020-02-15  6:27                   ` Mark Zhang
2020-02-18 14:16                     ` Tom Talpey
2020-02-18 17:41                       ` Tom Talpey
2020-02-19  1:51                         ` Mark Zhang
2020-02-19  2:01                           ` Tom Talpey
2020-02-19  2:06                             ` Mark Zhang
2020-02-19 13:06                               ` Jason Gunthorpe
2020-02-19 17:41                                 ` Tom Talpey
2020-02-19 17:55                                   ` Jason Gunthorpe
2020-02-20  1:04                                   ` Mark Zhang
2020-02-21 14:47                                     ` Tom Talpey
2020-02-25 13:20                                       ` Alex Rosenbaum

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.