From: Tom Talpey <tom@talpey.com> To: Alex Rosenbaum <rosenbaumalex@gmail.com> Cc: RDMA mailing list <linux-rdma@vger.kernel.org>, Jason Gunthorpe <jgg@ziepe.ca>, Eran Ben Elisha <eranbe@mellanox.com>, Yishai Hadas <yishaih@mellanox.com>, "Alex @ Mellanox" <alexr@mellanox.com>, Maor Gottlieb <maorg@mellanox.com>, Leon Romanovsky <leonro@mellanox.com>, Mark Zhang <markz@mellanox.com> Subject: Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Date: Thu, 6 Feb 2020 10:19:05 -0500 Message-ID: <b0414c43-c035-aa90-9f89-7ec6bba9e119@talpey.com> (raw) In-Reply-To: <CAFgAxU80+feEEtaRYFYTHwTMSE6Edjq0t0siJ0W06WSyD+Cy3A@mail.gmail.com> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote: > On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote: >> >> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: >>> A combination of the flow_label field in the IPv6 header and UDP source port >>> field in RoCE v2.0 are used to identify a group of packets that must be >>> delivered in order by the network, end-to-end. >>> These fields are used to create entropy for network routers (ECMP), load >>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB >>> headers. >>> >>> The flow_label field is defined by a 20 bit hash value. CM based connections >>> will use a hash function definition based on the service type (QP Type) and >>> Service ID (SID). Where CM services are not used, the 20 bit hash will be >>> according to the source and destination QPN values. >>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. >>> >>> UDP source port selection must adhere IANA port allocation ranges. Thus we will >>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in >>> hex: 0xC000-0xFFFF. >>> >>> The below calculations take into account the importance of producing a symmetric >>> hash result so we can support symmetric hash calculation of network elements. >>> >>> Hash Calculation for RDMA IP CM Service >>> ======================================= >>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the >>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM >>> REQ private data info and Service ID. >>> >>> Flow label hash function calculations definition will be defined as: >>> Extract the following fields from the CM IP REQ: >>> CM_REQ.ServiceID.DstPort [2 Bytes] >>> CM_REQ.PrivateData.SrcPort [2 Bytes] >>> u32 hash = DstPort * SrcPort; >>> hash ^= (hash >> 16); >>> hash ^= (hash >> 8); >>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>> >>> #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF >> >> Sorry it took me a while to respond to this, and thanks for looking >> into it since my comments on the previous proposal. I have a concern >> with an aspect of this one. >> >> The RoCEv2 destination port is a fixed value, 4791. Therefore the >> term >> >> u32 hash = DstPort * SrcPort; >> >> adds no entropy beyond the value of SrcPort. >> > > we're talking about the CM service ports, taken from the > rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec); > these are the CM level port-space and not the RoCE UDP L4 ports. > we want to use both as these will allow different client instance and > server instance on same nodes will use differen CM ports and hopefully > generate different hash results for multi-flows between these two > servers. Aha, ok I guess I missed that, and ok. >> In turn, the subsequent >> >> hash ^= (hash >> 16); >> hash ^= (hash >> 8); >> >> are re-mashing the bits with one another, again, adding no entropy. I still wonder about this one. It's attempting to reduce the 32-bit product to 20 bits, but a second xor with the "middle" 16 bits seems really strange. Mathematically, wouldn't it be better to just take the modulus of 2^20? If not, are you expecting some behavior in the hash values that makes the double-xor approach better (in which case it should be called out)? Tom. >> Can you describe how, mathematically, this is not different from simply >> using the SrcPort field, and if so, how it contributes to the entropy >> differentiation of the incoming streams? >> >> Tom. >> >>> Result of the above hash will be kept in the CM's route path record connection >>> context and will be used all across its vitality for all preceding CM messages >>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..). >>> Once connection is established, the corresponding Connected RC QPs, on both >>> ends of the connection, will update their context with the calculated RDMA IP >>> CM Service based flow_label and UDP src_port values at the Connect phase of >>> the active side and Accept phase of the passive side of the connection. >>> >>> CM will provide to the calculated value of the flow_label hash (20 bit) result >>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct >>> ibv_ah_attr'. >>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when >>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, >>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with >>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). >>> >>> Hash Calculation for non-RDMA CM Service ID >>> =========================================== >>> For non CM QP's, the application can define the flow_label value in the >>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating >>> a AH for the datagram QP's (e.g.: UD). >>> >>> If the provided flow_label value is zero, not set by the application (e.g.: >>> legacy cases), then verbs providers should use the src.QP[24bit] and >>> dst.QP[24bit] as input arguments for flow_label calculation. >>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes >>> value. We'll define a flow_label value as: >>> DstQPn [3 Bytes] >>> SrcQPn [3 Bytes] >>> u64 hash = DstQPn * SrcQPn; >>> hash ^= (hash >> 20); >>> hash ^= (hash >> 40); >>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>> >>> Hash Calculation for UDP src_port >>> ================================= >>> Providers supporting RoCEv2 will use the 'flow_label' value as input to >>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the >>> AH context. >>> >>> UDP src_port calculations from flow label: >>> [while considering the 14 bits UDP port range according to IANA recommendation] >>> AH_ATTR.GRH.flow_label [20 bits] >>> u32 fl_low = fl & 0x03FFF; >>> u32 fl_high = fl & 0xFC000; >>> u16 udp_sport = fl_low XOR (fl_high >> 14); >>> RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT >>> >>> #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 >>> >>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] >>> >>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html >>> >>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> >>> >>> > >
next prev parent reply index Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-01-08 14:26 Alex Rosenbaum 2020-01-15 9:48 ` Mark Zhang 2020-02-06 14:18 ` Tom Talpey 2020-02-06 14:35 ` Jason Gunthorpe 2020-02-06 14:39 ` Alex Rosenbaum 2020-02-06 15:19 ` Tom Talpey [this message] 2020-02-08 9:58 ` Alex Rosenbaum 2020-02-12 15:47 ` Tom Talpey 2020-02-13 11:03 ` Alex Rosenbaum 2020-02-13 15:26 ` Tom Talpey 2020-02-13 15:41 ` Jason Gunthorpe 2020-02-14 14:23 ` Mark Zhang 2020-02-15 6:27 ` Mark Zhang 2020-02-18 14:16 ` Tom Talpey 2020-02-18 17:41 ` Tom Talpey 2020-02-19 1:51 ` Mark Zhang 2020-02-19 2:01 ` Tom Talpey 2020-02-19 2:06 ` Mark Zhang 2020-02-19 13:06 ` Jason Gunthorpe 2020-02-19 17:41 ` Tom Talpey 2020-02-19 17:55 ` Jason Gunthorpe 2020-02-20 1:04 ` Mark Zhang 2020-02-21 14:47 ` Tom Talpey 2020-02-25 13:20 ` Alex Rosenbaum
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=b0414c43-c035-aa90-9f89-7ec6bba9e119@talpey.com \ --to=tom@talpey.com \ --cc=alexr@mellanox.com \ --cc=eranbe@mellanox.com \ --cc=jgg@ziepe.ca \ --cc=leonro@mellanox.com \ --cc=linux-rdma@vger.kernel.org \ --cc=maorg@mellanox.com \ --cc=markz@mellanox.com \ --cc=rosenbaumalex@gmail.com \ --cc=yishaih@mellanox.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Linux-RDMA Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \ linux-rdma@vger.kernel.org public-inbox-index linux-rdma Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma AGPL code for this site: git clone https://public-inbox.org/public-inbox.git