* [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port @ 2020-01-08 14:26 Alex Rosenbaum 2020-01-15 9:48 ` Mark Zhang 2020-02-06 14:18 ` Tom Talpey 0 siblings, 2 replies; 24+ messages in thread From: Alex Rosenbaum @ 2020-01-08 14:26 UTC (permalink / raw) To: RDMA mailing list Cc: Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang A combination of the flow_label field in the IPv6 header and UDP source port field in RoCE v2.0 are used to identify a group of packets that must be delivered in order by the network, end-to-end. These fields are used to create entropy for network routers (ECMP), load balancers and 802.3ad link aggregation switching that are not aware of RoCE IB headers. The flow_label field is defined by a 20 bit hash value. CM based connections will use a hash function definition based on the service type (QP Type) and Service ID (SID). Where CM services are not used, the 20 bit hash will be according to the source and destination QPN values. Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. UDP source port selection must adhere IANA port allocation ranges. Thus we will be using IANA recommendation for Ephemeral port range of: 49152-65535, or in hex: 0xC000-0xFFFF. The below calculations take into account the importance of producing a symmetric hash result so we can support symmetric hash calculation of network elements. Hash Calculation for RDMA IP CM Service ======================================= For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM REQ private data info and Service ID. Flow label hash function calculations definition will be defined as: Extract the following fields from the CM IP REQ: CM_REQ.ServiceID.DstPort [2 Bytes] CM_REQ.PrivateData.SrcPort [2 Bytes] u32 hash = DstPort * SrcPort; hash ^= (hash >> 16); hash ^= (hash >> 8); AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF Result of the above hash will be kept in the CM's route path record connection context and will be used all across its vitality for all preceding CM messages on both ends of the connection (including REP, REJ, DREQ, DREP, ..). Once connection is established, the corresponding Connected RC QPs, on both ends of the connection, will update their context with the calculated RDMA IP CM Service based flow_label and UDP src_port values at the Connect phase of the active side and Accept phase of the passive side of the connection. CM will provide to the calculated value of the flow_label hash (20 bit) result in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct ibv_ah_attr'. The 'struct ibv_ah_attr' is passed by the CM to the provider library when modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). Hash Calculation for non-RDMA CM Service ID =========================================== For non CM QP's, the application can define the flow_label value in the 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating a AH for the datagram QP's (e.g.: UD). If the provided flow_label value is zero, not set by the application (e.g.: legacy cases), then verbs providers should use the src.QP[24bit] and dst.QP[24bit] as input arguments for flow_label calculation. As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes value. We'll define a flow_label value as: DstQPn [3 Bytes] SrcQPn [3 Bytes] u64 hash = DstQPn * SrcQPn; hash ^= (hash >> 20); hash ^= (hash >> 40); AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; Hash Calculation for UDP src_port ================================= Providers supporting RoCEv2 will use the 'flow_label' value as input to calculate the RoCEv2 UDP src_port, which will be used in the QP context or the AH context. UDP src_port calculations from flow label: [while considering the 14 bits UDP port range according to IANA recommendation] AH_ATTR.GRH.flow_label [20 bits] u32 fl_low = fl & 0x03FFF; u32 fl_high = fl & 0xFC000; u16 udp_sport = fl_low XOR (fl_high >> 14); RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] [1] https://www.spinics.net/lists/linux-rdma/msg73735.html Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum @ 2020-01-15 9:48 ` Mark Zhang 2020-02-06 14:18 ` Tom Talpey 1 sibling, 0 replies; 24+ messages in thread From: Mark Zhang @ 2020-01-15 9:48 UTC (permalink / raw) To: Alex Rosenbaum, RDMA mailing list Cc: Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On 1/8/2020 10:26 PM, Alex Rosenbaum wrote: > A combination of the flow_label field in the IPv6 header and UDP source port > field in RoCE v2.0 are used to identify a group of packets that must be > delivered in order by the network, end-to-end. > These fields are used to create entropy for network routers (ECMP), load > balancers and 802.3ad link aggregation switching that are not aware of RoCE IB > headers. > > The flow_label field is defined by a 20 bit hash value. CM based connections > will use a hash function definition based on the service type (QP Type) and > Service ID (SID). Where CM services are not used, the 20 bit hash will be > according to the source and destination QPN values. > Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. > > UDP source port selection must adhere IANA port allocation ranges. Thus we will > be using IANA recommendation for Ephemeral port range of: 49152-65535, or in > hex: 0xC000-0xFFFF. > > The below calculations take into account the importance of producing a symmetric > hash result so we can support symmetric hash calculation of network elements. > > Hash Calculation for RDMA IP CM Service > ======================================= > For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the > RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM > REQ private data info and Service ID. > > Flow label hash function calculations definition will be defined as: > Extract the following fields from the CM IP REQ: > CM_REQ.ServiceID.DstPort [2 Bytes] > CM_REQ.PrivateData.SrcPort [2 Bytes] > u32 hash = DstPort * SrcPort; > hash ^= (hash >> 16); > hash ^= (hash >> 8); > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF > > Result of the above hash will be kept in the CM's route path record connection > context and will be used all across its vitality for all preceding CM messages > on both ends of the connection (including REP, REJ, DREQ, DREP, ..). > Once connection is established, the corresponding Connected RC QPs, on both > ends of the connection, will update their context with the calculated RDMA IP > CM Service based flow_label and UDP src_port values at the Connect phase of > the active side and Accept phase of the passive side of the connection. > > CM will provide to the calculated value of the flow_label hash (20 bit) result > in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct > ibv_ah_attr'. > The 'struct ibv_ah_attr' is passed by the CM to the provider library when > modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, > ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with > datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). > > Hash Calculation for non-RDMA CM Service ID > =========================================== > For non CM QP's, the application can define the flow_label value in the > 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating > a AH for the datagram QP's (e.g.: UD). > Hi Alex, when creating an AH for the datagram QP, I think we don't have the src.QP and dst.QP, so we can't set the flow_label here? > If the provided flow_label value is zero, not set by the application (e.g.: > legacy cases), then verbs providers should use the src.QP[24bit] and > dst.QP[24bit] as input arguments for flow_label calculation. > As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes > value. We'll define a flow_label value as: > DstQPn [3 Bytes] > SrcQPn [3 Bytes] > u64 hash = DstQPn * SrcQPn; > hash ^= (hash >> 20); > hash ^= (hash >> 40); > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > Hash Calculation for UDP src_port > ================================= > Providers supporting RoCEv2 will use the 'flow_label' value as input to > calculate the RoCEv2 UDP src_port, which will be used in the QP context or the > AH context. > > UDP src_port calculations from flow label: > [while considering the 14 bits UDP port range according to IANA recommendation] > AH_ATTR.GRH.flow_label [20 bits] > u32 fl_low = fl & 0x03FFF; > u32 fl_high = fl & 0xFC000; > u16 udp_sport = fl_low XOR (fl_high >> 14); > RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT > > #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 > > This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] > > [1] https://www.spinics.net/lists/linux-rdma/msg73735.html > > Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum 2020-01-15 9:48 ` Mark Zhang @ 2020-02-06 14:18 ` Tom Talpey 2020-02-06 14:35 ` Jason Gunthorpe 2020-02-06 14:39 ` Alex Rosenbaum 1 sibling, 2 replies; 24+ messages in thread From: Tom Talpey @ 2020-02-06 14:18 UTC (permalink / raw) To: Alex Rosenbaum, RDMA mailing list Cc: Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: > A combination of the flow_label field in the IPv6 header and UDP source port > field in RoCE v2.0 are used to identify a group of packets that must be > delivered in order by the network, end-to-end. > These fields are used to create entropy for network routers (ECMP), load > balancers and 802.3ad link aggregation switching that are not aware of RoCE IB > headers. > > The flow_label field is defined by a 20 bit hash value. CM based connections > will use a hash function definition based on the service type (QP Type) and > Service ID (SID). Where CM services are not used, the 20 bit hash will be > according to the source and destination QPN values. > Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. > > UDP source port selection must adhere IANA port allocation ranges. Thus we will > be using IANA recommendation for Ephemeral port range of: 49152-65535, or in > hex: 0xC000-0xFFFF. > > The below calculations take into account the importance of producing a symmetric > hash result so we can support symmetric hash calculation of network elements. > > Hash Calculation for RDMA IP CM Service > ======================================= > For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the > RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM > REQ private data info and Service ID. > > Flow label hash function calculations definition will be defined as: > Extract the following fields from the CM IP REQ: > CM_REQ.ServiceID.DstPort [2 Bytes] > CM_REQ.PrivateData.SrcPort [2 Bytes] > u32 hash = DstPort * SrcPort; > hash ^= (hash >> 16); > hash ^= (hash >> 8); > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF Sorry it took me a while to respond to this, and thanks for looking into it since my comments on the previous proposal. I have a concern with an aspect of this one. The RoCEv2 destination port is a fixed value, 4791. Therefore the term u32 hash = DstPort * SrcPort; adds no entropy beyond the value of SrcPort. In turn, the subsequent hash ^= (hash >> 16); hash ^= (hash >> 8); are re-mashing the bits with one another, again, adding no entropy. Can you describe how, mathematically, this is not different from simply using the SrcPort field, and if so, how it contributes to the entropy differentiation of the incoming streams? Tom. > Result of the above hash will be kept in the CM's route path record connection > context and will be used all across its vitality for all preceding CM messages > on both ends of the connection (including REP, REJ, DREQ, DREP, ..). > Once connection is established, the corresponding Connected RC QPs, on both > ends of the connection, will update their context with the calculated RDMA IP > CM Service based flow_label and UDP src_port values at the Connect phase of > the active side and Accept phase of the passive side of the connection. > > CM will provide to the calculated value of the flow_label hash (20 bit) result > in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct > ibv_ah_attr'. > The 'struct ibv_ah_attr' is passed by the CM to the provider library when > modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, > ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with > datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). > > Hash Calculation for non-RDMA CM Service ID > =========================================== > For non CM QP's, the application can define the flow_label value in the > 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating > a AH for the datagram QP's (e.g.: UD). > > If the provided flow_label value is zero, not set by the application (e.g.: > legacy cases), then verbs providers should use the src.QP[24bit] and > dst.QP[24bit] as input arguments for flow_label calculation. > As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes > value. We'll define a flow_label value as: > DstQPn [3 Bytes] > SrcQPn [3 Bytes] > u64 hash = DstQPn * SrcQPn; > hash ^= (hash >> 20); > hash ^= (hash >> 40); > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > Hash Calculation for UDP src_port > ================================= > Providers supporting RoCEv2 will use the 'flow_label' value as input to > calculate the RoCEv2 UDP src_port, which will be used in the QP context or the > AH context. > > UDP src_port calculations from flow label: > [while considering the 14 bits UDP port range according to IANA recommendation] > AH_ATTR.GRH.flow_label [20 bits] > u32 fl_low = fl & 0x03FFF; > u32 fl_high = fl & 0xFC000; > u16 udp_sport = fl_low XOR (fl_high >> 14); > RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT > > #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 > > This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] > > [1] https://www.spinics.net/lists/linux-rdma/msg73735.html > > Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-06 14:18 ` Tom Talpey @ 2020-02-06 14:35 ` Jason Gunthorpe 2020-02-06 14:39 ` Alex Rosenbaum 1 sibling, 0 replies; 24+ messages in thread From: Jason Gunthorpe @ 2020-02-06 14:35 UTC (permalink / raw) To: Tom Talpey Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On Thu, Feb 06, 2020 at 09:18:38AM -0500, Tom Talpey wrote: > On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: > > A combination of the flow_label field in the IPv6 header and UDP source port > > field in RoCE v2.0 are used to identify a group of packets that must be > > delivered in order by the network, end-to-end. > > These fields are used to create entropy for network routers (ECMP), load > > balancers and 802.3ad link aggregation switching that are not aware of RoCE IB > > headers. > > > > The flow_label field is defined by a 20 bit hash value. CM based connections > > will use a hash function definition based on the service type (QP Type) and > > Service ID (SID). Where CM services are not used, the 20 bit hash will be > > according to the source and destination QPN values. > > Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. > > > > UDP source port selection must adhere IANA port allocation ranges. Thus we will > > be using IANA recommendation for Ephemeral port range of: 49152-65535, or in > > hex: 0xC000-0xFFFF. > > > > The below calculations take into account the importance of producing a symmetric > > hash result so we can support symmetric hash calculation of network elements. > > > > Hash Calculation for RDMA IP CM Service > > ======================================= > > For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the > > RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM > > REQ private data info and Service ID. > > > > Flow label hash function calculations definition will be defined as: > > Extract the following fields from the CM IP REQ: > > CM_REQ.ServiceID.DstPort [2 Bytes] > > CM_REQ.PrivateData.SrcPort [2 Bytes] > > u32 hash = DstPort * SrcPort; > > hash ^= (hash >> 16); > > hash ^= (hash >> 8); > > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > > > #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF > > Sorry it took me a while to respond to this, and thanks for looking > into it since my comments on the previous proposal. I have a concern > with an aspect of this one. > > The RoCEv2 destination port is a fixed value, 4791. Therefore the > term I read the above as using the destination port of the IP contained within the CM REQ, not as the destination port of the RoCE UDP header? So it can be different.. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-06 14:18 ` Tom Talpey 2020-02-06 14:35 ` Jason Gunthorpe @ 2020-02-06 14:39 ` Alex Rosenbaum 2020-02-06 15:19 ` Tom Talpey 1 sibling, 1 reply; 24+ messages in thread From: Alex Rosenbaum @ 2020-02-06 14:39 UTC (permalink / raw) To: Tom Talpey Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote: > > On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: > > A combination of the flow_label field in the IPv6 header and UDP source port > > field in RoCE v2.0 are used to identify a group of packets that must be > > delivered in order by the network, end-to-end. > > These fields are used to create entropy for network routers (ECMP), load > > balancers and 802.3ad link aggregation switching that are not aware of RoCE IB > > headers. > > > > The flow_label field is defined by a 20 bit hash value. CM based connections > > will use a hash function definition based on the service type (QP Type) and > > Service ID (SID). Where CM services are not used, the 20 bit hash will be > > according to the source and destination QPN values. > > Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. > > > > UDP source port selection must adhere IANA port allocation ranges. Thus we will > > be using IANA recommendation for Ephemeral port range of: 49152-65535, or in > > hex: 0xC000-0xFFFF. > > > > The below calculations take into account the importance of producing a symmetric > > hash result so we can support symmetric hash calculation of network elements. > > > > Hash Calculation for RDMA IP CM Service > > ======================================= > > For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the > > RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM > > REQ private data info and Service ID. > > > > Flow label hash function calculations definition will be defined as: > > Extract the following fields from the CM IP REQ: > > CM_REQ.ServiceID.DstPort [2 Bytes] > > CM_REQ.PrivateData.SrcPort [2 Bytes] > > u32 hash = DstPort * SrcPort; > > hash ^= (hash >> 16); > > hash ^= (hash >> 8); > > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > > > #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF > > Sorry it took me a while to respond to this, and thanks for looking > into it since my comments on the previous proposal. I have a concern > with an aspect of this one. > > The RoCEv2 destination port is a fixed value, 4791. Therefore the > term > > u32 hash = DstPort * SrcPort; > > adds no entropy beyond the value of SrcPort. > we're talking about the CM service ports, taken from the rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec); these are the CM level port-space and not the RoCE UDP L4 ports. we want to use both as these will allow different client instance and server instance on same nodes will use differen CM ports and hopefully generate different hash results for multi-flows between these two servers. > In turn, the subsequent > > hash ^= (hash >> 16); > hash ^= (hash >> 8); > > are re-mashing the bits with one another, again, adding no entropy. > > Can you describe how, mathematically, this is not different from simply > using the SrcPort field, and if so, how it contributes to the entropy > differentiation of the incoming streams? > > Tom. > > > Result of the above hash will be kept in the CM's route path record connection > > context and will be used all across its vitality for all preceding CM messages > > on both ends of the connection (including REP, REJ, DREQ, DREP, ..). > > Once connection is established, the corresponding Connected RC QPs, on both > > ends of the connection, will update their context with the calculated RDMA IP > > CM Service based flow_label and UDP src_port values at the Connect phase of > > the active side and Accept phase of the passive side of the connection. > > > > CM will provide to the calculated value of the flow_label hash (20 bit) result > > in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct > > ibv_ah_attr'. > > The 'struct ibv_ah_attr' is passed by the CM to the provider library when > > modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, > > ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with > > datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). > > > > Hash Calculation for non-RDMA CM Service ID > > =========================================== > > For non CM QP's, the application can define the flow_label value in the > > 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating > > a AH for the datagram QP's (e.g.: UD). > > > > If the provided flow_label value is zero, not set by the application (e.g.: > > legacy cases), then verbs providers should use the src.QP[24bit] and > > dst.QP[24bit] as input arguments for flow_label calculation. > > As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes > > value. We'll define a flow_label value as: > > DstQPn [3 Bytes] > > SrcQPn [3 Bytes] > > u64 hash = DstQPn * SrcQPn; > > hash ^= (hash >> 20); > > hash ^= (hash >> 40); > > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > > > Hash Calculation for UDP src_port > > ================================= > > Providers supporting RoCEv2 will use the 'flow_label' value as input to > > calculate the RoCEv2 UDP src_port, which will be used in the QP context or the > > AH context. > > > > UDP src_port calculations from flow label: > > [while considering the 14 bits UDP port range according to IANA recommendation] > > AH_ATTR.GRH.flow_label [20 bits] > > u32 fl_low = fl & 0x03FFF; > > u32 fl_high = fl & 0xFC000; > > u16 udp_sport = fl_low XOR (fl_high >> 14); > > RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT > > > > #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 > > > > This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] > > > > [1] https://www.spinics.net/lists/linux-rdma/msg73735.html > > > > Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> > > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-06 14:39 ` Alex Rosenbaum @ 2020-02-06 15:19 ` Tom Talpey 2020-02-08 9:58 ` Alex Rosenbaum 0 siblings, 1 reply; 24+ messages in thread From: Tom Talpey @ 2020-02-06 15:19 UTC (permalink / raw) To: Alex Rosenbaum Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On 2/6/2020 9:39 AM, Alex Rosenbaum wrote: > On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote: >> >> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: >>> A combination of the flow_label field in the IPv6 header and UDP source port >>> field in RoCE v2.0 are used to identify a group of packets that must be >>> delivered in order by the network, end-to-end. >>> These fields are used to create entropy for network routers (ECMP), load >>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB >>> headers. >>> >>> The flow_label field is defined by a 20 bit hash value. CM based connections >>> will use a hash function definition based on the service type (QP Type) and >>> Service ID (SID). Where CM services are not used, the 20 bit hash will be >>> according to the source and destination QPN values. >>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. >>> >>> UDP source port selection must adhere IANA port allocation ranges. Thus we will >>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in >>> hex: 0xC000-0xFFFF. >>> >>> The below calculations take into account the importance of producing a symmetric >>> hash result so we can support symmetric hash calculation of network elements. >>> >>> Hash Calculation for RDMA IP CM Service >>> ======================================= >>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the >>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM >>> REQ private data info and Service ID. >>> >>> Flow label hash function calculations definition will be defined as: >>> Extract the following fields from the CM IP REQ: >>> CM_REQ.ServiceID.DstPort [2 Bytes] >>> CM_REQ.PrivateData.SrcPort [2 Bytes] >>> u32 hash = DstPort * SrcPort; >>> hash ^= (hash >> 16); >>> hash ^= (hash >> 8); >>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>> >>> #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF >> >> Sorry it took me a while to respond to this, and thanks for looking >> into it since my comments on the previous proposal. I have a concern >> with an aspect of this one. >> >> The RoCEv2 destination port is a fixed value, 4791. Therefore the >> term >> >> u32 hash = DstPort * SrcPort; >> >> adds no entropy beyond the value of SrcPort. >> > > we're talking about the CM service ports, taken from the > rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec); > these are the CM level port-space and not the RoCE UDP L4 ports. > we want to use both as these will allow different client instance and > server instance on same nodes will use differen CM ports and hopefully > generate different hash results for multi-flows between these two > servers. Aha, ok I guess I missed that, and ok. >> In turn, the subsequent >> >> hash ^= (hash >> 16); >> hash ^= (hash >> 8); >> >> are re-mashing the bits with one another, again, adding no entropy. I still wonder about this one. It's attempting to reduce the 32-bit product to 20 bits, but a second xor with the "middle" 16 bits seems really strange. Mathematically, wouldn't it be better to just take the modulus of 2^20? If not, are you expecting some behavior in the hash values that makes the double-xor approach better (in which case it should be called out)? Tom. >> Can you describe how, mathematically, this is not different from simply >> using the SrcPort field, and if so, how it contributes to the entropy >> differentiation of the incoming streams? >> >> Tom. >> >>> Result of the above hash will be kept in the CM's route path record connection >>> context and will be used all across its vitality for all preceding CM messages >>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..). >>> Once connection is established, the corresponding Connected RC QPs, on both >>> ends of the connection, will update their context with the calculated RDMA IP >>> CM Service based flow_label and UDP src_port values at the Connect phase of >>> the active side and Accept phase of the passive side of the connection. >>> >>> CM will provide to the calculated value of the flow_label hash (20 bit) result >>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct >>> ibv_ah_attr'. >>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when >>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, >>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with >>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). >>> >>> Hash Calculation for non-RDMA CM Service ID >>> =========================================== >>> For non CM QP's, the application can define the flow_label value in the >>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating >>> a AH for the datagram QP's (e.g.: UD). >>> >>> If the provided flow_label value is zero, not set by the application (e.g.: >>> legacy cases), then verbs providers should use the src.QP[24bit] and >>> dst.QP[24bit] as input arguments for flow_label calculation. >>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes >>> value. We'll define a flow_label value as: >>> DstQPn [3 Bytes] >>> SrcQPn [3 Bytes] >>> u64 hash = DstQPn * SrcQPn; >>> hash ^= (hash >> 20); >>> hash ^= (hash >> 40); >>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>> >>> Hash Calculation for UDP src_port >>> ================================= >>> Providers supporting RoCEv2 will use the 'flow_label' value as input to >>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the >>> AH context. >>> >>> UDP src_port calculations from flow label: >>> [while considering the 14 bits UDP port range according to IANA recommendation] >>> AH_ATTR.GRH.flow_label [20 bits] >>> u32 fl_low = fl & 0x03FFF; >>> u32 fl_high = fl & 0xFC000; >>> u16 udp_sport = fl_low XOR (fl_high >> 14); >>> RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT >>> >>> #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 >>> >>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] >>> >>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html >>> >>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> >>> >>> > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-06 15:19 ` Tom Talpey @ 2020-02-08 9:58 ` Alex Rosenbaum 2020-02-12 15:47 ` Tom Talpey 0 siblings, 1 reply; 24+ messages in thread From: Alex Rosenbaum @ 2020-02-08 9:58 UTC (permalink / raw) To: Tom Talpey Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote: > > On 2/6/2020 9:39 AM, Alex Rosenbaum wrote: > > On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote: > >> > >> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: > >>> A combination of the flow_label field in the IPv6 header and UDP source port > >>> field in RoCE v2.0 are used to identify a group of packets that must be > >>> delivered in order by the network, end-to-end. > >>> These fields are used to create entropy for network routers (ECMP), load > >>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB > >>> headers. > >>> > >>> The flow_label field is defined by a 20 bit hash value. CM based connections > >>> will use a hash function definition based on the service type (QP Type) and > >>> Service ID (SID). Where CM services are not used, the 20 bit hash will be > >>> according to the source and destination QPN values. > >>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. > >>> > >>> UDP source port selection must adhere IANA port allocation ranges. Thus we will > >>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in > >>> hex: 0xC000-0xFFFF. > >>> > >>> The below calculations take into account the importance of producing a symmetric > >>> hash result so we can support symmetric hash calculation of network elements. > >>> > >>> Hash Calculation for RDMA IP CM Service > >>> ======================================= > >>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the > >>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM > >>> REQ private data info and Service ID. > >>> > >>> Flow label hash function calculations definition will be defined as: > >>> Extract the following fields from the CM IP REQ: > >>> CM_REQ.ServiceID.DstPort [2 Bytes] > >>> CM_REQ.PrivateData.SrcPort [2 Bytes] > >>> u32 hash = DstPort * SrcPort; > >>> hash ^= (hash >> 16); > >>> hash ^= (hash >> 8); > >>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > >>> > >>> #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF > >> > >> Sorry it took me a while to respond to this, and thanks for looking > >> into it since my comments on the previous proposal. I have a concern > >> with an aspect of this one. > >> > >> The RoCEv2 destination port is a fixed value, 4791. Therefore the > >> term > >> > >> u32 hash = DstPort * SrcPort; > >> > >> adds no entropy beyond the value of SrcPort. > >> > > > > we're talking about the CM service ports, taken from the > > rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec); > > these are the CM level port-space and not the RoCE UDP L4 ports. > > we want to use both as these will allow different client instance and > > server instance on same nodes will use differen CM ports and hopefully > > generate different hash results for multi-flows between these two > > servers. > > Aha, ok I guess I missed that, and ok. > > >> In turn, the subsequent > >> > >> hash ^= (hash >> 16); > >> hash ^= (hash >> 8); > >> > >> are re-mashing the bits with one another, again, adding no entropy. > > I still wonder about this one. It's attempting to reduce the 32-bit > product to 20 bits, but a second xor with the "middle" 16 bits seems > really strange. Mathematically, wouldn't it be better to just take > the modulus of 2^20? If not, are you expecting some behavior in the > hash values that makes the double-xor approach better (in which case > it should be called out)? > > Tom. The function takes into account creating a symmetric hash, so both active and passive can reconstruct the same flow label results. That's why we multiply the two CM Port values (16 bit * 16 bit). The results is a 32 bit value, and we don't want to lose any of of the MSB bit's by modulus or masking. So we need some folding function from 32 bit to the 20 bit flow label. The specific bit shift is something I took from the bond driver: https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407 This proved very good in spreading the flow label in our internal testing. Other alternative can be suggested, as long as it considers all bits in the conversion 32->20 bits. Alex > > >> Can you describe how, mathematically, this is not different from simply > >> using the SrcPort field, and if so, how it contributes to the entropy > >> differentiation of the incoming streams? > >> > >> Tom. > >> > >>> Result of the above hash will be kept in the CM's route path record connection > >>> context and will be used all across its vitality for all preceding CM messages > >>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..). > >>> Once connection is established, the corresponding Connected RC QPs, on both > >>> ends of the connection, will update their context with the calculated RDMA IP > >>> CM Service based flow_label and UDP src_port values at the Connect phase of > >>> the active side and Accept phase of the passive side of the connection. > >>> > >>> CM will provide to the calculated value of the flow_label hash (20 bit) result > >>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct > >>> ibv_ah_attr'. > >>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when > >>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, > >>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with > >>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). > >>> > >>> Hash Calculation for non-RDMA CM Service ID > >>> =========================================== > >>> For non CM QP's, the application can define the flow_label value in the > >>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating > >>> a AH for the datagram QP's (e.g.: UD). > >>> > >>> If the provided flow_label value is zero, not set by the application (e.g.: > >>> legacy cases), then verbs providers should use the src.QP[24bit] and > >>> dst.QP[24bit] as input arguments for flow_label calculation. > >>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes > >>> value. We'll define a flow_label value as: > >>> DstQPn [3 Bytes] > >>> SrcQPn [3 Bytes] > >>> u64 hash = DstQPn * SrcQPn; > >>> hash ^= (hash >> 20); > >>> hash ^= (hash >> 40); > >>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > >>> > >>> Hash Calculation for UDP src_port > >>> ================================= > >>> Providers supporting RoCEv2 will use the 'flow_label' value as input to > >>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the > >>> AH context. > >>> > >>> UDP src_port calculations from flow label: > >>> [while considering the 14 bits UDP port range according to IANA recommendation] > >>> AH_ATTR.GRH.flow_label [20 bits] > >>> u32 fl_low = fl & 0x03FFF; > >>> u32 fl_high = fl & 0xFC000; > >>> u16 udp_sport = fl_low XOR (fl_high >> 14); > >>> RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT > >>> > >>> #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 > >>> > >>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] > >>> > >>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html > >>> > >>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> > >>> > >>> > > > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-08 9:58 ` Alex Rosenbaum @ 2020-02-12 15:47 ` Tom Talpey 2020-02-13 11:03 ` Alex Rosenbaum 0 siblings, 1 reply; 24+ messages in thread From: Tom Talpey @ 2020-02-12 15:47 UTC (permalink / raw) To: Alex Rosenbaum Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On 2/8/2020 4:58 AM, Alex Rosenbaum wrote: > On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote: >> >> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote: >>> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote: >>>> >>>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: >>>>> A combination of the flow_label field in the IPv6 header and UDP source port >>>>> field in RoCE v2.0 are used to identify a group of packets that must be >>>>> delivered in order by the network, end-to-end. >>>>> These fields are used to create entropy for network routers (ECMP), load >>>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB >>>>> headers. >>>>> >>>>> The flow_label field is defined by a 20 bit hash value. CM based connections >>>>> will use a hash function definition based on the service type (QP Type) and >>>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be >>>>> according to the source and destination QPN values. >>>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. >>>>> >>>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will >>>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in >>>>> hex: 0xC000-0xFFFF. >>>>> >>>>> The below calculations take into account the importance of producing a symmetric >>>>> hash result so we can support symmetric hash calculation of network elements. >>>>> >>>>> Hash Calculation for RDMA IP CM Service >>>>> ======================================= >>>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the >>>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM >>>>> REQ private data info and Service ID. >>>>> >>>>> Flow label hash function calculations definition will be defined as: >>>>> Extract the following fields from the CM IP REQ: >>>>> CM_REQ.ServiceID.DstPort [2 Bytes] >>>>> CM_REQ.PrivateData.SrcPort [2 Bytes] >>>>> u32 hash = DstPort * SrcPort; >>>>> hash ^= (hash >> 16); >>>>> hash ^= (hash >> 8); >>>>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>>>> >>>>> #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF >>>> >>>> Sorry it took me a while to respond to this, and thanks for looking >>>> into it since my comments on the previous proposal. I have a concern >>>> with an aspect of this one. >>>> >>>> The RoCEv2 destination port is a fixed value, 4791. Therefore the >>>> term >>>> >>>> u32 hash = DstPort * SrcPort; >>>> >>>> adds no entropy beyond the value of SrcPort. >>>> >>> >>> we're talking about the CM service ports, taken from the >>> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec); >>> these are the CM level port-space and not the RoCE UDP L4 ports. >>> we want to use both as these will allow different client instance and >>> server instance on same nodes will use differen CM ports and hopefully >>> generate different hash results for multi-flows between these two >>> servers. >> >> Aha, ok I guess I missed that, and ok. >> >>>> In turn, the subsequent >>>> >>>> hash ^= (hash >> 16); >>>> hash ^= (hash >> 8); >>>> >>>> are re-mashing the bits with one another, again, adding no entropy. >> >> I still wonder about this one. It's attempting to reduce the 32-bit >> product to 20 bits, but a second xor with the "middle" 16 bits seems >> really strange. Mathematically, wouldn't it be better to just take >> the modulus of 2^20? If not, are you expecting some behavior in the >> hash values that makes the double-xor approach better (in which case >> it should be called out)? >> >> Tom. > > The function takes into account creating a symmetric hash, so both > active and passive can reconstruct the same flow label results. That's > why we multiply the two CM Port values (16 bit * 16 bit). The results > is a 32 bit value, and we don't want to lose any of of the MSB bit's > by modulus or masking. So we need some folding function from 32 bit to > the 20 bit flow label. > > The specific bit shift is something I took from the bond driver: > https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407 > This proved very good in spreading the flow label in our internal > testing. Other alternative can be suggested, as long as it considers > all bits in the conversion 32->20 bits. I'm ok with it, but I still don't fully understand why the folding is necessary. The multiplication is the important part, and it is the operation that combines the two entropic inputs. The folding just flips bits from what's basically the same entropy source. IOW, I think that u32 hash = (DstPort * SrcPort) & IB_GRH_FLOWLABEL_MASK; would produce a completely equal benefit, mathematically. Tom. > Alex > >> >>>> Can you describe how, mathematically, this is not different from simply >>>> using the SrcPort field, and if so, how it contributes to the entropy >>>> differentiation of the incoming streams? >>>> >>>> Tom. >>>> >>>>> Result of the above hash will be kept in the CM's route path record connection >>>>> context and will be used all across its vitality for all preceding CM messages >>>>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..). >>>>> Once connection is established, the corresponding Connected RC QPs, on both >>>>> ends of the connection, will update their context with the calculated RDMA IP >>>>> CM Service based flow_label and UDP src_port values at the Connect phase of >>>>> the active side and Accept phase of the passive side of the connection. >>>>> >>>>> CM will provide to the calculated value of the flow_label hash (20 bit) result >>>>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct >>>>> ibv_ah_attr'. >>>>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when >>>>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp, >>>>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with >>>>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr). >>>>> >>>>> Hash Calculation for non-RDMA CM Service ID >>>>> =========================================== >>>>> For non CM QP's, the application can define the flow_label value in the >>>>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating >>>>> a AH for the datagram QP's (e.g.: UD). >>>>> >>>>> If the provided flow_label value is zero, not set by the application (e.g.: >>>>> legacy cases), then verbs providers should use the src.QP[24bit] and >>>>> dst.QP[24bit] as input arguments for flow_label calculation. >>>>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes >>>>> value. We'll define a flow_label value as: >>>>> DstQPn [3 Bytes] >>>>> SrcQPn [3 Bytes] >>>>> u64 hash = DstQPn * SrcQPn; >>>>> hash ^= (hash >> 20); >>>>> hash ^= (hash >> 40); >>>>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>>>> >>>>> Hash Calculation for UDP src_port >>>>> ================================= >>>>> Providers supporting RoCEv2 will use the 'flow_label' value as input to >>>>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the >>>>> AH context. >>>>> >>>>> UDP src_port calculations from flow label: >>>>> [while considering the 14 bits UDP port range according to IANA recommendation] >>>>> AH_ATTR.GRH.flow_label [20 bits] >>>>> u32 fl_low = fl & 0x03FFF; >>>>> u32 fl_high = fl & 0xFC000; >>>>> u16 udp_sport = fl_low XOR (fl_high >> 14); >>>>> RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT >>>>> >>>>> #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000 >>>>> >>>>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1] >>>>> >>>>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html >>>>> >>>>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com> >>>>> >>>>> >>> >>> > > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-12 15:47 ` Tom Talpey @ 2020-02-13 11:03 ` Alex Rosenbaum 2020-02-13 15:26 ` Tom Talpey 0 siblings, 1 reply; 24+ messages in thread From: Alex Rosenbaum @ 2020-02-13 11:03 UTC (permalink / raw) To: Tom Talpey Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On Wed, Feb 12, 2020 at 5:47 PM Tom Talpey <tom@talpey.com> wrote: > > On 2/8/2020 4:58 AM, Alex Rosenbaum wrote: > > On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote: > >> > >> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote: > >>> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote: > >>>> > >>>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: > >>>>> A combination of the flow_label field in the IPv6 header and UDP source port > >>>>> field in RoCE v2.0 are used to identify a group of packets that must be > >>>>> delivered in order by the network, end-to-end. > >>>>> These fields are used to create entropy for network routers (ECMP), load > >>>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB > >>>>> headers. > >>>>> > >>>>> The flow_label field is defined by a 20 bit hash value. CM based connections > >>>>> will use a hash function definition based on the service type (QP Type) and > >>>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be > >>>>> according to the source and destination QPN values. > >>>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. > >>>>> > >>>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will > >>>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in > >>>>> hex: 0xC000-0xFFFF. > >>>>> > >>>>> The below calculations take into account the importance of producing a symmetric > >>>>> hash result so we can support symmetric hash calculation of network elements. > >>>>> > >>>>> Hash Calculation for RDMA IP CM Service > >>>>> ======================================= > >>>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the > >>>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM > >>>>> REQ private data info and Service ID. > >>>>> > >>>>> Flow label hash function calculations definition will be defined as: > >>>>> Extract the following fields from the CM IP REQ: > >>>>> CM_REQ.ServiceID.DstPort [2 Bytes] > >>>>> CM_REQ.PrivateData.SrcPort [2 Bytes] > >>>>> u32 hash = DstPort * SrcPort; > >>>>> hash ^= (hash >> 16); > >>>>> hash ^= (hash >> 8); > >>>>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > >>>>> > >>>>> #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF > >>>> > >>>> Sorry it took me a while to respond to this, and thanks for looking > >>>> into it since my comments on the previous proposal. I have a concern > >>>> with an aspect of this one. > >>>> > >>>> The RoCEv2 destination port is a fixed value, 4791. Therefore the > >>>> term > >>>> > >>>> u32 hash = DstPort * SrcPort; > >>>> > >>>> adds no entropy beyond the value of SrcPort. > >>>> > >>> > >>> we're talking about the CM service ports, taken from the > >>> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec); > >>> these are the CM level port-space and not the RoCE UDP L4 ports. > >>> we want to use both as these will allow different client instance and > >>> server instance on same nodes will use differen CM ports and hopefully > >>> generate different hash results for multi-flows between these two > >>> servers. > >> > >> Aha, ok I guess I missed that, and ok. > >> > >>>> In turn, the subsequent > >>>> > >>>> hash ^= (hash >> 16); > >>>> hash ^= (hash >> 8); > >>>> > >>>> are re-mashing the bits with one another, again, adding no entropy. > >> > >> I still wonder about this one. It's attempting to reduce the 32-bit > >> product to 20 bits, but a second xor with the "middle" 16 bits seems > >> really strange. Mathematically, wouldn't it be better to just take > >> the modulus of 2^20? If not, are you expecting some behavior in the > >> hash values that makes the double-xor approach better (in which case > >> it should be called out)? > >> > >> Tom. > > > > The function takes into account creating a symmetric hash, so both > > active and passive can reconstruct the same flow label results. That's > > why we multiply the two CM Port values (16 bit * 16 bit). The results > > is a 32 bit value, and we don't want to lose any of of the MSB bit's > > by modulus or masking. So we need some folding function from 32 bit to > > the 20 bit flow label. > > > > The specific bit shift is something I took from the bond driver: > > https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407 > > This proved very good in spreading the flow label in our internal > > testing. Other alternative can be suggested, as long as it considers > > all bits in the conversion 32->20 bits. > > I'm ok with it, but I still don't fully understand why the folding > is necessary. The multiplication is the important part, and it is > the operation that combines the two entropic inputs. The folding just > flips bits from what's basically the same entropy source. > > IOW, I think that > > u32 hash = (DstPort * SrcPort) & IB_GRH_FLOWLABEL_MASK; > > would produce a completely equal benefit, mathematically. > Tom. > If both src & dst ports are in the high value range you loss those hash bits in the masking. If src & dst port are both 0xE000, your masked hash equals 0. You'll get the same hash if both ports are equal 0xF000. The idea with the bit shift is to take the MSB hash bits (left from the 0XFFFFF mask) and fold them with the LSB in some way. Alex ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-13 11:03 ` Alex Rosenbaum @ 2020-02-13 15:26 ` Tom Talpey 2020-02-13 15:41 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Tom Talpey @ 2020-02-13 15:26 UTC (permalink / raw) To: Alex Rosenbaum Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On 2/13/2020 6:03 AM, Alex Rosenbaum wrote: > On Wed, Feb 12, 2020 at 5:47 PM Tom Talpey <tom@talpey.com> wrote: >> >> On 2/8/2020 4:58 AM, Alex Rosenbaum wrote: >>> On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote: >>>> >>>> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote: >>>>> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote: >>>>>> >>>>>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: >>>>>>> A combination of the flow_label field in the IPv6 header and UDP source port >>>>>>> field in RoCE v2.0 are used to identify a group of packets that must be >>>>>>> delivered in order by the network, end-to-end. >>>>>>> These fields are used to create entropy for network routers (ECMP), load >>>>>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB >>>>>>> headers. >>>>>>> >>>>>>> The flow_label field is defined by a 20 bit hash value. CM based connections >>>>>>> will use a hash function definition based on the service type (QP Type) and >>>>>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be >>>>>>> according to the source and destination QPN values. >>>>>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. >>>>>>> >>>>>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will >>>>>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in >>>>>>> hex: 0xC000-0xFFFF. >>>>>>> >>>>>>> The below calculations take into account the importance of producing a symmetric >>>>>>> hash result so we can support symmetric hash calculation of network elements. >>>>>>> >>>>>>> Hash Calculation for RDMA IP CM Service >>>>>>> ======================================= >>>>>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the >>>>>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM >>>>>>> REQ private data info and Service ID. >>>>>>> >>>>>>> Flow label hash function calculations definition will be defined as: >>>>>>> Extract the following fields from the CM IP REQ: >>>>>>> CM_REQ.ServiceID.DstPort [2 Bytes] >>>>>>> CM_REQ.PrivateData.SrcPort [2 Bytes] >>>>>>> u32 hash = DstPort * SrcPort; >>>>>>> hash ^= (hash >> 16); >>>>>>> hash ^= (hash >> 8); >>>>>>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>>>>>> >>>>>>> #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF >>>>>> >>>>>> Sorry it took me a while to respond to this, and thanks for looking >>>>>> into it since my comments on the previous proposal. I have a concern >>>>>> with an aspect of this one. >>>>>> >>>>>> The RoCEv2 destination port is a fixed value, 4791. Therefore the >>>>>> term >>>>>> >>>>>> u32 hash = DstPort * SrcPort; >>>>>> >>>>>> adds no entropy beyond the value of SrcPort. >>>>>> >>>>> >>>>> we're talking about the CM service ports, taken from the >>>>> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec); >>>>> these are the CM level port-space and not the RoCE UDP L4 ports. >>>>> we want to use both as these will allow different client instance and >>>>> server instance on same nodes will use differen CM ports and hopefully >>>>> generate different hash results for multi-flows between these two >>>>> servers. >>>> >>>> Aha, ok I guess I missed that, and ok. >>>> >>>>>> In turn, the subsequent >>>>>> >>>>>> hash ^= (hash >> 16); >>>>>> hash ^= (hash >> 8); >>>>>> >>>>>> are re-mashing the bits with one another, again, adding no entropy. >>>> >>>> I still wonder about this one. It's attempting to reduce the 32-bit >>>> product to 20 bits, but a second xor with the "middle" 16 bits seems >>>> really strange. Mathematically, wouldn't it be better to just take >>>> the modulus of 2^20? If not, are you expecting some behavior in the >>>> hash values that makes the double-xor approach better (in which case >>>> it should be called out)? >>>> >>>> Tom. >>> >>> The function takes into account creating a symmetric hash, so both >>> active and passive can reconstruct the same flow label results. That's >>> why we multiply the two CM Port values (16 bit * 16 bit). The results >>> is a 32 bit value, and we don't want to lose any of of the MSB bit's >>> by modulus or masking. So we need some folding function from 32 bit to >>> the 20 bit flow label. >>> >>> The specific bit shift is something I took from the bond driver: >>> https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407 >>> This proved very good in spreading the flow label in our internal >>> testing. Other alternative can be suggested, as long as it considers >>> all bits in the conversion 32->20 bits. >> >> I'm ok with it, but I still don't fully understand why the folding >> is necessary. The multiplication is the important part, and it is >> the operation that combines the two entropic inputs. The folding just >> flips bits from what's basically the same entropy source. >> >> IOW, I think that >> >> u32 hash = (DstPort * SrcPort) & IB_GRH_FLOWLABEL_MASK; >> >> would produce a completely equal benefit, mathematically. >> Tom. >> > > If both src & dst ports are in the high value range you loss those > hash bits in the masking. > If src & dst port are both 0xE000, your masked hash equals 0. You'll > get the same hash if both ports are equal 0xF000. Sure, but this is because it's a 20-bit hash of a 32-bit object. There will always be collisions, this is just one example. My concern is the statistical spread of the results. I argue it's not changed by the proposed bit-folding, possibly even damaged. > The idea with the bit shift is to take the MSB hash bits (left from > the 0XFFFFF mask) and fold them with the LSB in some way. I get that, but it's only folding the "one" bits, and it's doing so in a rather primitive way. For example, the ">> 8" term is folding the high 4 of 20 bits twice - once in the >> 16 and again in the >> 8. This value is only computed once, at QP creation, correct? Why not compute a CRC-20, for example? Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-13 15:26 ` Tom Talpey @ 2020-02-13 15:41 ` Jason Gunthorpe 2020-02-14 14:23 ` Mark Zhang 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-02-13 15:41 UTC (permalink / raw) To: Tom Talpey Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: > > If both src & dst ports are in the high value range you loss those > > hash bits in the masking. > > If src & dst port are both 0xE000, your masked hash equals 0. You'll > > get the same hash if both ports are equal 0xF000. > > Sure, but this is because it's a 20-bit hash of a 32-bit object. There > will always be collisions, this is just one example. My concern is the > statistical spread of the results. I argue it's not changed by the > proposed bit-folding, possibly even damaged. I've always thought that 'folding' by modulo results in an abnormal statistical distribution The point here is not collisions but to have a hash distribution which is generally uniform for the input space. Alex, it would be good to make a quick program to measure the uniformity of the distribution.. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-13 15:41 ` Jason Gunthorpe @ 2020-02-14 14:23 ` Mark Zhang 2020-02-15 6:27 ` Mark Zhang 0 siblings, 1 reply; 24+ messages in thread From: Mark Zhang @ 2020-02-14 14:23 UTC (permalink / raw) To: Jason Gunthorpe, Tom Talpey Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky On 2/13/2020 11:41 PM, Jason Gunthorpe wrote: > On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: > >>> If both src & dst ports are in the high value range you loss those >>> hash bits in the masking. >>> If src & dst port are both 0xE000, your masked hash equals 0. You'll >>> get the same hash if both ports are equal 0xF000. >> >> Sure, but this is because it's a 20-bit hash of a 32-bit object. There >> will always be collisions, this is just one example. My concern is the >> statistical spread of the results. I argue it's not changed by the >> proposed bit-folding, possibly even damaged. > > I've always thought that 'folding' by modulo results in an abnormal > statistical distribution > > The point here is not collisions but to have a hash distribution which > is generally uniform for the input space. > > Alex, it would be good to make a quick program to measure the > uniformity of the distribution.. > Hi, I did some tests with a quick program (hope it's not buggy...), seems the hash without "folding" has a better distribution than hash with fold. The "hash quality" is reflected by the "total_access"[1] below. I tested only with cma_dport from 18515 (ib_write_bw default) to 18524. I can do more tests if required, for example use multiple cma_dport in one statistic. [1] https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays $ ./a max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; Lower is better; min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; Likely min is always 0 total_access: The sum of all 'accesses' (for each slot: accesses=n*(n+1)/2); Lower is better n[X]: How many slots that has X items cm source port range [32768, 65534], dest port 18515: Hash with folding: flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] = 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31 n[8] = 5 n[9] = 2 n[10] = 1 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] = 7926 n[3] = 530 n[4] = 3698 cm source port range [32768, 65534], dest port 18516: Hash with folding: flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] = 770 n[3] = 4 udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] = 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20 n[8] = 4 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 cm source port range [32768, 65534], dest port 18517: Hash with folding: flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = 258 udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] = 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94 n[8] = 22 n[9] = 5 n[10] = 2 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] = 10746 n[3] = 2818 cm source port range [32768, 65534], dest port 18518: Hash with folding: flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = 350 udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] = 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16 n[8] = 2 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 cm source port range [32768, 65534], dest port 18519: Hash with folding: flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] = 469 n[3] = 4 udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] = 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32 n[8] = 4 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] = 6360 n[3] = 3660 n[4] = 1351 cm source port range [32768, 65534], dest port 18520: Hash with folding: flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] = 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2 udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] = 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483 n[8] = 223 n[9] = 87 n[10] = 27 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] = 8190 cm source port range [32768, 65534], dest port 18521: Hash with folding: flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = 421 udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] = 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43 n[8] = 6 n[9] = 2 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] = 16338 n[3] = 22 cm source port range [32768, 65534], dest port 18522: Hash with folding: flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] = 283 n[3] = 1 udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] = 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29 n[8] = 2 n[9] = 2 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 cm source port range [32768, 65534], dest port 18523: Hash with folding: flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] = 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50 n[8] = 10 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] = 6640 n[3] = 3094 n[4] = 1777 cm source port range [32768, 65534], dest port 18524: Hash with folding: flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] = 696 n[3] = 4 udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] = 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14 n[8] = 1 Hash without folding: flow_label: max 1 min 0 total_access 32766 n[1] = 32766 udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 > Jason > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-14 14:23 ` Mark Zhang @ 2020-02-15 6:27 ` Mark Zhang 2020-02-18 14:16 ` Tom Talpey 0 siblings, 1 reply; 24+ messages in thread From: Mark Zhang @ 2020-02-15 6:27 UTC (permalink / raw) To: Jason Gunthorpe, Tom Talpey Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky On 2/14/2020 10:23 PM, Mark Zhang wrote: > On 2/13/2020 11:41 PM, Jason Gunthorpe wrote: >> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: >> >>>> If both src & dst ports are in the high value range you loss those >>>> hash bits in the masking. >>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll >>>> get the same hash if both ports are equal 0xF000. >>> >>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There >>> will always be collisions, this is just one example. My concern is the >>> statistical spread of the results. I argue it's not changed by the >>> proposed bit-folding, possibly even damaged. >> >> I've always thought that 'folding' by modulo results in an abnormal >> statistical distribution >> >> The point here is not collisions but to have a hash distribution which >> is generally uniform for the input space. >> >> Alex, it would be good to make a quick program to measure the >> uniformity of the distribution.. >> > > Hi, > > I did some tests with a quick program (hope it's not buggy...), seems > the hash without "folding" has a better distribution than hash with > fold. The "hash quality" is reflected by the "total_access"[1] below. > > I tested only with cma_dport from 18515 (ib_write_bw default) to 18524. > I can do more tests if required, for example use multiple cma_dport in > one statistic. > > > [1] > https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays > > > $ ./a > > max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; > Lower is better; > min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; > Likely min is always 0 > total_access: The sum of all 'accesses' (for each slot: > accesses=n*(n+1)/2); Lower is better > n[X]: How many slots that has X items > > cm source port range [32768, 65534], dest port 18515: > Hash with folding: > flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 > udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] = > 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31 n[8] > = 5 n[9] = 2 n[10] = 1 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] = > 7926 n[3] = 530 n[4] = 3698 > > > cm source port range [32768, 65534], dest port 18516: > Hash with folding: > flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] = 770 > n[3] = 4 > udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] = > 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20 n[8] > = 4 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 > > > cm source port range [32768, 65534], dest port 18517: > Hash with folding: > flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = 258 > udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] = > 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94 n[8] > = 22 n[9] = 5 n[10] = 2 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] = > 10746 n[3] = 2818 > > > cm source port range [32768, 65534], dest port 18518: > Hash with folding: > flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = 350 > udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] = > 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16 n[8] > = 2 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 > > > cm source port range [32768, 65534], dest port 18519: > Hash with folding: > flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] = 469 > n[3] = 4 > udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] = > 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32 n[8] > = 4 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] = > 6360 n[3] = 3660 n[4] = 1351 > > > cm source port range [32768, 65534], dest port 18520: > Hash with folding: > flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] = > 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2 > udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] = > 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483 > n[8] = 223 n[9] = 87 n[10] = 27 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] = 8190 > > > cm source port range [32768, 65534], dest port 18521: > Hash with folding: > flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = 421 > udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] = > 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43 n[8] > = 6 n[9] = 2 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] = > 16338 n[3] = 22 > > > cm source port range [32768, 65534], dest port 18522: > Hash with folding: > flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] = 283 > n[3] = 1 > udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] = > 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29 n[8] > = 2 n[9] = 2 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 > > > cm source port range [32768, 65534], dest port 18523: > Hash with folding: > flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 > udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] = > 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50 n[8] > = 10 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] = > 6640 n[3] = 3094 n[4] = 1777 > > > cm source port range [32768, 65534], dest port 18524: > Hash with folding: > flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] = 696 > n[3] = 4 > udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] = > 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14 n[8] > = 1 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 n[1] = 32766 > udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = 16382 > > Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, 0x800, ... 0xFE00), the hash distribution is tens of times worse then others. For examples when dport is 18431 and 18432: cm source port range [32768, 65534], dest port 18431: Hash with folding: flow_label: max 2 min 0 total_access 32766 udp_sport: max 8 min 0 total_access 50410 Hash without folding: flow_label: max 1 min 0 total_access 32766 udp_sport: max 4 min 0 total_access 48126 cm source port range [32768, 65534], dest port 18432(0x4800): Hash with folding: flow_label: max 133 min 0 total_access 1072938 udp_sport: max 203 min 0 total_access 2126644 Hash without folding: flow_label: max 64 min 0 total_access 1048450 udp_sport: max 1024 min 0 total_access 16775170 > >> Jason >> > ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-15 6:27 ` Mark Zhang @ 2020-02-18 14:16 ` Tom Talpey 2020-02-18 17:41 ` Tom Talpey 0 siblings, 1 reply; 24+ messages in thread From: Tom Talpey @ 2020-02-18 14:16 UTC (permalink / raw) To: Mark Zhang, Jason Gunthorpe Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky On 2/15/2020 1:27 AM, Mark Zhang wrote: > On 2/14/2020 10:23 PM, Mark Zhang wrote: >> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote: >>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: >>> >>>>> If both src & dst ports are in the high value range you loss those >>>>> hash bits in the masking. >>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll >>>>> get the same hash if both ports are equal 0xF000. >>>> >>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There >>>> will always be collisions, this is just one example. My concern is the >>>> statistical spread of the results. I argue it's not changed by the >>>> proposed bit-folding, possibly even damaged. >>> >>> I've always thought that 'folding' by modulo results in an abnormal >>> statistical distribution >>> >>> The point here is not collisions but to have a hash distribution which >>> is generally uniform for the input space. >>> >>> Alex, it would be good to make a quick program to measure the >>> uniformity of the distribution.. >>> >> >> Hi, >> >> I did some tests with a quick program (hope it's not buggy...), seems >> the hash without "folding" has a better distribution than hash with >> fold. The "hash quality" is reflected by the "total_access"[1] below. >> >> I tested only with cma_dport from 18515 (ib_write_bw default) to >> 18524. I can do more tests if required, for example use multiple >> cma_dport in one statistic. >> >> >> [1] >> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays >> >> >> $ ./a >> >> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; >> Lower is better; >> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; >> Likely min is always 0 >> total_access: The sum of all 'accesses' (for each slot: >> accesses=n*(n+1)/2); Lower is better >> n[X]: How many slots that has X items >> >> cm source port range [32768, 65534], dest port 18515: >> Hash with folding: >> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 >> udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] = >> 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31 >> n[8] = 5 n[9] = 2 n[10] = 1 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] = >> 7926 n[3] = 530 n[4] = 3698 >> >> >> cm source port range [32768, 65534], dest port 18516: >> Hash with folding: >> flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] = >> 770 n[3] = 4 >> udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] = >> 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20 >> n[8] = 4 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >> 16382 >> >> >> cm source port range [32768, 65534], dest port 18517: >> Hash with folding: >> flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = 258 >> udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] = >> 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94 >> n[8] = 22 n[9] = 5 n[10] = 2 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] = >> 10746 n[3] = 2818 >> >> >> cm source port range [32768, 65534], dest port 18518: >> Hash with folding: >> flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = 350 >> udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] = >> 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16 >> n[8] = 2 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >> 16382 >> >> >> cm source port range [32768, 65534], dest port 18519: >> Hash with folding: >> flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] = >> 469 n[3] = 4 >> udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] = >> 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32 >> n[8] = 4 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] = >> 6360 n[3] = 3660 n[4] = 1351 >> >> >> cm source port range [32768, 65534], dest port 18520: >> Hash with folding: >> flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] = >> 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2 >> udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] = >> 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483 >> n[8] = 223 n[9] = 87 n[10] = 27 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] = 8190 >> >> >> cm source port range [32768, 65534], dest port 18521: >> Hash with folding: >> flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = 421 >> udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] = >> 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43 >> n[8] = 6 n[9] = 2 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] = >> 16338 n[3] = 22 >> >> >> cm source port range [32768, 65534], dest port 18522: >> Hash with folding: >> flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] = >> 283 n[3] = 1 >> udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] = >> 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29 >> n[8] = 2 n[9] = 2 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >> 16382 >> >> >> cm source port range [32768, 65534], dest port 18523: >> Hash with folding: >> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 >> udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] = >> 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50 >> n[8] = 10 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] = >> 6640 n[3] = 3094 n[4] = 1777 >> >> >> cm source port range [32768, 65534], dest port 18524: >> Hash with folding: >> flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] = >> 696 n[3] = 4 >> udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] = >> 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14 >> n[8] = 1 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >> 16382 >> >> > > Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, > 0x800, ... 0xFE00), the hash distribution is tens of times worse then > others. For examples when dport is 18431 and 18432: > > cm source port range [32768, 65534], dest port 18431: > Hash with folding: > flow_label: max 2 min 0 total_access 32766 > udp_sport: max 8 min 0 total_access 50410 > Hash without folding: > flow_label: max 1 min 0 total_access 32766 > udp_sport: max 4 min 0 total_access 48126 > > cm source port range [32768, 65534], dest port 18432(0x4800): > Hash with folding: > flow_label: max 133 min 0 total_access 1072938 > > udp_sport: max 203 min 0 total_access 2126644 > > Hash without folding: > flow_label: max 64 min 0 total_access 1048450 > > udp_sport: max 1024 min 0 total_access 16775170 Good data! It certainly indicates an issue with the simple binary modulus for treuncating 32->20 bits. But the extremely narrow testing range limits the conclusions considerably: >> I tested only with cma_dport from 18515 (ib_write_bw default) to >> 18524. I can do more tests if required, for example use multiple >> cma_dport in one statistic. This hash is intended to provide entropy across the entire port range and we should evaluate it as such. At a minimum, the source port can vary much more widely, from Alex's original message it's 0xC000 - 0xFFFF. > UDP source port selection must adhere IANA port allocation ranges. Thus we will > be using IANA recommendation for Ephemeral port range of: 49152-65535, or in > hex: 0xC000-0xFFFF. I'm not certain what the range of the destination port might be, but as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF. Any chance you could scale up your test, to measure the original proposed hash across these broader ranges? > u32 hash = DstPort * SrcPort; > hash ^= (hash >> 16); > hash ^= (hash >> 8); > AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-18 14:16 ` Tom Talpey @ 2020-02-18 17:41 ` Tom Talpey 2020-02-19 1:51 ` Mark Zhang 0 siblings, 1 reply; 24+ messages in thread From: Tom Talpey @ 2020-02-18 17:41 UTC (permalink / raw) To: Mark Zhang, Jason Gunthorpe Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky [-- Attachment #1: Type: text/plain, Size: 12014 bytes --] On 2/18/2020 9:16 AM, Tom Talpey wrote: > On 2/15/2020 1:27 AM, Mark Zhang wrote: >> On 2/14/2020 10:23 PM, Mark Zhang wrote: >>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote: >>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: >>>> >>>>>> If both src & dst ports are in the high value range you loss those >>>>>> hash bits in the masking. >>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll >>>>>> get the same hash if both ports are equal 0xF000. >>>>> >>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There >>>>> will always be collisions, this is just one example. My concern is the >>>>> statistical spread of the results. I argue it's not changed by the >>>>> proposed bit-folding, possibly even damaged. >>>> >>>> I've always thought that 'folding' by modulo results in an abnormal >>>> statistical distribution >>>> >>>> The point here is not collisions but to have a hash distribution which >>>> is generally uniform for the input space. >>>> >>>> Alex, it would be good to make a quick program to measure the >>>> uniformity of the distribution.. >>>> >>> >>> Hi, >>> >>> I did some tests with a quick program (hope it's not buggy...), seems >>> the hash without "folding" has a better distribution than hash with >>> fold. The "hash quality" is reflected by the "total_access"[1] below. >>> >>> I tested only with cma_dport from 18515 (ib_write_bw default) to >>> 18524. I can do more tests if required, for example use multiple >>> cma_dport in one statistic. >>> >>> >>> [1] >>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays >>> >>> >>> $ ./a >>> >>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; >>> Lower is better; >>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; >>> Likely min is always 0 >>> total_access: The sum of all 'accesses' (for each slot: >>> accesses=n*(n+1)/2); Lower is better >>> n[X]: How many slots that has X items >>> >>> cm source port range [32768, 65534], dest port 18515: >>> Hash with folding: >>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 >>> udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] = >>> 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31 >>> n[8] = 5 n[9] = 2 n[10] = 1 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] = >>> 7926 n[3] = 530 n[4] = 3698 >>> >>> >>> cm source port range [32768, 65534], dest port 18516: >>> Hash with folding: >>> flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] = >>> 770 n[3] = 4 >>> udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] = >>> 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20 >>> n[8] = 4 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>> 16382 >>> >>> >>> cm source port range [32768, 65534], dest port 18517: >>> Hash with folding: >>> flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = 258 >>> udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] = >>> 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94 >>> n[8] = 22 n[9] = 5 n[10] = 2 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] = >>> 10746 n[3] = 2818 >>> >>> >>> cm source port range [32768, 65534], dest port 18518: >>> Hash with folding: >>> flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = 350 >>> udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] = >>> 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16 >>> n[8] = 2 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>> 16382 >>> >>> >>> cm source port range [32768, 65534], dest port 18519: >>> Hash with folding: >>> flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] = >>> 469 n[3] = 4 >>> udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] = >>> 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32 >>> n[8] = 4 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] = >>> 6360 n[3] = 3660 n[4] = 1351 >>> >>> >>> cm source port range [32768, 65534], dest port 18520: >>> Hash with folding: >>> flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] = >>> 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2 >>> udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] = >>> 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483 >>> n[8] = 223 n[9] = 87 n[10] = 27 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] = 8190 >>> >>> >>> cm source port range [32768, 65534], dest port 18521: >>> Hash with folding: >>> flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = 421 >>> udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] = >>> 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43 >>> n[8] = 6 n[9] = 2 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] = >>> 16338 n[3] = 22 >>> >>> >>> cm source port range [32768, 65534], dest port 18522: >>> Hash with folding: >>> flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] = >>> 283 n[3] = 1 >>> udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] = >>> 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29 >>> n[8] = 2 n[9] = 2 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>> 16382 >>> >>> >>> cm source port range [32768, 65534], dest port 18523: >>> Hash with folding: >>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126 >>> udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] = >>> 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50 >>> n[8] = 10 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] = >>> 6640 n[3] = 3094 n[4] = 1777 >>> >>> >>> cm source port range [32768, 65534], dest port 18524: >>> Hash with folding: >>> flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] = >>> 696 n[3] = 4 >>> udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] = >>> 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14 >>> n[8] = 1 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>> 16382 >>> >>> >> >> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, >> 0x800, ... 0xFE00), the hash distribution is tens of times worse then >> others. For examples when dport is 18431 and 18432: >> >> cm source port range [32768, 65534], dest port 18431: >> Hash with folding: >> flow_label: max 2 min 0 total_access 32766 >> udp_sport: max 8 min 0 total_access 50410 >> Hash without folding: >> flow_label: max 1 min 0 total_access 32766 >> udp_sport: max 4 min 0 total_access 48126 >> >> cm source port range [32768, 65534], dest port 18432(0x4800): >> Hash with folding: >> flow_label: max 133 min 0 total_access 1072938 >> >> udp_sport: max 203 min 0 total_access 2126644 >> >> Hash without folding: >> flow_label: max 64 min 0 total_access 1048450 >> >> udp_sport: max 1024 min 0 total_access 16775170 > > Good data! It certainly indicates an issue with the simple > binary modulus for treuncating 32->20 bits. But the extremely > narrow testing range limits the conclusions considerably: > > >> I tested only with cma_dport from 18515 (ib_write_bw default) to > >> 18524. I can do more tests if required, for example use multiple > >> cma_dport in one statistic. > > This hash is intended to provide entropy across the entire port > range and we should evaluate it as such. At a minimum, the source > port can vary much more widely, from Alex's original message it's > 0xC000 - 0xFFFF. > >> UDP source port selection must adhere IANA port allocation ranges. >> Thus we will >> be using IANA recommendation for Ephemeral port range of: 49152-65535, >> or in >> hex: 0xC000-0xFFFF. > > I'm not certain what the range of the destination port might be, but > as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF. > > Any chance you could scale up your test, to measure the original > proposed hash across these broader ranges? > >> u32 hash = DstPort * SrcPort; >> hash ^= (hash >> 16); >> hash ^= (hash >> 8); >> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; I did an even quicker-and-dirtier test, with the attached. Both the folding and non-folding methods display, to me, pretty much the same behavior. And there's a fairly significant periodicity with a doubling of the hash collision rate, every 8 or so buckets. The "folding" version has higher spikes at these points than the non-folding, in fact. As you mentioned, there are a few more "zero" hashes, but that's expected, and not that different for both. Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there are 800M possible permutations, and of course 1M hash buckets. So, an 800:1 collision rate is expected. But the numbers range from the mid-300's to several-1000's. That variance seems high to me. I really think there needs to be a flatter spectrum, here. These collisions can cause significant congestion effects at scale. I suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going to take me a little time to find that. > Folding hash > bucket hits > 0 3840 > 1 407 > 2 798 > 3 426 > 4 1137 > 5 409 > 6 711 > 7 372 > 8 1595 > 9 349 > 10 751 > 11 385 > 12 1164 > 13 375 > 14 747 > 15 406 > 16 1952 > 17 382 > 18 766 > 19 390 > 20 1139 > 21 372 > 22 792 > 23 419 > 24 1543 > 25 393 > 26 777 > 27 403 > 28 1123 > 29 356 > 30 773 > 31 363 > 32 2340 > 33 397 > 34 785 > 35 393 > 36 1154 > 37 415 > 38 744 Versus... > Non-folding hash > bucket hits > 0 4469 > 1 480 > 2 684 > 3 567 > 4 990 > 5 465 > 6 697 > 7 650 > 8 1279 > 9 453 > 10 671 > 11 556 > 12 989 > 13 499 > 14 653 > 15 812 > 16 1603 > 17 478 > 18 694 > 19 559 > 20 1015 > 21 506 > 22 675 > 23 659 > 24 1317 > 25 476 > 26 644 > 27 555 > 28 953 > 29 475 > 30 738 > 31 927 > 32 2047 > 33 456 > 34 726 > 35 537 > 36 952 > 37 472 > 38 665 Tom. [-- Attachment #2: hashtest.c --] [-- Type: text/plain, Size: 755 bytes --] #include <stdio.h> int data[1024 * 1024]; int main(int argc, char **argv) { unsigned short src, dst; unsigned long hash; printf("%s hash\nbucket\thits\n", argc > 1 ? "Non-folding" : "Folding"); for (src = 1; src < 0xBFFF; src++) for (dst = 0xC000; dst <= 0xFFFE; dst++) { hash = src * dst; if (argc > 1) { hash ^= hash >> 16; hash ^= hash >> 8; } hash &= 0xFFFFF; data[hash]++; } int i; for (i = 0; i < 1024 * 1024; i++) printf("%d\t%d\n", i, data[i]); return 0; } ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-18 17:41 ` Tom Talpey @ 2020-02-19 1:51 ` Mark Zhang 2020-02-19 2:01 ` Tom Talpey 0 siblings, 1 reply; 24+ messages in thread From: Mark Zhang @ 2020-02-19 1:51 UTC (permalink / raw) To: Tom Talpey, Jason Gunthorpe Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On 2/19/2020 1:41 AM, Tom Talpey wrote: > On 2/18/2020 9:16 AM, Tom Talpey wrote: >> On 2/15/2020 1:27 AM, Mark Zhang wrote: >>> On 2/14/2020 10:23 PM, Mark Zhang wrote: >>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote: >>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: >>>>> >>>>>>> If both src & dst ports are in the high value range you loss those >>>>>>> hash bits in the masking. >>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll >>>>>>> get the same hash if both ports are equal 0xF000. >>>>>> >>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. >>>>>> There >>>>>> will always be collisions, this is just one example. My concern is >>>>>> the >>>>>> statistical spread of the results. I argue it's not changed by the >>>>>> proposed bit-folding, possibly even damaged. >>>>> >>>>> I've always thought that 'folding' by modulo results in an abnormal >>>>> statistical distribution >>>>> >>>>> The point here is not collisions but to have a hash distribution which >>>>> is generally uniform for the input space. >>>>> >>>>> Alex, it would be good to make a quick program to measure the >>>>> uniformity of the distribution.. >>>>> >>>> >>>> Hi, >>>> >>>> I did some tests with a quick program (hope it's not buggy...), >>>> seems the hash without "folding" has a better distribution than hash >>>> with fold. The "hash quality" is reflected by the "total_access"[1] >>>> below. >>>> >>>> I tested only with cma_dport from 18515 (ib_write_bw default) to >>>> 18524. I can do more tests if required, for example use multiple >>>> cma_dport in one statistic. >>>> >>>> >>>> [1] >>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays >>>> >>>> >>>> $ ./a >>>> >>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; >>>> Lower is better; >>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; >>>> Likely min is always 0 >>>> total_access: The sum of all 'accesses' (for each slot: >>>> accesses=n*(n+1)/2); Lower is better >>>> n[X]: How many slots that has X items >>>> >>>> cm source port range [32768, 65534], dest port 18515: >>>> Hash with folding: >>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = >>>> 126 >>>> udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] = >>>> 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31 >>>> n[8] = 5 n[9] = 2 n[10] = 1 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] = >>>> 7926 n[3] = 530 n[4] = 3698 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18516: >>>> Hash with folding: >>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] = >>>> 770 n[3] = 4 >>>> udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] = >>>> 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20 >>>> n[8] = 4 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>> 16382 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18517: >>>> Hash with folding: >>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = >>>> 258 >>>> udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] = >>>> 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94 >>>> n[8] = 22 n[9] = 5 n[10] = 2 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] = >>>> 10746 n[3] = 2818 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18518: >>>> Hash with folding: >>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = >>>> 350 >>>> udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] = >>>> 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16 >>>> n[8] = 2 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>> 16382 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18519: >>>> Hash with folding: >>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] = >>>> 469 n[3] = 4 >>>> udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] = >>>> 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32 >>>> n[8] = 4 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] = >>>> 6360 n[3] = 3660 n[4] = 1351 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18520: >>>> Hash with folding: >>>> flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] = >>>> 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2 >>>> udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] = >>>> 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483 >>>> n[8] = 223 n[9] = 87 n[10] = 27 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] = 8190 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18521: >>>> Hash with folding: >>>> flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = >>>> 421 >>>> udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] = >>>> 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43 >>>> n[8] = 6 n[9] = 2 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] = >>>> 16338 n[3] = 22 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18522: >>>> Hash with folding: >>>> flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] = >>>> 283 n[3] = 1 >>>> udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] = >>>> 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29 >>>> n[8] = 2 n[9] = 2 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>> 16382 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18523: >>>> Hash with folding: >>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = >>>> 126 >>>> udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] = >>>> 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50 >>>> n[8] = 10 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] = >>>> 6640 n[3] = 3094 n[4] = 1777 >>>> >>>> >>>> cm source port range [32768, 65534], dest port 18524: >>>> Hash with folding: >>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] = >>>> 696 n[3] = 4 >>>> udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] = >>>> 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14 >>>> n[8] = 1 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>> 16382 >>>> >>>> >>> >>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, >>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then >>> others. For examples when dport is 18431 and 18432: >>> >>> cm source port range [32768, 65534], dest port 18431: >>> Hash with folding: >>> flow_label: max 2 min 0 total_access 32766 >>> udp_sport: max 8 min 0 total_access 50410 >>> Hash without folding: >>> flow_label: max 1 min 0 total_access 32766 >>> udp_sport: max 4 min 0 total_access 48126 >>> >>> cm source port range [32768, 65534], dest port 18432(0x4800): >>> Hash with folding: >>> flow_label: max 133 min 0 total_access 1072938 >>> >>> udp_sport: max 203 min 0 total_access 2126644 >>> >>> Hash without folding: >>> flow_label: max 64 min 0 total_access 1048450 >>> >>> udp_sport: max 1024 min 0 total_access 16775170 >> >> Good data! It certainly indicates an issue with the simple >> binary modulus for treuncating 32->20 bits. But the extremely >> narrow testing range limits the conclusions considerably: >> >> >> I tested only with cma_dport from 18515 (ib_write_bw default) to >> >> 18524. I can do more tests if required, for example use multiple >> >> cma_dport in one statistic. >> >> This hash is intended to provide entropy across the entire port >> range and we should evaluate it as such. At a minimum, the source >> port can vary much more widely, from Alex's original message it's >> 0xC000 - 0xFFFF. >> >>> UDP source port selection must adhere IANA port allocation ranges. >>> Thus we will >>> be using IANA recommendation for Ephemeral port range of: >>> 49152-65535, or in >>> hex: 0xC000-0xFFFF. >> >> I'm not certain what the range of the destination port might be, but >> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF. >> >> Any chance you could scale up your test, to measure the original >> proposed hash across these broader ranges? >> >>> u32 hash = DstPort * SrcPort; >>> hash ^= (hash >> 16); >>> hash ^= (hash >> 8); >>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; > > I did an even quicker-and-dirtier test, with the attached. Both > the folding and non-folding methods display, to me, pretty much > the same behavior. And there's a fairly significant periodicity > with a doubling of the hash collision rate, every 8 or so buckets. > > The "folding" version has higher spikes at these points than the > non-folding, in fact. As you mentioned, there are a few more "zero" > hashes, but that's expected, and not that different for both. > > Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there > are 800M possible permutations, and of course 1M hash buckets. So, > an 800:1 collision rate is expected. But the numbers range from > the mid-300's to several-1000's. That variance seems high to me. > > I really think there needs to be a flatter spectrum, here. These > collisions can cause significant congestion effects at scale. I > suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going > to take me a little time to find that. > I did tests with range cma_sport [0xC000, 0xFFFF] and cma_dport [1025, 0xFFFF] (but each test with one dport), and found: 1. The folding and non-folding results are similar; 2. When dport is multiple of 0x200 the result is very bad. I also tested with your hashtest.c, there are much more "zero" hashes when sport or dport is multiple of 0x200. For the hash one of the original goal is symmetry, i.e.: f(sport, dport) = f(dport, sport) If that's not important I feel "sport * 31 + dport" [1] has a better result. [1] https://www.strchr.com/hash_functions > > Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-19 1:51 ` Mark Zhang @ 2020-02-19 2:01 ` Tom Talpey 2020-02-19 2:06 ` Mark Zhang 0 siblings, 1 reply; 24+ messages in thread From: Tom Talpey @ 2020-02-19 2:01 UTC (permalink / raw) To: Mark Zhang, Jason Gunthorpe Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On 2/18/2020 8:51 PM, Mark Zhang wrote: > On 2/19/2020 1:41 AM, Tom Talpey wrote: >> On 2/18/2020 9:16 AM, Tom Talpey wrote: >>> On 2/15/2020 1:27 AM, Mark Zhang wrote: >>>> On 2/14/2020 10:23 PM, Mark Zhang wrote: >>>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote: >>>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: >>>>>> >>>>>>>> If both src & dst ports are in the high value range you loss those >>>>>>>> hash bits in the masking. >>>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll >>>>>>>> get the same hash if both ports are equal 0xF000. >>>>>>> >>>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. >>>>>>> There >>>>>>> will always be collisions, this is just one example. My concern is >>>>>>> the >>>>>>> statistical spread of the results. I argue it's not changed by the >>>>>>> proposed bit-folding, possibly even damaged. >>>>>> >>>>>> I've always thought that 'folding' by modulo results in an abnormal >>>>>> statistical distribution >>>>>> >>>>>> The point here is not collisions but to have a hash distribution which >>>>>> is generally uniform for the input space. >>>>>> >>>>>> Alex, it would be good to make a quick program to measure the >>>>>> uniformity of the distribution.. >>>>>> >>>>> >>>>> Hi, >>>>> >>>>> I did some tests with a quick program (hope it's not buggy...), >>>>> seems the hash without "folding" has a better distribution than hash >>>>> with fold. The "hash quality" is reflected by the "total_access"[1] >>>>> below. >>>>> >>>>> I tested only with cma_dport from 18515 (ib_write_bw default) to >>>>> 18524. I can do more tests if required, for example use multiple >>>>> cma_dport in one statistic. >>>>> >>>>> >>>>> [1] >>>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays >>>>> >>>>> >>>>> $ ./a >>>>> >>>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; >>>>> Lower is better; >>>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; >>>>> Likely min is always 0 >>>>> total_access: The sum of all 'accesses' (for each slot: >>>>> accesses=n*(n+1)/2); Lower is better >>>>> n[X]: How many slots that has X items >>>>> >>>>> cm source port range [32768, 65534], dest port 18515: >>>>> Hash with folding: >>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = >>>>> 126 >>>>> udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] = >>>>> 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31 >>>>> n[8] = 5 n[9] = 2 n[10] = 1 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] = >>>>> 7926 n[3] = 530 n[4] = 3698 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18516: >>>>> Hash with folding: >>>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] = >>>>> 770 n[3] = 4 >>>>> udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] = >>>>> 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20 >>>>> n[8] = 4 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>> 16382 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18517: >>>>> Hash with folding: >>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = >>>>> 258 >>>>> udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] = >>>>> 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94 >>>>> n[8] = 22 n[9] = 5 n[10] = 2 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] = >>>>> 10746 n[3] = 2818 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18518: >>>>> Hash with folding: >>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = >>>>> 350 >>>>> udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] = >>>>> 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16 >>>>> n[8] = 2 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>> 16382 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18519: >>>>> Hash with folding: >>>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] = >>>>> 469 n[3] = 4 >>>>> udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] = >>>>> 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32 >>>>> n[8] = 4 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] = >>>>> 6360 n[3] = 3660 n[4] = 1351 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18520: >>>>> Hash with folding: >>>>> flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] = >>>>> 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2 >>>>> udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] = >>>>> 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483 >>>>> n[8] = 223 n[9] = 87 n[10] = 27 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] = 8190 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18521: >>>>> Hash with folding: >>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = >>>>> 421 >>>>> udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] = >>>>> 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43 >>>>> n[8] = 6 n[9] = 2 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] = >>>>> 16338 n[3] = 22 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18522: >>>>> Hash with folding: >>>>> flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] = >>>>> 283 n[3] = 1 >>>>> udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] = >>>>> 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29 >>>>> n[8] = 2 n[9] = 2 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>> 16382 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18523: >>>>> Hash with folding: >>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = >>>>> 126 >>>>> udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] = >>>>> 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50 >>>>> n[8] = 10 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] = >>>>> 6640 n[3] = 3094 n[4] = 1777 >>>>> >>>>> >>>>> cm source port range [32768, 65534], dest port 18524: >>>>> Hash with folding: >>>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] = >>>>> 696 n[3] = 4 >>>>> udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] = >>>>> 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14 >>>>> n[8] = 1 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>> 16382 >>>>> >>>>> >>>> >>>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, >>>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then >>>> others. For examples when dport is 18431 and 18432: >>>> >>>> cm source port range [32768, 65534], dest port 18431: >>>> Hash with folding: >>>> flow_label: max 2 min 0 total_access 32766 >>>> udp_sport: max 8 min 0 total_access 50410 >>>> Hash without folding: >>>> flow_label: max 1 min 0 total_access 32766 >>>> udp_sport: max 4 min 0 total_access 48126 >>>> >>>> cm source port range [32768, 65534], dest port 18432(0x4800): >>>> Hash with folding: >>>> flow_label: max 133 min 0 total_access 1072938 >>>> >>>> udp_sport: max 203 min 0 total_access 2126644 >>>> >>>> Hash without folding: >>>> flow_label: max 64 min 0 total_access 1048450 >>>> >>>> udp_sport: max 1024 min 0 total_access 16775170 >>> >>> Good data! It certainly indicates an issue with the simple >>> binary modulus for treuncating 32->20 bits. But the extremely >>> narrow testing range limits the conclusions considerably: >>> >>> >> I tested only with cma_dport from 18515 (ib_write_bw default) to >>> >> 18524. I can do more tests if required, for example use multiple >>> >> cma_dport in one statistic. >>> >>> This hash is intended to provide entropy across the entire port >>> range and we should evaluate it as such. At a minimum, the source >>> port can vary much more widely, from Alex's original message it's >>> 0xC000 - 0xFFFF. >>> >>>> UDP source port selection must adhere IANA port allocation ranges. >>>> Thus we will >>>> be using IANA recommendation for Ephemeral port range of: >>>> 49152-65535, or in >>>> hex: 0xC000-0xFFFF. >>> >>> I'm not certain what the range of the destination port might be, but >>> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF. >>> >>> Any chance you could scale up your test, to measure the original >>> proposed hash across these broader ranges? >>> >>>> u32 hash = DstPort * SrcPort; >>>> hash ^= (hash >> 16); >>>> hash ^= (hash >> 8); >>>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >> >> I did an even quicker-and-dirtier test, with the attached. Both >> the folding and non-folding methods display, to me, pretty much >> the same behavior. And there's a fairly significant periodicity >> with a doubling of the hash collision rate, every 8 or so buckets. >> >> The "folding" version has higher spikes at these points than the >> non-folding, in fact. As you mentioned, there are a few more "zero" >> hashes, but that's expected, and not that different for both. >> >> Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there >> are 800M possible permutations, and of course 1M hash buckets. So, >> an 800:1 collision rate is expected. But the numbers range from >> the mid-300's to several-1000's. That variance seems high to me. >> >> I really think there needs to be a flatter spectrum, here. These >> collisions can cause significant congestion effects at scale. I >> suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going >> to take me a little time to find that. >> > > I did tests with range cma_sport [0xC000, 0xFFFF] and cma_dport [1025, > 0xFFFF] (but each test with one dport), and found: > > 1. The folding and non-folding results are similar; > 2. When dport is multiple of 0x200 the result is very bad. I also tested > with your hashtest.c, there are much more "zero" hashes when sport or > dport is multiple of 0x200. > > For the hash one of the original goal is symmetry, i.e.: > f(sport, dport) = f(dport, sport) I'm very curious why this is a requirement. The hash is used to map to a packet queue, which enforces ordering as well as providing a congestion throttle point. These queues are one-way, and therefore the same value has no effect when used symmetrically - it only works one-way, the reverse flow is completely independent. Am I missing something? > If that's not important I feel "sport * 31 + dport" [1] has a better result. > > [1] https://www.strchr.com/hash_functions Well, that'd be simple! Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-19 2:01 ` Tom Talpey @ 2020-02-19 2:06 ` Mark Zhang 2020-02-19 13:06 ` Jason Gunthorpe 0 siblings, 1 reply; 24+ messages in thread From: Mark Zhang @ 2020-02-19 2:06 UTC (permalink / raw) To: Tom Talpey, Jason Gunthorpe Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On 2/19/2020 10:01 AM, Tom Talpey wrote: > On 2/18/2020 8:51 PM, Mark Zhang wrote: >> On 2/19/2020 1:41 AM, Tom Talpey wrote: >>> On 2/18/2020 9:16 AM, Tom Talpey wrote: >>>> On 2/15/2020 1:27 AM, Mark Zhang wrote: >>>>> On 2/14/2020 10:23 PM, Mark Zhang wrote: >>>>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote: >>>>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote: >>>>>>> >>>>>>>>> If both src & dst ports are in the high value range you loss those >>>>>>>>> hash bits in the masking. >>>>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. >>>>>>>>> You'll >>>>>>>>> get the same hash if both ports are equal 0xF000. >>>>>>>> >>>>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. >>>>>>>> There >>>>>>>> will always be collisions, this is just one example. My concern is >>>>>>>> the >>>>>>>> statistical spread of the results. I argue it's not changed by the >>>>>>>> proposed bit-folding, possibly even damaged. >>>>>>> >>>>>>> I've always thought that 'folding' by modulo results in an abnormal >>>>>>> statistical distribution >>>>>>> >>>>>>> The point here is not collisions but to have a hash distribution >>>>>>> which >>>>>>> is generally uniform for the input space. >>>>>>> >>>>>>> Alex, it would be good to make a quick program to measure the >>>>>>> uniformity of the distribution.. >>>>>>> >>>>>> >>>>>> Hi, >>>>>> >>>>>> I did some tests with a quick program (hope it's not buggy...), >>>>>> seems the hash without "folding" has a better distribution than hash >>>>>> with fold. The "hash quality" is reflected by the "total_access"[1] >>>>>> below. >>>>>> >>>>>> I tested only with cma_dport from 18515 (ib_write_bw default) to >>>>>> 18524. I can do more tests if required, for example use multiple >>>>>> cma_dport in one statistic. >>>>>> >>>>>> >>>>>> [1] >>>>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays >>>>>> >>>>>> >>>>>> >>>>>> $ ./a >>>>>> >>>>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; >>>>>> Lower is better; >>>>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; >>>>>> Likely min is always 0 >>>>>> total_access: The sum of all 'accesses' (for each slot: >>>>>> accesses=n*(n+1)/2); Lower is better >>>>>> n[X]: How many slots that has X items >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18515: >>>>>> Hash with folding: >>>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = >>>>>> 126 >>>>>> udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] = >>>>>> 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31 >>>>>> n[8] = 5 n[9] = 2 n[10] = 1 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] = >>>>>> 7926 n[3] = 530 n[4] = 3698 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18516: >>>>>> Hash with folding: >>>>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] = >>>>>> 770 n[3] = 4 >>>>>> udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] = >>>>>> 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20 >>>>>> n[8] = 4 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>>> 16382 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18517: >>>>>> Hash with folding: >>>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = >>>>>> 258 >>>>>> udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] = >>>>>> 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94 >>>>>> n[8] = 22 n[9] = 5 n[10] = 2 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] = >>>>>> 10746 n[3] = 2818 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18518: >>>>>> Hash with folding: >>>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = >>>>>> 350 >>>>>> udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] = >>>>>> 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16 >>>>>> n[8] = 2 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>>> 16382 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18519: >>>>>> Hash with folding: >>>>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] = >>>>>> 469 n[3] = 4 >>>>>> udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] = >>>>>> 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32 >>>>>> n[8] = 4 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] = >>>>>> 6360 n[3] = 3660 n[4] = 1351 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18520: >>>>>> Hash with folding: >>>>>> flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] = >>>>>> 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2 >>>>>> udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] = >>>>>> 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483 >>>>>> n[8] = 223 n[9] = 87 n[10] = 27 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] >>>>>> = 8190 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18521: >>>>>> Hash with folding: >>>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = >>>>>> 421 >>>>>> udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] = >>>>>> 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43 >>>>>> n[8] = 6 n[9] = 2 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] = >>>>>> 16338 n[3] = 22 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18522: >>>>>> Hash with folding: >>>>>> flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] = >>>>>> 283 n[3] = 1 >>>>>> udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] = >>>>>> 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29 >>>>>> n[8] = 2 n[9] = 2 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>>> 16382 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18523: >>>>>> Hash with folding: >>>>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = >>>>>> 126 >>>>>> udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] = >>>>>> 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50 >>>>>> n[8] = 10 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] = >>>>>> 6640 n[3] = 3094 n[4] = 1777 >>>>>> >>>>>> >>>>>> cm source port range [32768, 65534], dest port 18524: >>>>>> Hash with folding: >>>>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] = >>>>>> 696 n[3] = 4 >>>>>> udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] = >>>>>> 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14 >>>>>> n[8] = 1 >>>>>> Hash without folding: >>>>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766 >>>>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] = >>>>>> 16382 >>>>>> >>>>>> >>>>> >>>>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, >>>>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then >>>>> others. For examples when dport is 18431 and 18432: >>>>> >>>>> cm source port range [32768, 65534], dest port 18431: >>>>> Hash with folding: >>>>> flow_label: max 2 min 0 total_access 32766 >>>>> udp_sport: max 8 min 0 total_access 50410 >>>>> Hash without folding: >>>>> flow_label: max 1 min 0 total_access 32766 >>>>> udp_sport: max 4 min 0 total_access 48126 >>>>> >>>>> cm source port range [32768, 65534], dest port 18432(0x4800): >>>>> Hash with folding: >>>>> flow_label: max 133 min 0 total_access 1072938 >>>>> >>>>> udp_sport: max 203 min 0 total_access 2126644 >>>>> >>>>> Hash without folding: >>>>> flow_label: max 64 min 0 total_access 1048450 >>>>> >>>>> udp_sport: max 1024 min 0 total_access 16775170 >>>> >>>> Good data! It certainly indicates an issue with the simple >>>> binary modulus for treuncating 32->20 bits. But the extremely >>>> narrow testing range limits the conclusions considerably: >>>> >>>> >> I tested only with cma_dport from 18515 (ib_write_bw default) to >>>> >> 18524. I can do more tests if required, for example use multiple >>>> >> cma_dport in one statistic. >>>> >>>> This hash is intended to provide entropy across the entire port >>>> range and we should evaluate it as such. At a minimum, the source >>>> port can vary much more widely, from Alex's original message it's >>>> 0xC000 - 0xFFFF. >>>> >>>>> UDP source port selection must adhere IANA port allocation ranges. >>>>> Thus we will >>>>> be using IANA recommendation for Ephemeral port range of: >>>>> 49152-65535, or in >>>>> hex: 0xC000-0xFFFF. >>>> >>>> I'm not certain what the range of the destination port might be, but >>>> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF. >>>> >>>> Any chance you could scale up your test, to measure the original >>>> proposed hash across these broader ranges? >>>> >>>>> u32 hash = DstPort * SrcPort; >>>>> hash ^= (hash >> 16); >>>>> hash ^= (hash >> 8); >>>>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>> >>> I did an even quicker-and-dirtier test, with the attached. Both >>> the folding and non-folding methods display, to me, pretty much >>> the same behavior. And there's a fairly significant periodicity >>> with a doubling of the hash collision rate, every 8 or so buckets. >>> >>> The "folding" version has higher spikes at these points than the >>> non-folding, in fact. As you mentioned, there are a few more "zero" >>> hashes, but that's expected, and not that different for both. >>> >>> Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there >>> are 800M possible permutations, and of course 1M hash buckets. So, >>> an 800:1 collision rate is expected. But the numbers range from >>> the mid-300's to several-1000's. That variance seems high to me. >>> >>> I really think there needs to be a flatter spectrum, here. These >>> collisions can cause significant congestion effects at scale. I >>> suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going >>> to take me a little time to find that. >>> >> >> I did tests with range cma_sport [0xC000, 0xFFFF] and cma_dport [1025, >> 0xFFFF] (but each test with one dport), and found: >> >> 1. The folding and non-folding results are similar; >> 2. When dport is multiple of 0x200 the result is very bad. I also tested >> with your hashtest.c, there are much more "zero" hashes when >> sport or >> dport is multiple of 0x200. >> >> For the hash one of the original goal is symmetry, i.e.: >> f(sport, dport) = f(dport, sport) > > I'm very curious why this is a requirement. The hash is used to map > to a packet queue, which enforces ordering as well as providing a > congestion throttle point. These queues are one-way, and therefore > the same value has no effect when used symmetrically - it only works > one-way, the reverse flow is completely independent. > > Am I missing something? > The symmetry is important when calculate flow_label with DstQPn/SrcQPn for non-RDMA CM Service ID (check the first mail), so that the server and client will have same flow_label and udp_sport. But looks like it is not important in this case. >> If that's not important I feel "sport * 31 + dport" [1] has a better >> result. >> >> [1] https://www.strchr.com/hash_functions > > Well, that'd be simple! > > Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-19 2:06 ` Mark Zhang @ 2020-02-19 13:06 ` Jason Gunthorpe 2020-02-19 17:41 ` Tom Talpey 0 siblings, 1 reply; 24+ messages in thread From: Jason Gunthorpe @ 2020-02-19 13:06 UTC (permalink / raw) To: Mark Zhang Cc: Tom Talpey, Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote: > The symmetry is important when calculate flow_label with DstQPn/SrcQPn > for non-RDMA CM Service ID (check the first mail), so that the server > and client will have same flow_label and udp_sport. But looks like it is > not important in this case. If the application needs a certain flow label it should not rely on auto-generation, IMHO. I expect most networks will not be reversible anyhow, even with the same flow label? Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-19 13:06 ` Jason Gunthorpe @ 2020-02-19 17:41 ` Tom Talpey 2020-02-19 17:55 ` Jason Gunthorpe 2020-02-20 1:04 ` Mark Zhang 0 siblings, 2 replies; 24+ messages in thread From: Tom Talpey @ 2020-02-19 17:41 UTC (permalink / raw) To: Jason Gunthorpe, Mark Zhang Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On 2/19/2020 8:06 AM, Jason Gunthorpe wrote: > On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote: > >> The symmetry is important when calculate flow_label with DstQPn/SrcQPn >> for non-RDMA CM Service ID (check the first mail), so that the server >> and client will have same flow_label and udp_sport. But looks like it is >> not important in this case. > > If the application needs a certain flow label it should not rely on > auto-generation, IMHO. > > I expect most networks will not be reversible anyhow, even with the > same flow label? These are network flow labels, not under application control. If they are under application control, that's a security issue. But I agree, if the symmetric behavior is not needed, it should be ignored and a better (more uniformly distributed) hash should be chosen. I definitely like the simplicity and perfect flatness of the newly proposed (src * 31) + dst. But that "31" causes overflow into bit 21, doesn't it? (31 * 0xffff == 0x1f0000) Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-19 17:41 ` Tom Talpey @ 2020-02-19 17:55 ` Jason Gunthorpe 2020-02-20 1:04 ` Mark Zhang 1 sibling, 0 replies; 24+ messages in thread From: Jason Gunthorpe @ 2020-02-19 17:55 UTC (permalink / raw) To: Tom Talpey Cc: Mark Zhang, Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On Wed, Feb 19, 2020 at 12:41:53PM -0500, Tom Talpey wrote: > On 2/19/2020 8:06 AM, Jason Gunthorpe wrote: > > On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote: > > > The symmetry is important when calculate flow_label with DstQPn/SrcQPn > > > for non-RDMA CM Service ID (check the first mail), so that the server > > > and client will have same flow_label and udp_sport. But looks like it is > > > not important in this case. > > > > If the application needs a certain flow label it should not rely on > > auto-generation, IMHO. > > > > I expect most networks will not be reversible anyhow, even with the > > same flow label? > > These are network flow labels, not under application control. If they > are under application control, that's a security issue. Eh? ipv6 puts them under application control. Jason ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-19 17:41 ` Tom Talpey 2020-02-19 17:55 ` Jason Gunthorpe @ 2020-02-20 1:04 ` Mark Zhang 2020-02-21 14:47 ` Tom Talpey 1 sibling, 1 reply; 24+ messages in thread From: Mark Zhang @ 2020-02-20 1:04 UTC (permalink / raw) To: Tom Talpey, Jason Gunthorpe Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On 2/20/2020 1:41 AM, Tom Talpey wrote: > On 2/19/2020 8:06 AM, Jason Gunthorpe wrote: >> On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote: >>> The symmetry is important when calculate flow_label with DstQPn/SrcQPn >>> for non-RDMA CM Service ID (check the first mail), so that the server >>> and client will have same flow_label and udp_sport. But looks like it is >>> not important in this case. >> >> If the application needs a certain flow label it should not rely on >> auto-generation, IMHO. >> >> I expect most networks will not be reversible anyhow, even with the >> same flow label? > > These are network flow labels, not under application control. If they > are under application control, that's a security issue. > As Jason said application is able to control it in ipv6. Besides application is also able to control it for non-RDMA CM Service ID in ipv4. Hi Jason, same flow label get same UDP source port, with same UDP source port (along with same sIP/dIP/sPort), are networks reversible? > But I agree, if the symmetric behavior is not needed, it should be > ignored and a better (more uniformly distributed) hash should be chosen. > > I definitely like the simplicity and perfect flatness of the newly > proposed (src * 31) + dst. But that "31" causes overflow into bit 21, > doesn't it? (31 * 0xffff == 0x1f0000) > I think overflow doesn't matter? We have overflow anyway if multiplicative is used. > Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-20 1:04 ` Mark Zhang @ 2020-02-21 14:47 ` Tom Talpey 2020-02-25 13:20 ` Alex Rosenbaum 0 siblings, 1 reply; 24+ messages in thread From: Tom Talpey @ 2020-02-21 14:47 UTC (permalink / raw) To: Mark Zhang, Jason Gunthorpe Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky On 2/19/2020 8:04 PM, Mark Zhang wrote: > On 2/20/2020 1:41 AM, Tom Talpey wrote: >> On 2/19/2020 8:06 AM, Jason Gunthorpe wrote: >>> On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote: >>>> The symmetry is important when calculate flow_label with DstQPn/SrcQPn >>>> for non-RDMA CM Service ID (check the first mail), so that the server >>>> and client will have same flow_label and udp_sport. But looks like it is >>>> not important in this case. >>> >>> If the application needs a certain flow label it should not rely on >>> auto-generation, IMHO. >>> >>> I expect most networks will not be reversible anyhow, even with the >>> same flow label? >> >> These are network flow labels, not under application control. If they >> are under application control, that's a security issue. >> > > As Jason said application is able to control it in ipv6. Besides > application is also able to control it for non-RDMA CM Service ID in ipv4. Ok, well I guess that's a separate issue, let's not rathole on it here then. > Hi Jason, same flow label get same UDP source port, with same UDP source > port (along with same sIP/dIP/sPort), are networks reversible? > >> But I agree, if the symmetric behavior is not needed, it should be >> ignored and a better (more uniformly distributed) hash should be chosen. >> >> I definitely like the simplicity and perfect flatness of the newly >> proposed (src * 31) + dst. But that "31" causes overflow into bit 21, >> doesn't it? (31 * 0xffff == 0x1f0000) > > > I think overflow doesn't matter? We have overflow anyway if > multiplicative is used. Hmm, it does seem to matter because dropping bits tilts the distribution curve. Plugging ((src * 31) + dst) & 0xFFFFF into my little test shows some odd behaviors. It starts out flat, then the collisions start to rise around 49000, leveling out at 65000 to a value roughly double the initial one (528 -> 1056). It sits there until 525700, where it falls back to the start value (528). I don't think this is optimal :-) Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port 2020-02-21 14:47 ` Tom Talpey @ 2020-02-25 13:20 ` Alex Rosenbaum 0 siblings, 0 replies; 24+ messages in thread From: Alex Rosenbaum @ 2020-02-25 13:20 UTC (permalink / raw) To: Tom Talpey Cc: Mark Zhang, Jason Gunthorpe, RDMA mailing list, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky Mark and I where playing with your test, and plotting the results I'm sharing the png's on a temp github here: https://github.com/rosenbaumalex/hashtest/ [I wasn't sure of a better place to share them] The README.md explains the port range we used, the 3 hash's used, and a line about the results. In general, the higher the 'noise' the worse the distribution is. It seems like Mark's hash suggestion (src*31 + dst) works best. then the folding one, and last the non-folding one. I am trying to cache a few switch related hash experts to get additional feedback. Alex On Fri, Feb 21, 2020 at 4:47 PM Tom Talpey <tom@talpey.com> wrote: > > On 2/19/2020 8:04 PM, Mark Zhang wrote: > > On 2/20/2020 1:41 AM, Tom Talpey wrote: > >> On 2/19/2020 8:06 AM, Jason Gunthorpe wrote: > >>> On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote: > >>>> The symmetry is important when calculate flow_label with DstQPn/SrcQPn > >>>> for non-RDMA CM Service ID (check the first mail), so that the server > >>>> and client will have same flow_label and udp_sport. But looks like it is > >>>> not important in this case. > >>> > >>> If the application needs a certain flow label it should not rely on > >>> auto-generation, IMHO. > >>> > >>> I expect most networks will not be reversible anyhow, even with the > >>> same flow label? > >> > >> These are network flow labels, not under application control. If they > >> are under application control, that's a security issue. > >> > > > > As Jason said application is able to control it in ipv6. Besides > > application is also able to control it for non-RDMA CM Service ID in ipv4. > > Ok, well I guess that's a separate issue, let's not rathole on > it here then. > > > Hi Jason, same flow label get same UDP source port, with same UDP source > > port (along with same sIP/dIP/sPort), are networks reversible? > > > >> But I agree, if the symmetric behavior is not needed, it should be > >> ignored and a better (more uniformly distributed) hash should be chosen. > >> > >> I definitely like the simplicity and perfect flatness of the newly > >> proposed (src * 31) + dst. But that "31" causes overflow into bit 21, > >> doesn't it? (31 * 0xffff == 0x1f0000) > > > > > I think overflow doesn't matter? We have overflow anyway if > > multiplicative is used. > > Hmm, it does seem to matter because dropping bits tilts the > distribution curve. Plugging ((src * 31) + dst) & 0xFFFFF into > my little test shows some odd behaviors. It starts out flat, > then the collisions start to rise around 49000, leveling out > at 65000 to a value roughly double the initial one (528 -> 1056). > It sits there until 525700, where it falls back to the start > value (528). I don't think this is optimal :-) > > Tom. ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2020-02-25 13:21 UTC | newest] Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum 2020-01-15 9:48 ` Mark Zhang 2020-02-06 14:18 ` Tom Talpey 2020-02-06 14:35 ` Jason Gunthorpe 2020-02-06 14:39 ` Alex Rosenbaum 2020-02-06 15:19 ` Tom Talpey 2020-02-08 9:58 ` Alex Rosenbaum 2020-02-12 15:47 ` Tom Talpey 2020-02-13 11:03 ` Alex Rosenbaum 2020-02-13 15:26 ` Tom Talpey 2020-02-13 15:41 ` Jason Gunthorpe 2020-02-14 14:23 ` Mark Zhang 2020-02-15 6:27 ` Mark Zhang 2020-02-18 14:16 ` Tom Talpey 2020-02-18 17:41 ` Tom Talpey 2020-02-19 1:51 ` Mark Zhang 2020-02-19 2:01 ` Tom Talpey 2020-02-19 2:06 ` Mark Zhang 2020-02-19 13:06 ` Jason Gunthorpe 2020-02-19 17:41 ` Tom Talpey 2020-02-19 17:55 ` Jason Gunthorpe 2020-02-20 1:04 ` Mark Zhang 2020-02-21 14:47 ` Tom Talpey 2020-02-25 13:20 ` Alex Rosenbaum
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.