Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
* [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
@ 2020-01-08 14:26 Alex Rosenbaum
  2020-01-15  9:48 ` Mark Zhang
  2020-02-06 14:18 ` Tom Talpey
  0 siblings, 2 replies; 24+ messages in thread
From: Alex Rosenbaum @ 2020-01-08 14:26 UTC (permalink / raw)
  To: RDMA mailing list
  Cc: Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox,
	Maor Gottlieb, Leon Romanovsky, Mark Zhang

A combination of the flow_label field in the IPv6 header and UDP source port
field in RoCE v2.0 are used to identify a group of packets that must be
delivered in order by the network, end-to-end.
These fields are used to create entropy for network routers (ECMP), load
balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
headers.

The flow_label field is defined by a 20 bit hash value. CM based connections
will use a hash function definition based on the service type (QP Type) and
Service ID (SID). Where CM services are not used, the 20 bit hash will be
according to the source and destination QPN values.
Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.

UDP source port selection must adhere IANA port allocation ranges. Thus we will
be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
hex: 0xC000-0xFFFF.

The below calculations take into account the importance of producing a symmetric
hash result so we can support symmetric hash calculation of network elements.

Hash Calculation for RDMA IP CM Service
=======================================
For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
REQ private data info and Service ID.

Flow label hash function calculations definition will be defined as:
Extract the following fields from the CM IP REQ:
  CM_REQ.ServiceID.DstPort [2 Bytes]
  CM_REQ.PrivateData.SrcPort [2 Bytes]
  u32 hash = DstPort * SrcPort;
  hash ^= (hash >> 16);
  hash ^= (hash >> 8);
  AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;

  #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF

Result of the above hash will be kept in the CM's route path record connection
context and will be used all across its vitality for all preceding CM messages
on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
Once connection is established, the corresponding Connected RC QPs, on both
ends of the connection, will update their context with the calculated RDMA IP
CM Service based flow_label and UDP src_port values at the Connect phase of
the active side and Accept phase of the passive side of the connection.

CM will provide to the calculated value of the flow_label hash (20 bit) result
in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
ibv_ah_attr'.
The 'struct ibv_ah_attr' is passed by the CM to the provider library when
modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).

Hash Calculation for non-RDMA CM Service ID
===========================================
For non CM QP's, the application can define the flow_label value in the
'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
a AH for the datagram QP's (e.g.: UD).

If the provided flow_label value is zero, not set by the application (e.g.:
legacy cases), then verbs providers should use the src.QP[24bit] and
dst.QP[24bit] as input arguments for flow_label calculation.
As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
value. We'll define a flow_label value as:
  DstQPn [3 Bytes]
  SrcQPn [3 Bytes]
  u64 hash = DstQPn * SrcQPn;
  hash ^= (hash >> 20);
  hash ^= (hash >> 40);
  AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;

Hash Calculation for UDP src_port
=================================
Providers supporting RoCEv2 will use the 'flow_label' value as input to
calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
AH context.

UDP src_port calculations from flow label:
[while considering the 14 bits UDP port range according to IANA recommendation]
  AH_ATTR.GRH.flow_label [20 bits]
  u32 fl_low  = fl & 0x03FFF;
  u32 fl_high = fl & 0xFC000;
  u16 udp_sport = fl_low XOR (fl_high >> 14);
  RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT

  #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000

This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]

[1] https://www.spinics.net/lists/linux-rdma/msg73735.html

Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum
@ 2020-01-15  9:48 ` Mark Zhang
  2020-02-06 14:18 ` Tom Talpey
  1 sibling, 0 replies; 24+ messages in thread
From: Mark Zhang @ 2020-01-15  9:48 UTC (permalink / raw)
  To: Alex Rosenbaum, RDMA mailing list
  Cc: Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex Rosenbaum,
	Maor Gottlieb, Leon Romanovsky

On 1/8/2020 10:26 PM, Alex Rosenbaum wrote:
> A combination of the flow_label field in the IPv6 header and UDP source port
> field in RoCE v2.0 are used to identify a group of packets that must be
> delivered in order by the network, end-to-end.
> These fields are used to create entropy for network routers (ECMP), load
> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
> headers.
> 
> The flow_label field is defined by a 20 bit hash value. CM based connections
> will use a hash function definition based on the service type (QP Type) and
> Service ID (SID). Where CM services are not used, the 20 bit hash will be
> according to the source and destination QPN values.
> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
> 
> UDP source port selection must adhere IANA port allocation ranges. Thus we will
> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
> hex: 0xC000-0xFFFF.
> 
> The below calculations take into account the importance of producing a symmetric
> hash result so we can support symmetric hash calculation of network elements.
> 
> Hash Calculation for RDMA IP CM Service
> =======================================
> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
> REQ private data info and Service ID.
> 
> Flow label hash function calculations definition will be defined as:
> Extract the following fields from the CM IP REQ:
>    CM_REQ.ServiceID.DstPort [2 Bytes]
>    CM_REQ.PrivateData.SrcPort [2 Bytes]
>    u32 hash = DstPort * SrcPort;
>    hash ^= (hash >> 16);
>    hash ^= (hash >> 8);
>    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> 
>    #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
> 
> Result of the above hash will be kept in the CM's route path record connection
> context and will be used all across its vitality for all preceding CM messages
> on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
> Once connection is established, the corresponding Connected RC QPs, on both
> ends of the connection, will update their context with the calculated RDMA IP
> CM Service based flow_label and UDP src_port values at the Connect phase of
> the active side and Accept phase of the passive side of the connection.
> 
> CM will provide to the calculated value of the flow_label hash (20 bit) result
> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
> ibv_ah_attr'.
> The 'struct ibv_ah_attr' is passed by the CM to the provider library when
> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).
> 
> Hash Calculation for non-RDMA CM Service ID
> ===========================================
> For non CM QP's, the application can define the flow_label value in the
> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
> a AH for the datagram QP's (e.g.: UD).
> 

Hi Alex, when creating an AH for the datagram QP, I think we don't have 
the src.QP and dst.QP, so we can't set the flow_label here?


> If the provided flow_label value is zero, not set by the application (e.g.:
> legacy cases), then verbs providers should use the src.QP[24bit] and
> dst.QP[24bit] as input arguments for flow_label calculation.
> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
> value. We'll define a flow_label value as:
>    DstQPn [3 Bytes]
>    SrcQPn [3 Bytes]
>    u64 hash = DstQPn * SrcQPn;
>    hash ^= (hash >> 20);
>    hash ^= (hash >> 40);
>    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> 
> Hash Calculation for UDP src_port
> =================================
> Providers supporting RoCEv2 will use the 'flow_label' value as input to
> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
> AH context.
> 
> UDP src_port calculations from flow label:
> [while considering the 14 bits UDP port range according to IANA recommendation]
>    AH_ATTR.GRH.flow_label [20 bits]
>    u32 fl_low  = fl & 0x03FFF;
>    u32 fl_high = fl & 0xFC000;
>    u16 udp_sport = fl_low XOR (fl_high >> 14);
>    RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT
> 
>    #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000
> 
> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]
> 
> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html
> 
> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum
  2020-01-15  9:48 ` Mark Zhang
@ 2020-02-06 14:18 ` Tom Talpey
  2020-02-06 14:35   ` Jason Gunthorpe
  2020-02-06 14:39   ` Alex Rosenbaum
  1 sibling, 2 replies; 24+ messages in thread
From: Tom Talpey @ 2020-02-06 14:18 UTC (permalink / raw)
  To: Alex Rosenbaum, RDMA mailing list
  Cc: Jason Gunthorpe, Eran Ben Elisha, Yishai Hadas, Alex @ Mellanox,
	Maor Gottlieb, Leon Romanovsky, Mark Zhang

On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
> A combination of the flow_label field in the IPv6 header and UDP source port
> field in RoCE v2.0 are used to identify a group of packets that must be
> delivered in order by the network, end-to-end.
> These fields are used to create entropy for network routers (ECMP), load
> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
> headers.
> 
> The flow_label field is defined by a 20 bit hash value. CM based connections
> will use a hash function definition based on the service type (QP Type) and
> Service ID (SID). Where CM services are not used, the 20 bit hash will be
> according to the source and destination QPN values.
> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
> 
> UDP source port selection must adhere IANA port allocation ranges. Thus we will
> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
> hex: 0xC000-0xFFFF.
> 
> The below calculations take into account the importance of producing a symmetric
> hash result so we can support symmetric hash calculation of network elements.
> 
> Hash Calculation for RDMA IP CM Service
> =======================================
> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
> REQ private data info and Service ID.
> 
> Flow label hash function calculations definition will be defined as:
> Extract the following fields from the CM IP REQ:
>    CM_REQ.ServiceID.DstPort [2 Bytes]
>    CM_REQ.PrivateData.SrcPort [2 Bytes]
>    u32 hash = DstPort * SrcPort;
>    hash ^= (hash >> 16);
>    hash ^= (hash >> 8);
>    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> 
>    #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF

Sorry it took me a while to respond to this, and thanks for looking
into it since my comments on the previous proposal. I have a concern
with an aspect of this one.

The RoCEv2 destination port is a fixed value, 4791. Therefore the
term

	u32 hash = DstPort * SrcPort;

adds no entropy beyond the value of SrcPort.

In turn, the subsequent

	hash ^= (hash >> 16);
	hash ^= (hash >> 8);

are re-mashing the bits with one another, again, adding no entropy.

Can you describe how, mathematically, this is not different from simply
using the SrcPort field, and if so, how it contributes to the entropy
differentiation of the incoming streams?

Tom.

> Result of the above hash will be kept in the CM's route path record connection
> context and will be used all across its vitality for all preceding CM messages
> on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
> Once connection is established, the corresponding Connected RC QPs, on both
> ends of the connection, will update their context with the calculated RDMA IP
> CM Service based flow_label and UDP src_port values at the Connect phase of
> the active side and Accept phase of the passive side of the connection.
> 
> CM will provide to the calculated value of the flow_label hash (20 bit) result
> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
> ibv_ah_attr'.
> The 'struct ibv_ah_attr' is passed by the CM to the provider library when
> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).
> 
> Hash Calculation for non-RDMA CM Service ID
> ===========================================
> For non CM QP's, the application can define the flow_label value in the
> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
> a AH for the datagram QP's (e.g.: UD).
> 
> If the provided flow_label value is zero, not set by the application (e.g.:
> legacy cases), then verbs providers should use the src.QP[24bit] and
> dst.QP[24bit] as input arguments for flow_label calculation.
> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
> value. We'll define a flow_label value as:
>    DstQPn [3 Bytes]
>    SrcQPn [3 Bytes]
>    u64 hash = DstQPn * SrcQPn;
>    hash ^= (hash >> 20);
>    hash ^= (hash >> 40);
>    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> 
> Hash Calculation for UDP src_port
> =================================
> Providers supporting RoCEv2 will use the 'flow_label' value as input to
> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
> AH context.
> 
> UDP src_port calculations from flow label:
> [while considering the 14 bits UDP port range according to IANA recommendation]
>    AH_ATTR.GRH.flow_label [20 bits]
>    u32 fl_low  = fl & 0x03FFF;
>    u32 fl_high = fl & 0xFC000;
>    u16 udp_sport = fl_low XOR (fl_high >> 14);
>    RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT
> 
>    #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000
> 
> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]
> 
> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html
> 
> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-06 14:18 ` Tom Talpey
@ 2020-02-06 14:35   ` Jason Gunthorpe
  2020-02-06 14:39   ` Alex Rosenbaum
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-06 14:35 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang

On Thu, Feb 06, 2020 at 09:18:38AM -0500, Tom Talpey wrote:
> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
> > A combination of the flow_label field in the IPv6 header and UDP source port
> > field in RoCE v2.0 are used to identify a group of packets that must be
> > delivered in order by the network, end-to-end.
> > These fields are used to create entropy for network routers (ECMP), load
> > balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
> > headers.
> > 
> > The flow_label field is defined by a 20 bit hash value. CM based connections
> > will use a hash function definition based on the service type (QP Type) and
> > Service ID (SID). Where CM services are not used, the 20 bit hash will be
> > according to the source and destination QPN values.
> > Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
> > 
> > UDP source port selection must adhere IANA port allocation ranges. Thus we will
> > be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
> > hex: 0xC000-0xFFFF.
> > 
> > The below calculations take into account the importance of producing a symmetric
> > hash result so we can support symmetric hash calculation of network elements.
> > 
> > Hash Calculation for RDMA IP CM Service
> > =======================================
> > For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
> > RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
> > REQ private data info and Service ID.
> > 
> > Flow label hash function calculations definition will be defined as:
> > Extract the following fields from the CM IP REQ:
> >    CM_REQ.ServiceID.DstPort [2 Bytes]
> >    CM_REQ.PrivateData.SrcPort [2 Bytes]
> >    u32 hash = DstPort * SrcPort;
> >    hash ^= (hash >> 16);
> >    hash ^= (hash >> 8);
> >    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> > 
> >    #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
> 
> Sorry it took me a while to respond to this, and thanks for looking
> into it since my comments on the previous proposal. I have a concern
> with an aspect of this one.
> 
> The RoCEv2 destination port is a fixed value, 4791. Therefore the
> term

I read the above as using the destination port of the IP contained
within the CM REQ, not as the destination port of the RoCE UDP header?

So it can be different..

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-06 14:18 ` Tom Talpey
  2020-02-06 14:35   ` Jason Gunthorpe
@ 2020-02-06 14:39   ` Alex Rosenbaum
  2020-02-06 15:19     ` Tom Talpey
  1 sibling, 1 reply; 24+ messages in thread
From: Alex Rosenbaum @ 2020-02-06 14:39 UTC (permalink / raw)
  To: Tom Talpey
  Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha,
	Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky,
	Mark Zhang

On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote:
>
> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
> > A combination of the flow_label field in the IPv6 header and UDP source port
> > field in RoCE v2.0 are used to identify a group of packets that must be
> > delivered in order by the network, end-to-end.
> > These fields are used to create entropy for network routers (ECMP), load
> > balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
> > headers.
> >
> > The flow_label field is defined by a 20 bit hash value. CM based connections
> > will use a hash function definition based on the service type (QP Type) and
> > Service ID (SID). Where CM services are not used, the 20 bit hash will be
> > according to the source and destination QPN values.
> > Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
> >
> > UDP source port selection must adhere IANA port allocation ranges. Thus we will
> > be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
> > hex: 0xC000-0xFFFF.
> >
> > The below calculations take into account the importance of producing a symmetric
> > hash result so we can support symmetric hash calculation of network elements.
> >
> > Hash Calculation for RDMA IP CM Service
> > =======================================
> > For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
> > RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
> > REQ private data info and Service ID.
> >
> > Flow label hash function calculations definition will be defined as:
> > Extract the following fields from the CM IP REQ:
> >    CM_REQ.ServiceID.DstPort [2 Bytes]
> >    CM_REQ.PrivateData.SrcPort [2 Bytes]
> >    u32 hash = DstPort * SrcPort;
> >    hash ^= (hash >> 16);
> >    hash ^= (hash >> 8);
> >    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> >
> >    #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
>
> Sorry it took me a while to respond to this, and thanks for looking
> into it since my comments on the previous proposal. I have a concern
> with an aspect of this one.
>
> The RoCEv2 destination port is a fixed value, 4791. Therefore the
> term
>
>         u32 hash = DstPort * SrcPort;
>
> adds no entropy beyond the value of SrcPort.
>

we're talking about the CM service ports, taken from the
rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec);
these are the CM level port-space and not the RoCE UDP L4 ports.
we want to use both as these will allow different client instance and
server instance on same nodes will use differen CM ports and hopefully
generate different hash results for multi-flows between these two
servers.

> In turn, the subsequent
>
>         hash ^= (hash >> 16);
>         hash ^= (hash >> 8);
>
> are re-mashing the bits with one another, again, adding no entropy.
>
> Can you describe how, mathematically, this is not different from simply
> using the SrcPort field, and if so, how it contributes to the entropy
> differentiation of the incoming streams?
>
> Tom.
>
> > Result of the above hash will be kept in the CM's route path record connection
> > context and will be used all across its vitality for all preceding CM messages
> > on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
> > Once connection is established, the corresponding Connected RC QPs, on both
> > ends of the connection, will update their context with the calculated RDMA IP
> > CM Service based flow_label and UDP src_port values at the Connect phase of
> > the active side and Accept phase of the passive side of the connection.
> >
> > CM will provide to the calculated value of the flow_label hash (20 bit) result
> > in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
> > ibv_ah_attr'.
> > The 'struct ibv_ah_attr' is passed by the CM to the provider library when
> > modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
> > ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
> > datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).
> >
> > Hash Calculation for non-RDMA CM Service ID
> > ===========================================
> > For non CM QP's, the application can define the flow_label value in the
> > 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
> > a AH for the datagram QP's (e.g.: UD).
> >
> > If the provided flow_label value is zero, not set by the application (e.g.:
> > legacy cases), then verbs providers should use the src.QP[24bit] and
> > dst.QP[24bit] as input arguments for flow_label calculation.
> > As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
> > value. We'll define a flow_label value as:
> >    DstQPn [3 Bytes]
> >    SrcQPn [3 Bytes]
> >    u64 hash = DstQPn * SrcQPn;
> >    hash ^= (hash >> 20);
> >    hash ^= (hash >> 40);
> >    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> >
> > Hash Calculation for UDP src_port
> > =================================
> > Providers supporting RoCEv2 will use the 'flow_label' value as input to
> > calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
> > AH context.
> >
> > UDP src_port calculations from flow label:
> > [while considering the 14 bits UDP port range according to IANA recommendation]
> >    AH_ATTR.GRH.flow_label [20 bits]
> >    u32 fl_low  = fl & 0x03FFF;
> >    u32 fl_high = fl & 0xFC000;
> >    u16 udp_sport = fl_low XOR (fl_high >> 14);
> >    RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT
> >
> >    #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000
> >
> > This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]
> >
> > [1] https://www.spinics.net/lists/linux-rdma/msg73735.html
> >
> > Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
> >
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-06 14:39   ` Alex Rosenbaum
@ 2020-02-06 15:19     ` Tom Talpey
  2020-02-08  9:58       ` Alex Rosenbaum
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2020-02-06 15:19 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha,
	Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky,
	Mark Zhang

On 2/6/2020 9:39 AM, Alex Rosenbaum wrote:
> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote:
>>
>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
>>> A combination of the flow_label field in the IPv6 header and UDP source port
>>> field in RoCE v2.0 are used to identify a group of packets that must be
>>> delivered in order by the network, end-to-end.
>>> These fields are used to create entropy for network routers (ECMP), load
>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
>>> headers.
>>>
>>> The flow_label field is defined by a 20 bit hash value. CM based connections
>>> will use a hash function definition based on the service type (QP Type) and
>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be
>>> according to the source and destination QPN values.
>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
>>>
>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will
>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
>>> hex: 0xC000-0xFFFF.
>>>
>>> The below calculations take into account the importance of producing a symmetric
>>> hash result so we can support symmetric hash calculation of network elements.
>>>
>>> Hash Calculation for RDMA IP CM Service
>>> =======================================
>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
>>> REQ private data info and Service ID.
>>>
>>> Flow label hash function calculations definition will be defined as:
>>> Extract the following fields from the CM IP REQ:
>>>     CM_REQ.ServiceID.DstPort [2 Bytes]
>>>     CM_REQ.PrivateData.SrcPort [2 Bytes]
>>>     u32 hash = DstPort * SrcPort;
>>>     hash ^= (hash >> 16);
>>>     hash ^= (hash >> 8);
>>>     AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>
>>>     #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
>>
>> Sorry it took me a while to respond to this, and thanks for looking
>> into it since my comments on the previous proposal. I have a concern
>> with an aspect of this one.
>>
>> The RoCEv2 destination port is a fixed value, 4791. Therefore the
>> term
>>
>>          u32 hash = DstPort * SrcPort;
>>
>> adds no entropy beyond the value of SrcPort.
>>
> 
> we're talking about the CM service ports, taken from the
> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec);
> these are the CM level port-space and not the RoCE UDP L4 ports.
> we want to use both as these will allow different client instance and
> server instance on same nodes will use differen CM ports and hopefully
> generate different hash results for multi-flows between these two
> servers.

Aha, ok I guess I missed that, and ok.

>> In turn, the subsequent
>>
>>          hash ^= (hash >> 16);
>>          hash ^= (hash >> 8);
>>
>> are re-mashing the bits with one another, again, adding no entropy.

I still wonder about this one. It's attempting to reduce the 32-bit
product to 20 bits, but a second xor with the "middle" 16 bits seems
really strange. Mathematically, wouldn't it be better to just take
the modulus of 2^20? If not, are you expecting some behavior in the
hash values that makes the double-xor approach better (in which case
it should be called out)?

Tom.

>> Can you describe how, mathematically, this is not different from simply
>> using the SrcPort field, and if so, how it contributes to the entropy
>> differentiation of the incoming streams?
>>
>> Tom.
>>
>>> Result of the above hash will be kept in the CM's route path record connection
>>> context and will be used all across its vitality for all preceding CM messages
>>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
>>> Once connection is established, the corresponding Connected RC QPs, on both
>>> ends of the connection, will update their context with the calculated RDMA IP
>>> CM Service based flow_label and UDP src_port values at the Connect phase of
>>> the active side and Accept phase of the passive side of the connection.
>>>
>>> CM will provide to the calculated value of the flow_label hash (20 bit) result
>>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
>>> ibv_ah_attr'.
>>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when
>>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
>>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
>>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).
>>>
>>> Hash Calculation for non-RDMA CM Service ID
>>> ===========================================
>>> For non CM QP's, the application can define the flow_label value in the
>>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
>>> a AH for the datagram QP's (e.g.: UD).
>>>
>>> If the provided flow_label value is zero, not set by the application (e.g.:
>>> legacy cases), then verbs providers should use the src.QP[24bit] and
>>> dst.QP[24bit] as input arguments for flow_label calculation.
>>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
>>> value. We'll define a flow_label value as:
>>>     DstQPn [3 Bytes]
>>>     SrcQPn [3 Bytes]
>>>     u64 hash = DstQPn * SrcQPn;
>>>     hash ^= (hash >> 20);
>>>     hash ^= (hash >> 40);
>>>     AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>
>>> Hash Calculation for UDP src_port
>>> =================================
>>> Providers supporting RoCEv2 will use the 'flow_label' value as input to
>>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
>>> AH context.
>>>
>>> UDP src_port calculations from flow label:
>>> [while considering the 14 bits UDP port range according to IANA recommendation]
>>>     AH_ATTR.GRH.flow_label [20 bits]
>>>     u32 fl_low  = fl & 0x03FFF;
>>>     u32 fl_high = fl & 0xFC000;
>>>     u16 udp_sport = fl_low XOR (fl_high >> 14);
>>>     RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT
>>>
>>>     #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000
>>>
>>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]
>>>
>>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html
>>>
>>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-06 15:19     ` Tom Talpey
@ 2020-02-08  9:58       ` Alex Rosenbaum
  2020-02-12 15:47         ` Tom Talpey
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Rosenbaum @ 2020-02-08  9:58 UTC (permalink / raw)
  To: Tom Talpey
  Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha,
	Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky,
	Mark Zhang

On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote:
>
> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote:
> > On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote:
> >>
> >> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
> >>> A combination of the flow_label field in the IPv6 header and UDP source port
> >>> field in RoCE v2.0 are used to identify a group of packets that must be
> >>> delivered in order by the network, end-to-end.
> >>> These fields are used to create entropy for network routers (ECMP), load
> >>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
> >>> headers.
> >>>
> >>> The flow_label field is defined by a 20 bit hash value. CM based connections
> >>> will use a hash function definition based on the service type (QP Type) and
> >>> Service ID (SID). Where CM services are not used, the 20 bit hash will be
> >>> according to the source and destination QPN values.
> >>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
> >>>
> >>> UDP source port selection must adhere IANA port allocation ranges. Thus we will
> >>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
> >>> hex: 0xC000-0xFFFF.
> >>>
> >>> The below calculations take into account the importance of producing a symmetric
> >>> hash result so we can support symmetric hash calculation of network elements.
> >>>
> >>> Hash Calculation for RDMA IP CM Service
> >>> =======================================
> >>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
> >>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
> >>> REQ private data info and Service ID.
> >>>
> >>> Flow label hash function calculations definition will be defined as:
> >>> Extract the following fields from the CM IP REQ:
> >>>     CM_REQ.ServiceID.DstPort [2 Bytes]
> >>>     CM_REQ.PrivateData.SrcPort [2 Bytes]
> >>>     u32 hash = DstPort * SrcPort;
> >>>     hash ^= (hash >> 16);
> >>>     hash ^= (hash >> 8);
> >>>     AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> >>>
> >>>     #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
> >>
> >> Sorry it took me a while to respond to this, and thanks for looking
> >> into it since my comments on the previous proposal. I have a concern
> >> with an aspect of this one.
> >>
> >> The RoCEv2 destination port is a fixed value, 4791. Therefore the
> >> term
> >>
> >>          u32 hash = DstPort * SrcPort;
> >>
> >> adds no entropy beyond the value of SrcPort.
> >>
> >
> > we're talking about the CM service ports, taken from the
> > rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec);
> > these are the CM level port-space and not the RoCE UDP L4 ports.
> > we want to use both as these will allow different client instance and
> > server instance on same nodes will use differen CM ports and hopefully
> > generate different hash results for multi-flows between these two
> > servers.
>
> Aha, ok I guess I missed that, and ok.
>
> >> In turn, the subsequent
> >>
> >>          hash ^= (hash >> 16);
> >>          hash ^= (hash >> 8);
> >>
> >> are re-mashing the bits with one another, again, adding no entropy.
>
> I still wonder about this one. It's attempting to reduce the 32-bit
> product to 20 bits, but a second xor with the "middle" 16 bits seems
> really strange. Mathematically, wouldn't it be better to just take
> the modulus of 2^20? If not, are you expecting some behavior in the
> hash values that makes the double-xor approach better (in which case
> it should be called out)?
>
> Tom.

The function takes into account creating a symmetric hash, so both
active and passive can reconstruct the same flow label results. That's
why we multiply the two CM Port values (16 bit * 16 bit). The results
is a 32 bit value, and we don't want to lose any of of the MSB bit's
by modulus or masking. So we need some folding function from 32 bit to
the 20 bit flow label.

The specific bit shift is something I took from the bond driver:
https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407
This proved very good in spreading the flow label in our internal
testing. Other alternative can be suggested, as long as it considers
all bits in the conversion 32->20 bits.

Alex

>
> >> Can you describe how, mathematically, this is not different from simply
> >> using the SrcPort field, and if so, how it contributes to the entropy
> >> differentiation of the incoming streams?
> >>
> >> Tom.
> >>
> >>> Result of the above hash will be kept in the CM's route path record connection
> >>> context and will be used all across its vitality for all preceding CM messages
> >>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
> >>> Once connection is established, the corresponding Connected RC QPs, on both
> >>> ends of the connection, will update their context with the calculated RDMA IP
> >>> CM Service based flow_label and UDP src_port values at the Connect phase of
> >>> the active side and Accept phase of the passive side of the connection.
> >>>
> >>> CM will provide to the calculated value of the flow_label hash (20 bit) result
> >>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
> >>> ibv_ah_attr'.
> >>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when
> >>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
> >>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
> >>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).
> >>>
> >>> Hash Calculation for non-RDMA CM Service ID
> >>> ===========================================
> >>> For non CM QP's, the application can define the flow_label value in the
> >>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
> >>> a AH for the datagram QP's (e.g.: UD).
> >>>
> >>> If the provided flow_label value is zero, not set by the application (e.g.:
> >>> legacy cases), then verbs providers should use the src.QP[24bit] and
> >>> dst.QP[24bit] as input arguments for flow_label calculation.
> >>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
> >>> value. We'll define a flow_label value as:
> >>>     DstQPn [3 Bytes]
> >>>     SrcQPn [3 Bytes]
> >>>     u64 hash = DstQPn * SrcQPn;
> >>>     hash ^= (hash >> 20);
> >>>     hash ^= (hash >> 40);
> >>>     AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> >>>
> >>> Hash Calculation for UDP src_port
> >>> =================================
> >>> Providers supporting RoCEv2 will use the 'flow_label' value as input to
> >>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
> >>> AH context.
> >>>
> >>> UDP src_port calculations from flow label:
> >>> [while considering the 14 bits UDP port range according to IANA recommendation]
> >>>     AH_ATTR.GRH.flow_label [20 bits]
> >>>     u32 fl_low  = fl & 0x03FFF;
> >>>     u32 fl_high = fl & 0xFC000;
> >>>     u16 udp_sport = fl_low XOR (fl_high >> 14);
> >>>     RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT
> >>>
> >>>     #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000
> >>>
> >>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]
> >>>
> >>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html
> >>>
> >>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
> >>>
> >>>
> >
> >

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-08  9:58       ` Alex Rosenbaum
@ 2020-02-12 15:47         ` Tom Talpey
  2020-02-13 11:03           ` Alex Rosenbaum
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2020-02-12 15:47 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha,
	Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky,
	Mark Zhang

On 2/8/2020 4:58 AM, Alex Rosenbaum wrote:
> On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote:
>>
>> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote:
>>> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote:
>>>>
>>>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
>>>>> A combination of the flow_label field in the IPv6 header and UDP source port
>>>>> field in RoCE v2.0 are used to identify a group of packets that must be
>>>>> delivered in order by the network, end-to-end.
>>>>> These fields are used to create entropy for network routers (ECMP), load
>>>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
>>>>> headers.
>>>>>
>>>>> The flow_label field is defined by a 20 bit hash value. CM based connections
>>>>> will use a hash function definition based on the service type (QP Type) and
>>>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be
>>>>> according to the source and destination QPN values.
>>>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
>>>>>
>>>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will
>>>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
>>>>> hex: 0xC000-0xFFFF.
>>>>>
>>>>> The below calculations take into account the importance of producing a symmetric
>>>>> hash result so we can support symmetric hash calculation of network elements.
>>>>>
>>>>> Hash Calculation for RDMA IP CM Service
>>>>> =======================================
>>>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
>>>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
>>>>> REQ private data info and Service ID.
>>>>>
>>>>> Flow label hash function calculations definition will be defined as:
>>>>> Extract the following fields from the CM IP REQ:
>>>>>      CM_REQ.ServiceID.DstPort [2 Bytes]
>>>>>      CM_REQ.PrivateData.SrcPort [2 Bytes]
>>>>>      u32 hash = DstPort * SrcPort;
>>>>>      hash ^= (hash >> 16);
>>>>>      hash ^= (hash >> 8);
>>>>>      AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>>>
>>>>>      #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
>>>>
>>>> Sorry it took me a while to respond to this, and thanks for looking
>>>> into it since my comments on the previous proposal. I have a concern
>>>> with an aspect of this one.
>>>>
>>>> The RoCEv2 destination port is a fixed value, 4791. Therefore the
>>>> term
>>>>
>>>>           u32 hash = DstPort * SrcPort;
>>>>
>>>> adds no entropy beyond the value of SrcPort.
>>>>
>>>
>>> we're talking about the CM service ports, taken from the
>>> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec);
>>> these are the CM level port-space and not the RoCE UDP L4 ports.
>>> we want to use both as these will allow different client instance and
>>> server instance on same nodes will use differen CM ports and hopefully
>>> generate different hash results for multi-flows between these two
>>> servers.
>>
>> Aha, ok I guess I missed that, and ok.
>>
>>>> In turn, the subsequent
>>>>
>>>>           hash ^= (hash >> 16);
>>>>           hash ^= (hash >> 8);
>>>>
>>>> are re-mashing the bits with one another, again, adding no entropy.
>>
>> I still wonder about this one. It's attempting to reduce the 32-bit
>> product to 20 bits, but a second xor with the "middle" 16 bits seems
>> really strange. Mathematically, wouldn't it be better to just take
>> the modulus of 2^20? If not, are you expecting some behavior in the
>> hash values that makes the double-xor approach better (in which case
>> it should be called out)?
>>
>> Tom.
> 
> The function takes into account creating a symmetric hash, so both
> active and passive can reconstruct the same flow label results. That's
> why we multiply the two CM Port values (16 bit * 16 bit). The results
> is a 32 bit value, and we don't want to lose any of of the MSB bit's
> by modulus or masking. So we need some folding function from 32 bit to
> the 20 bit flow label.
> 
> The specific bit shift is something I took from the bond driver:
> https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407
> This proved very good in spreading the flow label in our internal
> testing. Other alternative can be suggested, as long as it considers
> all bits in the conversion 32->20 bits.

I'm ok with it, but I still don't fully understand why the folding
is necessary. The multiplication is the important part, and it is
the operation that combines the two entropic inputs. The folding just
flips bits from what's basically the same entropy source.

IOW, I think that

	u32 hash = (DstPort * SrcPort) & IB_GRH_FLOWLABEL_MASK;

would produce a completely equal benefit, mathematically.

Tom.

> Alex
> 
>>
>>>> Can you describe how, mathematically, this is not different from simply
>>>> using the SrcPort field, and if so, how it contributes to the entropy
>>>> differentiation of the incoming streams?
>>>>
>>>> Tom.
>>>>
>>>>> Result of the above hash will be kept in the CM's route path record connection
>>>>> context and will be used all across its vitality for all preceding CM messages
>>>>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
>>>>> Once connection is established, the corresponding Connected RC QPs, on both
>>>>> ends of the connection, will update their context with the calculated RDMA IP
>>>>> CM Service based flow_label and UDP src_port values at the Connect phase of
>>>>> the active side and Accept phase of the passive side of the connection.
>>>>>
>>>>> CM will provide to the calculated value of the flow_label hash (20 bit) result
>>>>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
>>>>> ibv_ah_attr'.
>>>>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when
>>>>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
>>>>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
>>>>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).
>>>>>
>>>>> Hash Calculation for non-RDMA CM Service ID
>>>>> ===========================================
>>>>> For non CM QP's, the application can define the flow_label value in the
>>>>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
>>>>> a AH for the datagram QP's (e.g.: UD).
>>>>>
>>>>> If the provided flow_label value is zero, not set by the application (e.g.:
>>>>> legacy cases), then verbs providers should use the src.QP[24bit] and
>>>>> dst.QP[24bit] as input arguments for flow_label calculation.
>>>>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
>>>>> value. We'll define a flow_label value as:
>>>>>      DstQPn [3 Bytes]
>>>>>      SrcQPn [3 Bytes]
>>>>>      u64 hash = DstQPn * SrcQPn;
>>>>>      hash ^= (hash >> 20);
>>>>>      hash ^= (hash >> 40);
>>>>>      AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>>>
>>>>> Hash Calculation for UDP src_port
>>>>> =================================
>>>>> Providers supporting RoCEv2 will use the 'flow_label' value as input to
>>>>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
>>>>> AH context.
>>>>>
>>>>> UDP src_port calculations from flow label:
>>>>> [while considering the 14 bits UDP port range according to IANA recommendation]
>>>>>      AH_ATTR.GRH.flow_label [20 bits]
>>>>>      u32 fl_low  = fl & 0x03FFF;
>>>>>      u32 fl_high = fl & 0xFC000;
>>>>>      u16 udp_sport = fl_low XOR (fl_high >> 14);
>>>>>      RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT
>>>>>
>>>>>      #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000
>>>>>
>>>>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]
>>>>>
>>>>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html
>>>>>
>>>>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
>>>>>
>>>>>
>>>
>>>
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-12 15:47         ` Tom Talpey
@ 2020-02-13 11:03           ` Alex Rosenbaum
  2020-02-13 15:26             ` Tom Talpey
  0 siblings, 1 reply; 24+ messages in thread
From: Alex Rosenbaum @ 2020-02-13 11:03 UTC (permalink / raw)
  To: Tom Talpey
  Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha,
	Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky,
	Mark Zhang

On Wed, Feb 12, 2020 at 5:47 PM Tom Talpey <tom@talpey.com> wrote:
>
> On 2/8/2020 4:58 AM, Alex Rosenbaum wrote:
> > On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote:
> >>
> >> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote:
> >>> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote:
> >>>>
> >>>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
> >>>>> A combination of the flow_label field in the IPv6 header and UDP source port
> >>>>> field in RoCE v2.0 are used to identify a group of packets that must be
> >>>>> delivered in order by the network, end-to-end.
> >>>>> These fields are used to create entropy for network routers (ECMP), load
> >>>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
> >>>>> headers.
> >>>>>
> >>>>> The flow_label field is defined by a 20 bit hash value. CM based connections
> >>>>> will use a hash function definition based on the service type (QP Type) and
> >>>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be
> >>>>> according to the source and destination QPN values.
> >>>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
> >>>>>
> >>>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will
> >>>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
> >>>>> hex: 0xC000-0xFFFF.
> >>>>>
> >>>>> The below calculations take into account the importance of producing a symmetric
> >>>>> hash result so we can support symmetric hash calculation of network elements.
> >>>>>
> >>>>> Hash Calculation for RDMA IP CM Service
> >>>>> =======================================
> >>>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
> >>>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
> >>>>> REQ private data info and Service ID.
> >>>>>
> >>>>> Flow label hash function calculations definition will be defined as:
> >>>>> Extract the following fields from the CM IP REQ:
> >>>>>      CM_REQ.ServiceID.DstPort [2 Bytes]
> >>>>>      CM_REQ.PrivateData.SrcPort [2 Bytes]
> >>>>>      u32 hash = DstPort * SrcPort;
> >>>>>      hash ^= (hash >> 16);
> >>>>>      hash ^= (hash >> 8);
> >>>>>      AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> >>>>>
> >>>>>      #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
> >>>>
> >>>> Sorry it took me a while to respond to this, and thanks for looking
> >>>> into it since my comments on the previous proposal. I have a concern
> >>>> with an aspect of this one.
> >>>>
> >>>> The RoCEv2 destination port is a fixed value, 4791. Therefore the
> >>>> term
> >>>>
> >>>>           u32 hash = DstPort * SrcPort;
> >>>>
> >>>> adds no entropy beyond the value of SrcPort.
> >>>>
> >>>
> >>> we're talking about the CM service ports, taken from the
> >>> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec);
> >>> these are the CM level port-space and not the RoCE UDP L4 ports.
> >>> we want to use both as these will allow different client instance and
> >>> server instance on same nodes will use differen CM ports and hopefully
> >>> generate different hash results for multi-flows between these two
> >>> servers.
> >>
> >> Aha, ok I guess I missed that, and ok.
> >>
> >>>> In turn, the subsequent
> >>>>
> >>>>           hash ^= (hash >> 16);
> >>>>           hash ^= (hash >> 8);
> >>>>
> >>>> are re-mashing the bits with one another, again, adding no entropy.
> >>
> >> I still wonder about this one. It's attempting to reduce the 32-bit
> >> product to 20 bits, but a second xor with the "middle" 16 bits seems
> >> really strange. Mathematically, wouldn't it be better to just take
> >> the modulus of 2^20? If not, are you expecting some behavior in the
> >> hash values that makes the double-xor approach better (in which case
> >> it should be called out)?
> >>
> >> Tom.
> >
> > The function takes into account creating a symmetric hash, so both
> > active and passive can reconstruct the same flow label results. That's
> > why we multiply the two CM Port values (16 bit * 16 bit). The results
> > is a 32 bit value, and we don't want to lose any of of the MSB bit's
> > by modulus or masking. So we need some folding function from 32 bit to
> > the 20 bit flow label.
> >
> > The specific bit shift is something I took from the bond driver:
> > https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407
> > This proved very good in spreading the flow label in our internal
> > testing. Other alternative can be suggested, as long as it considers
> > all bits in the conversion 32->20 bits.
>
> I'm ok with it, but I still don't fully understand why the folding
> is necessary. The multiplication is the important part, and it is
> the operation that combines the two entropic inputs. The folding just
> flips bits from what's basically the same entropy source.
>
> IOW, I think that
>
>         u32 hash = (DstPort * SrcPort) & IB_GRH_FLOWLABEL_MASK;
>
> would produce a completely equal benefit, mathematically.
> Tom.
>

If both src & dst ports are in the high value range you loss those
hash bits in the masking.
If src & dst port are both 0xE000, your masked hash equals 0. You'll
get the same hash if both ports are equal 0xF000.

The idea with the bit shift is to take the MSB hash bits (left from
the 0XFFFFF mask) and fold them with the LSB in some way.

Alex

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-13 11:03           ` Alex Rosenbaum
@ 2020-02-13 15:26             ` Tom Talpey
  2020-02-13 15:41               ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2020-02-13 15:26 UTC (permalink / raw)
  To: Alex Rosenbaum
  Cc: RDMA mailing list, Jason Gunthorpe, Eran Ben Elisha,
	Yishai Hadas, Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky,
	Mark Zhang

On 2/13/2020 6:03 AM, Alex Rosenbaum wrote:
> On Wed, Feb 12, 2020 at 5:47 PM Tom Talpey <tom@talpey.com> wrote:
>>
>> On 2/8/2020 4:58 AM, Alex Rosenbaum wrote:
>>> On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey <tom@talpey.com> wrote:
>>>>
>>>> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote:
>>>>> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote:
>>>>>>
>>>>>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
>>>>>>> A combination of the flow_label field in the IPv6 header and UDP source port
>>>>>>> field in RoCE v2.0 are used to identify a group of packets that must be
>>>>>>> delivered in order by the network, end-to-end.
>>>>>>> These fields are used to create entropy for network routers (ECMP), load
>>>>>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
>>>>>>> headers.
>>>>>>>
>>>>>>> The flow_label field is defined by a 20 bit hash value. CM based connections
>>>>>>> will use a hash function definition based on the service type (QP Type) and
>>>>>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be
>>>>>>> according to the source and destination QPN values.
>>>>>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
>>>>>>>
>>>>>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will
>>>>>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
>>>>>>> hex: 0xC000-0xFFFF.
>>>>>>>
>>>>>>> The below calculations take into account the importance of producing a symmetric
>>>>>>> hash result so we can support symmetric hash calculation of network elements.
>>>>>>>
>>>>>>> Hash Calculation for RDMA IP CM Service
>>>>>>> =======================================
>>>>>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
>>>>>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
>>>>>>> REQ private data info and Service ID.
>>>>>>>
>>>>>>> Flow label hash function calculations definition will be defined as:
>>>>>>> Extract the following fields from the CM IP REQ:
>>>>>>>       CM_REQ.ServiceID.DstPort [2 Bytes]
>>>>>>>       CM_REQ.PrivateData.SrcPort [2 Bytes]
>>>>>>>       u32 hash = DstPort * SrcPort;
>>>>>>>       hash ^= (hash >> 16);
>>>>>>>       hash ^= (hash >> 8);
>>>>>>>       AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>>>>>
>>>>>>>       #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
>>>>>>
>>>>>> Sorry it took me a while to respond to this, and thanks for looking
>>>>>> into it since my comments on the previous proposal. I have a concern
>>>>>> with an aspect of this one.
>>>>>>
>>>>>> The RoCEv2 destination port is a fixed value, 4791. Therefore the
>>>>>> term
>>>>>>
>>>>>>            u32 hash = DstPort * SrcPort;
>>>>>>
>>>>>> adds no entropy beyond the value of SrcPort.
>>>>>>
>>>>>
>>>>> we're talking about the CM service ports, taken from the
>>>>> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec);
>>>>> these are the CM level port-space and not the RoCE UDP L4 ports.
>>>>> we want to use both as these will allow different client instance and
>>>>> server instance on same nodes will use differen CM ports and hopefully
>>>>> generate different hash results for multi-flows between these two
>>>>> servers.
>>>>
>>>> Aha, ok I guess I missed that, and ok.
>>>>
>>>>>> In turn, the subsequent
>>>>>>
>>>>>>            hash ^= (hash >> 16);
>>>>>>            hash ^= (hash >> 8);
>>>>>>
>>>>>> are re-mashing the bits with one another, again, adding no entropy.
>>>>
>>>> I still wonder about this one. It's attempting to reduce the 32-bit
>>>> product to 20 bits, but a second xor with the "middle" 16 bits seems
>>>> really strange. Mathematically, wouldn't it be better to just take
>>>> the modulus of 2^20? If not, are you expecting some behavior in the
>>>> hash values that makes the double-xor approach better (in which case
>>>> it should be called out)?
>>>>
>>>> Tom.
>>>
>>> The function takes into account creating a symmetric hash, so both
>>> active and passive can reconstruct the same flow label results. That's
>>> why we multiply the two CM Port values (16 bit * 16 bit). The results
>>> is a 32 bit value, and we don't want to lose any of of the MSB bit's
>>> by modulus or masking. So we need some folding function from 32 bit to
>>> the 20 bit flow label.
>>>
>>> The specific bit shift is something I took from the bond driver:
>>> https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407
>>> This proved very good in spreading the flow label in our internal
>>> testing. Other alternative can be suggested, as long as it considers
>>> all bits in the conversion 32->20 bits.
>>
>> I'm ok with it, but I still don't fully understand why the folding
>> is necessary. The multiplication is the important part, and it is
>> the operation that combines the two entropic inputs. The folding just
>> flips bits from what's basically the same entropy source.
>>
>> IOW, I think that
>>
>>          u32 hash = (DstPort * SrcPort) & IB_GRH_FLOWLABEL_MASK;
>>
>> would produce a completely equal benefit, mathematically.
>> Tom.
>>
> 
> If both src & dst ports are in the high value range you loss those
> hash bits in the masking.
> If src & dst port are both 0xE000, your masked hash equals 0. You'll
> get the same hash if both ports are equal 0xF000.

Sure, but this is because it's a 20-bit hash of a 32-bit object. There
will always be collisions, this is just one example. My concern is the
statistical spread of the results. I argue it's not changed by the
proposed bit-folding, possibly even damaged.

> The idea with the bit shift is to take the MSB hash bits (left from
> the 0XFFFFF mask) and fold them with the LSB in some way.

I get that, but it's only folding the "one" bits, and it's doing so in
a rather primitive way. For example, the ">> 8" term is folding the
high 4 of 20 bits twice - once in the >> 16 and again in the >> 8.

This value is only computed once, at QP creation, correct? Why not
compute a CRC-20, for example?

Tom.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-13 15:26             ` Tom Talpey
@ 2020-02-13 15:41               ` Jason Gunthorpe
  2020-02-14 14:23                 ` Mark Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-13 15:41 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky, Mark Zhang

On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:

> > If both src & dst ports are in the high value range you loss those
> > hash bits in the masking.
> > If src & dst port are both 0xE000, your masked hash equals 0. You'll
> > get the same hash if both ports are equal 0xF000.
> 
> Sure, but this is because it's a 20-bit hash of a 32-bit object. There
> will always be collisions, this is just one example. My concern is the
> statistical spread of the results. I argue it's not changed by the
> proposed bit-folding, possibly even damaged.

I've always thought that 'folding' by modulo results in an abnormal
statistical distribution

The point here is not collisions but to have a hash distribution which
is generally uniform for the input space.

Alex, it would be good to make a quick program to measure the
uniformity of the distribution..

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-13 15:41               ` Jason Gunthorpe
@ 2020-02-14 14:23                 ` Mark Zhang
  2020-02-15  6:27                   ` Mark Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Zhang @ 2020-02-14 14:23 UTC (permalink / raw)
  To: Jason Gunthorpe, Tom Talpey
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky

On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
> 
>>> If both src & dst ports are in the high value range you loss those
>>> hash bits in the masking.
>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll
>>> get the same hash if both ports are equal 0xF000.
>>
>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There
>> will always be collisions, this is just one example. My concern is the
>> statistical spread of the results. I argue it's not changed by the
>> proposed bit-folding, possibly even damaged.
> 
> I've always thought that 'folding' by modulo results in an abnormal
> statistical distribution
> 
> The point here is not collisions but to have a hash distribution which
> is generally uniform for the input space.
> 
> Alex, it would be good to make a quick program to measure the
> uniformity of the distribution..
> 

Hi,

I did some tests with a quick program (hope it's not buggy...), seems 
the hash without "folding" has a better distribution than hash with 
fold. The "hash quality" is reflected by the "total_access"[1] below.

I tested only with cma_dport from 18515 (ib_write_bw default) to 18524. 
I can do more tests if required, for example use multiple cma_dport in 
one statistic.


[1] 
https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays

$ ./a

max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; 
Lower is better;
min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; 
Likely min is always 0
total_access: The sum of all 'accesses' (for each slot: 
accesses=n*(n+1)/2); Lower is better
n[X]: How many slots that has X items

cm source port range [32768, 65534], dest port 18515:
Hash with folding:
     flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
     udp_sport: max 10 min 0 total_access 51740  n[1] = 4420  n[2] = 
4670  n[3] = 3112  n[4] = 1433  n[5] = 535   n[6] = 163   n[7] = 31 
n[8] = 5     n[9] = 2     n[10] = 1
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 4 min 0 total_access 48618   n[1] = 532   n[2] = 
7926  n[3] = 530   n[4] = 3698


cm source port range [32768, 65534], dest port 18516:
Hash with folding:
     flow_label: max 3 min 0 total_access 32774  n[1] = 31214 n[2] = 770 
   n[3] = 4
     udp_sport: max 8 min 0 total_access 50808   n[1] = 4406  n[2] = 
4873  n[3] = 3157  n[4] = 1413  n[5] = 509   n[6] = 129   n[7] = 20 
n[8] = 4
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382


cm source port range [32768, 65534], dest port 18517:
Hash with folding:
     flow_label: max 2 min 0 total_access 32766  n[1] = 32250 n[2] = 258
     udp_sport: max 10 min 0 total_access 54916  n[1] = 4536  n[2] = 
4170  n[3] = 2817  n[4] = 1445  n[5] = 622   n[6] = 275   n[7] = 94 
n[8] = 22    n[9] = 5     n[10] = 2
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 3 min 1 total_access 38402   n[1] = 2820  n[2] = 
10746 n[3] = 2818


cm source port range [32768, 65534], dest port 18518:
Hash with folding:
     flow_label: max 2 min 0 total_access 32766  n[1] = 32066 n[2] = 350
     udp_sport: max 8 min 0 total_access 50018   n[1] = 4435  n[2] = 
4970  n[3] = 3294  n[4] = 1376  n[5] = 465   n[6] = 92    n[7] = 16 
n[8] = 2
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382


cm source port range [32768, 65534], dest port 18519:
Hash with folding:
     flow_label: max 3 min 0 total_access 32774  n[1] = 31816 n[2] = 469 
   n[3] = 4
     udp_sport: max 8 min 0 total_access 51462   n[1] = 4414  n[2] = 
4734  n[3] = 3088  n[4] = 1466  n[5] = 508   n[6] = 160   n[7] = 32 
n[8] = 4
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 4 min 0 total_access 45490   n[1] = 3662  n[2] = 
6360  n[3] = 3660  n[4] = 1351


cm source port range [32768, 65534], dest port 18520:
Hash with folding:
     flow_label: max 6 min 0 total_access 34618  n[1] = 20349 n[2] = 
5027  n[3] = 550   n[4] = 164   n[5] = 9     n[6] = 2
     udp_sport: max 13 min 0 total_access 82542  n[1] = 549   n[2] = 
1167  n[3] = 1635  n[4] = 1706  n[5] = 1341  n[6] = 836   n[7] = 483 
n[8] = 223   n[9] = 87    n[10] = 27
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 4 min 0 total_access 65530 
n[3] = 2     n[4] = 8190


cm source port range [32768, 65534], dest port 18521:
Hash with folding:
     flow_label: max 2 min 0 total_access 32766  n[1] = 31924 n[2] = 421
     udp_sport: max 9 min 0 total_access 51864   n[1] = 4505  n[2] = 
4645  n[3] = 3038  n[4] = 1464  n[5] = 542   n[6] = 154   n[7] = 43 
n[8] = 6     n[9] = 2
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 3 min 1 total_access 32810   n[1] = 24    n[2] = 
16338 n[3] = 22


cm source port range [32768, 65534], dest port 18522:
Hash with folding:
     flow_label: max 3 min 0 total_access 32768  n[1] = 32197 n[2] = 283 
   n[3] = 1
     udp_sport: max 9 min 0 total_access 50850   n[1] = 4561  n[2] = 
4756  n[3] = 3187  n[4] = 1452  n[5] = 453   n[6] = 137   n[7] = 29 
n[8] = 2     n[9] = 2
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382


cm source port range [32768, 65534], dest port 18523:
Hash with folding:
     flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
     udp_sport: max 8 min 0 total_access 52208   n[1] = 4426  n[2] = 
4609  n[3] = 3069  n[4] = 1435  n[5] = 533   n[6] = 180   n[7] = 50 
n[8] = 10
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 4 min 0 total_access 46062   n[1] = 3096  n[2] = 
6640  n[3] = 3094  n[4] = 1777


cm source port range [32768, 65534], dest port 18524:
Hash with folding:
     flow_label: max 3 min 0 total_access 32774  n[1] = 31362 n[2] = 696 
   n[3] = 4
     udp_sport: max 8 min 0 total_access 49490   n[1] = 4440  n[2] = 
5148  n[3] = 3240  n[4] = 1413  n[5] = 394   n[6] = 97    n[7] = 14 
n[8] = 1
Hash without folding:
     flow_label: max 1 min 0 total_access 32766  n[1] = 32766
     udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382



> Jason
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-14 14:23                 ` Mark Zhang
@ 2020-02-15  6:27                   ` Mark Zhang
  2020-02-18 14:16                     ` Tom Talpey
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Zhang @ 2020-02-15  6:27 UTC (permalink / raw)
  To: Jason Gunthorpe, Tom Talpey
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky

On 2/14/2020 10:23 PM, Mark Zhang wrote:
> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
>>
>>>> If both src & dst ports are in the high value range you loss those
>>>> hash bits in the masking.
>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll
>>>> get the same hash if both ports are equal 0xF000.
>>>
>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There
>>> will always be collisions, this is just one example. My concern is the
>>> statistical spread of the results. I argue it's not changed by the
>>> proposed bit-folding, possibly even damaged.
>>
>> I've always thought that 'folding' by modulo results in an abnormal
>> statistical distribution
>>
>> The point here is not collisions but to have a hash distribution which
>> is generally uniform for the input space.
>>
>> Alex, it would be good to make a quick program to measure the
>> uniformity of the distribution..
>>
> 
> Hi,
> 
> I did some tests with a quick program (hope it's not buggy...), seems 
> the hash without "folding" has a better distribution than hash with 
> fold. The "hash quality" is reflected by the "total_access"[1] below.
> 
> I tested only with cma_dport from 18515 (ib_write_bw default) to 18524. 
> I can do more tests if required, for example use multiple cma_dport in 
> one statistic.
> 
> 
> [1] 
> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays 
> 
> 
> $ ./a
> 
> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; 
> Lower is better;
> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; 
> Likely min is always 0
> total_access: The sum of all 'accesses' (for each slot: 
> accesses=n*(n+1)/2); Lower is better
> n[X]: How many slots that has X items
> 
> cm source port range [32768, 65534], dest port 18515:
> Hash with folding:
>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
>      udp_sport: max 10 min 0 total_access 51740  n[1] = 4420  n[2] = 
> 4670  n[3] = 3112  n[4] = 1433  n[5] = 535   n[6] = 163   n[7] = 31 n[8] 
> = 5     n[9] = 2     n[10] = 1
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 4 min 0 total_access 48618   n[1] = 532   n[2] = 
> 7926  n[3] = 530   n[4] = 3698
> 
> 
> cm source port range [32768, 65534], dest port 18516:
> Hash with folding:
>      flow_label: max 3 min 0 total_access 32774  n[1] = 31214 n[2] = 770 
>    n[3] = 4
>      udp_sport: max 8 min 0 total_access 50808   n[1] = 4406  n[2] = 
> 4873  n[3] = 3157  n[4] = 1413  n[5] = 509   n[6] = 129   n[7] = 20 n[8] 
> = 4
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382
> 
> 
> cm source port range [32768, 65534], dest port 18517:
> Hash with folding:
>      flow_label: max 2 min 0 total_access 32766  n[1] = 32250 n[2] = 258
>      udp_sport: max 10 min 0 total_access 54916  n[1] = 4536  n[2] = 
> 4170  n[3] = 2817  n[4] = 1445  n[5] = 622   n[6] = 275   n[7] = 94 n[8] 
> = 22    n[9] = 5     n[10] = 2
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 3 min 1 total_access 38402   n[1] = 2820  n[2] = 
> 10746 n[3] = 2818
> 
> 
> cm source port range [32768, 65534], dest port 18518:
> Hash with folding:
>      flow_label: max 2 min 0 total_access 32766  n[1] = 32066 n[2] = 350
>      udp_sport: max 8 min 0 total_access 50018   n[1] = 4435  n[2] = 
> 4970  n[3] = 3294  n[4] = 1376  n[5] = 465   n[6] = 92    n[7] = 16 n[8] 
> = 2
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382
> 
> 
> cm source port range [32768, 65534], dest port 18519:
> Hash with folding:
>      flow_label: max 3 min 0 total_access 32774  n[1] = 31816 n[2] = 469 
>    n[3] = 4
>      udp_sport: max 8 min 0 total_access 51462   n[1] = 4414  n[2] = 
> 4734  n[3] = 3088  n[4] = 1466  n[5] = 508   n[6] = 160   n[7] = 32 n[8] 
> = 4
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 4 min 0 total_access 45490   n[1] = 3662  n[2] = 
> 6360  n[3] = 3660  n[4] = 1351
> 
> 
> cm source port range [32768, 65534], dest port 18520:
> Hash with folding:
>      flow_label: max 6 min 0 total_access 34618  n[1] = 20349 n[2] = 
> 5027  n[3] = 550   n[4] = 164   n[5] = 9     n[6] = 2
>      udp_sport: max 13 min 0 total_access 82542  n[1] = 549   n[2] = 
> 1167  n[3] = 1635  n[4] = 1706  n[5] = 1341  n[6] = 836   n[7] = 483 
> n[8] = 223   n[9] = 87    n[10] = 27
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 4 min 0 total_access 65530 n[3] = 2     n[4] = 8190
> 
> 
> cm source port range [32768, 65534], dest port 18521:
> Hash with folding:
>      flow_label: max 2 min 0 total_access 32766  n[1] = 31924 n[2] = 421
>      udp_sport: max 9 min 0 total_access 51864   n[1] = 4505  n[2] = 
> 4645  n[3] = 3038  n[4] = 1464  n[5] = 542   n[6] = 154   n[7] = 43 n[8] 
> = 6     n[9] = 2
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 3 min 1 total_access 32810   n[1] = 24    n[2] = 
> 16338 n[3] = 22
> 
> 
> cm source port range [32768, 65534], dest port 18522:
> Hash with folding:
>      flow_label: max 3 min 0 total_access 32768  n[1] = 32197 n[2] = 283 
>    n[3] = 1
>      udp_sport: max 9 min 0 total_access 50850   n[1] = 4561  n[2] = 
> 4756  n[3] = 3187  n[4] = 1452  n[5] = 453   n[6] = 137   n[7] = 29 n[8] 
> = 2     n[9] = 2
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382
> 
> 
> cm source port range [32768, 65534], dest port 18523:
> Hash with folding:
>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
>      udp_sport: max 8 min 0 total_access 52208   n[1] = 4426  n[2] = 
> 4609  n[3] = 3069  n[4] = 1435  n[5] = 533   n[6] = 180   n[7] = 50 n[8] 
> = 10
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 4 min 0 total_access 46062   n[1] = 3096  n[2] = 
> 6640  n[3] = 3094  n[4] = 1777
> 
> 
> cm source port range [32768, 65534], dest port 18524:
> Hash with folding:
>      flow_label: max 3 min 0 total_access 32774  n[1] = 31362 n[2] = 696 
>    n[3] = 4
>      udp_sport: max 8 min 0 total_access 49490   n[1] = 4440  n[2] = 
> 5148  n[3] = 3240  n[4] = 1413  n[5] = 394   n[6] = 97    n[7] = 14 n[8] 
> = 1
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 16382
> 
> 

Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, 
0x800, ... 0xFE00), the hash distribution is tens of times worse then 
others. For examples when dport is 18431 and 18432:

cm source port range [32768, 65534], dest port 18431:
Hash with folding:
     flow_label: max 2 min 0 total_access 32766
     udp_sport:  max 8 min 0 total_access 50410
Hash without folding:
     flow_label: max 1 min 0 total_access 32766
     udp_sport:  max 4 min 0 total_access 48126

cm source port range [32768, 65534], dest port 18432(0x4800):
Hash with folding:
     flow_label: max 133 min 0 total_access 1072938 
 

     udp_sport:  max 203 min 0 total_access 2126644 
 

Hash without folding:
     flow_label: max 64 min 0   total_access 1048450 
 

     udp_sport:  max 1024 min 0 total_access 16775170

> 
>> Jason
>>
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-15  6:27                   ` Mark Zhang
@ 2020-02-18 14:16                     ` Tom Talpey
  2020-02-18 17:41                       ` Tom Talpey
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2020-02-18 14:16 UTC (permalink / raw)
  To: Mark Zhang, Jason Gunthorpe
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky

On 2/15/2020 1:27 AM, Mark Zhang wrote:
> On 2/14/2020 10:23 PM, Mark Zhang wrote:
>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
>>>
>>>>> If both src & dst ports are in the high value range you loss those
>>>>> hash bits in the masking.
>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll
>>>>> get the same hash if both ports are equal 0xF000.
>>>>
>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There
>>>> will always be collisions, this is just one example. My concern is the
>>>> statistical spread of the results. I argue it's not changed by the
>>>> proposed bit-folding, possibly even damaged.
>>>
>>> I've always thought that 'folding' by modulo results in an abnormal
>>> statistical distribution
>>>
>>> The point here is not collisions but to have a hash distribution which
>>> is generally uniform for the input space.
>>>
>>> Alex, it would be good to make a quick program to measure the
>>> uniformity of the distribution..
>>>
>>
>> Hi,
>>
>> I did some tests with a quick program (hope it's not buggy...), seems 
>> the hash without "folding" has a better distribution than hash with 
>> fold. The "hash quality" is reflected by the "total_access"[1] below.
>>
>> I tested only with cma_dport from 18515 (ib_write_bw default) to 
>> 18524. I can do more tests if required, for example use multiple 
>> cma_dport in one statistic.
>>
>>
>> [1] 
>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays 
>>
>>
>> $ ./a
>>
>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; 
>> Lower is better;
>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; 
>> Likely min is always 0
>> total_access: The sum of all 'accesses' (for each slot: 
>> accesses=n*(n+1)/2); Lower is better
>> n[X]: How many slots that has X items
>>
>> cm source port range [32768, 65534], dest port 18515:
>> Hash with folding:
>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
>>      udp_sport: max 10 min 0 total_access 51740  n[1] = 4420  n[2] = 
>> 4670  n[3] = 3112  n[4] = 1433  n[5] = 535   n[6] = 163   n[7] = 31 
>> n[8] = 5     n[9] = 2     n[10] = 1
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 4 min 0 total_access 48618   n[1] = 532   n[2] = 
>> 7926  n[3] = 530   n[4] = 3698
>>
>>
>> cm source port range [32768, 65534], dest port 18516:
>> Hash with folding:
>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31214 n[2] = 
>> 770    n[3] = 4
>>      udp_sport: max 8 min 0 total_access 50808   n[1] = 4406  n[2] = 
>> 4873  n[3] = 3157  n[4] = 1413  n[5] = 509   n[6] = 129   n[7] = 20 
>> n[8] = 4
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>> 16382
>>
>>
>> cm source port range [32768, 65534], dest port 18517:
>> Hash with folding:
>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32250 n[2] = 258
>>      udp_sport: max 10 min 0 total_access 54916  n[1] = 4536  n[2] = 
>> 4170  n[3] = 2817  n[4] = 1445  n[5] = 622   n[6] = 275   n[7] = 94 
>> n[8] = 22    n[9] = 5     n[10] = 2
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 3 min 1 total_access 38402   n[1] = 2820  n[2] = 
>> 10746 n[3] = 2818
>>
>>
>> cm source port range [32768, 65534], dest port 18518:
>> Hash with folding:
>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32066 n[2] = 350
>>      udp_sport: max 8 min 0 total_access 50018   n[1] = 4435  n[2] = 
>> 4970  n[3] = 3294  n[4] = 1376  n[5] = 465   n[6] = 92    n[7] = 16 
>> n[8] = 2
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>> 16382
>>
>>
>> cm source port range [32768, 65534], dest port 18519:
>> Hash with folding:
>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31816 n[2] = 
>> 469    n[3] = 4
>>      udp_sport: max 8 min 0 total_access 51462   n[1] = 4414  n[2] = 
>> 4734  n[3] = 3088  n[4] = 1466  n[5] = 508   n[6] = 160   n[7] = 32 
>> n[8] = 4
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 4 min 0 total_access 45490   n[1] = 3662  n[2] = 
>> 6360  n[3] = 3660  n[4] = 1351
>>
>>
>> cm source port range [32768, 65534], dest port 18520:
>> Hash with folding:
>>      flow_label: max 6 min 0 total_access 34618  n[1] = 20349 n[2] = 
>> 5027  n[3] = 550   n[4] = 164   n[5] = 9     n[6] = 2
>>      udp_sport: max 13 min 0 total_access 82542  n[1] = 549   n[2] = 
>> 1167  n[3] = 1635  n[4] = 1706  n[5] = 1341  n[6] = 836   n[7] = 483 
>> n[8] = 223   n[9] = 87    n[10] = 27
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 4 min 0 total_access 65530 n[3] = 2     n[4] = 8190
>>
>>
>> cm source port range [32768, 65534], dest port 18521:
>> Hash with folding:
>>      flow_label: max 2 min 0 total_access 32766  n[1] = 31924 n[2] = 421
>>      udp_sport: max 9 min 0 total_access 51864   n[1] = 4505  n[2] = 
>> 4645  n[3] = 3038  n[4] = 1464  n[5] = 542   n[6] = 154   n[7] = 43 
>> n[8] = 6     n[9] = 2
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 3 min 1 total_access 32810   n[1] = 24    n[2] = 
>> 16338 n[3] = 22
>>
>>
>> cm source port range [32768, 65534], dest port 18522:
>> Hash with folding:
>>      flow_label: max 3 min 0 total_access 32768  n[1] = 32197 n[2] = 
>> 283    n[3] = 1
>>      udp_sport: max 9 min 0 total_access 50850   n[1] = 4561  n[2] = 
>> 4756  n[3] = 3187  n[4] = 1452  n[5] = 453   n[6] = 137   n[7] = 29 
>> n[8] = 2     n[9] = 2
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>> 16382
>>
>>
>> cm source port range [32768, 65534], dest port 18523:
>> Hash with folding:
>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
>>      udp_sport: max 8 min 0 total_access 52208   n[1] = 4426  n[2] = 
>> 4609  n[3] = 3069  n[4] = 1435  n[5] = 533   n[6] = 180   n[7] = 50 
>> n[8] = 10
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 4 min 0 total_access 46062   n[1] = 3096  n[2] = 
>> 6640  n[3] = 3094  n[4] = 1777
>>
>>
>> cm source port range [32768, 65534], dest port 18524:
>> Hash with folding:
>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31362 n[2] = 
>> 696    n[3] = 4
>>      udp_sport: max 8 min 0 total_access 49490   n[1] = 4440  n[2] = 
>> 5148  n[3] = 3240  n[4] = 1413  n[5] = 394   n[6] = 97    n[7] = 14 
>> n[8] = 1
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>> 16382
>>
>>
> 
> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, 
> 0x800, ... 0xFE00), the hash distribution is tens of times worse then 
> others. For examples when dport is 18431 and 18432:
> 
> cm source port range [32768, 65534], dest port 18431:
> Hash with folding:
>      flow_label: max 2 min 0 total_access 32766
>      udp_sport:  max 8 min 0 total_access 50410
> Hash without folding:
>      flow_label: max 1 min 0 total_access 32766
>      udp_sport:  max 4 min 0 total_access 48126
> 
> cm source port range [32768, 65534], dest port 18432(0x4800):
> Hash with folding:
>      flow_label: max 133 min 0 total_access 1072938
> 
>      udp_sport:  max 203 min 0 total_access 2126644
> 
> Hash without folding:
>      flow_label: max 64 min 0   total_access 1048450
> 
>      udp_sport:  max 1024 min 0 total_access 16775170

Good data! It certainly indicates an issue with the simple
binary modulus for treuncating 32->20 bits. But the extremely
narrow testing range limits the conclusions considerably:

 >> I tested only with cma_dport from 18515 (ib_write_bw default) to
 >> 18524. I can do more tests if required, for example use multiple
 >> cma_dport in one statistic.

This hash is intended to provide entropy across the entire port
range and we should evaluate it as such. At a minimum, the source
port can vary much more widely, from Alex's original message it's
0xC000 - 0xFFFF.

> UDP source port selection must adhere IANA port allocation ranges. Thus we will
> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
> hex: 0xC000-0xFFFF.

I'm not certain what the range of the destination port might be, but
as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF.

Any chance you could scale up your test, to measure the original
proposed hash across these broader ranges?

>   u32 hash = DstPort * SrcPort;
>   hash ^= (hash >> 16);
>   hash ^= (hash >> 8);
>   AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;

Tom.






^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-18 14:16                     ` Tom Talpey
@ 2020-02-18 17:41                       ` Tom Talpey
  2020-02-19  1:51                         ` Mark Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2020-02-18 17:41 UTC (permalink / raw)
  To: Mark Zhang, Jason Gunthorpe
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex @ Mellanox, Maor Gottlieb, Leon Romanovsky

[-- Attachment #1: Type: text/plain, Size: 12014 bytes --]

On 2/18/2020 9:16 AM, Tom Talpey wrote:
> On 2/15/2020 1:27 AM, Mark Zhang wrote:
>> On 2/14/2020 10:23 PM, Mark Zhang wrote:
>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
>>>>
>>>>>> If both src & dst ports are in the high value range you loss those
>>>>>> hash bits in the masking.
>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll
>>>>>> get the same hash if both ports are equal 0xF000.
>>>>>
>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There
>>>>> will always be collisions, this is just one example. My concern is the
>>>>> statistical spread of the results. I argue it's not changed by the
>>>>> proposed bit-folding, possibly even damaged.
>>>>
>>>> I've always thought that 'folding' by modulo results in an abnormal
>>>> statistical distribution
>>>>
>>>> The point here is not collisions but to have a hash distribution which
>>>> is generally uniform for the input space.
>>>>
>>>> Alex, it would be good to make a quick program to measure the
>>>> uniformity of the distribution..
>>>>
>>>
>>> Hi,
>>>
>>> I did some tests with a quick program (hope it's not buggy...), seems 
>>> the hash without "folding" has a better distribution than hash with 
>>> fold. The "hash quality" is reflected by the "total_access"[1] below.
>>>
>>> I tested only with cma_dport from 18515 (ib_write_bw default) to 
>>> 18524. I can do more tests if required, for example use multiple 
>>> cma_dport in one statistic.
>>>
>>>
>>> [1] 
>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays 
>>>
>>>
>>> $ ./a
>>>
>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; 
>>> Lower is better;
>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; 
>>> Likely min is always 0
>>> total_access: The sum of all 'accesses' (for each slot: 
>>> accesses=n*(n+1)/2); Lower is better
>>> n[X]: How many slots that has X items
>>>
>>> cm source port range [32768, 65534], dest port 18515:
>>> Hash with folding:
>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
>>>      udp_sport: max 10 min 0 total_access 51740  n[1] = 4420  n[2] = 
>>> 4670  n[3] = 3112  n[4] = 1433  n[5] = 535   n[6] = 163   n[7] = 31 
>>> n[8] = 5     n[9] = 2     n[10] = 1
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 4 min 0 total_access 48618   n[1] = 532   n[2] = 
>>> 7926  n[3] = 530   n[4] = 3698
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18516:
>>> Hash with folding:
>>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31214 n[2] = 
>>> 770    n[3] = 4
>>>      udp_sport: max 8 min 0 total_access 50808   n[1] = 4406  n[2] = 
>>> 4873  n[3] = 3157  n[4] = 1413  n[5] = 509   n[6] = 129   n[7] = 20 
>>> n[8] = 4
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>> 16382
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18517:
>>> Hash with folding:
>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32250 n[2] = 258
>>>      udp_sport: max 10 min 0 total_access 54916  n[1] = 4536  n[2] = 
>>> 4170  n[3] = 2817  n[4] = 1445  n[5] = 622   n[6] = 275   n[7] = 94 
>>> n[8] = 22    n[9] = 5     n[10] = 2
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 3 min 1 total_access 38402   n[1] = 2820  n[2] = 
>>> 10746 n[3] = 2818
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18518:
>>> Hash with folding:
>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32066 n[2] = 350
>>>      udp_sport: max 8 min 0 total_access 50018   n[1] = 4435  n[2] = 
>>> 4970  n[3] = 3294  n[4] = 1376  n[5] = 465   n[6] = 92    n[7] = 16 
>>> n[8] = 2
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>> 16382
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18519:
>>> Hash with folding:
>>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31816 n[2] = 
>>> 469    n[3] = 4
>>>      udp_sport: max 8 min 0 total_access 51462   n[1] = 4414  n[2] = 
>>> 4734  n[3] = 3088  n[4] = 1466  n[5] = 508   n[6] = 160   n[7] = 32 
>>> n[8] = 4
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 4 min 0 total_access 45490   n[1] = 3662  n[2] = 
>>> 6360  n[3] = 3660  n[4] = 1351
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18520:
>>> Hash with folding:
>>>      flow_label: max 6 min 0 total_access 34618  n[1] = 20349 n[2] = 
>>> 5027  n[3] = 550   n[4] = 164   n[5] = 9     n[6] = 2
>>>      udp_sport: max 13 min 0 total_access 82542  n[1] = 549   n[2] = 
>>> 1167  n[3] = 1635  n[4] = 1706  n[5] = 1341  n[6] = 836   n[7] = 483 
>>> n[8] = 223   n[9] = 87    n[10] = 27
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 4 min 0 total_access 65530 n[3] = 2     n[4] = 8190
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18521:
>>> Hash with folding:
>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 31924 n[2] = 421
>>>      udp_sport: max 9 min 0 total_access 51864   n[1] = 4505  n[2] = 
>>> 4645  n[3] = 3038  n[4] = 1464  n[5] = 542   n[6] = 154   n[7] = 43 
>>> n[8] = 6     n[9] = 2
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 3 min 1 total_access 32810   n[1] = 24    n[2] = 
>>> 16338 n[3] = 22
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18522:
>>> Hash with folding:
>>>      flow_label: max 3 min 0 total_access 32768  n[1] = 32197 n[2] = 
>>> 283    n[3] = 1
>>>      udp_sport: max 9 min 0 total_access 50850   n[1] = 4561  n[2] = 
>>> 4756  n[3] = 3187  n[4] = 1452  n[5] = 453   n[6] = 137   n[7] = 29 
>>> n[8] = 2     n[9] = 2
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>> 16382
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18523:
>>> Hash with folding:
>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 126
>>>      udp_sport: max 8 min 0 total_access 52208   n[1] = 4426  n[2] = 
>>> 4609  n[3] = 3069  n[4] = 1435  n[5] = 533   n[6] = 180   n[7] = 50 
>>> n[8] = 10
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 4 min 0 total_access 46062   n[1] = 3096  n[2] = 
>>> 6640  n[3] = 3094  n[4] = 1777
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18524:
>>> Hash with folding:
>>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31362 n[2] = 
>>> 696    n[3] = 4
>>>      udp_sport: max 8 min 0 total_access 49490   n[1] = 4440  n[2] = 
>>> 5148  n[3] = 3240  n[4] = 1413  n[5] = 394   n[6] = 97    n[7] = 14 
>>> n[8] = 1
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>> 16382
>>>
>>>
>>
>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, 
>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then 
>> others. For examples when dport is 18431 and 18432:
>>
>> cm source port range [32768, 65534], dest port 18431:
>> Hash with folding:
>>      flow_label: max 2 min 0 total_access 32766
>>      udp_sport:  max 8 min 0 total_access 50410
>> Hash without folding:
>>      flow_label: max 1 min 0 total_access 32766
>>      udp_sport:  max 4 min 0 total_access 48126
>>
>> cm source port range [32768, 65534], dest port 18432(0x4800):
>> Hash with folding:
>>      flow_label: max 133 min 0 total_access 1072938
>>
>>      udp_sport:  max 203 min 0 total_access 2126644
>>
>> Hash without folding:
>>      flow_label: max 64 min 0   total_access 1048450
>>
>>      udp_sport:  max 1024 min 0 total_access 16775170
> 
> Good data! It certainly indicates an issue with the simple
> binary modulus for treuncating 32->20 bits. But the extremely
> narrow testing range limits the conclusions considerably:
> 
>  >> I tested only with cma_dport from 18515 (ib_write_bw default) to
>  >> 18524. I can do more tests if required, for example use multiple
>  >> cma_dport in one statistic.
> 
> This hash is intended to provide entropy across the entire port
> range and we should evaluate it as such. At a minimum, the source
> port can vary much more widely, from Alex's original message it's
> 0xC000 - 0xFFFF.
> 
>> UDP source port selection must adhere IANA port allocation ranges. 
>> Thus we will
>> be using IANA recommendation for Ephemeral port range of: 49152-65535, 
>> or in
>> hex: 0xC000-0xFFFF.
> 
> I'm not certain what the range of the destination port might be, but
> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF.
> 
> Any chance you could scale up your test, to measure the original
> proposed hash across these broader ranges?
> 
>>   u32 hash = DstPort * SrcPort;
>>   hash ^= (hash >> 16);
>>   hash ^= (hash >> 8);
>>   AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;

I did an even quicker-and-dirtier test, with the attached. Both
the folding and non-folding methods display, to me, pretty much
the same behavior. And there's a fairly significant periodicity
with a doubling of the hash collision rate, every 8 or so buckets.

The "folding" version has higher spikes at these points than the
non-folding, in fact. As you mentioned, there are a few more "zero"
hashes, but that's expected, and not that different for both.

Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there
are 800M possible permutations, and of course 1M hash buckets. So,
an 800:1 collision rate is expected. But the numbers range from
the mid-300's to several-1000's. That variance seems high to me.

I really think there needs to be a flatter spectrum, here. These
collisions can cause significant congestion effects at scale. I
suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going
to take me a little time to find that.


> Folding hash
> bucket  hits
> 0       3840
> 1       407
> 2       798
> 3       426
> 4       1137
> 5       409
> 6       711
> 7       372
> 8       1595
> 9       349
> 10      751
> 11      385
> 12      1164
> 13      375
> 14      747
> 15      406
> 16      1952
> 17      382
> 18      766
> 19      390
> 20      1139
> 21      372
> 22      792
> 23      419
> 24      1543
> 25      393
> 26      777
> 27      403
> 28      1123
> 29      356
> 30      773
> 31      363
> 32      2340
> 33      397
> 34      785
> 35      393
> 36      1154
> 37      415
> 38      744

Versus...

> Non-folding hash
> bucket  hits
> 0       4469
> 1       480
> 2       684
> 3       567
> 4       990
> 5       465
> 6       697
> 7       650
> 8       1279
> 9       453
> 10      671
> 11      556
> 12      989
> 13      499
> 14      653
> 15      812
> 16      1603
> 17      478
> 18      694
> 19      559
> 20      1015
> 21      506
> 22      675
> 23      659
> 24      1317
> 25      476
> 26      644
> 27      555
> 28      953
> 29      475
> 30      738
> 31      927
> 32      2047
> 33      456
> 34      726
> 35      537
> 36      952
> 37      472
> 38      665

Tom.

[-- Attachment #2: hashtest.c --]
[-- Type: text/plain, Size: 755 bytes --]

#include <stdio.h>

int data[1024 * 1024];

int main(int argc, char **argv)
{
        unsigned short src, dst;
        unsigned long hash;
        printf("%s hash\nbucket\thits\n", argc > 1 ? "Non-folding" : "Folding");
        for (src = 1; src < 0xBFFF; src++)
                for (dst = 0xC000; dst <= 0xFFFE; dst++) {
                        hash = src * dst;
                        if (argc > 1) {
                                hash ^= hash >> 16;
                                hash ^= hash >> 8;
                        }
                        hash &= 0xFFFFF;
                        data[hash]++;
                }
        int i;
        for (i = 0; i < 1024 * 1024; i++)
                printf("%d\t%d\n", i, data[i]);
        return 0;
}

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-18 17:41                       ` Tom Talpey
@ 2020-02-19  1:51                         ` Mark Zhang
  2020-02-19  2:01                           ` Tom Talpey
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Zhang @ 2020-02-19  1:51 UTC (permalink / raw)
  To: Tom Talpey, Jason Gunthorpe
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On 2/19/2020 1:41 AM, Tom Talpey wrote:
> On 2/18/2020 9:16 AM, Tom Talpey wrote:
>> On 2/15/2020 1:27 AM, Mark Zhang wrote:
>>> On 2/14/2020 10:23 PM, Mark Zhang wrote:
>>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
>>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
>>>>>
>>>>>>> If both src & dst ports are in the high value range you loss those
>>>>>>> hash bits in the masking.
>>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll
>>>>>>> get the same hash if both ports are equal 0xF000.
>>>>>>
>>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. 
>>>>>> There
>>>>>> will always be collisions, this is just one example. My concern is 
>>>>>> the
>>>>>> statistical spread of the results. I argue it's not changed by the
>>>>>> proposed bit-folding, possibly even damaged.
>>>>>
>>>>> I've always thought that 'folding' by modulo results in an abnormal
>>>>> statistical distribution
>>>>>
>>>>> The point here is not collisions but to have a hash distribution which
>>>>> is generally uniform for the input space.
>>>>>
>>>>> Alex, it would be good to make a quick program to measure the
>>>>> uniformity of the distribution..
>>>>>
>>>>
>>>> Hi,
>>>>
>>>> I did some tests with a quick program (hope it's not buggy...), 
>>>> seems the hash without "folding" has a better distribution than hash 
>>>> with fold. The "hash quality" is reflected by the "total_access"[1] 
>>>> below.
>>>>
>>>> I tested only with cma_dport from 18515 (ib_write_bw default) to 
>>>> 18524. I can do more tests if required, for example use multiple 
>>>> cma_dport in one statistic.
>>>>
>>>>
>>>> [1] 
>>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays 
>>>>
>>>>
>>>> $ ./a
>>>>
>>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])'; 
>>>> Lower is better;
>>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])'; 
>>>> Likely min is always 0
>>>> total_access: The sum of all 'accesses' (for each slot: 
>>>> accesses=n*(n+1)/2); Lower is better
>>>> n[X]: How many slots that has X items
>>>>
>>>> cm source port range [32768, 65534], dest port 18515:
>>>> Hash with folding:
>>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 
>>>> 126
>>>>      udp_sport: max 10 min 0 total_access 51740  n[1] = 4420  n[2] = 
>>>> 4670  n[3] = 3112  n[4] = 1433  n[5] = 535   n[6] = 163   n[7] = 31 
>>>> n[8] = 5     n[9] = 2     n[10] = 1
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 4 min 0 total_access 48618   n[1] = 532   n[2] = 
>>>> 7926  n[3] = 530   n[4] = 3698
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18516:
>>>> Hash with folding:
>>>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31214 n[2] = 
>>>> 770    n[3] = 4
>>>>      udp_sport: max 8 min 0 total_access 50808   n[1] = 4406  n[2] = 
>>>> 4873  n[3] = 3157  n[4] = 1413  n[5] = 509   n[6] = 129   n[7] = 20 
>>>> n[8] = 4
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>>> 16382
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18517:
>>>> Hash with folding:
>>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32250 n[2] = 
>>>> 258
>>>>      udp_sport: max 10 min 0 total_access 54916  n[1] = 4536  n[2] = 
>>>> 4170  n[3] = 2817  n[4] = 1445  n[5] = 622   n[6] = 275   n[7] = 94 
>>>> n[8] = 22    n[9] = 5     n[10] = 2
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 3 min 1 total_access 38402   n[1] = 2820  n[2] = 
>>>> 10746 n[3] = 2818
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18518:
>>>> Hash with folding:
>>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32066 n[2] = 
>>>> 350
>>>>      udp_sport: max 8 min 0 total_access 50018   n[1] = 4435  n[2] = 
>>>> 4970  n[3] = 3294  n[4] = 1376  n[5] = 465   n[6] = 92    n[7] = 16 
>>>> n[8] = 2
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>>> 16382
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18519:
>>>> Hash with folding:
>>>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31816 n[2] = 
>>>> 469    n[3] = 4
>>>>      udp_sport: max 8 min 0 total_access 51462   n[1] = 4414  n[2] = 
>>>> 4734  n[3] = 3088  n[4] = 1466  n[5] = 508   n[6] = 160   n[7] = 32 
>>>> n[8] = 4
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 4 min 0 total_access 45490   n[1] = 3662  n[2] = 
>>>> 6360  n[3] = 3660  n[4] = 1351
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18520:
>>>> Hash with folding:
>>>>      flow_label: max 6 min 0 total_access 34618  n[1] = 20349 n[2] = 
>>>> 5027  n[3] = 550   n[4] = 164   n[5] = 9     n[6] = 2
>>>>      udp_sport: max 13 min 0 total_access 82542  n[1] = 549   n[2] = 
>>>> 1167  n[3] = 1635  n[4] = 1706  n[5] = 1341  n[6] = 836   n[7] = 483 
>>>> n[8] = 223   n[9] = 87    n[10] = 27
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 4 min 0 total_access 65530 n[3] = 2     n[4] = 8190
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18521:
>>>> Hash with folding:
>>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 31924 n[2] = 
>>>> 421
>>>>      udp_sport: max 9 min 0 total_access 51864   n[1] = 4505  n[2] = 
>>>> 4645  n[3] = 3038  n[4] = 1464  n[5] = 542   n[6] = 154   n[7] = 43 
>>>> n[8] = 6     n[9] = 2
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 3 min 1 total_access 32810   n[1] = 24    n[2] = 
>>>> 16338 n[3] = 22
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18522:
>>>> Hash with folding:
>>>>      flow_label: max 3 min 0 total_access 32768  n[1] = 32197 n[2] = 
>>>> 283    n[3] = 1
>>>>      udp_sport: max 9 min 0 total_access 50850   n[1] = 4561  n[2] = 
>>>> 4756  n[3] = 3187  n[4] = 1452  n[5] = 453   n[6] = 137   n[7] = 29 
>>>> n[8] = 2     n[9] = 2
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>>> 16382
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18523:
>>>> Hash with folding:
>>>>      flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] = 
>>>> 126
>>>>      udp_sport: max 8 min 0 total_access 52208   n[1] = 4426  n[2] = 
>>>> 4609  n[3] = 3069  n[4] = 1435  n[5] = 533   n[6] = 180   n[7] = 50 
>>>> n[8] = 10
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 4 min 0 total_access 46062   n[1] = 3096  n[2] = 
>>>> 6640  n[3] = 3094  n[4] = 1777
>>>>
>>>>
>>>> cm source port range [32768, 65534], dest port 18524:
>>>> Hash with folding:
>>>>      flow_label: max 3 min 0 total_access 32774  n[1] = 31362 n[2] = 
>>>> 696    n[3] = 4
>>>>      udp_sport: max 8 min 0 total_access 49490   n[1] = 4440  n[2] = 
>>>> 5148  n[3] = 3240  n[4] = 1413  n[5] = 394   n[6] = 97    n[7] = 14 
>>>> n[8] = 1
>>>> Hash without folding:
>>>>      flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>      udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] = 
>>>> 16382
>>>>
>>>>
>>>
>>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600, 
>>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then 
>>> others. For examples when dport is 18431 and 18432:
>>>
>>> cm source port range [32768, 65534], dest port 18431:
>>> Hash with folding:
>>>      flow_label: max 2 min 0 total_access 32766
>>>      udp_sport:  max 8 min 0 total_access 50410
>>> Hash without folding:
>>>      flow_label: max 1 min 0 total_access 32766
>>>      udp_sport:  max 4 min 0 total_access 48126
>>>
>>> cm source port range [32768, 65534], dest port 18432(0x4800):
>>> Hash with folding:
>>>      flow_label: max 133 min 0 total_access 1072938
>>>
>>>      udp_sport:  max 203 min 0 total_access 2126644
>>>
>>> Hash without folding:
>>>      flow_label: max 64 min 0   total_access 1048450
>>>
>>>      udp_sport:  max 1024 min 0 total_access 16775170
>>
>> Good data! It certainly indicates an issue with the simple
>> binary modulus for treuncating 32->20 bits. But the extremely
>> narrow testing range limits the conclusions considerably:
>>
>>  >> I tested only with cma_dport from 18515 (ib_write_bw default) to
>>  >> 18524. I can do more tests if required, for example use multiple
>>  >> cma_dport in one statistic.
>>
>> This hash is intended to provide entropy across the entire port
>> range and we should evaluate it as such. At a minimum, the source
>> port can vary much more widely, from Alex's original message it's
>> 0xC000 - 0xFFFF.
>>
>>> UDP source port selection must adhere IANA port allocation ranges. 
>>> Thus we will
>>> be using IANA recommendation for Ephemeral port range of: 
>>> 49152-65535, or in
>>> hex: 0xC000-0xFFFF.
>>
>> I'm not certain what the range of the destination port might be, but
>> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF.
>>
>> Any chance you could scale up your test, to measure the original
>> proposed hash across these broader ranges?
>>
>>>   u32 hash = DstPort * SrcPort;
>>>   hash ^= (hash >> 16);
>>>   hash ^= (hash >> 8);
>>>   AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
> 
> I did an even quicker-and-dirtier test, with the attached. Both
> the folding and non-folding methods display, to me, pretty much
> the same behavior. And there's a fairly significant periodicity
> with a doubling of the hash collision rate, every 8 or so buckets.
> 
> The "folding" version has higher spikes at these points than the
> non-folding, in fact. As you mentioned, there are a few more "zero"
> hashes, but that's expected, and not that different for both.
> 
> Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there
> are 800M possible permutations, and of course 1M hash buckets. So,
> an 800:1 collision rate is expected. But the numbers range from
> the mid-300's to several-1000's. That variance seems high to me.
> 
> I really think there needs to be a flatter spectrum, here. These
> collisions can cause significant congestion effects at scale. I
> suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going
> to take me a little time to find that.
> 

I did tests with range cma_sport [0xC000, 0xFFFF] and cma_dport [1025, 
0xFFFF] (but each test with one dport), and found:

1. The folding and non-folding results are similar;
2. When dport is multiple of 0x200 the result is very bad. I also tested
    with your hashtest.c, there are much more "zero" hashes when sport or
    dport is multiple of 0x200.

For the hash one of the original goal is symmetry, i.e.:
     f(sport, dport) = f(dport, sport)

If that's not important I feel "sport * 31 + dport" [1] has a better result.

[1] https://www.strchr.com/hash_functions

> 
> Tom.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-19  1:51                         ` Mark Zhang
@ 2020-02-19  2:01                           ` Tom Talpey
  2020-02-19  2:06                             ` Mark Zhang
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2020-02-19  2:01 UTC (permalink / raw)
  To: Mark Zhang, Jason Gunthorpe
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On 2/18/2020 8:51 PM, Mark Zhang wrote:
> On 2/19/2020 1:41 AM, Tom Talpey wrote:
>> On 2/18/2020 9:16 AM, Tom Talpey wrote:
>>> On 2/15/2020 1:27 AM, Mark Zhang wrote:
>>>> On 2/14/2020 10:23 PM, Mark Zhang wrote:
>>>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
>>>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
>>>>>>
>>>>>>>> If both src & dst ports are in the high value range you loss those
>>>>>>>> hash bits in the masking.
>>>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll
>>>>>>>> get the same hash if both ports are equal 0xF000.
>>>>>>>
>>>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object.
>>>>>>> There
>>>>>>> will always be collisions, this is just one example. My concern is
>>>>>>> the
>>>>>>> statistical spread of the results. I argue it's not changed by the
>>>>>>> proposed bit-folding, possibly even damaged.
>>>>>>
>>>>>> I've always thought that 'folding' by modulo results in an abnormal
>>>>>> statistical distribution
>>>>>>
>>>>>> The point here is not collisions but to have a hash distribution which
>>>>>> is generally uniform for the input space.
>>>>>>
>>>>>> Alex, it would be good to make a quick program to measure the
>>>>>> uniformity of the distribution..
>>>>>>
>>>>>
>>>>> Hi,
>>>>>
>>>>> I did some tests with a quick program (hope it's not buggy...),
>>>>> seems the hash without "folding" has a better distribution than hash
>>>>> with fold. The "hash quality" is reflected by the "total_access"[1]
>>>>> below.
>>>>>
>>>>> I tested only with cma_dport from 18515 (ib_write_bw default) to
>>>>> 18524. I can do more tests if required, for example use multiple
>>>>> cma_dport in one statistic.
>>>>>
>>>>>
>>>>> [1]
>>>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays
>>>>>
>>>>>
>>>>> $ ./a
>>>>>
>>>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])';
>>>>> Lower is better;
>>>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])';
>>>>> Likely min is always 0
>>>>> total_access: The sum of all 'accesses' (for each slot:
>>>>> accesses=n*(n+1)/2); Lower is better
>>>>> n[X]: How many slots that has X items
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18515:
>>>>> Hash with folding:
>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] =
>>>>> 126
>>>>>       udp_sport: max 10 min 0 total_access 51740  n[1] = 4420  n[2] =
>>>>> 4670  n[3] = 3112  n[4] = 1433  n[5] = 535   n[6] = 163   n[7] = 31
>>>>> n[8] = 5     n[9] = 2     n[10] = 1
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 4 min 0 total_access 48618   n[1] = 532   n[2] =
>>>>> 7926  n[3] = 530   n[4] = 3698
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18516:
>>>>> Hash with folding:
>>>>>       flow_label: max 3 min 0 total_access 32774  n[1] = 31214 n[2] =
>>>>> 770    n[3] = 4
>>>>>       udp_sport: max 8 min 0 total_access 50808   n[1] = 4406  n[2] =
>>>>> 4873  n[3] = 3157  n[4] = 1413  n[5] = 509   n[6] = 129   n[7] = 20
>>>>> n[8] = 4
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>> 16382
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18517:
>>>>> Hash with folding:
>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32250 n[2] =
>>>>> 258
>>>>>       udp_sport: max 10 min 0 total_access 54916  n[1] = 4536  n[2] =
>>>>> 4170  n[3] = 2817  n[4] = 1445  n[5] = 622   n[6] = 275   n[7] = 94
>>>>> n[8] = 22    n[9] = 5     n[10] = 2
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 3 min 1 total_access 38402   n[1] = 2820  n[2] =
>>>>> 10746 n[3] = 2818
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18518:
>>>>> Hash with folding:
>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32066 n[2] =
>>>>> 350
>>>>>       udp_sport: max 8 min 0 total_access 50018   n[1] = 4435  n[2] =
>>>>> 4970  n[3] = 3294  n[4] = 1376  n[5] = 465   n[6] = 92    n[7] = 16
>>>>> n[8] = 2
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>> 16382
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18519:
>>>>> Hash with folding:
>>>>>       flow_label: max 3 min 0 total_access 32774  n[1] = 31816 n[2] =
>>>>> 469    n[3] = 4
>>>>>       udp_sport: max 8 min 0 total_access 51462   n[1] = 4414  n[2] =
>>>>> 4734  n[3] = 3088  n[4] = 1466  n[5] = 508   n[6] = 160   n[7] = 32
>>>>> n[8] = 4
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 4 min 0 total_access 45490   n[1] = 3662  n[2] =
>>>>> 6360  n[3] = 3660  n[4] = 1351
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18520:
>>>>> Hash with folding:
>>>>>       flow_label: max 6 min 0 total_access 34618  n[1] = 20349 n[2] =
>>>>> 5027  n[3] = 550   n[4] = 164   n[5] = 9     n[6] = 2
>>>>>       udp_sport: max 13 min 0 total_access 82542  n[1] = 549   n[2] =
>>>>> 1167  n[3] = 1635  n[4] = 1706  n[5] = 1341  n[6] = 836   n[7] = 483
>>>>> n[8] = 223   n[9] = 87    n[10] = 27
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 4 min 0 total_access 65530 n[3] = 2     n[4] = 8190
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18521:
>>>>> Hash with folding:
>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 31924 n[2] =
>>>>> 421
>>>>>       udp_sport: max 9 min 0 total_access 51864   n[1] = 4505  n[2] =
>>>>> 4645  n[3] = 3038  n[4] = 1464  n[5] = 542   n[6] = 154   n[7] = 43
>>>>> n[8] = 6     n[9] = 2
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 3 min 1 total_access 32810   n[1] = 24    n[2] =
>>>>> 16338 n[3] = 22
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18522:
>>>>> Hash with folding:
>>>>>       flow_label: max 3 min 0 total_access 32768  n[1] = 32197 n[2] =
>>>>> 283    n[3] = 1
>>>>>       udp_sport: max 9 min 0 total_access 50850   n[1] = 4561  n[2] =
>>>>> 4756  n[3] = 3187  n[4] = 1452  n[5] = 453   n[6] = 137   n[7] = 29
>>>>> n[8] = 2     n[9] = 2
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>> 16382
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18523:
>>>>> Hash with folding:
>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] =
>>>>> 126
>>>>>       udp_sport: max 8 min 0 total_access 52208   n[1] = 4426  n[2] =
>>>>> 4609  n[3] = 3069  n[4] = 1435  n[5] = 533   n[6] = 180   n[7] = 50
>>>>> n[8] = 10
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 4 min 0 total_access 46062   n[1] = 3096  n[2] =
>>>>> 6640  n[3] = 3094  n[4] = 1777
>>>>>
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18524:
>>>>> Hash with folding:
>>>>>       flow_label: max 3 min 0 total_access 32774  n[1] = 31362 n[2] =
>>>>> 696    n[3] = 4
>>>>>       udp_sport: max 8 min 0 total_access 49490   n[1] = 4440  n[2] =
>>>>> 5148  n[3] = 3240  n[4] = 1413  n[5] = 394   n[6] = 97    n[7] = 14
>>>>> n[8] = 1
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>> 16382
>>>>>
>>>>>
>>>>
>>>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600,
>>>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then
>>>> others. For examples when dport is 18431 and 18432:
>>>>
>>>> cm source port range [32768, 65534], dest port 18431:
>>>> Hash with folding:
>>>>       flow_label: max 2 min 0 total_access 32766
>>>>       udp_sport:  max 8 min 0 total_access 50410
>>>> Hash without folding:
>>>>       flow_label: max 1 min 0 total_access 32766
>>>>       udp_sport:  max 4 min 0 total_access 48126
>>>>
>>>> cm source port range [32768, 65534], dest port 18432(0x4800):
>>>> Hash with folding:
>>>>       flow_label: max 133 min 0 total_access 1072938
>>>>
>>>>       udp_sport:  max 203 min 0 total_access 2126644
>>>>
>>>> Hash without folding:
>>>>       flow_label: max 64 min 0   total_access 1048450
>>>>
>>>>       udp_sport:  max 1024 min 0 total_access 16775170
>>>
>>> Good data! It certainly indicates an issue with the simple
>>> binary modulus for treuncating 32->20 bits. But the extremely
>>> narrow testing range limits the conclusions considerably:
>>>
>>>   >> I tested only with cma_dport from 18515 (ib_write_bw default) to
>>>   >> 18524. I can do more tests if required, for example use multiple
>>>   >> cma_dport in one statistic.
>>>
>>> This hash is intended to provide entropy across the entire port
>>> range and we should evaluate it as such. At a minimum, the source
>>> port can vary much more widely, from Alex's original message it's
>>> 0xC000 - 0xFFFF.
>>>
>>>> UDP source port selection must adhere IANA port allocation ranges.
>>>> Thus we will
>>>> be using IANA recommendation for Ephemeral port range of:
>>>> 49152-65535, or in
>>>> hex: 0xC000-0xFFFF.
>>>
>>> I'm not certain what the range of the destination port might be, but
>>> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF.
>>>
>>> Any chance you could scale up your test, to measure the original
>>> proposed hash across these broader ranges?
>>>
>>>>    u32 hash = DstPort * SrcPort;
>>>>    hash ^= (hash >> 16);
>>>>    hash ^= (hash >> 8);
>>>>    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>
>> I did an even quicker-and-dirtier test, with the attached. Both
>> the folding and non-folding methods display, to me, pretty much
>> the same behavior. And there's a fairly significant periodicity
>> with a doubling of the hash collision rate, every 8 or so buckets.
>>
>> The "folding" version has higher spikes at these points than the
>> non-folding, in fact. As you mentioned, there are a few more "zero"
>> hashes, but that's expected, and not that different for both.
>>
>> Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there
>> are 800M possible permutations, and of course 1M hash buckets. So,
>> an 800:1 collision rate is expected. But the numbers range from
>> the mid-300's to several-1000's. That variance seems high to me.
>>
>> I really think there needs to be a flatter spectrum, here. These
>> collisions can cause significant congestion effects at scale. I
>> suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going
>> to take me a little time to find that.
>>
> 
> I did tests with range cma_sport [0xC000, 0xFFFF] and cma_dport [1025,
> 0xFFFF] (but each test with one dport), and found:
> 
> 1. The folding and non-folding results are similar;
> 2. When dport is multiple of 0x200 the result is very bad. I also tested
>      with your hashtest.c, there are much more "zero" hashes when sport or
>      dport is multiple of 0x200.
> 
> For the hash one of the original goal is symmetry, i.e.:
>       f(sport, dport) = f(dport, sport)

I'm very curious why this is a requirement. The hash is used to map
to a packet queue, which enforces ordering as well as providing a
congestion throttle point. These queues are one-way, and therefore
the same value has no effect when used symmetrically - it only works
one-way, the reverse flow is completely independent.

Am I missing something?

> If that's not important I feel "sport * 31 + dport" [1] has a better result.
> 
> [1] https://www.strchr.com/hash_functions

Well, that'd be simple!

Tom.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-19  2:01                           ` Tom Talpey
@ 2020-02-19  2:06                             ` Mark Zhang
  2020-02-19 13:06                               ` Jason Gunthorpe
  0 siblings, 1 reply; 24+ messages in thread
From: Mark Zhang @ 2020-02-19  2:06 UTC (permalink / raw)
  To: Tom Talpey, Jason Gunthorpe
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On 2/19/2020 10:01 AM, Tom Talpey wrote:
> On 2/18/2020 8:51 PM, Mark Zhang wrote:
>> On 2/19/2020 1:41 AM, Tom Talpey wrote:
>>> On 2/18/2020 9:16 AM, Tom Talpey wrote:
>>>> On 2/15/2020 1:27 AM, Mark Zhang wrote:
>>>>> On 2/14/2020 10:23 PM, Mark Zhang wrote:
>>>>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
>>>>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
>>>>>>>
>>>>>>>>> If both src & dst ports are in the high value range you loss those
>>>>>>>>> hash bits in the masking.
>>>>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. 
>>>>>>>>> You'll
>>>>>>>>> get the same hash if both ports are equal 0xF000.
>>>>>>>>
>>>>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object.
>>>>>>>> There
>>>>>>>> will always be collisions, this is just one example. My concern is
>>>>>>>> the
>>>>>>>> statistical spread of the results. I argue it's not changed by the
>>>>>>>> proposed bit-folding, possibly even damaged.
>>>>>>>
>>>>>>> I've always thought that 'folding' by modulo results in an abnormal
>>>>>>> statistical distribution
>>>>>>>
>>>>>>> The point here is not collisions but to have a hash distribution 
>>>>>>> which
>>>>>>> is generally uniform for the input space.
>>>>>>>
>>>>>>> Alex, it would be good to make a quick program to measure the
>>>>>>> uniformity of the distribution..
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> I did some tests with a quick program (hope it's not buggy...),
>>>>>> seems the hash without "folding" has a better distribution than hash
>>>>>> with fold. The "hash quality" is reflected by the "total_access"[1]
>>>>>> below.
>>>>>>
>>>>>> I tested only with cma_dport from 18515 (ib_write_bw default) to
>>>>>> 18524. I can do more tests if required, for example use multiple
>>>>>> cma_dport in one statistic.
>>>>>>
>>>>>>
>>>>>> [1]
>>>>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays 
>>>>>>
>>>>>>
>>>>>>
>>>>>> $ ./a
>>>>>>
>>>>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])';
>>>>>> Lower is better;
>>>>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])';
>>>>>> Likely min is always 0
>>>>>> total_access: The sum of all 'accesses' (for each slot:
>>>>>> accesses=n*(n+1)/2); Lower is better
>>>>>> n[X]: How many slots that has X items
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18515:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] =
>>>>>> 126
>>>>>>       udp_sport: max 10 min 0 total_access 51740  n[1] = 4420  n[2] =
>>>>>> 4670  n[3] = 3112  n[4] = 1433  n[5] = 535   n[6] = 163   n[7] = 31
>>>>>> n[8] = 5     n[9] = 2     n[10] = 1
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 4 min 0 total_access 48618   n[1] = 532   n[2] =
>>>>>> 7926  n[3] = 530   n[4] = 3698
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18516:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 3 min 0 total_access 32774  n[1] = 31214 n[2] =
>>>>>> 770    n[3] = 4
>>>>>>       udp_sport: max 8 min 0 total_access 50808   n[1] = 4406  n[2] =
>>>>>> 4873  n[3] = 3157  n[4] = 1413  n[5] = 509   n[6] = 129   n[7] = 20
>>>>>> n[8] = 4
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>>> 16382
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18517:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32250 n[2] =
>>>>>> 258
>>>>>>       udp_sport: max 10 min 0 total_access 54916  n[1] = 4536  n[2] =
>>>>>> 4170  n[3] = 2817  n[4] = 1445  n[5] = 622   n[6] = 275   n[7] = 94
>>>>>> n[8] = 22    n[9] = 5     n[10] = 2
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 3 min 1 total_access 38402   n[1] = 2820  n[2] =
>>>>>> 10746 n[3] = 2818
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18518:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32066 n[2] =
>>>>>> 350
>>>>>>       udp_sport: max 8 min 0 total_access 50018   n[1] = 4435  n[2] =
>>>>>> 4970  n[3] = 3294  n[4] = 1376  n[5] = 465   n[6] = 92    n[7] = 16
>>>>>> n[8] = 2
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>>> 16382
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18519:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 3 min 0 total_access 32774  n[1] = 31816 n[2] =
>>>>>> 469    n[3] = 4
>>>>>>       udp_sport: max 8 min 0 total_access 51462   n[1] = 4414  n[2] =
>>>>>> 4734  n[3] = 3088  n[4] = 1466  n[5] = 508   n[6] = 160   n[7] = 32
>>>>>> n[8] = 4
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 4 min 0 total_access 45490   n[1] = 3662  n[2] =
>>>>>> 6360  n[3] = 3660  n[4] = 1351
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18520:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 6 min 0 total_access 34618  n[1] = 20349 n[2] =
>>>>>> 5027  n[3] = 550   n[4] = 164   n[5] = 9     n[6] = 2
>>>>>>       udp_sport: max 13 min 0 total_access 82542  n[1] = 549   n[2] =
>>>>>> 1167  n[3] = 1635  n[4] = 1706  n[5] = 1341  n[6] = 836   n[7] = 483
>>>>>> n[8] = 223   n[9] = 87    n[10] = 27
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 4 min 0 total_access 65530 n[3] = 2     n[4] 
>>>>>> = 8190
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18521:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 31924 n[2] =
>>>>>> 421
>>>>>>       udp_sport: max 9 min 0 total_access 51864   n[1] = 4505  n[2] =
>>>>>> 4645  n[3] = 3038  n[4] = 1464  n[5] = 542   n[6] = 154   n[7] = 43
>>>>>> n[8] = 6     n[9] = 2
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 3 min 1 total_access 32810   n[1] = 24    n[2] =
>>>>>> 16338 n[3] = 22
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18522:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 3 min 0 total_access 32768  n[1] = 32197 n[2] =
>>>>>> 283    n[3] = 1
>>>>>>       udp_sport: max 9 min 0 total_access 50850   n[1] = 4561  n[2] =
>>>>>> 4756  n[3] = 3187  n[4] = 1452  n[5] = 453   n[6] = 137   n[7] = 29
>>>>>> n[8] = 2     n[9] = 2
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>>> 16382
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18523:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 2 min 0 total_access 32766  n[1] = 32514 n[2] =
>>>>>> 126
>>>>>>       udp_sport: max 8 min 0 total_access 52208   n[1] = 4426  n[2] =
>>>>>> 4609  n[3] = 3069  n[4] = 1435  n[5] = 533   n[6] = 180   n[7] = 50
>>>>>> n[8] = 10
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 4 min 0 total_access 46062   n[1] = 3096  n[2] =
>>>>>> 6640  n[3] = 3094  n[4] = 1777
>>>>>>
>>>>>>
>>>>>> cm source port range [32768, 65534], dest port 18524:
>>>>>> Hash with folding:
>>>>>>       flow_label: max 3 min 0 total_access 32774  n[1] = 31362 n[2] =
>>>>>> 696    n[3] = 4
>>>>>>       udp_sport: max 8 min 0 total_access 49490   n[1] = 4440  n[2] =
>>>>>> 5148  n[3] = 3240  n[4] = 1413  n[5] = 394   n[6] = 97    n[7] = 14
>>>>>> n[8] = 1
>>>>>> Hash without folding:
>>>>>>       flow_label: max 1 min 0 total_access 32766  n[1] = 32766
>>>>>>       udp_sport: max 2 min 1 total_access 32766   n[1] = 2     n[2] =
>>>>>> 16382
>>>>>>
>>>>>>
>>>>>
>>>>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600,
>>>>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then
>>>>> others. For examples when dport is 18431 and 18432:
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18431:
>>>>> Hash with folding:
>>>>>       flow_label: max 2 min 0 total_access 32766
>>>>>       udp_sport:  max 8 min 0 total_access 50410
>>>>> Hash without folding:
>>>>>       flow_label: max 1 min 0 total_access 32766
>>>>>       udp_sport:  max 4 min 0 total_access 48126
>>>>>
>>>>> cm source port range [32768, 65534], dest port 18432(0x4800):
>>>>> Hash with folding:
>>>>>       flow_label: max 133 min 0 total_access 1072938
>>>>>
>>>>>       udp_sport:  max 203 min 0 total_access 2126644
>>>>>
>>>>> Hash without folding:
>>>>>       flow_label: max 64 min 0   total_access 1048450
>>>>>
>>>>>       udp_sport:  max 1024 min 0 total_access 16775170
>>>>
>>>> Good data! It certainly indicates an issue with the simple
>>>> binary modulus for treuncating 32->20 bits. But the extremely
>>>> narrow testing range limits the conclusions considerably:
>>>>
>>>>   >> I tested only with cma_dport from 18515 (ib_write_bw default) to
>>>>   >> 18524. I can do more tests if required, for example use multiple
>>>>   >> cma_dport in one statistic.
>>>>
>>>> This hash is intended to provide entropy across the entire port
>>>> range and we should evaluate it as such. At a minimum, the source
>>>> port can vary much more widely, from Alex's original message it's
>>>> 0xC000 - 0xFFFF.
>>>>
>>>>> UDP source port selection must adhere IANA port allocation ranges.
>>>>> Thus we will
>>>>> be using IANA recommendation for Ephemeral port range of:
>>>>> 49152-65535, or in
>>>>> hex: 0xC000-0xFFFF.
>>>>
>>>> I'm not certain what the range of the destination port might be, but
>>>> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF.
>>>>
>>>> Any chance you could scale up your test, to measure the original
>>>> proposed hash across these broader ranges?
>>>>
>>>>>    u32 hash = DstPort * SrcPort;
>>>>>    hash ^= (hash >> 16);
>>>>>    hash ^= (hash >> 8);
>>>>>    AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>
>>> I did an even quicker-and-dirtier test, with the attached. Both
>>> the folding and non-folding methods display, to me, pretty much
>>> the same behavior. And there's a fairly significant periodicity
>>> with a doubling of the hash collision rate, every 8 or so buckets.
>>>
>>> The "folding" version has higher spikes at these points than the
>>> non-folding, in fact. As you mentioned, there are a few more "zero"
>>> hashes, but that's expected, and not that different for both.
>>>
>>> Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there
>>> are 800M possible permutations, and of course 1M hash buckets. So,
>>> an 800:1 collision rate is expected. But the numbers range from
>>> the mid-300's to several-1000's. That variance seems high to me.
>>>
>>> I really think there needs to be a flatter spectrum, here. These
>>> collisions can cause significant congestion effects at scale. I
>>> suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going
>>> to take me a little time to find that.
>>>
>>
>> I did tests with range cma_sport [0xC000, 0xFFFF] and cma_dport [1025,
>> 0xFFFF] (but each test with one dport), and found:
>>
>> 1. The folding and non-folding results are similar;
>> 2. When dport is multiple of 0x200 the result is very bad. I also tested
>>      with your hashtest.c, there are much more "zero" hashes when 
>> sport or
>>      dport is multiple of 0x200.
>>
>> For the hash one of the original goal is symmetry, i.e.:
>>       f(sport, dport) = f(dport, sport)
> 
> I'm very curious why this is a requirement. The hash is used to map
> to a packet queue, which enforces ordering as well as providing a
> congestion throttle point. These queues are one-way, and therefore
> the same value has no effect when used symmetrically - it only works
> one-way, the reverse flow is completely independent.
> 
> Am I missing something?
> 

The symmetry is important when calculate flow_label with DstQPn/SrcQPn 
for non-RDMA CM Service ID (check the first mail), so that the server 
and client will have same flow_label and udp_sport. But looks like it is 
not important in this case.

>> If that's not important I feel "sport * 31 + dport" [1] has a better 
>> result.
>>
>> [1] https://www.strchr.com/hash_functions
> 
> Well, that'd be simple!
> 
> Tom.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-19  2:06                             ` Mark Zhang
@ 2020-02-19 13:06                               ` Jason Gunthorpe
  2020-02-19 17:41                                 ` Tom Talpey
  0 siblings, 1 reply; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-19 13:06 UTC (permalink / raw)
  To: Mark Zhang
  Cc: Tom Talpey, Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha,
	Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote:
 
> The symmetry is important when calculate flow_label with DstQPn/SrcQPn 
> for non-RDMA CM Service ID (check the first mail), so that the server 
> and client will have same flow_label and udp_sport. But looks like it is 
> not important in this case.

If the application needs a certain flow label it should not rely on
auto-generation, IMHO.

I expect most networks will not be reversible anyhow, even with the
same flow label?

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-19 13:06                               ` Jason Gunthorpe
@ 2020-02-19 17:41                                 ` Tom Talpey
  2020-02-19 17:55                                   ` Jason Gunthorpe
  2020-02-20  1:04                                   ` Mark Zhang
  0 siblings, 2 replies; 24+ messages in thread
From: Tom Talpey @ 2020-02-19 17:41 UTC (permalink / raw)
  To: Jason Gunthorpe, Mark Zhang
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On 2/19/2020 8:06 AM, Jason Gunthorpe wrote:
> On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote:
>   
>> The symmetry is important when calculate flow_label with DstQPn/SrcQPn
>> for non-RDMA CM Service ID (check the first mail), so that the server
>> and client will have same flow_label and udp_sport. But looks like it is
>> not important in this case.
> 
> If the application needs a certain flow label it should not rely on
> auto-generation, IMHO.
> 
> I expect most networks will not be reversible anyhow, even with the
> same flow label?

These are network flow labels, not under application control. If they
are under application control, that's a security issue.

But I agree, if the symmetric behavior is not needed, it should be
ignored and a better (more uniformly distributed) hash should be chosen.

I definitely like the simplicity and perfect flatness of the newly
proposed (src * 31) + dst. But that "31" causes overflow into bit 21,
doesn't it? (31 * 0xffff == 0x1f0000)

Tom.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-19 17:41                                 ` Tom Talpey
@ 2020-02-19 17:55                                   ` Jason Gunthorpe
  2020-02-20  1:04                                   ` Mark Zhang
  1 sibling, 0 replies; 24+ messages in thread
From: Jason Gunthorpe @ 2020-02-19 17:55 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Mark Zhang, Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha,
	Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On Wed, Feb 19, 2020 at 12:41:53PM -0500, Tom Talpey wrote:
> On 2/19/2020 8:06 AM, Jason Gunthorpe wrote:
> > On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote:
> > > The symmetry is important when calculate flow_label with DstQPn/SrcQPn
> > > for non-RDMA CM Service ID (check the first mail), so that the server
> > > and client will have same flow_label and udp_sport. But looks like it is
> > > not important in this case.
> > 
> > If the application needs a certain flow label it should not rely on
> > auto-generation, IMHO.
> > 
> > I expect most networks will not be reversible anyhow, even with the
> > same flow label?
> 
> These are network flow labels, not under application control. If they
> are under application control, that's a security issue.

Eh? ipv6 puts them under application control.

Jason

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-19 17:41                                 ` Tom Talpey
  2020-02-19 17:55                                   ` Jason Gunthorpe
@ 2020-02-20  1:04                                   ` Mark Zhang
  2020-02-21 14:47                                     ` Tom Talpey
  1 sibling, 1 reply; 24+ messages in thread
From: Mark Zhang @ 2020-02-20  1:04 UTC (permalink / raw)
  To: Tom Talpey, Jason Gunthorpe
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On 2/20/2020 1:41 AM, Tom Talpey wrote:
> On 2/19/2020 8:06 AM, Jason Gunthorpe wrote:
>> On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote:
>>> The symmetry is important when calculate flow_label with DstQPn/SrcQPn
>>> for non-RDMA CM Service ID (check the first mail), so that the server
>>> and client will have same flow_label and udp_sport. But looks like it is
>>> not important in this case.
>>
>> If the application needs a certain flow label it should not rely on
>> auto-generation, IMHO.
>>
>> I expect most networks will not be reversible anyhow, even with the
>> same flow label?
> 
> These are network flow labels, not under application control. If they
> are under application control, that's a security issue.
> 

As Jason said application is able to control it in ipv6. Besides 
application is also able to control it for non-RDMA CM Service ID in ipv4.

Hi Jason, same flow label get same UDP source port, with same UDP source 
port (along with same sIP/dIP/sPort), are networks reversible?

> But I agree, if the symmetric behavior is not needed, it should be
> ignored and a better (more uniformly distributed) hash should be chosen.
> 
> I definitely like the simplicity and perfect flatness of the newly
> proposed (src * 31) + dst. But that "31" causes overflow into bit 21,
> doesn't it? (31 * 0xffff == 0x1f0000) >

I think overflow doesn't matter? We have overflow anyway if 
multiplicative is used.

> Tom.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-20  1:04                                   ` Mark Zhang
@ 2020-02-21 14:47                                     ` Tom Talpey
  2020-02-25 13:20                                       ` Alex Rosenbaum
  0 siblings, 1 reply; 24+ messages in thread
From: Tom Talpey @ 2020-02-21 14:47 UTC (permalink / raw)
  To: Mark Zhang, Jason Gunthorpe
  Cc: Alex Rosenbaum, RDMA mailing list, Eran Ben Elisha, Yishai Hadas,
	Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

On 2/19/2020 8:04 PM, Mark Zhang wrote:
> On 2/20/2020 1:41 AM, Tom Talpey wrote:
>> On 2/19/2020 8:06 AM, Jason Gunthorpe wrote:
>>> On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote:
>>>> The symmetry is important when calculate flow_label with DstQPn/SrcQPn
>>>> for non-RDMA CM Service ID (check the first mail), so that the server
>>>> and client will have same flow_label and udp_sport. But looks like it is
>>>> not important in this case.
>>>
>>> If the application needs a certain flow label it should not rely on
>>> auto-generation, IMHO.
>>>
>>> I expect most networks will not be reversible anyhow, even with the
>>> same flow label?
>>
>> These are network flow labels, not under application control. If they
>> are under application control, that's a security issue.
>>
> 
> As Jason said application is able to control it in ipv6. Besides
> application is also able to control it for non-RDMA CM Service ID in ipv4.

Ok, well I guess that's a separate issue, let's not rathole on
it here then.

> Hi Jason, same flow label get same UDP source port, with same UDP source
> port (along with same sIP/dIP/sPort), are networks reversible?
> 
>> But I agree, if the symmetric behavior is not needed, it should be
>> ignored and a better (more uniformly distributed) hash should be chosen.
>>
>> I definitely like the simplicity and perfect flatness of the newly
>> proposed (src * 31) + dst. But that "31" causes overflow into bit 21,
>> doesn't it? (31 * 0xffff == 0x1f0000) >
> 
> I think overflow doesn't matter? We have overflow anyway if
> multiplicative is used.

Hmm, it does seem to matter because dropping bits tilts the
distribution curve. Plugging ((src * 31) + dst) & 0xFFFFF into
my little test shows some odd behaviors. It starts out flat,
then the collisions start to rise around 49000, leveling out
at 65000 to a value roughly double the initial one (528 -> 1056).
It sits there until 525700, where it falls back to the start
value (528). I don't think this is optimal :-)

Tom.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
  2020-02-21 14:47                                     ` Tom Talpey
@ 2020-02-25 13:20                                       ` Alex Rosenbaum
  0 siblings, 0 replies; 24+ messages in thread
From: Alex Rosenbaum @ 2020-02-25 13:20 UTC (permalink / raw)
  To: Tom Talpey
  Cc: Mark Zhang, Jason Gunthorpe, RDMA mailing list, Eran Ben Elisha,
	Yishai Hadas, Alex Rosenbaum, Maor Gottlieb, Leon Romanovsky

Mark and I where playing with your test, and plotting the results
I'm sharing the png's on a temp github here:
https://github.com/rosenbaumalex/hashtest/
[I wasn't sure of a better place to share them]

The README.md explains the port range we used, the 3 hash's used, and
a line about the results.
In general, the higher the 'noise' the worse the distribution is.
It seems like Mark's hash suggestion (src*31 + dst) works best. then
the folding one, and last the non-folding one.

I am trying to cache a few switch related hash experts to get
additional feedback.

Alex

On Fri, Feb 21, 2020 at 4:47 PM Tom Talpey <tom@talpey.com> wrote:
>
> On 2/19/2020 8:04 PM, Mark Zhang wrote:
> > On 2/20/2020 1:41 AM, Tom Talpey wrote:
> >> On 2/19/2020 8:06 AM, Jason Gunthorpe wrote:
> >>> On Wed, Feb 19, 2020 at 02:06:28AM +0000, Mark Zhang wrote:
> >>>> The symmetry is important when calculate flow_label with DstQPn/SrcQPn
> >>>> for non-RDMA CM Service ID (check the first mail), so that the server
> >>>> and client will have same flow_label and udp_sport. But looks like it is
> >>>> not important in this case.
> >>>
> >>> If the application needs a certain flow label it should not rely on
> >>> auto-generation, IMHO.
> >>>
> >>> I expect most networks will not be reversible anyhow, even with the
> >>> same flow label?
> >>
> >> These are network flow labels, not under application control. If they
> >> are under application control, that's a security issue.
> >>
> >
> > As Jason said application is able to control it in ipv6. Besides
> > application is also able to control it for non-RDMA CM Service ID in ipv4.
>
> Ok, well I guess that's a separate issue, let's not rathole on
> it here then.
>
> > Hi Jason, same flow label get same UDP source port, with same UDP source
> > port (along with same sIP/dIP/sPort), are networks reversible?
> >
> >> But I agree, if the symmetric behavior is not needed, it should be
> >> ignored and a better (more uniformly distributed) hash should be chosen.
> >>
> >> I definitely like the simplicity and perfect flatness of the newly
> >> proposed (src * 31) + dst. But that "31" causes overflow into bit 21,
> >> doesn't it? (31 * 0xffff == 0x1f0000) >
> >
> > I think overflow doesn't matter? We have overflow anyway if
> > multiplicative is used.
>
> Hmm, it does seem to matter because dropping bits tilts the
> distribution curve. Plugging ((src * 31) + dst) & 0xFFFFF into
> my little test shows some odd behaviors. It starts out flat,
> then the collisions start to rise around 49000, leveling out
> at 65000 to a value roughly double the initial one (528 -> 1056).
> It sits there until 525700, where it falls back to the start
> value (528). I don't think this is optimal :-)
>
> Tom.

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, back to index

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum
2020-01-15  9:48 ` Mark Zhang
2020-02-06 14:18 ` Tom Talpey
2020-02-06 14:35   ` Jason Gunthorpe
2020-02-06 14:39   ` Alex Rosenbaum
2020-02-06 15:19     ` Tom Talpey
2020-02-08  9:58       ` Alex Rosenbaum
2020-02-12 15:47         ` Tom Talpey
2020-02-13 11:03           ` Alex Rosenbaum
2020-02-13 15:26             ` Tom Talpey
2020-02-13 15:41               ` Jason Gunthorpe
2020-02-14 14:23                 ` Mark Zhang
2020-02-15  6:27                   ` Mark Zhang
2020-02-18 14:16                     ` Tom Talpey
2020-02-18 17:41                       ` Tom Talpey
2020-02-19  1:51                         ` Mark Zhang
2020-02-19  2:01                           ` Tom Talpey
2020-02-19  2:06                             ` Mark Zhang
2020-02-19 13:06                               ` Jason Gunthorpe
2020-02-19 17:41                                 ` Tom Talpey
2020-02-19 17:55                                   ` Jason Gunthorpe
2020-02-20  1:04                                   ` Mark Zhang
2020-02-21 14:47                                     ` Tom Talpey
2020-02-25 13:20                                       ` Alex Rosenbaum

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git