Linux-RDMA Archive on lore.kernel.org
 help / color / Atom feed
From: Tom Talpey <tom@talpey.com>
To: Alex Rosenbaum <rosenbaumalex@gmail.com>
Cc: RDMA mailing list <linux-rdma@vger.kernel.org>,
	Jason Gunthorpe <jgg@ziepe.ca>,
	Eran Ben Elisha <eranbe@mellanox.com>,
	Yishai Hadas <yishaih@mellanox.com>,
	"Alex @ Mellanox" <alexr@mellanox.com>,
	Maor Gottlieb <maorg@mellanox.com>,
	Leon Romanovsky <leonro@mellanox.com>,
	Mark Zhang <markz@mellanox.com>
Subject: Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
Date: Thu, 6 Feb 2020 10:19:05 -0500
Message-ID: <b0414c43-c035-aa90-9f89-7ec6bba9e119@talpey.com> (raw)
In-Reply-To: <CAFgAxU80+feEEtaRYFYTHwTMSE6Edjq0t0siJ0W06WSyD+Cy3A@mail.gmail.com>

On 2/6/2020 9:39 AM, Alex Rosenbaum wrote:
> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey <tom@talpey.com> wrote:
>>
>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote:
>>> A combination of the flow_label field in the IPv6 header and UDP source port
>>> field in RoCE v2.0 are used to identify a group of packets that must be
>>> delivered in order by the network, end-to-end.
>>> These fields are used to create entropy for network routers (ECMP), load
>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB
>>> headers.
>>>
>>> The flow_label field is defined by a 20 bit hash value. CM based connections
>>> will use a hash function definition based on the service type (QP Type) and
>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be
>>> according to the source and destination QPN values.
>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result.
>>>
>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will
>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in
>>> hex: 0xC000-0xFFFF.
>>>
>>> The below calculations take into account the importance of producing a symmetric
>>> hash result so we can support symmetric hash calculation of network elements.
>>>
>>> Hash Calculation for RDMA IP CM Service
>>> =======================================
>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the
>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM
>>> REQ private data info and Service ID.
>>>
>>> Flow label hash function calculations definition will be defined as:
>>> Extract the following fields from the CM IP REQ:
>>>     CM_REQ.ServiceID.DstPort [2 Bytes]
>>>     CM_REQ.PrivateData.SrcPort [2 Bytes]
>>>     u32 hash = DstPort * SrcPort;
>>>     hash ^= (hash >> 16);
>>>     hash ^= (hash >> 8);
>>>     AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>
>>>     #define IB_GRH_FLOWLABEL_MASK  0x000FFFFF
>>
>> Sorry it took me a while to respond to this, and thanks for looking
>> into it since my comments on the previous proposal. I have a concern
>> with an aspect of this one.
>>
>> The RoCEv2 destination port is a fixed value, 4791. Therefore the
>> term
>>
>>          u32 hash = DstPort * SrcPort;
>>
>> adds no entropy beyond the value of SrcPort.
>>
> 
> we're talking about the CM service ports, taken from the
> rdma_resolve_route(mca_id, <ip:SrcPort>, <ip:DstPort>, to_msec);
> these are the CM level port-space and not the RoCE UDP L4 ports.
> we want to use both as these will allow different client instance and
> server instance on same nodes will use differen CM ports and hopefully
> generate different hash results for multi-flows between these two
> servers.

Aha, ok I guess I missed that, and ok.

>> In turn, the subsequent
>>
>>          hash ^= (hash >> 16);
>>          hash ^= (hash >> 8);
>>
>> are re-mashing the bits with one another, again, adding no entropy.

I still wonder about this one. It's attempting to reduce the 32-bit
product to 20 bits, but a second xor with the "middle" 16 bits seems
really strange. Mathematically, wouldn't it be better to just take
the modulus of 2^20? If not, are you expecting some behavior in the
hash values that makes the double-xor approach better (in which case
it should be called out)?

Tom.

>> Can you describe how, mathematically, this is not different from simply
>> using the SrcPort field, and if so, how it contributes to the entropy
>> differentiation of the incoming streams?
>>
>> Tom.
>>
>>> Result of the above hash will be kept in the CM's route path record connection
>>> context and will be used all across its vitality for all preceding CM messages
>>> on both ends of the connection (including REP, REJ, DREQ, DREP, ..).
>>> Once connection is established, the corresponding Connected RC QPs, on both
>>> ends of the connection, will update their context with the calculated RDMA IP
>>> CM Service based flow_label and UDP src_port values at the Connect phase of
>>> the active side and Accept phase of the passive side of the connection.
>>>
>>> CM will provide to the calculated value of the flow_label hash (20 bit) result
>>> in the 'uint32_t flow_label' field of 'struct ibv_global_route' in 'struct
>>> ibv_ah_attr'.
>>> The 'struct ibv_ah_attr' is passed by the CM to the provider library when
>>> modifying a connected QP's (e.g.: RC) state by calling 'ibv_modify_qp(qp,
>>> ah_attr, attr_mask |= IBV_QP_AV)' or when create a AH for working with
>>> datagram QP's (e.g.: UD) by calling ibv_create_ah(ah_attr).
>>>
>>> Hash Calculation for non-RDMA CM Service ID
>>> ===========================================
>>> For non CM QP's, the application can define the flow_label value in the
>>> 'struct ibv_ah_attr' when modifying the connected QP's (e.g.: RC) or creating
>>> a AH for the datagram QP's (e.g.: UD).
>>>
>>> If the provided flow_label value is zero, not set by the application (e.g.:
>>> legacy cases), then verbs providers should use the src.QP[24bit] and
>>> dst.QP[24bit] as input arguments for flow_label calculation.
>>> As QPN's are an array of 3 bytes, the multiplication will result in 6 bytes
>>> value. We'll define a flow_label value as:
>>>     DstQPn [3 Bytes]
>>>     SrcQPn [3 Bytes]
>>>     u64 hash = DstQPn * SrcQPn;
>>>     hash ^= (hash >> 20);
>>>     hash ^= (hash >> 40);
>>>     AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
>>>
>>> Hash Calculation for UDP src_port
>>> =================================
>>> Providers supporting RoCEv2 will use the 'flow_label' value as input to
>>> calculate the RoCEv2 UDP src_port, which will be used in the QP context or the
>>> AH context.
>>>
>>> UDP src_port calculations from flow label:
>>> [while considering the 14 bits UDP port range according to IANA recommendation]
>>>     AH_ATTR.GRH.flow_label [20 bits]
>>>     u32 fl_low  = fl & 0x03FFF;
>>>     u32 fl_high = fl & 0xFC000;
>>>     u16 udp_sport = fl_low XOR (fl_high >> 14);
>>>     RoCE.UDP.src_port = udp_sport OR IB_ROCE_UDP_ENCAP_VALID_PORT
>>>
>>>     #define IB_ROCE_UDP_ENCAP_VALID_PORT 0xC000
>>>
>>> This is a v2 follow-up on "[RFC] RoCE v2.0 UDP Source Port Entropy" [1]
>>>
>>> [1] https://www.spinics.net/lists/linux-rdma/msg73735.html
>>>
>>> Signed-off-by: Alex Rosenbaum <alexr@mellanox.com>
>>>
>>>
> 
> 

  reply index

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-08 14:26 Alex Rosenbaum
2020-01-15  9:48 ` Mark Zhang
2020-02-06 14:18 ` Tom Talpey
2020-02-06 14:35   ` Jason Gunthorpe
2020-02-06 14:39   ` Alex Rosenbaum
2020-02-06 15:19     ` Tom Talpey [this message]
2020-02-08  9:58       ` Alex Rosenbaum
2020-02-12 15:47         ` Tom Talpey
2020-02-13 11:03           ` Alex Rosenbaum
2020-02-13 15:26             ` Tom Talpey
2020-02-13 15:41               ` Jason Gunthorpe
2020-02-14 14:23                 ` Mark Zhang
2020-02-15  6:27                   ` Mark Zhang
2020-02-18 14:16                     ` Tom Talpey
2020-02-18 17:41                       ` Tom Talpey
2020-02-19  1:51                         ` Mark Zhang
2020-02-19  2:01                           ` Tom Talpey
2020-02-19  2:06                             ` Mark Zhang
2020-02-19 13:06                               ` Jason Gunthorpe
2020-02-19 17:41                                 ` Tom Talpey
2020-02-19 17:55                                   ` Jason Gunthorpe
2020-02-20  1:04                                   ` Mark Zhang
2020-02-21 14:47                                     ` Tom Talpey
2020-02-25 13:20                                       ` Alex Rosenbaum

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b0414c43-c035-aa90-9f89-7ec6bba9e119@talpey.com \
    --to=tom@talpey.com \
    --cc=alexr@mellanox.com \
    --cc=eranbe@mellanox.com \
    --cc=jgg@ziepe.ca \
    --cc=leonro@mellanox.com \
    --cc=linux-rdma@vger.kernel.org \
    --cc=maorg@mellanox.com \
    --cc=markz@mellanox.com \
    --cc=rosenbaumalex@gmail.com \
    --cc=yishaih@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-RDMA Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-rdma/0 linux-rdma/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-rdma linux-rdma/ https://lore.kernel.org/linux-rdma \
		linux-rdma@vger.kernel.org
	public-inbox-index linux-rdma

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-rdma


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git