From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.2 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CF468C3F68F for ; Thu, 13 Feb 2020 15:51:47 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id AE1D52073C for ; Thu, 13 Feb 2020 15:51:47 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729281AbgBMPvm (ORCPT ); Thu, 13 Feb 2020 10:51:42 -0500 Received: from p3plsmtpa11-04.prod.phx3.secureserver.net ([68.178.252.105]:47989 "EHLO p3plsmtpa11-04.prod.phx3.secureserver.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728469AbgBMP0L (ORCPT ); Thu, 13 Feb 2020 10:26:11 -0500 Received: from [192.168.0.78] ([24.218.182.144]) by :SMTPAUTH: with ESMTPSA id 2GNWjRzNfilGc2GNWjbJF7; Thu, 13 Feb 2020 08:26:11 -0700 X-CMAE-Analysis: v=2.3 cv=CubBjUwD c=1 sm=1 tr=0 a=ugQcCzLIhEHbLaAUV45L0A==:117 a=ugQcCzLIhEHbLaAUV45L0A==:17 a=jpOVt7BSZ2e4Z31A5e1TngXxSK0=:19 a=IkcTkHD0fZMA:10 a=SEc3moZ4AAAA:8 a=P-IC7800AAAA:8 a=lKubuHx3MQE-PGVhoo8A:9 a=7Zwj6sZBwVKJAoWSPKxL6X1jA+E=:19 a=QEXdDO2ut3YA:10 a=5oRCH6oROnRZc2VpWJZ3:22 a=d3PnA9EDa4IxuAV0gXij:22 X-SECURESERVER-ACCT: tom@talpey.com Subject: Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port To: Alex Rosenbaum Cc: RDMA mailing list , Jason Gunthorpe , Eran Ben Elisha , Yishai Hadas , "Alex @ Mellanox" , Maor Gottlieb , Leon Romanovsky , Mark Zhang References: <63a56c06-57bf-6e31-6ca8-043f9d3b72f3@talpey.com> <09478db9-28ca-65fe-1424-b0229a514bbb@talpey.com> From: Tom Talpey Message-ID: <62f4df50-b50d-29e2-a0f4-eccaf81bd8d9@talpey.com> Date: Thu, 13 Feb 2020 10:26:09 -0500 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.4.2 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit X-CMAE-Envelope: MS4wfFTOtIgjv+LXBgVu2B3uVF5wJI3JtFyzctMzY9Vm4jrwQ9jmgXU7S4RomuzM9PpmhP1T3JKcRQ7RB9smnIlfxvw7BqCr8fbcZdNRtoHcc38xAuinuKHr VYgPkJ5H7NuA/uukFR5zFZK04jes8DIfT5p3eNrqE9aERggkkxSUJFkbiBVDCqemtRjls2HPRvRpLz47DGliuBoiKqseDbNDYCwsCYqDs6ch0ffOmd8OZn5y /YpyMTcvE2sOVlXgGKAAPHHssp5/eEzObEP/7xwQu64LTCsv2ahq/+gEsiUenpnuLoF8tSYSDo5psHz9JQU7qRWesmj51yUasakbOdRDHClJI4B1FjY8enUP 6ai4FkbVBmCCkspdZQ3BJ2AsDBcjAoVoXpP8QZwKPIdTse8NuMy7HRW16eSTjedV3BVqgORN Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On 2/13/2020 6:03 AM, Alex Rosenbaum wrote: > On Wed, Feb 12, 2020 at 5:47 PM Tom Talpey wrote: >> >> On 2/8/2020 4:58 AM, Alex Rosenbaum wrote: >>> On Thu, Feb 6, 2020 at 5:19 PM Tom Talpey wrote: >>>> >>>> On 2/6/2020 9:39 AM, Alex Rosenbaum wrote: >>>>> On Thu, Feb 6, 2020 at 4:18 PM Tom Talpey wrote: >>>>>> >>>>>> On 1/8/2020 9:26 AM, Alex Rosenbaum wrote: >>>>>>> A combination of the flow_label field in the IPv6 header and UDP source port >>>>>>> field in RoCE v2.0 are used to identify a group of packets that must be >>>>>>> delivered in order by the network, end-to-end. >>>>>>> These fields are used to create entropy for network routers (ECMP), load >>>>>>> balancers and 802.3ad link aggregation switching that are not aware of RoCE IB >>>>>>> headers. >>>>>>> >>>>>>> The flow_label field is defined by a 20 bit hash value. CM based connections >>>>>>> will use a hash function definition based on the service type (QP Type) and >>>>>>> Service ID (SID). Where CM services are not used, the 20 bit hash will be >>>>>>> according to the source and destination QPN values. >>>>>>> Drivers will derive the RoCE v2.0 UDP src_port from the flow_label result. >>>>>>> >>>>>>> UDP source port selection must adhere IANA port allocation ranges. Thus we will >>>>>>> be using IANA recommendation for Ephemeral port range of: 49152-65535, or in >>>>>>> hex: 0xC000-0xFFFF. >>>>>>> >>>>>>> The below calculations take into account the importance of producing a symmetric >>>>>>> hash result so we can support symmetric hash calculation of network elements. >>>>>>> >>>>>>> Hash Calculation for RDMA IP CM Service >>>>>>> ======================================= >>>>>>> For RDMA IP CM Services, based on QP1 iMAD usage and connected RC QPs using the >>>>>>> RDMA IP CM Service ID, the flow label will be calculated according to IBTA CM >>>>>>> REQ private data info and Service ID. >>>>>>> >>>>>>> Flow label hash function calculations definition will be defined as: >>>>>>> Extract the following fields from the CM IP REQ: >>>>>>> CM_REQ.ServiceID.DstPort [2 Bytes] >>>>>>> CM_REQ.PrivateData.SrcPort [2 Bytes] >>>>>>> u32 hash = DstPort * SrcPort; >>>>>>> hash ^= (hash >> 16); >>>>>>> hash ^= (hash >> 8); >>>>>>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK; >>>>>>> >>>>>>> #define IB_GRH_FLOWLABEL_MASK 0x000FFFFF >>>>>> >>>>>> Sorry it took me a while to respond to this, and thanks for looking >>>>>> into it since my comments on the previous proposal. I have a concern >>>>>> with an aspect of this one. >>>>>> >>>>>> The RoCEv2 destination port is a fixed value, 4791. Therefore the >>>>>> term >>>>>> >>>>>> u32 hash = DstPort * SrcPort; >>>>>> >>>>>> adds no entropy beyond the value of SrcPort. >>>>>> >>>>> >>>>> we're talking about the CM service ports, taken from the >>>>> rdma_resolve_route(mca_id, , , to_msec); >>>>> these are the CM level port-space and not the RoCE UDP L4 ports. >>>>> we want to use both as these will allow different client instance and >>>>> server instance on same nodes will use differen CM ports and hopefully >>>>> generate different hash results for multi-flows between these two >>>>> servers. >>>> >>>> Aha, ok I guess I missed that, and ok. >>>> >>>>>> In turn, the subsequent >>>>>> >>>>>> hash ^= (hash >> 16); >>>>>> hash ^= (hash >> 8); >>>>>> >>>>>> are re-mashing the bits with one another, again, adding no entropy. >>>> >>>> I still wonder about this one. It's attempting to reduce the 32-bit >>>> product to 20 bits, but a second xor with the "middle" 16 bits seems >>>> really strange. Mathematically, wouldn't it be better to just take >>>> the modulus of 2^20? If not, are you expecting some behavior in the >>>> hash values that makes the double-xor approach better (in which case >>>> it should be called out)? >>>> >>>> Tom. >>> >>> The function takes into account creating a symmetric hash, so both >>> active and passive can reconstruct the same flow label results. That's >>> why we multiply the two CM Port values (16 bit * 16 bit). The results >>> is a 32 bit value, and we don't want to lose any of of the MSB bit's >>> by modulus or masking. So we need some folding function from 32 bit to >>> the 20 bit flow label. >>> >>> The specific bit shift is something I took from the bond driver: >>> https://elixir.bootlin.com/linux/latest/source/drivers/net/bonding/bond_main.c#L3407 >>> This proved very good in spreading the flow label in our internal >>> testing. Other alternative can be suggested, as long as it considers >>> all bits in the conversion 32->20 bits. >> >> I'm ok with it, but I still don't fully understand why the folding >> is necessary. The multiplication is the important part, and it is >> the operation that combines the two entropic inputs. The folding just >> flips bits from what's basically the same entropy source. >> >> IOW, I think that >> >> u32 hash = (DstPort * SrcPort) & IB_GRH_FLOWLABEL_MASK; >> >> would produce a completely equal benefit, mathematically. >> Tom. >> > > If both src & dst ports are in the high value range you loss those > hash bits in the masking. > If src & dst port are both 0xE000, your masked hash equals 0. You'll > get the same hash if both ports are equal 0xF000. Sure, but this is because it's a 20-bit hash of a 32-bit object. There will always be collisions, this is just one example. My concern is the statistical spread of the results. I argue it's not changed by the proposed bit-folding, possibly even damaged. > The idea with the bit shift is to take the MSB hash bits (left from > the 0XFFFFF mask) and fold them with the LSB in some way. I get that, but it's only folding the "one" bits, and it's doing so in a rather primitive way. For example, the ">> 8" term is folding the high 4 of 20 bits twice - once in the >> 16 and again in the >> 8. This value is only computed once, at QP creation, correct? Why not compute a CRC-20, for example? Tom.