From: Tom Talpey <tom@talpey.com>
To: Mark Zhang <markz@mellanox.com>, Jason Gunthorpe <jgg@ziepe.ca>
Cc: Alex Rosenbaum <rosenbaumalex@gmail.com>,
RDMA mailing list <linux-rdma@vger.kernel.org>,
Eran Ben Elisha <eranbe@mellanox.com>,
Yishai Hadas <yishaih@mellanox.com>,
"Alex @ Mellanox" <alexr@mellanox.com>,
Maor Gottlieb <maorg@mellanox.com>,
Leon Romanovsky <leonro@mellanox.com>
Subject: Re: [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port
Date: Tue, 18 Feb 2020 12:41:33 -0500 [thread overview]
Message-ID: <e758da0d-94a3-a22f-c2aa-3d13714c4ed3@talpey.com> (raw)
In-Reply-To: <55e8c9cf-cd64-27b2-1333-ac4849f5e3ff@talpey.com>
[-- Attachment #1: Type: text/plain, Size: 12014 bytes --]
On 2/18/2020 9:16 AM, Tom Talpey wrote:
> On 2/15/2020 1:27 AM, Mark Zhang wrote:
>> On 2/14/2020 10:23 PM, Mark Zhang wrote:
>>> On 2/13/2020 11:41 PM, Jason Gunthorpe wrote:
>>>> On Thu, Feb 13, 2020 at 10:26:09AM -0500, Tom Talpey wrote:
>>>>
>>>>>> If both src & dst ports are in the high value range you loss those
>>>>>> hash bits in the masking.
>>>>>> If src & dst port are both 0xE000, your masked hash equals 0. You'll
>>>>>> get the same hash if both ports are equal 0xF000.
>>>>>
>>>>> Sure, but this is because it's a 20-bit hash of a 32-bit object. There
>>>>> will always be collisions, this is just one example. My concern is the
>>>>> statistical spread of the results. I argue it's not changed by the
>>>>> proposed bit-folding, possibly even damaged.
>>>>
>>>> I've always thought that 'folding' by modulo results in an abnormal
>>>> statistical distribution
>>>>
>>>> The point here is not collisions but to have a hash distribution which
>>>> is generally uniform for the input space.
>>>>
>>>> Alex, it would be good to make a quick program to measure the
>>>> uniformity of the distribution..
>>>>
>>>
>>> Hi,
>>>
>>> I did some tests with a quick program (hope it's not buggy...), seems
>>> the hash without "folding" has a better distribution than hash with
>>> fold. The "hash quality" is reflected by the "total_access"[1] below.
>>>
>>> I tested only with cma_dport from 18515 (ib_write_bw default) to
>>> 18524. I can do more tests if required, for example use multiple
>>> cma_dport in one statistic.
>>>
>>>
>>> [1]
>>> https://stackoverflow.com/questions/24729730/measuring-a-hash-functions-quality-for-use-with-maps-assosiative-arrays
>>>
>>>
>>> $ ./a
>>>
>>> max: Say for slot x there are tb[x] items, then 'max = max(tb[x])';
>>> Lower is better;
>>> min: Say for slot x there are tb[x] items, then 'min = min(tb[x])';
>>> Likely min is always 0
>>> total_access: The sum of all 'accesses' (for each slot:
>>> accesses=n*(n+1)/2); Lower is better
>>> n[X]: How many slots that has X items
>>>
>>> cm source port range [32768, 65534], dest port 18515:
>>> Hash with folding:
>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126
>>> udp_sport: max 10 min 0 total_access 51740 n[1] = 4420 n[2] =
>>> 4670 n[3] = 3112 n[4] = 1433 n[5] = 535 n[6] = 163 n[7] = 31
>>> n[8] = 5 n[9] = 2 n[10] = 1
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 4 min 0 total_access 48618 n[1] = 532 n[2] =
>>> 7926 n[3] = 530 n[4] = 3698
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18516:
>>> Hash with folding:
>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31214 n[2] =
>>> 770 n[3] = 4
>>> udp_sport: max 8 min 0 total_access 50808 n[1] = 4406 n[2] =
>>> 4873 n[3] = 3157 n[4] = 1413 n[5] = 509 n[6] = 129 n[7] = 20
>>> n[8] = 4
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] =
>>> 16382
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18517:
>>> Hash with folding:
>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32250 n[2] = 258
>>> udp_sport: max 10 min 0 total_access 54916 n[1] = 4536 n[2] =
>>> 4170 n[3] = 2817 n[4] = 1445 n[5] = 622 n[6] = 275 n[7] = 94
>>> n[8] = 22 n[9] = 5 n[10] = 2
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 3 min 1 total_access 38402 n[1] = 2820 n[2] =
>>> 10746 n[3] = 2818
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18518:
>>> Hash with folding:
>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32066 n[2] = 350
>>> udp_sport: max 8 min 0 total_access 50018 n[1] = 4435 n[2] =
>>> 4970 n[3] = 3294 n[4] = 1376 n[5] = 465 n[6] = 92 n[7] = 16
>>> n[8] = 2
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] =
>>> 16382
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18519:
>>> Hash with folding:
>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31816 n[2] =
>>> 469 n[3] = 4
>>> udp_sport: max 8 min 0 total_access 51462 n[1] = 4414 n[2] =
>>> 4734 n[3] = 3088 n[4] = 1466 n[5] = 508 n[6] = 160 n[7] = 32
>>> n[8] = 4
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 4 min 0 total_access 45490 n[1] = 3662 n[2] =
>>> 6360 n[3] = 3660 n[4] = 1351
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18520:
>>> Hash with folding:
>>> flow_label: max 6 min 0 total_access 34618 n[1] = 20349 n[2] =
>>> 5027 n[3] = 550 n[4] = 164 n[5] = 9 n[6] = 2
>>> udp_sport: max 13 min 0 total_access 82542 n[1] = 549 n[2] =
>>> 1167 n[3] = 1635 n[4] = 1706 n[5] = 1341 n[6] = 836 n[7] = 483
>>> n[8] = 223 n[9] = 87 n[10] = 27
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 4 min 0 total_access 65530 n[3] = 2 n[4] = 8190
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18521:
>>> Hash with folding:
>>> flow_label: max 2 min 0 total_access 32766 n[1] = 31924 n[2] = 421
>>> udp_sport: max 9 min 0 total_access 51864 n[1] = 4505 n[2] =
>>> 4645 n[3] = 3038 n[4] = 1464 n[5] = 542 n[6] = 154 n[7] = 43
>>> n[8] = 6 n[9] = 2
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 3 min 1 total_access 32810 n[1] = 24 n[2] =
>>> 16338 n[3] = 22
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18522:
>>> Hash with folding:
>>> flow_label: max 3 min 0 total_access 32768 n[1] = 32197 n[2] =
>>> 283 n[3] = 1
>>> udp_sport: max 9 min 0 total_access 50850 n[1] = 4561 n[2] =
>>> 4756 n[3] = 3187 n[4] = 1452 n[5] = 453 n[6] = 137 n[7] = 29
>>> n[8] = 2 n[9] = 2
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] =
>>> 16382
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18523:
>>> Hash with folding:
>>> flow_label: max 2 min 0 total_access 32766 n[1] = 32514 n[2] = 126
>>> udp_sport: max 8 min 0 total_access 52208 n[1] = 4426 n[2] =
>>> 4609 n[3] = 3069 n[4] = 1435 n[5] = 533 n[6] = 180 n[7] = 50
>>> n[8] = 10
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 4 min 0 total_access 46062 n[1] = 3096 n[2] =
>>> 6640 n[3] = 3094 n[4] = 1777
>>>
>>>
>>> cm source port range [32768, 65534], dest port 18524:
>>> Hash with folding:
>>> flow_label: max 3 min 0 total_access 32774 n[1] = 31362 n[2] =
>>> 696 n[3] = 4
>>> udp_sport: max 8 min 0 total_access 49490 n[1] = 4440 n[2] =
>>> 5148 n[3] = 3240 n[4] = 1413 n[5] = 394 n[6] = 97 n[7] = 14
>>> n[8] = 1
>>> Hash without folding:
>>> flow_label: max 1 min 0 total_access 32766 n[1] = 32766
>>> udp_sport: max 2 min 1 total_access 32766 n[1] = 2 n[2] =
>>> 16382
>>>
>>>
>>
>> Another finding is, when cma_dport is multiple of 0x200 (i.e., 0x600,
>> 0x800, ... 0xFE00), the hash distribution is tens of times worse then
>> others. For examples when dport is 18431 and 18432:
>>
>> cm source port range [32768, 65534], dest port 18431:
>> Hash with folding:
>> flow_label: max 2 min 0 total_access 32766
>> udp_sport: max 8 min 0 total_access 50410
>> Hash without folding:
>> flow_label: max 1 min 0 total_access 32766
>> udp_sport: max 4 min 0 total_access 48126
>>
>> cm source port range [32768, 65534], dest port 18432(0x4800):
>> Hash with folding:
>> flow_label: max 133 min 0 total_access 1072938
>>
>> udp_sport: max 203 min 0 total_access 2126644
>>
>> Hash without folding:
>> flow_label: max 64 min 0 total_access 1048450
>>
>> udp_sport: max 1024 min 0 total_access 16775170
>
> Good data! It certainly indicates an issue with the simple
> binary modulus for treuncating 32->20 bits. But the extremely
> narrow testing range limits the conclusions considerably:
>
> >> I tested only with cma_dport from 18515 (ib_write_bw default) to
> >> 18524. I can do more tests if required, for example use multiple
> >> cma_dport in one statistic.
>
> This hash is intended to provide entropy across the entire port
> range and we should evaluate it as such. At a minimum, the source
> port can vary much more widely, from Alex's original message it's
> 0xC000 - 0xFFFF.
>
>> UDP source port selection must adhere IANA port allocation ranges.
>> Thus we will
>> be using IANA recommendation for Ephemeral port range of: 49152-65535,
>> or in
>> hex: 0xC000-0xFFFF.
>
> I'm not certain what the range of the destination port might be, but
> as a Service ID, a good assumption is the full range of 0x1 - 0xBFFF.
>
> Any chance you could scale up your test, to measure the original
> proposed hash across these broader ranges?
>
>> u32 hash = DstPort * SrcPort;
>> hash ^= (hash >> 16);
>> hash ^= (hash >> 8);
>> AH_ATTR.GRH.flow_label = hash AND IB_GRH_FLOWLABEL_MASK;
I did an even quicker-and-dirtier test, with the attached. Both
the folding and non-folding methods display, to me, pretty much
the same behavior. And there's a fairly significant periodicity
with a doubling of the hash collision rate, every 8 or so buckets.
The "folding" version has higher spikes at these points than the
non-folding, in fact. As you mentioned, there are a few more "zero"
hashes, but that's expected, and not that different for both.
Assuming you agree with my C000-FFFF and 1-BFFF port ranges, there
are 800M possible permutations, and of course 1M hash buckets. So,
an 800:1 collision rate is expected. But the numbers range from
the mid-300's to several-1000's. That variance seems high to me.
I really think there needs to be a flatter spectrum, here. These
collisions can cause significant congestion effects at scale. I
suggested trying a CRC-20 of the 32-bit src<<16|dst, but it's going
to take me a little time to find that.
> Folding hash
> bucket hits
> 0 3840
> 1 407
> 2 798
> 3 426
> 4 1137
> 5 409
> 6 711
> 7 372
> 8 1595
> 9 349
> 10 751
> 11 385
> 12 1164
> 13 375
> 14 747
> 15 406
> 16 1952
> 17 382
> 18 766
> 19 390
> 20 1139
> 21 372
> 22 792
> 23 419
> 24 1543
> 25 393
> 26 777
> 27 403
> 28 1123
> 29 356
> 30 773
> 31 363
> 32 2340
> 33 397
> 34 785
> 35 393
> 36 1154
> 37 415
> 38 744
Versus...
> Non-folding hash
> bucket hits
> 0 4469
> 1 480
> 2 684
> 3 567
> 4 990
> 5 465
> 6 697
> 7 650
> 8 1279
> 9 453
> 10 671
> 11 556
> 12 989
> 13 499
> 14 653
> 15 812
> 16 1603
> 17 478
> 18 694
> 19 559
> 20 1015
> 21 506
> 22 675
> 23 659
> 24 1317
> 25 476
> 26 644
> 27 555
> 28 953
> 29 475
> 30 738
> 31 927
> 32 2047
> 33 456
> 34 726
> 35 537
> 36 952
> 37 472
> 38 665
Tom.
[-- Attachment #2: hashtest.c --]
[-- Type: text/plain, Size: 755 bytes --]
#include <stdio.h>
int data[1024 * 1024];
int main(int argc, char **argv)
{
unsigned short src, dst;
unsigned long hash;
printf("%s hash\nbucket\thits\n", argc > 1 ? "Non-folding" : "Folding");
for (src = 1; src < 0xBFFF; src++)
for (dst = 0xC000; dst <= 0xFFFE; dst++) {
hash = src * dst;
if (argc > 1) {
hash ^= hash >> 16;
hash ^= hash >> 8;
}
hash &= 0xFFFFF;
data[hash]++;
}
int i;
for (i = 0; i < 1024 * 1024; i++)
printf("%d\t%d\n", i, data[i]);
return 0;
}
next prev parent reply other threads:[~2020-02-18 17:41 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-08 14:26 [RFC v2] RoCE v2.0 Entropy - IPv6 Flow Label and UDP Source Port Alex Rosenbaum
2020-01-15 9:48 ` Mark Zhang
2020-02-06 14:18 ` Tom Talpey
2020-02-06 14:35 ` Jason Gunthorpe
2020-02-06 14:39 ` Alex Rosenbaum
2020-02-06 15:19 ` Tom Talpey
2020-02-08 9:58 ` Alex Rosenbaum
2020-02-12 15:47 ` Tom Talpey
2020-02-13 11:03 ` Alex Rosenbaum
2020-02-13 15:26 ` Tom Talpey
2020-02-13 15:41 ` Jason Gunthorpe
2020-02-14 14:23 ` Mark Zhang
2020-02-15 6:27 ` Mark Zhang
2020-02-18 14:16 ` Tom Talpey
2020-02-18 17:41 ` Tom Talpey [this message]
2020-02-19 1:51 ` Mark Zhang
2020-02-19 2:01 ` Tom Talpey
2020-02-19 2:06 ` Mark Zhang
2020-02-19 13:06 ` Jason Gunthorpe
2020-02-19 17:41 ` Tom Talpey
2020-02-19 17:55 ` Jason Gunthorpe
2020-02-20 1:04 ` Mark Zhang
2020-02-21 14:47 ` Tom Talpey
2020-02-25 13:20 ` Alex Rosenbaum
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e758da0d-94a3-a22f-c2aa-3d13714c4ed3@talpey.com \
--to=tom@talpey.com \
--cc=alexr@mellanox.com \
--cc=eranbe@mellanox.com \
--cc=jgg@ziepe.ca \
--cc=leonro@mellanox.com \
--cc=linux-rdma@vger.kernel.org \
--cc=maorg@mellanox.com \
--cc=markz@mellanox.com \
--cc=rosenbaumalex@gmail.com \
--cc=yishaih@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).