From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1161002AbcANSs5 (ORCPT ); Thu, 14 Jan 2016 13:48:57 -0500 Received: from mail-io0-f180.google.com ([209.85.223.180]:34125 "EHLO mail-io0-f180.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753054AbcANSsy (ORCPT ); Thu, 14 Jan 2016 13:48:54 -0500 MIME-Version: 1.0 In-Reply-To: References: <1452159189-11473-1-git-send-email-vkuznets@redhat.com> <20160110.172558.367101858392871618.davem@davemloft.net> <20160114175304.161ff0af@lxorguk.ukuu.org.uk> <1452795849.1223.112.camel@edumazet-glaptop2.roam.corp.google.com> Date: Thu, 14 Jan 2016 10:48:54 -0800 Message-ID: Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow_keys layout From: Tom Herbert To: Haiyang Zhang Cc: Eric Dumazet , One Thousand Gnomes , David Miller , "vkuznets@redhat.com" , "netdev@vger.kernel.org" , KY Srinivasan , "devel@linuxdriverproject.org" , "linux-kernel@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jan 14, 2016 at 10:35 AM, Haiyang Zhang wrote: > > >> -----Original Message----- >> From: Eric Dumazet [mailto:eric.dumazet@gmail.com] >> Sent: Thursday, January 14, 2016 1:24 PM >> To: One Thousand Gnomes >> Cc: Tom Herbert ; Haiyang Zhang >> ; David Miller ; >> vkuznets@redhat.com; netdev@vger.kernel.org; KY Srinivasan >> ; devel@linuxdriverproject.org; linux- >> kernel@vger.kernel.org >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on >> struct flow_keys layout >> >> On Thu, 2016-01-14 at 17:53 +0000, One Thousand Gnomes wrote: >> > > These results for Toeplitz are not plausible. Given random input you >> > > cannot expect any hash function to produce such uniform results. I >> > > suspect either your input data is biased or how your applying the >> hash >> > > is. >> > > >> > > When I run 64 random IPv4 3-tuples through Toeplitz and Jenkins I >> get >> > > something more reasonable: >> > >> > IPv4 address patterns are not random. Nothing like it. A long long >> time >> > ago we did do a bunch of tuning for network hashes using big porn site >> > data sets. Random it was not. >> > >> >> I ran my tests with non random IPV4 addresses, as I had 2 hosts, >> one server, one client. (typical benchmark stuff) >> >> The only 'random' part was the ports, so maybe ~20 bits of entropy, >> considering how we allocate ports during connect() to a given >> destination to avoid port reuse. >> >> > It's probably hard to repeat that exercise now with geo specific >> routing, >> > and all the front end caches and redirectors on big sites but I'd >> > strongly suggest random input is not a good test, and also that you >> need >> > to worry more about hash attacks than perfect distributions. >> >> Anyway, the exercise is not to find a hash that exactly splits 128 flows >> into 16 buckets, according to the number of flows per bucket. >> >> Maybe only 4 flows are sending at 3Gbits, and others are sending at 100 >> kbits. There is no way the driver can predict the future. >> >> This is why we prefer to select a queue given the cpu sending the >> packet. This permits a natural shift based on actual load, and is the >> default on linux (see XPS in Documentation/networking/scaling.txt) >> >> Only this driver has a selection based on a flow 'hash'. > > Also, the port number selection may not be random either. For example, > the well-known network throughput test tool, iperf, use port numbers with > equal increment among them. We tested these non-random cases, and found > the Toeplitz hash has distributed evenly, but Jenkins hash has non-even > distribution. > > I'm aware of the test from Tom Herbert , which > showing similar results of Toeplitz v.s. Jenkins with random inputs. > > In summary, the Toeplitz performs better in case of non-random inputs, > and performs similar to Jenkins in random inputs (which may not be the > case in real world). So we still prefer to use Toeplitz hash. > You are basing your conclusions on one toy benchmark. I don't believe that an realistically loaded web server is going to consistently give you tuples that happen to somehow fit into a nice model so that the bias benefits your load distribution. > To minimize the computational overhead, we may consider put the hash > in a per-connection cache in TCP layer, so it only needs one time > computation. But, even with the computation overhead at this moment, > the throughput based on Toeplitz hash is better than Jenkins: > Throughput (Gbps) comparison: > #conn Toeplitz Jenkins > 32 26.6 23.2 > 64 32.1 23.4 > 128 29.1 24.1 > You don't need to do that. We already store a random hash value in the connection context. If you want to make it non-random then just replace that with a simple global counter. This will have the exact same effect that you see in your tests without needing any expensive computation. > Also, to the questions from Eric Dumazet -- no, > there is not limit of the number of connections per VMBus channel. But, > if one channel has a lot more connections than other channels, the > unbalanced work load slow down the overall throughput. > > The purpose of send-indirection-table is to shift the workload by change > the mapping of table entry v.s. the channel. The updated table is sent > by host to guest from time to time. But if the hash function distributes > too many connections into one table entry, it cannot spread them into > different channels. > > Thanks to everyone who joined the discussion. > > Thanks, > - Haiyang >