All of lore.kernel.org
 help / color / mirror / Atom feed
From: Haiyang Zhang <haiyangz@microsoft.com>
To: Tom Herbert <tom@herbertland.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	David Miller <davem@davemloft.net>,
	"vkuznets@redhat.com" <vkuznets@redhat.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	KY Srinivasan <kys@microsoft.com>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow_keys layout
Date: Thu, 14 Jan 2016 20:23:32 +0000	[thread overview]
Message-ID: <BN1PR0301MB0770C72CCEDD9AEBA7AEA329CACC0@BN1PR0301MB0770.namprd03.prod.outlook.com> (raw)
In-Reply-To: <CALx6S34NWdBe0ZBuSMJCC8r68LOVvbY3ZXhneo+TSU1qz=9mYw@mail.gmail.com>



> -----Original Message-----
> From: Tom Herbert [mailto:tom@herbertland.com]
> Sent: Thursday, January 14, 2016 2:41 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>; One Thousand Gnomes
> <gnomes@lxorguk.ukuu.org.uk>; David Miller <davem@davemloft.net>;
> vkuznets@redhat.com; netdev@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> struct flow_keys layout
> 
> On Thu, Jan 14, 2016 at 11:15 AM, Haiyang Zhang <haiyangz@microsoft.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Tom Herbert [mailto:tom@herbertland.com]
> >> Sent: Thursday, January 14, 2016 1:49 PM
> >> To: Haiyang Zhang <haiyangz@microsoft.com>
> >> Cc: Eric Dumazet <eric.dumazet@gmail.com>; One Thousand Gnomes
> >> <gnomes@lxorguk.ukuu.org.uk>; David Miller <davem@davemloft.net>;
> >> vkuznets@redhat.com; netdev@vger.kernel.org; KY Srinivasan
> >> <kys@microsoft.com>; devel@linuxdriverproject.org; linux-
> >> kernel@vger.kernel.org
> >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> >> struct flow_keys layout
> >>
> >> On Thu, Jan 14, 2016 at 10:35 AM, Haiyang Zhang
> <haiyangz@microsoft.com>
> >> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> >> >> Sent: Thursday, January 14, 2016 1:24 PM
> >> >> To: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
> >> >> Cc: Tom Herbert <tom@herbertland.com>; Haiyang Zhang
> >> >> <haiyangz@microsoft.com>; David Miller <davem@davemloft.net>;
> >> >> vkuznets@redhat.com; netdev@vger.kernel.org; KY Srinivasan
> >> >> <kys@microsoft.com>; devel@linuxdriverproject.org; linux-
> >> >> kernel@vger.kernel.org
> >> >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> >> >> struct flow_keys layout
> >> >>
> >> >> On Thu, 2016-01-14 at 17:53 +0000, One Thousand Gnomes wrote:
> >> >> > > These results for Toeplitz are not plausible. Given random
> input
> >> you
> >> >> > > cannot expect any hash function to produce such uniform
> results.
> >> I
> >> >> > > suspect either your input data is biased or how your applying
> the
> >> >> hash
> >> >> > > is.
> >> >> > >
> >> >> > > When I run 64 random IPv4 3-tuples through Toeplitz and
> Jenkins I
> >> >> get
> >> >> > > something more reasonable:
> >> >> >
> >> >> > IPv4 address patterns are not random. Nothing like it. A long
> long
> >> >> time
> >> >> > ago we did do a bunch of tuning for network hashes using big
> porn
> >> site
> >> >> > data sets. Random it was not.
> >> >> >
> >> >>
> >> >> I ran my tests with non random IPV4 addresses, as I had 2 hosts,
> >> >> one server, one client. (typical benchmark stuff)
> >> >>
> >> >> The only 'random' part was the ports, so maybe ~20 bits of entropy,
> >> >> considering how we allocate ports during connect() to a given
> >> >> destination to avoid port reuse.
> >> >>
> >> >> > It's probably hard to repeat that exercise now with geo specific
> >> >> routing,
> >> >> > and all the front end caches and redirectors on big sites but
> I'd
> >> >> > strongly suggest random input is not a good test, and also that
> you
> >> >> need
> >> >> > to worry more about hash attacks than perfect distributions.
> >> >>
> >> >> Anyway, the exercise is not to find a hash that exactly splits 128
> >> flows
> >> >> into 16 buckets, according to the number of flows per bucket.
> >> >>
> >> >> Maybe only 4 flows are sending at 3Gbits, and others are sending
> at
> >> 100
> >> >> kbits. There is no way the driver can predict the future.
> >> >>
> >> >> This is why we prefer to select a queue given the cpu sending the
> >> >> packet. This permits a natural shift based on actual load, and is
> the
> >> >> default on linux (see XPS in Documentation/networking/scaling.txt)
> >> >>
> >> >> Only this driver has a selection based on a flow 'hash'.
> >> >
> >> > Also, the port number selection may not be random either. For
> example,
> >> > the well-known network throughput test tool, iperf, use port
> numbers
> >> with
> >> > equal increment among them. We tested these non-random cases, and
> >> found
> >> > the Toeplitz hash has distributed evenly, but Jenkins hash has non-
> >> even
> >> > distribution.
> >> >
> >> > I'm aware of the test from Tom Herbert <tom@herbertland.com>, which
> >> > showing similar results of Toeplitz v.s. Jenkins with random inputs.
> >> >
> >> > In summary, the Toeplitz performs better in case of non-random
> inputs,
> >> > and performs similar to Jenkins in random inputs (which may not be
> the
> >> > case in real world). So we still prefer to use Toeplitz hash.
> >> >
> >> You are basing your conclusions on one toy benchmark. I don't believe
> >> that an realistically loaded web server is going to consistently give
> >> you tuples that happen to somehow fit into a nice model so that the
> >> bias benefits your load distribution.
> >>
> >> > To minimize the computational overhead, we may consider put the
> hash
> >> > in a per-connection cache in TCP layer, so it only needs one time
> >> > computation. But, even with the computation overhead at this moment,
> >> > the throughput based on Toeplitz hash is better than Jenkins:
> >> > Throughput (Gbps) comparison:
> >> > #conn           Toeplitz        Jenkins
> >> > 32              26.6            23.2
> >> > 64              32.1            23.4
> >> > 128             29.1            24.1
> >> >
> >> You don't need to do that. We already store a random hash value in
> the
> >> connection context. If you want to make it non-random then just
> >> replace that with a simple global counter. This will have the exact
> >> same effect that you see in your tests without needing any expensive
> >> computation.
> >
> > Could you point me to the data field of connection context where this
> > hash value is stored? Is it computed only one time?
> >
> sk_txhash in struct sock. It is set to a random number on TCP or UDP
> connect call, It can be reset to a different random value when
> connection is seen to be have trouble (sk_rethink_txhash).
> 
> Also when you say "Toeplitz performs better in case of non-random
> inputs" please quantify exactly how your input data is not random.
> What header changes with each connection in your test...

Thank you for the info! 

For non-random inputs, I used the port selection of iperf that increases 
the port number by 2 for each connection. Only send-port numbers are 
different, other values are the same. I also tested some other fixed 
increment, Toeplitz spreads the connections evenly. For real applications, 
if the load came from local area, then the IP/port combinations are 
likely to have some non-random patterns.

For our driver, we are thinking to put the Toeplitz hash to the sk_txhash, 
so it only needs to be computed only once, or during sk_rethink_txhash. 
So, the computational overhead happens almost only once.

Thanks,
- Haiyang

WARNING: multiple messages have this Message-ID (diff)
From: Haiyang Zhang <haiyangz@microsoft.com>
To: Tom Herbert <tom@herbertland.com>
Cc: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"devel@linuxdriverproject.org" <devel@linuxdriverproject.org>,
	David Miller <davem@davemloft.net>
Subject: RE: [PATCH net-next] hv_netvsc: don't make assumptions on struct flow_keys layout
Date: Thu, 14 Jan 2016 20:23:32 +0000	[thread overview]
Message-ID: <BN1PR0301MB0770C72CCEDD9AEBA7AEA329CACC0@BN1PR0301MB0770.namprd03.prod.outlook.com> (raw)
In-Reply-To: <CALx6S34NWdBe0ZBuSMJCC8r68LOVvbY3ZXhneo+TSU1qz=9mYw@mail.gmail.com>



> -----Original Message-----
> From: Tom Herbert [mailto:tom@herbertland.com]
> Sent: Thursday, January 14, 2016 2:41 PM
> To: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>; One Thousand Gnomes
> <gnomes@lxorguk.ukuu.org.uk>; David Miller <davem@davemloft.net>;
> vkuznets@redhat.com; netdev@vger.kernel.org; KY Srinivasan
> <kys@microsoft.com>; devel@linuxdriverproject.org; linux-
> kernel@vger.kernel.org
> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> struct flow_keys layout
> 
> On Thu, Jan 14, 2016 at 11:15 AM, Haiyang Zhang <haiyangz@microsoft.com>
> wrote:
> >
> >
> >> -----Original Message-----
> >> From: Tom Herbert [mailto:tom@herbertland.com]
> >> Sent: Thursday, January 14, 2016 1:49 PM
> >> To: Haiyang Zhang <haiyangz@microsoft.com>
> >> Cc: Eric Dumazet <eric.dumazet@gmail.com>; One Thousand Gnomes
> >> <gnomes@lxorguk.ukuu.org.uk>; David Miller <davem@davemloft.net>;
> >> vkuznets@redhat.com; netdev@vger.kernel.org; KY Srinivasan
> >> <kys@microsoft.com>; devel@linuxdriverproject.org; linux-
> >> kernel@vger.kernel.org
> >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> >> struct flow_keys layout
> >>
> >> On Thu, Jan 14, 2016 at 10:35 AM, Haiyang Zhang
> <haiyangz@microsoft.com>
> >> wrote:
> >> >
> >> >
> >> >> -----Original Message-----
> >> >> From: Eric Dumazet [mailto:eric.dumazet@gmail.com]
> >> >> Sent: Thursday, January 14, 2016 1:24 PM
> >> >> To: One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>
> >> >> Cc: Tom Herbert <tom@herbertland.com>; Haiyang Zhang
> >> >> <haiyangz@microsoft.com>; David Miller <davem@davemloft.net>;
> >> >> vkuznets@redhat.com; netdev@vger.kernel.org; KY Srinivasan
> >> >> <kys@microsoft.com>; devel@linuxdriverproject.org; linux-
> >> >> kernel@vger.kernel.org
> >> >> Subject: Re: [PATCH net-next] hv_netvsc: don't make assumptions on
> >> >> struct flow_keys layout
> >> >>
> >> >> On Thu, 2016-01-14 at 17:53 +0000, One Thousand Gnomes wrote:
> >> >> > > These results for Toeplitz are not plausible. Given random
> input
> >> you
> >> >> > > cannot expect any hash function to produce such uniform
> results.
> >> I
> >> >> > > suspect either your input data is biased or how your applying
> the
> >> >> hash
> >> >> > > is.
> >> >> > >
> >> >> > > When I run 64 random IPv4 3-tuples through Toeplitz and
> Jenkins I
> >> >> get
> >> >> > > something more reasonable:
> >> >> >
> >> >> > IPv4 address patterns are not random. Nothing like it. A long
> long
> >> >> time
> >> >> > ago we did do a bunch of tuning for network hashes using big
> porn
> >> site
> >> >> > data sets. Random it was not.
> >> >> >
> >> >>
> >> >> I ran my tests with non random IPV4 addresses, as I had 2 hosts,
> >> >> one server, one client. (typical benchmark stuff)
> >> >>
> >> >> The only 'random' part was the ports, so maybe ~20 bits of entropy,
> >> >> considering how we allocate ports during connect() to a given
> >> >> destination to avoid port reuse.
> >> >>
> >> >> > It's probably hard to repeat that exercise now with geo specific
> >> >> routing,
> >> >> > and all the front end caches and redirectors on big sites but
> I'd
> >> >> > strongly suggest random input is not a good test, and also that
> you
> >> >> need
> >> >> > to worry more about hash attacks than perfect distributions.
> >> >>
> >> >> Anyway, the exercise is not to find a hash that exactly splits 128
> >> flows
> >> >> into 16 buckets, according to the number of flows per bucket.
> >> >>
> >> >> Maybe only 4 flows are sending at 3Gbits, and others are sending
> at
> >> 100
> >> >> kbits. There is no way the driver can predict the future.
> >> >>
> >> >> This is why we prefer to select a queue given the cpu sending the
> >> >> packet. This permits a natural shift based on actual load, and is
> the
> >> >> default on linux (see XPS in Documentation/networking/scaling.txt)
> >> >>
> >> >> Only this driver has a selection based on a flow 'hash'.
> >> >
> >> > Also, the port number selection may not be random either. For
> example,
> >> > the well-known network throughput test tool, iperf, use port
> numbers
> >> with
> >> > equal increment among them. We tested these non-random cases, and
> >> found
> >> > the Toeplitz hash has distributed evenly, but Jenkins hash has non-
> >> even
> >> > distribution.
> >> >
> >> > I'm aware of the test from Tom Herbert <tom@herbertland.com>, which
> >> > showing similar results of Toeplitz v.s. Jenkins with random inputs.
> >> >
> >> > In summary, the Toeplitz performs better in case of non-random
> inputs,
> >> > and performs similar to Jenkins in random inputs (which may not be
> the
> >> > case in real world). So we still prefer to use Toeplitz hash.
> >> >
> >> You are basing your conclusions on one toy benchmark. I don't believe
> >> that an realistically loaded web server is going to consistently give
> >> you tuples that happen to somehow fit into a nice model so that the
> >> bias benefits your load distribution.
> >>
> >> > To minimize the computational overhead, we may consider put the
> hash
> >> > in a per-connection cache in TCP layer, so it only needs one time
> >> > computation. But, even with the computation overhead at this moment,
> >> > the throughput based on Toeplitz hash is better than Jenkins:
> >> > Throughput (Gbps) comparison:
> >> > #conn           Toeplitz        Jenkins
> >> > 32              26.6            23.2
> >> > 64              32.1            23.4
> >> > 128             29.1            24.1
> >> >
> >> You don't need to do that. We already store a random hash value in
> the
> >> connection context. If you want to make it non-random then just
> >> replace that with a simple global counter. This will have the exact
> >> same effect that you see in your tests without needing any expensive
> >> computation.
> >
> > Could you point me to the data field of connection context where this
> > hash value is stored? Is it computed only one time?
> >
> sk_txhash in struct sock. It is set to a random number on TCP or UDP
> connect call, It can be reset to a different random value when
> connection is seen to be have trouble (sk_rethink_txhash).
> 
> Also when you say "Toeplitz performs better in case of non-random
> inputs" please quantify exactly how your input data is not random.
> What header changes with each connection in your test...

Thank you for the info! 

For non-random inputs, I used the port selection of iperf that increases 
the port number by 2 for each connection. Only send-port numbers are 
different, other values are the same. I also tested some other fixed 
increment, Toeplitz spreads the connections evenly. For real applications, 
if the load came from local area, then the IP/port combinations are 
likely to have some non-random patterns.

For our driver, we are thinking to put the Toeplitz hash to the sk_txhash, 
so it only needs to be computed only once, or during sk_rethink_txhash. 
So, the computational overhead happens almost only once.

Thanks,
- Haiyang

  reply	other threads:[~2016-01-14 20:23 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-07  9:33 [PATCH net-next] hv_netvsc: don't make assumptions on struct flow_keys layout Vitaly Kuznetsov
2016-01-07  9:33 ` Vitaly Kuznetsov
2016-01-07 12:52 ` Eric Dumazet
2016-01-07 13:28   ` Vitaly Kuznetsov
2016-01-07 13:28     ` Vitaly Kuznetsov
2016-01-08  1:02     ` John Fastabend
2016-01-08  3:49       ` KY Srinivasan
2016-01-08  3:49         ` KY Srinivasan
2016-01-08  6:16         ` John Fastabend
2016-01-08  6:16           ` John Fastabend
2016-01-08 18:01           ` KY Srinivasan
2016-01-08 21:07     ` Haiyang Zhang
2016-01-08 21:07       ` Haiyang Zhang
2016-01-09  0:17   ` Tom Herbert
2016-01-09  0:17     ` Tom Herbert
2016-01-10 22:25 ` David Miller
2016-01-10 22:25   ` David Miller
2016-01-13 23:10   ` Haiyang Zhang
2016-01-13 23:10     ` Haiyang Zhang
2016-01-14  4:56     ` David Miller
2016-01-14  4:56       ` David Miller
2016-01-14 17:14     ` Tom Herbert
2016-01-14 17:14       ` Tom Herbert
2016-01-14 17:53       ` One Thousand Gnomes
2016-01-14 17:53         ` One Thousand Gnomes
2016-01-14 18:24         ` Eric Dumazet
2016-01-14 18:24           ` Eric Dumazet
2016-01-14 18:35           ` Haiyang Zhang
2016-01-14 18:35             ` Haiyang Zhang
2016-01-14 18:48             ` Tom Herbert
2016-01-14 19:15               ` Haiyang Zhang
2016-01-14 19:15                 ` Haiyang Zhang
2016-01-14 19:41                 ` Tom Herbert
2016-01-14 20:23                   ` Haiyang Zhang [this message]
2016-01-14 20:23                     ` Haiyang Zhang
2016-01-14 21:44                     ` Tom Herbert
2016-01-14 21:44                       ` Tom Herbert
2016-01-14 22:06                       ` David Miller
2016-01-14 22:08                     ` Eric Dumazet
2016-01-14 22:08                       ` Eric Dumazet
2016-01-14 22:29                       ` Haiyang Zhang
2016-01-14 22:29                         ` Haiyang Zhang
2016-01-14 17:53     ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BN1PR0301MB0770C72CCEDD9AEBA7AEA329CACC0@BN1PR0301MB0770.namprd03.prod.outlook.com \
    --to=haiyangz@microsoft.com \
    --cc=davem@davemloft.net \
    --cc=devel@linuxdriverproject.org \
    --cc=eric.dumazet@gmail.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=kys@microsoft.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=tom@herbertland.com \
    --cc=vkuznets@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.