Re: [PATCH] rfs: Receive Flow Steering

From: Changli Gao <xiaosuo@gmail.com>
To: Rick Jones <rick.jones2@hp.com>
Cc: Tom Herbert <therbert@google.com>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	davem@davemloft.net, netdev@vger.kernel.org
Subject: Re: [PATCH] rfs: Receive Flow Steering
Date: Thu, 8 Apr 2010 09:37:28 +0800	[thread overview]
Message-ID: <z2h412e6f7f1004071837s88d1cd87yc7667f607dd33b5@mail.gmail.com> (raw)
In-Reply-To: <4BB6367D.9090600@hp.com>

On Sat, Apr 3, 2010 at 2:25 AM, Rick Jones <rick.jones2@hp.com> wrote:
> Tom Herbert wrote:
>>    The progression in HP-UX was IPS (10.20) (aka RPS) then TOPS (11.0)
>>    (aka RFS). We found that IPS was great for
>>    single-flow-per-thread-of-execution stuff and that TOPS was better
>>    for multiple-flow-per-thread-of-execution stuff.  It was long enough
>>    ago now that I can safely say for one system-level benchmark not
>>    known to be a "networking" benchmark, and without a massive kernel
>>    component, TOPS was a 10% win.  Not too shabby.
>>
>>    It wasn't that IPS wasn't good in its context - just that TOPS was
>>    even better.
>>
>> I would assume that with IPS threads would migrate to where packets were
>> being delivered thus giving the same sort of locality TOPS was providing?
>>  That would work great without any other constraints (multiple flows per
>> thread, thread CPU bindings, etc.).
>
> Well... that depended - at the time, and still, we were and are also
> encouraging users and app designers to make copious use of
> processor/locality affinity (SMP and NUMA going back far longer in the RISC
> et al space than the x86 space).  So, it was and is entirely possible that
> the application thread of execution is hard-bound to a specific
> core/locality.  Also, I do not recall if HP-UX was as aggressive about
> waking a process/thread on the processor from which the wake-up came vs on
> the processor on which it last ran.
>

Maybe RPS should be work against process not processor. For packets
forwarding, the process is net_rx softirq.

>>    We also preferred the concept of the scheduler giving networking
>>    clues as to where to process an application's packets rather than
>>    networking trying to tell the scheduler.  There was some discussion
>>    of out of order worries, but we were willing to trust to the basic
>>    soundness of the scheduler - if it was moving threads around willy
>>    nilly at a rate able to cause big packet reordering it had
>>    fundamental problems that would have to be addressed anyway.
>>
>>
>> I also think scheduler leading networking, like in RPS,  is generally more
>> scalable.  As for OOO packets, I've spent way to much time trying to
>> convince the bean-counters that a small number of them aren't problematic
>> :-), in the end it's just easier to not introduce new mechanisms that will
>> cause them!
>
> So long as it doesn't drive you to produce new mechanisms heavier than they
> would have otherwise been.
>
> The irony in the case of HP-UX IPS was that it was put in place in response
> to the severe out of order packet problems in HP-UX in 10.X before 10.20 -
> there were multiple netisr processes and only one netisr queue.  The other
> little tweak that came along in 10.20 with IPS, was inaddition to having a
> per processor (well, per core in today's parlance) netisr queue, the netisr
> would grab the entire queue under the one spinlock and work off of that.
>  That was nice because the code path became more efficient under load - more
> packets processed per spinlock/unlock pair.
>

RPS dispatches packets among all the CPUs permitted fairly, in order
to take full advantage of all the CPU power. The assumption is the cpu
cycles each CPU gives to packet processing are the same. But it isn't
always true as scheduler is mixed in. In this case, scheduler leading
network is a good choice. Maybe we should make softirq threaded under
the control of scheduler. And the number of softirq threads can be
specified by users. By default, the number of the softirq threads are
the same as the number of CPUs, and each thread binds to a special
CPU, to keep the current behavior. If the other tasks aren't
dispatched among the CPUs even, system administrator may increase the
number of softirq thread, and dissolve the thread binding, then there
will be enough schedulable softirq threads for scheduler scheduling.
Oh, maybe there is no need of weighted packets dispatching RPS.

-- 
Regards，
Changli Gao(xiaosuo@gmail.com)