netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Will de Bruijn <willemb@google.com>
To: Rick Jones <rick.jones2@hp.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>,
	rdunlap@xenotime.net, linux-doc@vger.kernel.org,
	davem@davemloft.net, netdev@vger.kernel.org, therbert@google.com
Subject: Re: [PATCH v2] net: add Documentation/networking/scaling.txt
Date: Thu, 11 Aug 2011 17:34:05 -0400	[thread overview]
Message-ID: <CA+FuTSc3mcL6i8J2CBvbOui1xLNDHPf0DJj=NorSduvRLq+vbg@mail.gmail.com> (raw)
In-Reply-To: <4E44192A.2070204@hp.com>

>> Well, patch was already accepted by David in net tree two days ago ;)
>
> Didn't see the customary "Applied" email - mailer glitch somewhere?
>

I didn't catch that either. Since it's already in, I instead wrote a
patch set where
[1/2] adds one-liners to 00-INDEX for scaling.txt and all other
missing entries (I had no idea how many there were when I started)
[2/2] fixes the few text-related issues that Rick raised below.

Will send them out shortly.

> <rss>
> Whether it lowers latency in the absence of an interrupt processing
> bottleneck depends on whether or not the application(s) receiving the data
> are able/allowed to run on the CPU(s) to which the IRQs of the queues are
> directed right?

The latency saved would be the time spent in the interrupt handler.
With multiple application threads, this delay is reduced if packets
are spread between interrupt service routines on different CPUs. These
savings, if any, are irrespective of where the application threads are
run. I have no data for the practical savings in absence of a
bottleneck: it could be inconsequential.

> Also, what mpstat and its ilk shows as CPUs could be HW threads - is it
> indeed the case that one is optimal when there are as many queues as there
> are HW threads, or is it when there are as many queues as there are discrete
> cores?

In my experience, cores. I'll add a brief statement on HT.

> If I have disabled interrupt coalescing in the name of latency, does the
> number of queues actually affect the number of interrupts?

Good point: I suppose it doesn't.

> Certainly any CPU processing interrupts that stays below 100% utilization is
> less likely to be a bottleneck, but if there are algorithms/heuristics that
> get more efficient under load, staying below the 100% CPU utilization mark
> doesn't mean that peak efficiency has been reached.  If there is something
> that processes more and more packets per lock grab/release then it is
> actually most efficient in terms of packets processed per unit CPU
> consumption once one gets to the ragged edge of saturation.

a busy polling CPU would be an example where measuring utilization is
useless. But for default interrupt-based device driver operation,
utilization of a CPU dedicated exclusively to HW interrupt processing
is indicative of overflow.

> Is utilization of the rx ring associated with the queue the more accurate,
> albeit unavailable, measure of saturation?

measuring overflow here could be an interesting alternative.

> This isn't the first mention of "cache domain"

I will add a definition on first use.

> This one is more drift than critique of the documentation itself, but just
> how often is the scheduler shuffling a thread of execution around anyway?  I
> would have thought that was happening on a timescale that would seem
> positively glacial compared to packet arrival rates.

I didn't contribute to the evaluation or implementation, so cannot
answer decisively (just happen to have written a user's guide for
colleagues that could be reworked into this)

> Again, drifting from critique simply of the documentation, but if
> accelerated RFS is indeed goodness when RFS is being used and the NIC HW
> supports it, shouldn't it be enabled automagically?  And then drifting back
> to the documentation itself, if accelerated RFS isn't enabled automagically
> with RFS today, does the reason suggest a caveat to the suggested
> configuration?

It probably should be enabled automatically, indeed.

> I'd probably go with "over all packets in the flow"

will change that.

> And I'm curious/confused about rates of thread migration vs packets - it
> seems like the mechanisms in place to avoid OOO packets have a property that
> the queue selected can remain "stuck" when the packet rates are sufficiently
> high.

It sounds like that, yes.

> If being stuck isn't likely, it suggests that "normal" processing is
> enough to get packets drained - that the thread of execution is (at least in
> the context of sending and receiving traffic) going idle.

Not necessarily, if a single thread processes many connections at
once. State is kept on a per connection basis in the sk struct.

> In the specific example of TCP, I see where ACK of data is sufficient to
> guarantee no OOO on outbound when migrating, but all that is really
> necessary is transmit completion by the NIC, no?  Admittedly, getting that
> information to TCP is probably undesired overhead, but doesn't using the ACK
> "penalize" the thread/TCP talking to more remote (in terms of RTT)
> destinations?

Probably, but perhaps someone with more intimate knowledge of the
implementation should answer definitely.

  reply	other threads:[~2011-08-11 21:34 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-08-09 14:20 [PATCH v2] net: add Documentation/networking/scaling.txt Willem de Bruijn
2011-08-09 18:45 ` Rick Jones
2011-08-11 14:26   ` Will de Bruijn
2011-08-11 16:31     ` Eric Dumazet
2011-08-11 18:02       ` Rick Jones
2011-08-11 21:34         ` Will de Bruijn [this message]
2011-08-12  0:34   ` [PATCH 00/02] small changes to Documentation/networking/00-INDEX and scaling.txt Willem de Bruijn
2011-08-12  0:39     ` [PATCH 01/02] net: add missing entries to Documentation/networking/00-INDEX Willem de Bruijn
2011-08-12  7:40       ` Michał Mirosław
2011-08-12  0:41     ` [PATCH 2/2] net: minor update to Documentation/networking/scaling.txt Willem de Bruijn
2011-08-12 23:32       ` Rick Jones
2011-08-15 16:11         ` Will de Bruijn
2011-08-15 16:56           ` Rick Jones
2011-08-14  1:03     ` [PATCH 00/02] small changes to Documentation/networking/00-INDEX and scaling.txt David Miller
2011-08-10 14:57 ` [PATCH v2] net: add Documentation/networking/scaling.txt David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+FuTSc3mcL6i8J2CBvbOui1xLNDHPf0DJj=NorSduvRLq+vbg@mail.gmail.com' \
    --to=willemb@google.com \
    --cc=davem@davemloft.net \
    --cc=eric.dumazet@gmail.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=rdunlap@xenotime.net \
    --cc=rick.jones2@hp.com \
    --cc=therbert@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).