netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Nitesh Narayan Lal <nitesh@redhat.com>
To: Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>
Cc: Marcelo Tosatti <mtosatti@redhat.com>,
	helgaas@kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-pci@vger.kernel.org,
	intel-wired-lan@lists.osuosl.org, frederic@kernel.org,
	sassmann@redhat.com, jesse.brandeburg@intel.com,
	lihong.yang@intel.com, jeffrey.t.kirsher@intel.com,
	jacob.e.keller@intel.com, jlelli@redhat.com, hch@infradead.org,
	bhelgaas@google.com, mike.marciniszyn@intel.com,
	dennis.dalessandro@intel.com, thomas.lendacky@amd.com,
	jiri@nvidia.com, mingo@redhat.com, juri.lelli@redhat.com,
	vincent.guittot@linaro.org, lgoncalv@redhat.com
Subject: Re: [PATCH v4 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
Date: Mon, 26 Oct 2020 09:35:52 -0400	[thread overview]
Message-ID: <9529ddd0-28f7-fa74-e56f-39de84321a22@redhat.com> (raw)
In-Reply-To: <87ft6464jf.fsf@nanos.tec.linutronix.de>


[-- Attachment #1.1: Type: text/plain, Size: 4785 bytes --]


On 10/23/20 5:00 PM, Thomas Gleixner wrote:
> On Fri, Oct 23 2020 at 09:10, Nitesh Narayan Lal wrote:
>> On 10/23/20 4:58 AM, Peter Zijlstra wrote:
>>> On Thu, Oct 22, 2020 at 01:47:14PM -0400, Nitesh Narayan Lal wrote:
>>> So shouldn't we then fix the drivers / interface first, to get rid of
>>> this inconsistency?
>>>
>> Considering we agree that excess vector is a problem that needs to be
>> solved across all the drivers and that you are comfortable with the other
>> three patches in the set. If I may suggest the following:
>>
>> - We can pick those three patches for now, as that will atleast fix a
>>   driver that is currently impacting RT workloads. Is that a fair
>>   expectation?
> No. Blindly reducing the maximum vectors to the number of housekeeping
> CPUs is patently wrong. The PCI core _cannot_ just nilly willy decide
> what the right number of interrupts for this situation is.
>
> Many of these drivers need more than queue interrupts, admin, error
> interrupt and some operate best with seperate RX/TX interrupts per
> queue. They all can "work" with a single PCI interrupt of course, but
> the price you pay is performance.
>
> An isolated setup, which I'm familiar with, has two housekeeping
> CPUs. So far I restricted the number of network queues with a module
> argument to two, which allocates two management interrupts for the
> device and two interrupts (RX/TX) per queue, i.e. a total of six.

Does it somehow take num_online_cpus() into consideration while deciding
the number of interrupts to create?

>
> Now I reduced the number of available interrupts to two according to
> your hack, which makes it use one queue RX/TX combined and one
> management interrupt. Guess what happens? Network performance tanks to
> the points that it breaks a carefully crafted setup.
>
> The same applies to a device which is application specific and wants one
> channel including an interrupt per isolated application core. Today I
> can isolate 8 out of 12 CPUs and let the device create 8 channels and
> set one interrupt and channel affine to each isolated CPU. With your
> hack, I get only 4 interrupts and channels. Fail!
>
> You cannot declare that all this is perfectly fine, just because it does
> not matter for your particular use case.

I agree that does sound wrong.

>
> So without information from the driver which tells what the best number
> of interrupts is with a reduced number of CPUs, this cutoff will cause
> more problems than it solves. Regressions guaranteed.

Indeed.
I think one commonality among the drivers at the moment is the usage of
num_online_cpus() to determine the vectors to create.

So, maybe instead of doing this kind of restrictions in a generic level
API, it will make more sense to do this on a per-device basis by replacing
the number of online CPUs with the housekeeping CPUs?

This is what I have done in the i40e patch.
But that still sounds hackish and will impact the performance.

>
> Managed interrupts base their interrupt allocation and spreading on
> information which is handed in by the individual driver and not on crude
> assumptions. They are not imposing restrictions on the use case.

Right, FWIU it is irq_do_set_affinity that prevents the spreading of
managed interrupts to isolated CPUs if HK_FLAG_MANAGED_IRQ is enabled,
isn't?

>
> It's perfectly fine for isolated work to save a data set to disk after
> computation has finished and that just works with the per-cpu I/O queue
> which is otherwise completely silent. All isolated workers can do the
> same in parallel without trampling on each other toes by competing for a
> reduced number of queues which are affine to the housekeeper CPUs.
>
> Unfortunately network multi-queue is substantially different from block
> multi-queue (as I learned in this conversation), so the concept cannot
> be applied one-to-one to networking as is. But there are certainly part
> of it which can be reused.

So this is one of the areas that I don't understand well yet and will
explore now.

>
> This needs a lot more thought than just these crude hacks.

Got it.
I am indeed not in the favor of pushing a solution that has a possibility
of regressing/breaking things afterward.

>
> Especially under the aspect that there are talks about making isolation
> runtime switchable. Are you going to rmmod/insmod the i40e network
> driver to do so? That's going to work fine if you do that
> reconfiguration over network...

That's an interesting point. However, for some of the drivers which uses
something like cpu_online/possible_mask while creating its affinity mask
reloading will still associate jobs to the isolated CPUs.


-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

  reply	other threads:[~2020-10-26 13:36 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-28 18:35 [PATCH v4 0/4] isolation: limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
2020-09-28 18:35 ` [PATCH v4 1/4] sched/isolation: API to get number of " Nitesh Narayan Lal
2020-09-28 18:35 ` [PATCH v4 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
2020-10-23 13:25   ` Peter Zijlstra
2020-10-23 13:29     ` Frederic Weisbecker
2020-10-23 13:57       ` Nitesh Narayan Lal
2020-10-23 13:45     ` Nitesh Narayan Lal
2020-09-28 18:35 ` [PATCH v4 3/4] i40e: Limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
2020-09-28 18:35 ` [PATCH v4 4/4] PCI: Limit pci_alloc_irq_vectors() " Nitesh Narayan Lal
2020-09-28 21:59   ` Bjorn Helgaas
2020-09-29 17:46     ` Christoph Hellwig
2020-10-16 12:20   ` Peter Zijlstra
2020-10-18 18:14     ` Nitesh Narayan Lal
2020-10-19 11:11       ` Peter Zijlstra
2020-10-19 14:00         ` Marcelo Tosatti
2020-10-19 14:25           ` Nitesh Narayan Lal
2020-10-20  7:30           ` Peter Zijlstra
2020-10-20 13:00             ` Nitesh Narayan Lal
2020-10-20 13:41               ` Peter Zijlstra
2020-10-20 14:39                 ` Nitesh Narayan Lal
2020-10-22 17:47                   ` Nitesh Narayan Lal
2020-10-23  8:58                     ` Peter Zijlstra
2020-10-23 13:10                       ` Nitesh Narayan Lal
2020-10-23 21:00                         ` Thomas Gleixner
2020-10-26 13:35                           ` Nitesh Narayan Lal [this message]
2020-10-26 13:57                             ` Thomas Gleixner
2020-10-26 17:30                           ` Marcelo Tosatti
2020-10-26 19:00                             ` Thomas Gleixner
2020-10-26 19:11                               ` Marcelo Tosatti
2020-10-26 19:21                               ` Jacob Keller
2020-10-26 20:11                                 ` Thomas Gleixner
2020-10-26 21:11                                   ` Jacob Keller
2020-10-26 21:50                                     ` Thomas Gleixner
2020-10-26 22:13                                       ` Jakub Kicinski
2020-10-26 22:46                                         ` Thomas Gleixner
2020-10-26 22:52                                         ` Jacob Keller
2020-10-26 22:22                                       ` Nitesh Narayan Lal
2020-10-26 22:49                                         ` Thomas Gleixner
2020-10-26 23:08                                           ` Jacob Keller
2020-10-27 14:28                                             ` Thomas Gleixner
2020-10-27 11:47                                         ` Marcelo Tosatti
2020-10-27 14:43                                           ` Thomas Gleixner
2020-10-19 14:21         ` Frederic Weisbecker
2020-10-20 14:16   ` Thomas Gleixner
2020-10-20 16:18     ` Nitesh Narayan Lal
2020-10-20 18:07       ` Thomas Gleixner
2020-10-21 20:25         ` Thomas Gleixner
2020-10-21 21:04           ` Nitesh Narayan Lal
2020-10-22  0:02           ` Jakub Kicinski
2020-10-22  0:27             ` Jacob Keller
2020-10-22  8:28             ` Thomas Gleixner
2020-10-22 12:28           ` Marcelo Tosatti
2020-10-22 22:39             ` Thomas Gleixner
2020-10-01 15:49 ` [PATCH v4 0/4] isolation: limit msix vectors " Frederic Weisbecker
2020-10-08 21:40   ` Nitesh Narayan Lal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9529ddd0-28f7-fa74-e56f-39de84321a22@redhat.com \
    --to=nitesh@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=dennis.dalessandro@intel.com \
    --cc=frederic@kernel.org \
    --cc=hch@infradead.org \
    --cc=helgaas@kernel.org \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=jacob.e.keller@intel.com \
    --cc=jeffrey.t.kirsher@intel.com \
    --cc=jesse.brandeburg@intel.com \
    --cc=jiri@nvidia.com \
    --cc=jlelli@redhat.com \
    --cc=juri.lelli@redhat.com \
    --cc=lgoncalv@redhat.com \
    --cc=lihong.yang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mike.marciniszyn@intel.com \
    --cc=mingo@redhat.com \
    --cc=mtosatti@redhat.com \
    --cc=netdev@vger.kernel.org \
    --cc=peterz@infradead.org \
    --cc=sassmann@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=thomas.lendacky@amd.com \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).