From: Nitesh Narayan Lal <firstname.lastname@example.org> To: Thomas Gleixner <email@example.com>, Peter Zijlstra <firstname.lastname@example.org> Cc: Marcelo Tosatti <email@example.com>, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com, firstname.lastname@example.org, email@example.com Subject: Re: [PATCH v4 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs Date: Mon, 26 Oct 2020 09:35:52 -0400 [thread overview] Message-ID: <firstname.lastname@example.org> (raw) In-Reply-To: <email@example.com> [-- Attachment #1.1: Type: text/plain, Size: 4785 bytes --] On 10/23/20 5:00 PM, Thomas Gleixner wrote: > On Fri, Oct 23 2020 at 09:10, Nitesh Narayan Lal wrote: >> On 10/23/20 4:58 AM, Peter Zijlstra wrote: >>> On Thu, Oct 22, 2020 at 01:47:14PM -0400, Nitesh Narayan Lal wrote: >>> So shouldn't we then fix the drivers / interface first, to get rid of >>> this inconsistency? >>> >> Considering we agree that excess vector is a problem that needs to be >> solved across all the drivers and that you are comfortable with the other >> three patches in the set. If I may suggest the following: >> >> - We can pick those three patches for now, as that will atleast fix a >> driver that is currently impacting RT workloads. Is that a fair >> expectation? > No. Blindly reducing the maximum vectors to the number of housekeeping > CPUs is patently wrong. The PCI core _cannot_ just nilly willy decide > what the right number of interrupts for this situation is. > > Many of these drivers need more than queue interrupts, admin, error > interrupt and some operate best with seperate RX/TX interrupts per > queue. They all can "work" with a single PCI interrupt of course, but > the price you pay is performance. > > An isolated setup, which I'm familiar with, has two housekeeping > CPUs. So far I restricted the number of network queues with a module > argument to two, which allocates two management interrupts for the > device and two interrupts (RX/TX) per queue, i.e. a total of six. Does it somehow take num_online_cpus() into consideration while deciding the number of interrupts to create? > > Now I reduced the number of available interrupts to two according to > your hack, which makes it use one queue RX/TX combined and one > management interrupt. Guess what happens? Network performance tanks to > the points that it breaks a carefully crafted setup. > > The same applies to a device which is application specific and wants one > channel including an interrupt per isolated application core. Today I > can isolate 8 out of 12 CPUs and let the device create 8 channels and > set one interrupt and channel affine to each isolated CPU. With your > hack, I get only 4 interrupts and channels. Fail! > > You cannot declare that all this is perfectly fine, just because it does > not matter for your particular use case. I agree that does sound wrong. > > So without information from the driver which tells what the best number > of interrupts is with a reduced number of CPUs, this cutoff will cause > more problems than it solves. Regressions guaranteed. Indeed. I think one commonality among the drivers at the moment is the usage of num_online_cpus() to determine the vectors to create. So, maybe instead of doing this kind of restrictions in a generic level API, it will make more sense to do this on a per-device basis by replacing the number of online CPUs with the housekeeping CPUs? This is what I have done in the i40e patch. But that still sounds hackish and will impact the performance. > > Managed interrupts base their interrupt allocation and spreading on > information which is handed in by the individual driver and not on crude > assumptions. They are not imposing restrictions on the use case. Right, FWIU it is irq_do_set_affinity that prevents the spreading of managed interrupts to isolated CPUs if HK_FLAG_MANAGED_IRQ is enabled, isn't? > > It's perfectly fine for isolated work to save a data set to disk after > computation has finished and that just works with the per-cpu I/O queue > which is otherwise completely silent. All isolated workers can do the > same in parallel without trampling on each other toes by competing for a > reduced number of queues which are affine to the housekeeper CPUs. > > Unfortunately network multi-queue is substantially different from block > multi-queue (as I learned in this conversation), so the concept cannot > be applied one-to-one to networking as is. But there are certainly part > of it which can be reused. So this is one of the areas that I don't understand well yet and will explore now. > > This needs a lot more thought than just these crude hacks. Got it. I am indeed not in the favor of pushing a solution that has a possibility of regressing/breaking things afterward. > > Especially under the aspect that there are talks about making isolation > runtime switchable. Are you going to rmmod/insmod the i40e network > driver to do so? That's going to work fine if you do that > reconfiguration over network... That's an interesting point. However, for some of the drivers which uses something like cpu_online/possible_mask while creating its affinity mask reloading will still associate jobs to the isolated CPUs. -- Thanks Nitesh [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 833 bytes --]
next prev parent reply other threads:[~2020-10-26 13:36 UTC|newest] Thread overview: 55+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-09-28 18:35 [PATCH v4 0/4] isolation: limit msix vectors " Nitesh Narayan Lal 2020-09-28 18:35 ` [PATCH v4 1/4] sched/isolation: API to get number of " Nitesh Narayan Lal 2020-09-28 18:35 ` [PATCH v4 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal 2020-10-23 13:25 ` Peter Zijlstra 2020-10-23 13:29 ` Frederic Weisbecker 2020-10-23 13:57 ` Nitesh Narayan Lal 2020-10-23 13:45 ` Nitesh Narayan Lal 2020-09-28 18:35 ` [PATCH v4 3/4] i40e: Limit msix vectors to housekeeping CPUs Nitesh Narayan Lal 2020-09-28 18:35 ` [PATCH v4 4/4] PCI: Limit pci_alloc_irq_vectors() " Nitesh Narayan Lal 2020-09-28 21:59 ` Bjorn Helgaas 2020-09-29 17:46 ` Christoph Hellwig 2020-10-16 12:20 ` Peter Zijlstra 2020-10-18 18:14 ` Nitesh Narayan Lal 2020-10-19 11:11 ` Peter Zijlstra 2020-10-19 14:00 ` Marcelo Tosatti 2020-10-19 14:25 ` Nitesh Narayan Lal 2020-10-20 7:30 ` Peter Zijlstra 2020-10-20 13:00 ` Nitesh Narayan Lal 2020-10-20 13:41 ` Peter Zijlstra 2020-10-20 14:39 ` Nitesh Narayan Lal 2020-10-22 17:47 ` Nitesh Narayan Lal 2020-10-23 8:58 ` Peter Zijlstra 2020-10-23 13:10 ` Nitesh Narayan Lal 2020-10-23 21:00 ` Thomas Gleixner 2020-10-26 13:35 ` Nitesh Narayan Lal [this message] 2020-10-26 13:57 ` Thomas Gleixner 2020-10-26 17:30 ` Marcelo Tosatti 2020-10-26 19:00 ` Thomas Gleixner 2020-10-26 19:11 ` Marcelo Tosatti 2020-10-26 19:21 ` Jacob Keller 2020-10-26 20:11 ` Thomas Gleixner 2020-10-26 21:11 ` Jacob Keller 2020-10-26 21:50 ` Thomas Gleixner 2020-10-26 22:13 ` Jakub Kicinski 2020-10-26 22:46 ` Thomas Gleixner 2020-10-26 22:52 ` Jacob Keller 2020-10-26 22:22 ` Nitesh Narayan Lal 2020-10-26 22:49 ` Thomas Gleixner 2020-10-26 23:08 ` Jacob Keller 2020-10-27 14:28 ` Thomas Gleixner 2020-10-27 11:47 ` Marcelo Tosatti 2020-10-27 14:43 ` Thomas Gleixner 2020-10-19 14:21 ` Frederic Weisbecker 2020-10-20 14:16 ` Thomas Gleixner 2020-10-20 16:18 ` Nitesh Narayan Lal 2020-10-20 18:07 ` Thomas Gleixner 2020-10-21 20:25 ` Thomas Gleixner 2020-10-21 21:04 ` Nitesh Narayan Lal 2020-10-22 0:02 ` Jakub Kicinski 2020-10-22 0:27 ` Jacob Keller 2020-10-22 8:28 ` Thomas Gleixner 2020-10-22 12:28 ` Marcelo Tosatti 2020-10-22 22:39 ` Thomas Gleixner 2020-10-01 15:49 ` [PATCH v4 0/4] isolation: limit msix vectors " Frederic Weisbecker 2020-10-08 21:40 ` Nitesh Narayan Lal
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --subject='Re: [PATCH v4 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs' \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).