archive mirror
 help / color / mirror / Atom feed
From: Nitesh Narayan Lal <>
Subject: [RFC] [PATCH v1 0/3] isolation: limit msix vectors based on housekeeping CPUs
Date: Wed,  9 Sep 2020 11:08:15 -0400	[thread overview]
Message-ID: <> (raw)

This is a follow-up posting for "[v1] i40e: limit the msix vectors based on
housekeeping CPUs" [1] (It took longer than expected for me to get back to

With the current implementation device drivers while creating their MSIX
vectors only takes num_online_cpus() into consideration which works quite well
for a non-RT environment, but in an RT environment that has a large number of
isolated CPUs and a very few housekeeping CPUs this could lead to a problem.
The problem will be triggered when something like tuned will try to move all
the IRQs from isolated CPUs to the limited number of housekeeping CPUs to
prevent interruptions for a latency sensitive workload that will be runing on
the isolated CPUs. This failure is caused because of the per CPU vector

Proposed Fix
In this patch-set, the following changes are proposed:
- A generic API num_housekeeping_cpus() which is meant to return the available
  housekeeping CPUs in an environment with isolated CPUs and all online CPUs
- i40e: Specifically for the i40e driver the num_online_cpus() used in 
  i40e_init_msix() to calculate numbers msix vectors is replaced with the above
  defined API. This is done to restrict the number of msix vectors for i40e in
  RT environments.
- pci_alloc_irq_vector(): With the help of num_housekeeping_cpus() the max_vecs
  passed in pci_alloc_irq_vector() is restricted only to the available
  housekeeping CPUs only in an environment that has isolated CPUs. However, if
  the min_vecs exceeds the num_housekeeping_cpus(), no change is made to make
  sure that a device initialization is not prevented due to lack of
  housekeeping CPUs.

Reproducing the Issue
I have triggered this issue on a setup that had a total of 72 cores among which
68 were isolated and only 4 were left for housekeeping tasks. I was using
tuned's realtime-virtual-host profile to configure the system. In this
scenario, Tuned reported the error message 'Failed to set SMP affinity of IRQ
xxx to '00000040,00000010,00000005': [Errno 28] No space left on the device'
for several IRQs in tuned.log due to the per CPU vector limit.

- To test that the issue is resolved with i40e change I added a tracepoint
  in i40e_init_msix() to find the number of CPUs derived for vector creation
  with and without tuned's realtime-virtual-host profile. As per expectation
  with the profile applied I was only getting the number of housekeeping CPUs
  and all available CPUs without it.

- To analyze the performance impact I have targetted the change introduced in 
  pci_alloc_irq_vectors() and compared the results against a vanilla kernel
  (5.9.0-rc3) results.

  Setup Information:
  + I had a couple of 24-core machines connected back to back via a couple of
    mlx5 NICs and I analyzed the average bitrate for server-client TCP and UDP
    transmission via iperf. 
  + To minimize the Bitrate variation of iperf TCP and UDP stream test I have
    applied the tuned's network-throughput profile and disabled HT.
 Test Information:
  + For the environment that had no isolated CPUs:
    I have tested with single stream and 24 streams (same as that of online
  + For the environment that had 20 isolated CPUs:
    I have tested with single stream, 4 streams (same as that the number of
    housekeeping) and 24 streams (same as that of online CPUs).

  # UDP Stream Test:
    + There was no degradation observed in UDP stream tests in both
      environments. (With isolated CPUs and without isolated CPUs after the
      introduction of the patches).
  # TCP Stream Test - No isolated CPUs:
    + No noticeable degradation was observed.
  # TCP Stream Test - With isolated CPUs:
    + Multiple Stream (4)  - Average degradation of around 5-6%
    + Multiple Stream (24) - Average degradation of around 2-3%
    + Single Stream        - Even on a vanilla kernel the Bitrate observed for
                             a TCP single stream test seem to vary
                             significantly across different runs (eg. the %
                             variation between the best and the worst case on
                             a vanilla kernel was around 8-10%). A similar
                             variation was observed with the kernel that
                             included my patches. No additional degradation
                             was observed.

Since the change specifically for pci_alloc_irq_vectors is going to impact
several drivers I have posted this patch-set as RFC. I would be happy to
perform more testing based on any suggestions or incorporate any comments to
ensure that the change is not breaking anything.


Nitesh Narayan Lal (3):
  sched/isolation: API to get num of hosekeeping CPUs
  i40e: limit msix vectors based on housekeeping CPUs
  PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs

 drivers/net/ethernet/intel/i40e/i40e_main.c |  3 ++-
 include/linux/pci.h                         | 16 ++++++++++++++
 include/linux/sched/isolation.h             |  7 +++++++
 kernel/sched/isolation.c                    | 23 +++++++++++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)


             reply	other threads:[~2020-09-09 17:17 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-09 15:08 Nitesh Narayan Lal [this message]
2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
2020-09-17 18:18   ` Jesse Brandeburg
2020-09-17 18:43     ` Nitesh Narayan Lal
2020-09-17 20:11   ` Bjorn Helgaas
2020-09-17 21:48     ` Jacob Keller
2020-09-17 22:09     ` Nitesh Narayan Lal
2020-09-21 23:40   ` Frederic Weisbecker
2020-09-22  3:16     ` Nitesh Narayan Lal
2020-09-22 10:08       ` Frederic Weisbecker
2020-09-22 13:50         ` Nitesh Narayan Lal
2020-09-22 20:58           ` Frederic Weisbecker
2020-09-22 21:15             ` Nitesh Narayan Lal
2020-09-22 21:26             ` Andrew Lunn
2020-09-22 22:20               ` Nitesh Narayan Lal
2020-09-09 15:08 ` [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-11 15:23   ` Marcelo Tosatti
2020-09-17 18:23   ` Jesse Brandeburg
2020-09-17 18:31     ` Nitesh Narayan Lal
2020-09-21 22:58     ` Frederic Weisbecker
2020-09-22  3:08       ` Nitesh Narayan Lal
2020-09-22  9:54         ` Frederic Weisbecker
2020-09-22 13:34           ` Nitesh Narayan Lal
2020-09-22 20:44             ` Frederic Weisbecker
2020-09-22 21:05               ` Nitesh Narayan Lal
2020-09-09 15:08 ` [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
2020-09-10 19:22   ` Marcelo Tosatti
2020-09-10 19:31     ` Nitesh Narayan Lal
2020-09-22 13:54       ` Nitesh Narayan Lal
2020-09-22 21:08         ` Frederic Weisbecker

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).