* [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs
@ 2020-09-23 18:11 Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
` (3 more replies)
0 siblings, 4 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
This is a follow-up posting for "[RFC v1 0/3] isolation: limit msix vectors
based on housekeeping CPUs".
Issue
=====
With the current implementation device drivers while creating their MSIX
vectors only take num_online_cpus() into consideration which works quite well
for a non-RT environment, but in an RT environment that has a large number of
isolated CPUs and very few housekeeping CPUs this could lead to a problem.
The problem will be triggered when something like tuned will try to move all
the IRQs from isolated CPUs to the limited number of housekeeping CPUs to
prevent interruptions for a latency-sensitive workload that will be running on
the isolated CPUs. This failure is caused because of the per CPU vector
limitation.
Proposed Fix
============
In this patch-set, the following changes are proposed:
- A generic API hk_num_online_cpus() which is meant to return the online
housekeeping CPUs that are meant to handle managed IRQ jobs.
- i40e: Specifically for the i40e driver the num_online_cpus() used in
i40e_init_msix() to calculate numbers msix vectors is replaced with the above
defined API. This is done to restrict the number of msix vectors for i40e in
RT environments.
- pci_alloc_irq_vector(): With the help of hk_num_online_cpus() the max_vecs
passed in pci_alloc_irq_vector() is restricted only to the online
housekeeping CPUs only in an RT environment. However, if the min_vecs exceeds
the online housekeeping CPUs, max_vecs is limited based on the min_vecs
instead.
Future Work
===========
- In the previous upstream discussion [1], it was decided that it would be
better if we can have a generic framework that can be consumed by all the
drivers to fix this kind of issue. However, it will be a long term work,
and since there are RT workloads that are getting impacted by the reported
issue. We agreed upon the proposed per-device approach for now.
Testing
=======
Functionality:
- To test that the issue is resolved with i40e change I added a tracepoint
in i40e_init_msix() to find the number of CPUs derived for vector creation
with and without tuned's realtime-virtual-host profile. As per expectation
with the profile applied I was only getting the number of housekeeping CPUs
and all available CPUs without it.
Similarly did a few more tests with different modes eg with only
nohz_full, isolcpus etc.
Performance:
- To analyze the performance impact I have targetted the change introduced in
pci_alloc_irq_vectors() and compared the results against a vanilla kernel
(5.9.0-rc3) results.
Setup Information:
+ I had a couple of 24-core machines connected back to back via a couple of
mlx5 NICs and I analyzed the average bitrate for server-client TCP and UDP
transmission via iperf.
+ To minimize the Bitrate variation of iperf TCP and UDP stream test I have
applied the tuned's network-throughput profile and disabled HT.
Test Information:
+ For the environment that had no isolated CPUs:
I have tested with single stream and 24 streams (same as that of online
CPUs).
+ For the environment that had 20 isolated CPUs:
I have tested with single stream, 4 streams (same as that the number of
housekeeping) and 24 streams (same as that of online CPUs).
Results:
# UDP Stream Test:
+ There was no degradation observed in UDP stream tests in both
environments. (With isolated CPUs and without isolated CPUs after the
introduction of the patches).
# TCP Stream Test - No isolated CPUs:
+ No noticeable degradation was observed.
# TCP Stream Test - With isolated CPUs:
+ Multiple Stream (4) - Average degradation of around 5-6%
+ Multiple Stream (24) - Average degradation of around 2-3%
+ Single Stream - Even on a vanilla kernel the Bitrate observed for
a TCP single stream test seem to vary
significantly across different runs (eg. the %
variation between the best and the worst case on
a vanilla kernel was around 8-10%). A similar
variation was observed with the kernel that
included my patches. No additional degradation
was observed.
If there are any suggestions for more performance evaluation, I would
be happy to discuss/perform them.
Changes from v1[2]:
==================
Patch1:
- Replaced num_houskeeeping_cpus() with hk_num_online_cpus() and started using
the cpumask corresponding to HK_FLAG_MANAGED_IRQ to derive the number of
online housekeeping CPUs. This is based on Frederic Weisbecker's suggestion.
- Since the hk_num_online_cpus() is self-explanatory, got rid of
the comment that was added previously.
Patch2:
- Added a new patch that is meant to enable managed IRQ isolation for nohz_full
CPUs. This is based on Frederic Weisbecker's suggestion.
Patch4 (PCI):
- For cases where the min_vecs exceeds the online housekeeping CPUs, instead
of skipping modification to max_vecs, started restricting it based on the
min_vecs. This is based on a suggestion from Marcelo Tosatti.
[1] https://lore.kernel.org/lkml/20200922095440.GA5217@lenoir/
[2] https://lore.kernel.org/lkml/20200909150818.313699-1-nitesh@redhat.com/
Nitesh Narayan Lal (4):
sched/isolation: API to get housekeeping online CPUs
sched/isolation: Extend nohz_full to isolate managed IRQs
i40e: limit msix vectors based on housekeeping CPUs
PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
include/linux/pci.h | 15 +++++++++++++++
include/linux/sched/isolation.h | 13 +++++++++++++
kernel/sched/isolation.c | 2 +-
4 files changed, 31 insertions(+), 2 deletions(-)
--
^ permalink raw reply [flat|nested] 19+ messages in thread
* [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
@ 2020-09-23 18:11 ` Nitesh Narayan Lal
2020-09-24 8:40 ` peterz
` (3 more replies)
2020-09-23 18:11 ` [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
` (2 subsequent siblings)
3 siblings, 4 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
Introduce a new API hk_num_online_cpus(), that can be used to
retrieve the number of online housekeeping CPUs that are meant to handle
managed IRQ jobs.
This API is introduced for the drivers that were previously relying only
on num_online_cpus() to determine the number of MSIX vectors to create.
In an RT environment with large isolated but fewer housekeeping CPUs this
was leading to a situation where an attempt to move all of the vectors
corresponding to isolated CPUs to housekeeping CPUs were failing due to
per CPU vector limit.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
include/linux/sched/isolation.h | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index cc9f393e2a70..2e96b626e02e 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
return true;
}
+static inline unsigned int hk_num_online_cpus(void)
+{
+#ifdef CONFIG_CPU_ISOLATION
+ const struct cpumask *hk_mask;
+
+ if (static_branch_unlikely(&housekeeping_overridden)) {
+ hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
+ return cpumask_weight(hk_mask);
+ }
+#endif
+ return cpumask_weight(cpu_online_mask);
+}
+
#endif /* _LINUX_SCHED_ISOLATION_H */
--
2.18.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
@ 2020-09-23 18:11 ` Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
3 siblings, 0 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
Extend nohz_full feature set to include isolation from managed IRQS. This
is required specifically for setups that only uses nohz_full and still
requires isolation for maintaining lower latency for the listed CPUs.
Having this change will ensure that the kernel functions that were using
HK_FLAG_MANAGED_IRQ to derive cpumask for pinning various jobs/IRQs do not
enqueue anything on the CPUs listed under nohz_full.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
kernel/sched/isolation.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..9df9598a9e39 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
unsigned int flags;
flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
- HK_FLAG_MISC | HK_FLAG_KTHREAD;
+ HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_MANAGED_IRQ;
return housekeeping_setup(str, flags);
}
--
2.18.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
@ 2020-09-23 18:11 ` Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
3 siblings, 0 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
In a realtime environment, it is essential to isolate unwanted IRQs from
isolated CPUs to prevent latency overheads. Creating MSIX vectors only
based on the online CPUs could lead to a potential issue on an RT setup
that has several isolated CPUs but a very few housekeeping CPUs. This is
because in these kinds of setups an attempt to move the IRQs to the limited
housekeeping CPUs from isolated CPUs might fail due to the per CPU vector
limit. This could eventually result in latency spikes because of the IRQs
that we fail to move from isolated CPUs.
This patch prevents i40e to create vectors only based on online CPUs by
using hk_num_online_cpus() instead.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e433fdbf2c3..fcb6fa3707e0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5,6 +5,7 @@
#include <linux/of_net.h>
#include <linux/pci.h>
#include <linux/bpf.h>
+#include <linux/sched/isolation.h>
#include <generated/utsrelease.h>
/* Local includes */
@@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
* will use any remaining vectors to reach as close as we can to the
* number of online CPUs.
*/
- cpus = num_online_cpus();
+ cpus = hk_num_online_cpus();
pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
vectors_left -= pf->num_lan_msix;
--
2.18.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
` (2 preceding siblings ...)
2020-09-23 18:11 ` [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
@ 2020-09-23 18:11 ` Nitesh Narayan Lal
2020-09-24 20:45 ` Bjorn Helgaas
3 siblings, 1 reply; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
passed on by the caller based on the housekeeping online CPUs (that are
meant to perform managed IRQ jobs).
A minimum of the max_vecs passed and housekeeping online CPUs is derived
to ensure that we don't create excess vectors as that can be problematic
specifically in an RT environment. In cases where the min_vecs exceeds the
housekeeping online CPUs, max vecs is restricted based on the min_vecs
instead. The proposed change is required because for an RT environment
unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
keep the latency overhead to a minimum. If the number of housekeeping CPUs
is significantly lower than that of the isolated CPUs we can run into
failures while moving these IRQs to housekeeping CPUs due to per CPU
vector limit.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
include/linux/pci.h | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 835530605c0d..cf9ca9410213 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -38,6 +38,7 @@
#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/resource_ext.h>
+#include <linux/sched/isolation.h>
#include <uapi/linux/pci.h>
#include <linux/pci_ids.h>
@@ -1797,6 +1798,20 @@ static inline int
pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags)
{
+ unsigned int hk_cpus = hk_num_online_cpus();
+
+ /*
+ * For a real-time environment, try to be conservative and at max only
+ * ask for the same number of vectors as there are housekeeping online
+ * CPUs. In case, the min_vecs requested exceeds the housekeeping
+ * online CPUs, restrict the max_vecs based on the min_vecs instead.
+ */
+ if (hk_cpus != num_online_cpus()) {
+ if (min_vecs > hk_cpus)
+ max_vecs = min_vecs;
+ else
+ max_vecs = min_t(int, max_vecs, hk_cpus);
+ }
return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
NULL);
}
--
2.18.2
^ permalink raw reply related [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
@ 2020-09-24 8:40 ` peterz
2020-09-24 12:09 ` Frederic Weisbecker
2020-09-24 12:11 ` Frederic Weisbecker
` (2 subsequent siblings)
3 siblings, 1 reply; 19+ messages in thread
From: peterz @ 2020-09-24 8:40 UTC (permalink / raw)
To: Nitesh Narayan Lal
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot
On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
>
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
>
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
> include/linux/sched/isolation.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> return true;
> }
>
> +static inline unsigned int hk_num_online_cpus(void)
This breaks with the established naming of that header.
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-24 8:40 ` peterz
@ 2020-09-24 12:09 ` Frederic Weisbecker
2020-09-24 12:23 ` Nitesh Narayan Lal
2020-09-24 12:24 ` Peter Zijlstra
0 siblings, 2 replies; 19+ messages in thread
From: Frederic Weisbecker @ 2020-09-24 12:09 UTC (permalink / raw)
To: peterz
Cc: Nitesh Narayan Lal, linux-kernel, netdev, linux-pci,
intel-wired-lan, mtosatti, sassmann, jesse.brandeburg,
lihong.yang, helgaas, jeffrey.t.kirsher, jacob.e.keller, jlelli,
hch, bhelgaas, mike.marciniszyn, dennis.dalessandro,
thomas.lendacky, jerinj, mathias.nyman, jiri, mingo, juri.lelli,
vincent.guittot
On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz@infradead.org wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> > Introduce a new API hk_num_online_cpus(), that can be used to
> > retrieve the number of online housekeeping CPUs that are meant to handle
> > managed IRQ jobs.
> >
> > This API is introduced for the drivers that were previously relying only
> > on num_online_cpus() to determine the number of MSIX vectors to create.
> > In an RT environment with large isolated but fewer housekeeping CPUs this
> > was leading to a situation where an attempt to move all of the vectors
> > corresponding to isolated CPUs to housekeeping CPUs were failing due to
> > per CPU vector limit.
> >
> > Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> > ---
> > include/linux/sched/isolation.h | 13 +++++++++++++
> > 1 file changed, 13 insertions(+)
> >
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index cc9f393e2a70..2e96b626e02e 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> > return true;
> > }
> >
> > +static inline unsigned int hk_num_online_cpus(void)
>
> This breaks with the established naming of that header.
I guess we can make it housekeeping_num_online_cpus() ?
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
2020-09-24 8:40 ` peterz
@ 2020-09-24 12:11 ` Frederic Weisbecker
2020-09-24 12:26 ` Nitesh Narayan Lal
2020-09-24 12:46 ` Peter Zijlstra
2020-09-24 20:47 ` Bjorn Helgaas
3 siblings, 1 reply; 19+ messages in thread
From: Frederic Weisbecker @ 2020-09-24 12:11 UTC (permalink / raw)
To: Nitesh Narayan Lal
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, mtosatti,
sassmann, jesse.brandeburg, lihong.yang, helgaas,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
>
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
>
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
> include/linux/sched/isolation.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> return true;
> }
>
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> + const struct cpumask *hk_mask;
> +
> + if (static_branch_unlikely(&housekeeping_overridden)) {
> + hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:
housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
completely arbitrary otherwise.
> + return cpumask_weight(hk_mask);
> + }
> +#endif
> + return cpumask_weight(cpu_online_mask);
> +}
> +
> #endif /* _LINUX_SCHED_ISOLATION_H */
> --
> 2.18.2
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-24 12:09 ` Frederic Weisbecker
@ 2020-09-24 12:23 ` Nitesh Narayan Lal
2020-09-24 12:24 ` Peter Zijlstra
1 sibling, 0 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 12:23 UTC (permalink / raw)
To: Frederic Weisbecker, peterz
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, mtosatti,
sassmann, jesse.brandeburg, lihong.yang, helgaas,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot
[-- Attachment #1.1: Type: text/plain, Size: 1528 bytes --]
On 9/24/20 8:09 AM, Frederic Weisbecker wrote:
> On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz@infradead.org wrote:
>> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>>> Introduce a new API hk_num_online_cpus(), that can be used to
>>> retrieve the number of online housekeeping CPUs that are meant to handle
>>> managed IRQ jobs.
>>>
>>> This API is introduced for the drivers that were previously relying only
>>> on num_online_cpus() to determine the number of MSIX vectors to create.
>>> In an RT environment with large isolated but fewer housekeeping CPUs this
>>> was leading to a situation where an attempt to move all of the vectors
>>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>>> per CPU vector limit.
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>> ---
>>> include/linux/sched/isolation.h | 13 +++++++++++++
>>> 1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>>> index cc9f393e2a70..2e96b626e02e 100644
>>> --- a/include/linux/sched/isolation.h
>>> +++ b/include/linux/sched/isolation.h
>>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>> return true;
>>> }
>>>
>>> +static inline unsigned int hk_num_online_cpus(void)
>> This breaks with the established naming of that header.
> I guess we can make it housekeeping_num_online_cpus() ?
Right, I can do that.
Thanks
>
--
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-24 12:09 ` Frederic Weisbecker
2020-09-24 12:23 ` Nitesh Narayan Lal
@ 2020-09-24 12:24 ` Peter Zijlstra
1 sibling, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2020-09-24 12:24 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Nitesh Narayan Lal, linux-kernel, netdev, linux-pci,
intel-wired-lan, mtosatti, sassmann, jesse.brandeburg,
lihong.yang, helgaas, jeffrey.t.kirsher, jacob.e.keller, jlelli,
hch, bhelgaas, mike.marciniszyn, dennis.dalessandro,
thomas.lendacky, jerinj, mathias.nyman, jiri, mingo, juri.lelli,
vincent.guittot
On Thu, Sep 24, 2020 at 02:09:57PM +0200, Frederic Weisbecker wrote:
> > > +static inline unsigned int hk_num_online_cpus(void)
> >
> > This breaks with the established naming of that header.
>
> I guess we can make it housekeeping_num_online_cpus() ?
That would be consistent :-)
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-24 12:11 ` Frederic Weisbecker
@ 2020-09-24 12:26 ` Nitesh Narayan Lal
0 siblings, 0 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 12:26 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, mtosatti,
sassmann, jesse.brandeburg, lihong.yang, helgaas,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
[-- Attachment #1.1: Type: text/plain, Size: 1959 bytes --]
On 9/24/20 8:11 AM, Frederic Weisbecker wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>> include/linux/sched/isolation.h | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>> return true;
>> }
>>
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> + const struct cpumask *hk_mask;
>> +
>> + if (static_branch_unlikely(&housekeeping_overridden)) {
>> + hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:
>
> housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
> completely arbitrary otherwise.
Yeap that is more sensible, I will do that.
Do you have any other concerns/suggestions on any other patch?
>
>> + return cpumask_weight(hk_mask);
>> + }
>> +#endif
>> + return cpumask_weight(cpu_online_mask);
>> +}
>> +
>> #endif /* _LINUX_SCHED_ISOLATION_H */
>> --
>> 2.18.2
>>
--
Thanks
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
2020-09-24 8:40 ` peterz
2020-09-24 12:11 ` Frederic Weisbecker
@ 2020-09-24 12:46 ` Peter Zijlstra
2020-09-24 13:45 ` Nitesh Narayan Lal
2020-09-24 20:47 ` Bjorn Helgaas
3 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2020-09-24 12:46 UTC (permalink / raw)
To: Nitesh Narayan Lal
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot
FWIW, cross-posting to moderated lists is annoying. I don't know why we
allow them in MAINTAINERS :-(
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-24 12:46 ` Peter Zijlstra
@ 2020-09-24 13:45 ` Nitesh Narayan Lal
0 siblings, 0 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 13:45 UTC (permalink / raw)
To: Peter Zijlstra
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot
[-- Attachment #1.1: Type: text/plain, Size: 342 bytes --]
On 9/24/20 8:46 AM, Peter Zijlstra wrote:
>
> FWIW, cross-posting to moderated lists is annoying. I don't know why we
> allow them in MAINTAINERS :-(
Yeah, it sends out an acknowledgment for every email.
I had to include it because sending the patches to it apparently allows them
to get tested by Intel folks.
>
--
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
2020-09-23 18:11 ` [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
@ 2020-09-24 20:45 ` Bjorn Helgaas
2020-09-24 21:39 ` Nitesh Narayan Lal
0 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 20:45 UTC (permalink / raw)
To: Nitesh Narayan Lal
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
Possible subject:
PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
> passed on by the caller based on the housekeeping online CPUs (that are
> meant to perform managed IRQ jobs).
>
> A minimum of the max_vecs passed and housekeeping online CPUs is derived
> to ensure that we don't create excess vectors as that can be problematic
> specifically in an RT environment. In cases where the min_vecs exceeds the
> housekeeping online CPUs, max vecs is restricted based on the min_vecs
> instead. The proposed change is required because for an RT environment
> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
> keep the latency overhead to a minimum. If the number of housekeeping CPUs
> is significantly lower than that of the isolated CPUs we can run into
> failures while moving these IRQs to housekeeping CPUs due to per CPU
> vector limit.
Does this capture enough of the log?
If we have isolated CPUs dedicated for use by real-time tasks, we
try to move IRQs to housekeeping CPUs to reduce overhead on the
isolated CPUs.
If we allocate too many IRQ vectors, moving them all to housekeeping
CPUs may exceed per-CPU vector limits.
When we have isolated CPUs, limit the number of vectors allocated by
pci_alloc_irq_vectors() to the minimum number required by the
driver, or to one per housekeeping CPU if that is larger.
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
> include/linux/pci.h | 15 +++++++++++++++
> 1 file changed, 15 insertions(+)
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 835530605c0d..cf9ca9410213 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -38,6 +38,7 @@
> #include <linux/interrupt.h>
> #include <linux/io.h>
> #include <linux/resource_ext.h>
> +#include <linux/sched/isolation.h>
> #include <uapi/linux/pci.h>
>
> #include <linux/pci_ids.h>
> @@ -1797,6 +1798,20 @@ static inline int
> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> unsigned int max_vecs, unsigned int flags)
> {
> + unsigned int hk_cpus = hk_num_online_cpus();
> +
> + /*
> + * For a real-time environment, try to be conservative and at max only
> + * ask for the same number of vectors as there are housekeeping online
> + * CPUs. In case, the min_vecs requested exceeds the housekeeping
> + * online CPUs, restrict the max_vecs based on the min_vecs instead.
> + */
> + if (hk_cpus != num_online_cpus()) {
> + if (min_vecs > hk_cpus)
> + max_vecs = min_vecs;
> + else
> + max_vecs = min_t(int, max_vecs, hk_cpus);
> + }
Is the below basically the same?
/*
* If we have isolated CPUs for use by real-time tasks,
* minimize overhead on those CPUs by moving IRQs to the
* remaining "housekeeping" CPUs. Limit vector usage to keep
* housekeeping CPUs from running out of IRQ vectors.
*/
if (housekeeping_cpus < num_online_cpus()) {
if (housekeeping_cpus < min_vecs)
max_vecs = min_vecs;
else if (housekeeping_cpus < max_vecs)
max_vecs = housekeeping_cpus;
}
My comment isn't quite right because this patch only limits the number
of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
I don't know where the move happens (or maybe you just avoid assigning
IRQs to isolated CPUs, and I don't know how that happens either).
> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
> NULL);
> }
> --
> 2.18.2
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
` (2 preceding siblings ...)
2020-09-24 12:46 ` Peter Zijlstra
@ 2020-09-24 20:47 ` Bjorn Helgaas
2020-09-24 21:52 ` Nitesh Narayan Lal
3 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 20:47 UTC (permalink / raw)
To: Nitesh Narayan Lal
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
>
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
>
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
> include/linux/sched/isolation.h | 13 +++++++++++++
> 1 file changed, 13 insertions(+)
>
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> return true;
> }
>
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> + const struct cpumask *hk_mask;
> +
> + if (static_branch_unlikely(&housekeeping_overridden)) {
> + hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> + return cpumask_weight(hk_mask);
> + }
> +#endif
> + return cpumask_weight(cpu_online_mask);
Just curious: why is this not
#ifdef CONFIG_CPU_ISOLATION
...
#endif
return num_online_cpus();
> +}
> +
> #endif /* _LINUX_SCHED_ISOLATION_H */
> --
> 2.18.2
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
2020-09-24 20:45 ` Bjorn Helgaas
@ 2020-09-24 21:39 ` Nitesh Narayan Lal
2020-09-24 22:59 ` Bjorn Helgaas
0 siblings, 1 reply; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 21:39 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
[-- Attachment #1.1: Type: text/plain, Size: 5133 bytes --]
On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
> Possible subject:
>
> PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
Will switch to this.
>
> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
>> passed on by the caller based on the housekeeping online CPUs (that are
>> meant to perform managed IRQ jobs).
>>
>> A minimum of the max_vecs passed and housekeeping online CPUs is derived
>> to ensure that we don't create excess vectors as that can be problematic
>> specifically in an RT environment. In cases where the min_vecs exceeds the
>> housekeeping online CPUs, max vecs is restricted based on the min_vecs
>> instead. The proposed change is required because for an RT environment
>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
>> keep the latency overhead to a minimum. If the number of housekeeping CPUs
>> is significantly lower than that of the isolated CPUs we can run into
>> failures while moving these IRQs to housekeeping CPUs due to per CPU
>> vector limit.
> Does this capture enough of the log?
>
> If we have isolated CPUs dedicated for use by real-time tasks, we
> try to move IRQs to housekeeping CPUs to reduce overhead on the
> isolated CPUs.
How about:
"
If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
overhead on these real-time CPUs.
"
What do you think?
>
> If we allocate too many IRQ vectors, moving them all to housekeeping
> CPUs may exceed per-CPU vector limits.
>
> When we have isolated CPUs, limit the number of vectors allocated by
> pci_alloc_irq_vectors() to the minimum number required by the
> driver, or to one per housekeeping CPU if that is larger
I think this is good, I can adopt this.
> .
>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>> include/linux/pci.h | 15 +++++++++++++++
>> 1 file changed, 15 insertions(+)
>>
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 835530605c0d..cf9ca9410213 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -38,6 +38,7 @@
>> #include <linux/interrupt.h>
>> #include <linux/io.h>
>> #include <linux/resource_ext.h>
>> +#include <linux/sched/isolation.h>
>> #include <uapi/linux/pci.h>
>>
>> #include <linux/pci_ids.h>
>> @@ -1797,6 +1798,20 @@ static inline int
>> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>> unsigned int max_vecs, unsigned int flags)
>> {
>> + unsigned int hk_cpus = hk_num_online_cpus();
>> +
>> + /*
>> + * For a real-time environment, try to be conservative and at max only
>> + * ask for the same number of vectors as there are housekeeping online
>> + * CPUs. In case, the min_vecs requested exceeds the housekeeping
>> + * online CPUs, restrict the max_vecs based on the min_vecs instead.
>> + */
>> + if (hk_cpus != num_online_cpus()) {
>> + if (min_vecs > hk_cpus)
>> + max_vecs = min_vecs;
>> + else
>> + max_vecs = min_t(int, max_vecs, hk_cpus);
>> + }
> Is the below basically the same?
>
> /*
> * If we have isolated CPUs for use by real-time tasks,
> * minimize overhead on those CPUs by moving IRQs to the
> * remaining "housekeeping" CPUs. Limit vector usage to keep
> * housekeeping CPUs from running out of IRQ vectors.
> */
How about the following as a comment:
"
If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
latency overhead is minimized on those CPUs by moving the IRQ vectors to
the housekeeping CPUs. Limit the number of vectors to keep housekeeping
CPUs from running out of IRQ vectors.
"
> if (housekeeping_cpus < num_online_cpus()) {
> if (housekeeping_cpus < min_vecs)
> max_vecs = min_vecs;
> else if (housekeeping_cpus < max_vecs)
> max_vecs = housekeeping_cpus;
> }
The only reason I went with hk_cpus instead of housekeeping_cpus is because
at other places in the kernel I found a similar naming convention (eg.
hk_mask, hk_flags etc.).
But if housekeeping_cpus makes things more clear, I can switch to that
instead.
Although after Frederic and Peter's suggestion the previous call will change
to something like:
"
housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
"
Which should still falls in the that 80 chars per line limit.
>
> My comment isn't quite right because this patch only limits the number
> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
Yeap it doesn't move IRQs to the housekeeping CPUs.
> I don't know where the move happens (or maybe you just avoid assigning
> IRQs to isolated CPUs, and I don't know how that happens either).
This can happen in the userspace, either manually or by some application
such as tuned.
>
>> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>> NULL);
>> }
>> --
>> 2.18.2
>>
--
Thanks
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
2020-09-24 20:47 ` Bjorn Helgaas
@ 2020-09-24 21:52 ` Nitesh Narayan Lal
0 siblings, 0 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 21:52 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
[-- Attachment #1.1: Type: text/plain, Size: 1879 bytes --]
On 9/24/20 4:47 PM, Bjorn Helgaas wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>> include/linux/sched/isolation.h | 13 +++++++++++++
>> 1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>> return true;
>> }
>>
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> + const struct cpumask *hk_mask;
>> +
>> + if (static_branch_unlikely(&housekeeping_overridden)) {
>> + hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
>> + return cpumask_weight(hk_mask);
>> + }
>> +#endif
>> + return cpumask_weight(cpu_online_mask);
> Just curious: why is this not
>
> #ifdef CONFIG_CPU_ISOLATION
> ...
> #endif
> return num_online_cpus();
I think doing an atomic read is better than a bitmap operation.
Thanks for pointing this out.
>
>> +}
>> +
>> #endif /* _LINUX_SCHED_ISOLATION_H */
>> --
>> 2.18.2
>>
--
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
2020-09-24 21:39 ` Nitesh Narayan Lal
@ 2020-09-24 22:59 ` Bjorn Helgaas
2020-09-24 23:40 ` Nitesh Narayan Lal
0 siblings, 1 reply; 19+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 22:59 UTC (permalink / raw)
To: Nitesh Narayan Lal
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
>
> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
> > Possible subject:
> >
> > PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
>
> Will switch to this.
>
> > On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
> >> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
> >> passed on by the caller based on the housekeeping online CPUs (that are
> >> meant to perform managed IRQ jobs).
> >>
> >> A minimum of the max_vecs passed and housekeeping online CPUs is derived
> >> to ensure that we don't create excess vectors as that can be problematic
> >> specifically in an RT environment. In cases where the min_vecs exceeds the
> >> housekeeping online CPUs, max vecs is restricted based on the min_vecs
> >> instead. The proposed change is required because for an RT environment
> >> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
> >> keep the latency overhead to a minimum. If the number of housekeeping CPUs
> >> is significantly lower than that of the isolated CPUs we can run into
> >> failures while moving these IRQs to housekeeping CPUs due to per CPU
> >> vector limit.
> > Does this capture enough of the log?
> >
> > If we have isolated CPUs dedicated for use by real-time tasks, we
> > try to move IRQs to housekeeping CPUs to reduce overhead on the
> > isolated CPUs.
>
> How about:
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
> overhead on these real-time CPUs.
> "
>
> What do you think?
It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
really mean anything. I think it's a detail that should be inside the
"housekeeping CPU" abstraction.
> > If we allocate too many IRQ vectors, moving them all to housekeeping
> > CPUs may exceed per-CPU vector limits.
> >
> > When we have isolated CPUs, limit the number of vectors allocated by
> > pci_alloc_irq_vectors() to the minimum number required by the
> > driver, or to one per housekeeping CPU if that is larger
>
> I think this is good, I can adopt this.
>
> > .
> >
> >> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> >> ---
> >> include/linux/pci.h | 15 +++++++++++++++
> >> 1 file changed, 15 insertions(+)
> >>
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index 835530605c0d..cf9ca9410213 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -38,6 +38,7 @@
> >> #include <linux/interrupt.h>
> >> #include <linux/io.h>
> >> #include <linux/resource_ext.h>
> >> +#include <linux/sched/isolation.h>
> >> #include <uapi/linux/pci.h>
> >>
> >> #include <linux/pci_ids.h>
> >> @@ -1797,6 +1798,20 @@ static inline int
> >> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> >> unsigned int max_vecs, unsigned int flags)
> >> {
> >> + unsigned int hk_cpus = hk_num_online_cpus();
> >> +
> >> + /*
> >> + * For a real-time environment, try to be conservative and at max only
> >> + * ask for the same number of vectors as there are housekeeping online
> >> + * CPUs. In case, the min_vecs requested exceeds the housekeeping
> >> + * online CPUs, restrict the max_vecs based on the min_vecs instead.
> >> + */
> >> + if (hk_cpus != num_online_cpus()) {
> >> + if (min_vecs > hk_cpus)
> >> + max_vecs = min_vecs;
> >> + else
> >> + max_vecs = min_t(int, max_vecs, hk_cpus);
> >> + }
> > Is the below basically the same?
> >
> > /*
> > * If we have isolated CPUs for use by real-time tasks,
> > * minimize overhead on those CPUs by moving IRQs to the
> > * remaining "housekeeping" CPUs. Limit vector usage to keep
> > * housekeeping CPUs from running out of IRQ vectors.
> > */
>
> How about the following as a comment:
>
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
> latency overhead is minimized on those CPUs by moving the IRQ vectors to
> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
> CPUs from running out of IRQ vectors.
>
> "
>
> > if (housekeeping_cpus < num_online_cpus()) {
> > if (housekeeping_cpus < min_vecs)
> > max_vecs = min_vecs;
> > else if (housekeeping_cpus < max_vecs)
> > max_vecs = housekeeping_cpus;
> > }
>
> The only reason I went with hk_cpus instead of housekeeping_cpus is because
> at other places in the kernel I found a similar naming convention (eg.
> hk_mask, hk_flags etc.).
> But if housekeeping_cpus makes things more clear, I can switch to that
> instead.
>
> Although after Frederic and Peter's suggestion the previous call will change
> to something like:
>
> "
> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
> "
>
> Which should still falls in the that 80 chars per line limit.
I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
long as "housekeeping" appears in the code somewhere. If we call
"housekeeping_num_online_cpus()" that should be enough.
> > My comment isn't quite right because this patch only limits the number
> > of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
>
> Yeap it doesn't move IRQs to the housekeeping CPUs.
>
> > I don't know where the move happens (or maybe you just avoid assigning
> > IRQs to isolated CPUs, and I don't know how that happens either).
>
> This can happen in the userspace, either manually or by some application
> such as tuned.
Some brief hint about this might be helpful.
> >> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
> >> NULL);
> >> }
> >> --
> >> 2.18.2
> >>
> --
> Thanks
> Nitesh
>
^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
2020-09-24 22:59 ` Bjorn Helgaas
@ 2020-09-24 23:40 ` Nitesh Narayan Lal
0 siblings, 0 replies; 19+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 23:40 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot
[-- Attachment #1.1: Type: text/plain, Size: 6343 bytes --]
On 9/24/20 6:59 PM, Bjorn Helgaas wrote:
> On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
>> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
>>> Possible subject:
>>>
>>> PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
>> Will switch to this.
>>
>>> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
>>>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
>>>> passed on by the caller based on the housekeeping online CPUs (that are
>>>> meant to perform managed IRQ jobs).
>>>>
>>>> A minimum of the max_vecs passed and housekeeping online CPUs is derived
>>>> to ensure that we don't create excess vectors as that can be problematic
>>>> specifically in an RT environment. In cases where the min_vecs exceeds the
>>>> housekeeping online CPUs, max vecs is restricted based on the min_vecs
>>>> instead. The proposed change is required because for an RT environment
>>>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
>>>> keep the latency overhead to a minimum. If the number of housekeeping CPUs
>>>> is significantly lower than that of the isolated CPUs we can run into
>>>> failures while moving these IRQs to housekeeping CPUs due to per CPU
>>>> vector limit.
>>> Does this capture enough of the log?
>>>
>>> If we have isolated CPUs dedicated for use by real-time tasks, we
>>> try to move IRQs to housekeeping CPUs to reduce overhead on the
>>> isolated CPUs.
>> How about:
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
>> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
>> overhead on these real-time CPUs.
>> "
>>
>> What do you think?
> It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
> really mean anything. I think it's a detail that should be inside the
> "housekeeping CPU" abstraction.
I get your point, in that case I will probably stick to your original
suggestion just replacing the term "overhead" with "latency overhead".
>
>>> If we allocate too many IRQ vectors, moving them all to housekeeping
>>> CPUs may exceed per-CPU vector limits.
>>>
>>> When we have isolated CPUs, limit the number of vectors allocated by
>>> pci_alloc_irq_vectors() to the minimum number required by the
>>> driver, or to one per housekeeping CPU if that is larger
>> I think this is good, I can adopt this.
>>
>>> .
>>>
>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>> ---
>>>> include/linux/pci.h | 15 +++++++++++++++
>>>> 1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 835530605c0d..cf9ca9410213 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -38,6 +38,7 @@
>>>> #include <linux/interrupt.h>
>>>> #include <linux/io.h>
>>>> #include <linux/resource_ext.h>
>>>> +#include <linux/sched/isolation.h>
>>>> #include <uapi/linux/pci.h>
>>>>
>>>> #include <linux/pci_ids.h>
>>>> @@ -1797,6 +1798,20 @@ static inline int
>>>> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>> unsigned int max_vecs, unsigned int flags)
>>>> {
>>>> + unsigned int hk_cpus = hk_num_online_cpus();
>>>> +
>>>> + /*
>>>> + * For a real-time environment, try to be conservative and at max only
>>>> + * ask for the same number of vectors as there are housekeeping online
>>>> + * CPUs. In case, the min_vecs requested exceeds the housekeeping
>>>> + * online CPUs, restrict the max_vecs based on the min_vecs instead.
>>>> + */
>>>> + if (hk_cpus != num_online_cpus()) {
>>>> + if (min_vecs > hk_cpus)
>>>> + max_vecs = min_vecs;
>>>> + else
>>>> + max_vecs = min_t(int, max_vecs, hk_cpus);
>>>> + }
>>> Is the below basically the same?
>>>
>>> /*
>>> * If we have isolated CPUs for use by real-time tasks,
>>> * minimize overhead on those CPUs by moving IRQs to the
>>> * remaining "housekeeping" CPUs. Limit vector usage to keep
>>> * housekeeping CPUs from running out of IRQ vectors.
>>> */
>> How about the following as a comment:
>>
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
>> latency overhead is minimized on those CPUs by moving the IRQ vectors to
>> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
>> CPUs from running out of IRQ vectors.
>>
>> "
>>
>>> if (housekeeping_cpus < num_online_cpus()) {
>>> if (housekeeping_cpus < min_vecs)
>>> max_vecs = min_vecs;
>>> else if (housekeeping_cpus < max_vecs)
>>> max_vecs = housekeeping_cpus;
>>> }
>> The only reason I went with hk_cpus instead of housekeeping_cpus is because
>> at other places in the kernel I found a similar naming convention (eg.
>> hk_mask, hk_flags etc.).
>> But if housekeeping_cpus makes things more clear, I can switch to that
>> instead.
>>
>> Although after Frederic and Peter's suggestion the previous call will change
>> to something like:
>>
>> "
>> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
>> "
>>
>> Which should still falls in the that 80 chars per line limit.
> I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
> long as "housekeeping" appears in the code somewhere. If we call
> "housekeeping_num_online_cpus()" that should be enough.
Got it, in that case we are good here.
>
>>> My comment isn't quite right because this patch only limits the number
>>> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
>> Yeap it doesn't move IRQs to the housekeeping CPUs.
>>
>>> I don't know where the move happens (or maybe you just avoid assigning
>>> IRQs to isolated CPUs, and I don't know how that happens either).
>> This can happen in the userspace, either manually or by some application
>> such as tuned.
> Some brief hint about this might be helpful.
Sure, I will try to come up with something on the lines what you suggested
i.e., it also includes the information that the IRQs are moved from the
userspace.
>
>>>> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>>> NULL);
>>>> }
>>>> --
>>>> 2.18.2
>>>>
>> --
>> Thanks
>> Nitesh
>>
>
>
--
Thanks
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2020-09-24 23:41 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
2020-09-24 8:40 ` peterz
2020-09-24 12:09 ` Frederic Weisbecker
2020-09-24 12:23 ` Nitesh Narayan Lal
2020-09-24 12:24 ` Peter Zijlstra
2020-09-24 12:11 ` Frederic Weisbecker
2020-09-24 12:26 ` Nitesh Narayan Lal
2020-09-24 12:46 ` Peter Zijlstra
2020-09-24 13:45 ` Nitesh Narayan Lal
2020-09-24 20:47 ` Bjorn Helgaas
2020-09-24 21:52 ` Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
2020-09-24 20:45 ` Bjorn Helgaas
2020-09-24 21:39 ` Nitesh Narayan Lal
2020-09-24 22:59 ` Bjorn Helgaas
2020-09-24 23:40 ` Nitesh Narayan Lal
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).