linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC] [PATCH v1 0/3] isolation: limit msix vectors based on housekeeping CPUs
@ 2020-09-09 15:08 Nitesh Narayan Lal
  2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
                   ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-09 15:08 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

This is a follow-up posting for "[v1] i40e: limit the msix vectors based on
housekeeping CPUs" [1] (It took longer than expected for me to get back to
this).


Issue
=====
With the current implementation device drivers while creating their MSIX
vectors only takes num_online_cpus() into consideration which works quite well
for a non-RT environment, but in an RT environment that has a large number of
isolated CPUs and a very few housekeeping CPUs this could lead to a problem.
The problem will be triggered when something like tuned will try to move all
the IRQs from isolated CPUs to the limited number of housekeeping CPUs to
prevent interruptions for a latency sensitive workload that will be runing on
the isolated CPUs. This failure is caused because of the per CPU vector
limitation.


Proposed Fix
============
In this patch-set, the following changes are proposed:
- A generic API num_housekeeping_cpus() which is meant to return the available
  housekeeping CPUs in an environment with isolated CPUs and all online CPUs
  otherwise.
- i40e: Specifically for the i40e driver the num_online_cpus() used in 
  i40e_init_msix() to calculate numbers msix vectors is replaced with the above
  defined API. This is done to restrict the number of msix vectors for i40e in
  RT environments.
- pci_alloc_irq_vector(): With the help of num_housekeeping_cpus() the max_vecs
  passed in pci_alloc_irq_vector() is restricted only to the available
  housekeeping CPUs only in an environment that has isolated CPUs. However, if
  the min_vecs exceeds the num_housekeeping_cpus(), no change is made to make
  sure that a device initialization is not prevented due to lack of
  housekeeping CPUs.



Reproducing the Issue
=====================
I have triggered this issue on a setup that had a total of 72 cores among which
68 were isolated and only 4 were left for housekeeping tasks. I was using
tuned's realtime-virtual-host profile to configure the system. In this
scenario, Tuned reported the error message 'Failed to set SMP affinity of IRQ
xxx to '00000040,00000010,00000005': [Errno 28] No space left on the device'
for several IRQs in tuned.log due to the per CPU vector limit.


Testing
=======
Functionality:
- To test that the issue is resolved with i40e change I added a tracepoint
  in i40e_init_msix() to find the number of CPUs derived for vector creation
  with and without tuned's realtime-virtual-host profile. As per expectation
  with the profile applied I was only getting the number of housekeeping CPUs
  and all available CPUs without it.

Performance:
- To analyze the performance impact I have targetted the change introduced in 
  pci_alloc_irq_vectors() and compared the results against a vanilla kernel
  (5.9.0-rc3) results.

  Setup Information:
  + I had a couple of 24-core machines connected back to back via a couple of
    mlx5 NICs and I analyzed the average bitrate for server-client TCP and UDP
    transmission via iperf. 
  + To minimize the Bitrate variation of iperf TCP and UDP stream test I have
    applied the tuned's network-throughput profile and disabled HT.
 Test Information:
  + For the environment that had no isolated CPUs:
    I have tested with single stream and 24 streams (same as that of online
    CPUs).
  + For the environment that had 20 isolated CPUs:
    I have tested with single stream, 4 streams (same as that the number of
    housekeeping) and 24 streams (same as that of online CPUs).

 Results:
  # UDP Stream Test:
    + There was no degradation observed in UDP stream tests in both
      environments. (With isolated CPUs and without isolated CPUs after the
      introduction of the patches).
  # TCP Stream Test - No isolated CPUs:
    + No noticeable degradation was observed.
  # TCP Stream Test - With isolated CPUs:
    + Multiple Stream (4)  - Average degradation of around 5-6%
    + Multiple Stream (24) - Average degradation of around 2-3%
    + Single Stream        - Even on a vanilla kernel the Bitrate observed for
                             a TCP single stream test seem to vary
                             significantly across different runs (eg. the %
                             variation between the best and the worst case on
                             a vanilla kernel was around 8-10%). A similar
                             variation was observed with the kernel that
                             included my patches. No additional degradation
                             was observed.

Since the change specifically for pci_alloc_irq_vectors is going to impact
several drivers I have posted this patch-set as RFC. I would be happy to
perform more testing based on any suggestions or incorporate any comments to
ensure that the change is not breaking anything.

[1] https://lore.kernel.org/patchwork/patch/1256308/ 

Nitesh Narayan Lal (3):
  sched/isolation: API to get num of hosekeeping CPUs
  i40e: limit msix vectors based on housekeeping CPUs
  PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs

 drivers/net/ethernet/intel/i40e/i40e_main.c |  3 ++-
 include/linux/pci.h                         | 16 ++++++++++++++
 include/linux/sched/isolation.h             |  7 +++++++
 kernel/sched/isolation.c                    | 23 +++++++++++++++++++++
 4 files changed, 48 insertions(+), 1 deletion(-)

-- 
2.27.0




^ permalink raw reply	[flat|nested] 30+ messages in thread

* [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-09 15:08 [RFC] [PATCH v1 0/3] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
@ 2020-09-09 15:08 ` Nitesh Narayan Lal
  2020-09-17 18:18   ` Jesse Brandeburg
                     ` (2 more replies)
  2020-09-09 15:08 ` [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
  2020-09-09 15:08 ` [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
  2 siblings, 3 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-09 15:08 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

Introduce a new API num_housekeeping_cpus(), that can be used to retrieve
the number of housekeeping CPUs by reading an atomic variable
__num_housekeeping_cpus. This variable is set from housekeeping_setup().

This API is introduced for the purpose of drivers that were previously
relying only on num_online_cpus() to determine the number of MSIX vectors
to create. In an RT environment with large isolated but a fewer
housekeeping CPUs this was leading to a situation where an attempt to
move all of the vectors corresponding to isolated CPUs to housekeeping
CPUs was failing due to per CPU vector limit.

If there are no isolated CPUs specified then the API returns the number
of all online CPUs.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 include/linux/sched/isolation.h |  7 +++++++
 kernel/sched/isolation.c        | 23 +++++++++++++++++++++++
 2 files changed, 30 insertions(+)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index cc9f393e2a70..94c25d956d8a 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -25,6 +25,7 @@ extern bool housekeeping_enabled(enum hk_flags flags);
 extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
 extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
 extern void __init housekeeping_init(void);
+extern unsigned int num_housekeeping_cpus(void);
 
 #else
 
@@ -46,6 +47,12 @@ static inline bool housekeeping_enabled(enum hk_flags flags)
 static inline void housekeeping_affine(struct task_struct *t,
 				       enum hk_flags flags) { }
 static inline void housekeeping_init(void) { }
+
+static unsigned int num_housekeeping_cpus(void)
+{
+	return num_online_cpus();
+}
+
 #endif /* CONFIG_CPU_ISOLATION */
 
 static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..7024298390b7 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -13,6 +13,7 @@ DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
 EXPORT_SYMBOL_GPL(housekeeping_overridden);
 static cpumask_var_t housekeeping_mask;
 static unsigned int housekeeping_flags;
+static atomic_t __num_housekeeping_cpus __read_mostly;
 
 bool housekeeping_enabled(enum hk_flags flags)
 {
@@ -20,6 +21,27 @@ bool housekeeping_enabled(enum hk_flags flags)
 }
 EXPORT_SYMBOL_GPL(housekeeping_enabled);
 
+/*
+ * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
+ *
+ * This function returns the number of available housekeeping CPUs
+ * based on __num_housekeeping_cpus which is of type atomic_t
+ * and is initialized at the time of the housekeeping setup.
+ */
+unsigned int num_housekeeping_cpus(void)
+{
+	unsigned int cpus;
+
+	if (static_branch_unlikely(&housekeeping_overridden)) {
+		cpus = atomic_read(&__num_housekeeping_cpus);
+		/* We should always have at least one housekeeping CPU */
+		BUG_ON(!cpus);
+		return cpus;
+	}
+	return num_online_cpus();
+}
+EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
+
 int housekeeping_any_cpu(enum hk_flags flags)
 {
 	int cpu;
@@ -131,6 +153,7 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
 
 	housekeeping_flags |= flags;
 
+	atomic_set(&__num_housekeeping_cpus, cpumask_weight(housekeeping_mask));
 	free_bootmem_cpumask_var(non_housekeeping_mask);
 
 	return 1;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-09 15:08 [RFC] [PATCH v1 0/3] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
  2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
@ 2020-09-09 15:08 ` Nitesh Narayan Lal
  2020-09-11 15:23   ` Marcelo Tosatti
  2020-09-17 18:23   ` Jesse Brandeburg
  2020-09-09 15:08 ` [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
  2 siblings, 2 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-09 15:08 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

In a realtime environment, it is essential to isolate unwanted IRQs from
isolated CPUs to prevent latency overheads. Creating MSIX vectors only
based on the online CPUs could lead to a potential issue on an RT setup
that has several isolated CPUs but a very few housekeeping CPUs. This is
because in these kinds of setups an attempt to move the IRQs to the
limited housekeeping CPUs from isolated CPUs might fail due to the per
CPU vector limit. This could eventually result in latency spikes because
of the IRQ threads that we fail to move from isolated CPUs.

This patch prevents i40e to add vectors only based on available
housekeeping CPUs by using num_housekeeping_cpus().

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e433fdbf2c3..3b4cd4b3de85 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5,6 +5,7 @@
 #include <linux/of_net.h>
 #include <linux/pci.h>
 #include <linux/bpf.h>
+#include <linux/sched/isolation.h>
 #include <generated/utsrelease.h>
 
 /* Local includes */
@@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
 	 * will use any remaining vectors to reach as close as we can to the
 	 * number of online CPUs.
 	 */
-	cpus = num_online_cpus();
+	cpus = num_housekeeping_cpus();
 	pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
 	vectors_left -= pf->num_lan_msix;
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-09 15:08 [RFC] [PATCH v1 0/3] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
  2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
  2020-09-09 15:08 ` [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
@ 2020-09-09 15:08 ` Nitesh Narayan Lal
  2020-09-10 19:22   ` Marcelo Tosatti
  2 siblings, 1 reply; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-09 15:08 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

This patch limits the pci_alloc_irq_vectors max vectors that is passed on
by the caller based on the available housekeeping CPUs by only using the
minimum of the two.

A minimum of the max_vecs passed and available housekeeping CPUs is
derived to ensure that we don't create excess vectors which can be
problematic specifically in an RT environment. This is because for an RT
environment unwanted IRQs are moved to the housekeeping CPUs from
isolated CPUs to keep the latency overhead to a minimum. If the number of
housekeeping CPUs are significantly lower than that of the isolated CPUs
we can run into failures while moving these IRQs to housekeeping due to
per CPU vector limit.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 include/linux/pci.h | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 835530605c0d..750ba927d963 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -38,6 +38,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/resource_ext.h>
+#include <linux/sched/isolation.h>
 #include <uapi/linux/pci.h>
 
 #include <linux/pci_ids.h>
@@ -1797,6 +1798,21 @@ static inline int
 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		      unsigned int max_vecs, unsigned int flags)
 {
+	unsigned int num_housekeeping = num_housekeeping_cpus();
+	unsigned int num_online = num_online_cpus();
+
+	/*
+	 * Try to be conservative and at max only ask for the same number of
+	 * vectors as there are housekeeping CPUs. However, skip any
+	 * modification to the of max vectors in two conditions:
+	 * 1. If the min_vecs requested are higher than that of the
+	 *    housekeeping CPUs as we don't want to prevent the initialization
+	 *    of a device.
+	 * 2. If there are no isolated CPUs as in this case the driver should
+	 *    already have taken online CPUs into consideration.
+	 */
+	if (min_vecs < num_housekeeping && num_housekeeping != num_online)
+		max_vecs = min_t(int, max_vecs, num_housekeeping);
 	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
 					      NULL);
 }
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-09 15:08 ` [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
@ 2020-09-10 19:22   ` Marcelo Tosatti
  2020-09-10 19:31     ` Nitesh Narayan Lal
  0 siblings, 1 reply; 30+ messages in thread
From: Marcelo Tosatti @ 2020-09-10 19:22 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, frederic, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

On Wed, Sep 09, 2020 at 11:08:18AM -0400, Nitesh Narayan Lal wrote:
> This patch limits the pci_alloc_irq_vectors max vectors that is passed on
> by the caller based on the available housekeeping CPUs by only using the
> minimum of the two.
> 
> A minimum of the max_vecs passed and available housekeeping CPUs is
> derived to ensure that we don't create excess vectors which can be
> problematic specifically in an RT environment. This is because for an RT
> environment unwanted IRQs are moved to the housekeeping CPUs from
> isolated CPUs to keep the latency overhead to a minimum. If the number of
> housekeeping CPUs are significantly lower than that of the isolated CPUs
> we can run into failures while moving these IRQs to housekeeping due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/pci.h | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 835530605c0d..750ba927d963 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -38,6 +38,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/resource_ext.h>
> +#include <linux/sched/isolation.h>
>  #include <uapi/linux/pci.h>
>  
>  #include <linux/pci_ids.h>
> @@ -1797,6 +1798,21 @@ static inline int
>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>  		      unsigned int max_vecs, unsigned int flags)
>  {
> +	unsigned int num_housekeeping = num_housekeeping_cpus();
> +	unsigned int num_online = num_online_cpus();
> +
> +	/*
> +	 * Try to be conservative and at max only ask for the same number of
> +	 * vectors as there are housekeeping CPUs. However, skip any
> +	 * modification to the of max vectors in two conditions:
> +	 * 1. If the min_vecs requested are higher than that of the
> +	 *    housekeeping CPUs as we don't want to prevent the initialization
> +	 *    of a device.
> +	 * 2. If there are no isolated CPUs as in this case the driver should
> +	 *    already have taken online CPUs into consideration.
> +	 */
> +	if (min_vecs < num_housekeeping && num_housekeeping != num_online)
> +		max_vecs = min_t(int, max_vecs, num_housekeeping);
>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>  					      NULL);
>  }

If min_vecs > num_housekeeping, for example:

/* PCI MSI/MSIx support */
#define XGBE_MSI_BASE_COUNT     4
#define XGBE_MSI_MIN_COUNT      (XGBE_MSI_BASE_COUNT + 1)

Then the protection fails.

How about reducing max_vecs down to min_vecs, if min_vecs >
num_housekeeping ?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-10 19:22   ` Marcelo Tosatti
@ 2020-09-10 19:31     ` Nitesh Narayan Lal
  2020-09-22 13:54       ` Nitesh Narayan Lal
  0 siblings, 1 reply; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-10 19:31 UTC (permalink / raw)
  To: Marcelo Tosatti
  Cc: linux-kernel, netdev, linux-pci, frederic, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 2960 bytes --]


On 9/10/20 3:22 PM, Marcelo Tosatti wrote:
> On Wed, Sep 09, 2020 at 11:08:18AM -0400, Nitesh Narayan Lal wrote:
>> This patch limits the pci_alloc_irq_vectors max vectors that is passed on
>> by the caller based on the available housekeeping CPUs by only using the
>> minimum of the two.
>>
>> A minimum of the max_vecs passed and available housekeeping CPUs is
>> derived to ensure that we don't create excess vectors which can be
>> problematic specifically in an RT environment. This is because for an RT
>> environment unwanted IRQs are moved to the housekeeping CPUs from
>> isolated CPUs to keep the latency overhead to a minimum. If the number of
>> housekeeping CPUs are significantly lower than that of the isolated CPUs
>> we can run into failures while moving these IRQs to housekeeping due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/pci.h | 16 ++++++++++++++++
>>  1 file changed, 16 insertions(+)
>>
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 835530605c0d..750ba927d963 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -38,6 +38,7 @@
>>  #include <linux/interrupt.h>
>>  #include <linux/io.h>
>>  #include <linux/resource_ext.h>
>> +#include <linux/sched/isolation.h>
>>  #include <uapi/linux/pci.h>
>>  
>>  #include <linux/pci_ids.h>
>> @@ -1797,6 +1798,21 @@ static inline int
>>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>  		      unsigned int max_vecs, unsigned int flags)
>>  {
>> +	unsigned int num_housekeeping = num_housekeeping_cpus();
>> +	unsigned int num_online = num_online_cpus();
>> +
>> +	/*
>> +	 * Try to be conservative and at max only ask for the same number of
>> +	 * vectors as there are housekeeping CPUs. However, skip any
>> +	 * modification to the of max vectors in two conditions:
>> +	 * 1. If the min_vecs requested are higher than that of the
>> +	 *    housekeeping CPUs as we don't want to prevent the initialization
>> +	 *    of a device.
>> +	 * 2. If there are no isolated CPUs as in this case the driver should
>> +	 *    already have taken online CPUs into consideration.
>> +	 */
>> +	if (min_vecs < num_housekeeping && num_housekeeping != num_online)
>> +		max_vecs = min_t(int, max_vecs, num_housekeeping);
>>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>  					      NULL);
>>  }
> If min_vecs > num_housekeeping, for example:
>
> /* PCI MSI/MSIx support */
> #define XGBE_MSI_BASE_COUNT     4
> #define XGBE_MSI_MIN_COUNT      (XGBE_MSI_BASE_COUNT + 1)
>
> Then the protection fails.

Right, I was ignoring that case.

>
> How about reducing max_vecs down to min_vecs, if min_vecs >
> num_housekeeping ?

Yes, I think this makes sense.
I will wait a bit to see if anyone else has any other comment and will post
the next version then.

>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-09 15:08 ` [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
@ 2020-09-11 15:23   ` Marcelo Tosatti
  2020-09-17 18:23   ` Jesse Brandeburg
  1 sibling, 0 replies; 30+ messages in thread
From: Marcelo Tosatti @ 2020-09-11 15:23 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, frederic, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

On Wed, Sep 09, 2020 at 11:08:17AM -0400, Nitesh Narayan Lal wrote:
> In a realtime environment, it is essential to isolate unwanted IRQs from
> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> based on the online CPUs could lead to a potential issue on an RT setup
> that has several isolated CPUs but a very few housekeeping CPUs. This is
> because in these kinds of setups an attempt to move the IRQs to the
> limited housekeeping CPUs from isolated CPUs might fail due to the per
> CPU vector limit. This could eventually result in latency spikes because
> of the IRQ threads that we fail to move from isolated CPUs.
> 
> This patch prevents i40e to add vectors only based on available
> housekeeping CPUs by using num_housekeeping_cpus().
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
> index 2e433fdbf2c3..3b4cd4b3de85 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_main.c
> +++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
> @@ -5,6 +5,7 @@
>  #include <linux/of_net.h>
>  #include <linux/pci.h>
>  #include <linux/bpf.h>
> +#include <linux/sched/isolation.h>
>  #include <generated/utsrelease.h>
>  
>  /* Local includes */
> @@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
>  	 * will use any remaining vectors to reach as close as we can to the
>  	 * number of online CPUs.
>  	 */
> -	cpus = num_online_cpus();
> +	cpus = num_housekeeping_cpus();
>  	pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
>  	vectors_left -= pf->num_lan_msix;
>  
> -- 
> 2.27.0

For patches 1 and 2:

Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
@ 2020-09-17 18:18   ` Jesse Brandeburg
  2020-09-17 18:43     ` Nitesh Narayan Lal
  2020-09-17 20:11   ` Bjorn Helgaas
  2020-09-21 23:40   ` Frederic Weisbecker
  2 siblings, 1 reply; 30+ messages in thread
From: Jesse Brandeburg @ 2020-09-17 18:18 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

Nitesh Narayan Lal wrote:

> Introduce a new API num_housekeeping_cpus(), that can be used to retrieve
> the number of housekeeping CPUs by reading an atomic variable
> __num_housekeeping_cpus. This variable is set from housekeeping_setup().
> 
> This API is introduced for the purpose of drivers that were previously
> relying only on num_online_cpus() to determine the number of MSIX vectors
> to create. In an RT environment with large isolated but a fewer
> housekeeping CPUs this was leading to a situation where an attempt to
> move all of the vectors corresponding to isolated CPUs to housekeeping
> CPUs was failing due to per CPU vector limit.
> 
> If there are no isolated CPUs specified then the API returns the number
> of all online CPUs.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h |  7 +++++++
>  kernel/sched/isolation.c        | 23 +++++++++++++++++++++++
>  2 files changed, 30 insertions(+)

I'm not a scheduler expert, but a couple comments follow.

> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..94c25d956d8a 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -25,6 +25,7 @@ extern bool housekeeping_enabled(enum hk_flags flags);
>  extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
>  extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
>  extern void __init housekeeping_init(void);
> +extern unsigned int num_housekeeping_cpus(void);
>  
>  #else
>  
> @@ -46,6 +47,12 @@ static inline bool housekeeping_enabled(enum hk_flags flags)
>  static inline void housekeeping_affine(struct task_struct *t,
>  				       enum hk_flags flags) { }
>  static inline void housekeeping_init(void) { }
> +
> +static unsigned int num_housekeeping_cpus(void)
> +{
> +	return num_online_cpus();
> +}
> +
>  #endif /* CONFIG_CPU_ISOLATION */
>  
>  static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 5a6ea03f9882..7024298390b7 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -13,6 +13,7 @@ DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
>  EXPORT_SYMBOL_GPL(housekeeping_overridden);
>  static cpumask_var_t housekeeping_mask;
>  static unsigned int housekeeping_flags;
> +static atomic_t __num_housekeeping_cpus __read_mostly;
>  
>  bool housekeeping_enabled(enum hk_flags flags)
>  {
> @@ -20,6 +21,27 @@ bool housekeeping_enabled(enum hk_flags flags)
>  }
>  EXPORT_SYMBOL_GPL(housekeeping_enabled);
>  
> +/*

use correct kdoc style, and you get free documentation from your source
(you're so close!)

should be (note the first line and the function title line change to
remove parens:
/**
 * num_housekeeping_cpus - Read the number of housekeeping CPUs.
 *
 * This function returns the number of available housekeeping CPUs
 * based on __num_housekeeping_cpus which is of type atomic_t
 * and is initialized at the time of the housekeeping setup.
 */

> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
> + *
> + * This function returns the number of available housekeeping CPUs
> + * based on __num_housekeeping_cpus which is of type atomic_t
> + * and is initialized at the time of the housekeeping setup.
> + */
> +unsigned int num_housekeeping_cpus(void)
> +{
> +	unsigned int cpus;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		cpus = atomic_read(&__num_housekeeping_cpus);
> +		/* We should always have at least one housekeeping CPU */
> +		BUG_ON(!cpus);

you need to crash the kernel because of this? maybe a WARN_ON? How did
the global even get set to the bad value? It's going to blame the poor
caller for this in the trace, but the caller likely had nothing to do
with setting the value incorrectly!

> +		return cpus;
> +	}
> +	return num_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-09 15:08 ` [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
  2020-09-11 15:23   ` Marcelo Tosatti
@ 2020-09-17 18:23   ` Jesse Brandeburg
  2020-09-17 18:31     ` Nitesh Narayan Lal
  2020-09-21 22:58     ` Frederic Weisbecker
  1 sibling, 2 replies; 30+ messages in thread
From: Jesse Brandeburg @ 2020-09-17 18:23 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

Nitesh Narayan Lal wrote:

> In a realtime environment, it is essential to isolate unwanted IRQs from
> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> based on the online CPUs could lead to a potential issue on an RT setup
> that has several isolated CPUs but a very few housekeeping CPUs. This is
> because in these kinds of setups an attempt to move the IRQs to the
> limited housekeeping CPUs from isolated CPUs might fail due to the per
> CPU vector limit. This could eventually result in latency spikes because
> of the IRQ threads that we fail to move from isolated CPUs.
> 
> This patch prevents i40e to add vectors only based on available
> housekeeping CPUs by using num_housekeeping_cpus().
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>

The driver changes are straightforward, but this isn't the only driver
with this issue, right?  I'm sure ixgbe and ice both have this problem
too, you should fix them as well, at a minimum, and probably other
vendors drivers:

$ rg -c --stats num_online_cpus drivers/net/ethernet
...
50 files contained matches

for this patch i40e
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-17 18:23   ` Jesse Brandeburg
@ 2020-09-17 18:31     ` Nitesh Narayan Lal
  2020-09-21 22:58     ` Frederic Weisbecker
  1 sibling, 0 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-17 18:31 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, Luis Claudio R. Goncalves


[-- Attachment #1.1: Type: text/plain, Size: 1573 bytes --]


On 9/17/20 2:23 PM, Jesse Brandeburg wrote:
> Nitesh Narayan Lal wrote:
>
>> In a realtime environment, it is essential to isolate unwanted IRQs from
>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>> based on the online CPUs could lead to a potential issue on an RT setup
>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>> because in these kinds of setups an attempt to move the IRQs to the
>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>> CPU vector limit. This could eventually result in latency spikes because
>> of the IRQ threads that we fail to move from isolated CPUs.
>>
>> This patch prevents i40e to add vectors only based on available
>> housekeeping CPUs by using num_housekeeping_cpus().
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> The driver changes are straightforward, but this isn't the only driver
> with this issue, right?

Indeed, I was hoping to modify them over the time with some testing.

>   I'm sure ixgbe and ice both have this problem
> too, you should fix them as well, at a minimum, and probably other
> vendors drivers:

Sure, I can atleast include ixgbe and ice in the next posting if that makes
sense.
The reason I skipped them is that I was not very sure about the right way to
test these changes.

>
> $ rg -c --stats num_online_cpus drivers/net/ethernet
> ...
> 50 files contained matches
>
> for this patch i40e
> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-17 18:18   ` Jesse Brandeburg
@ 2020-09-17 18:43     ` Nitesh Narayan Lal
  0 siblings, 0 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-17 18:43 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, Luis Claudio R. Goncalves


[-- Attachment #1.1: Type: text/plain, Size: 4702 bytes --]


On 9/17/20 2:18 PM, Jesse Brandeburg wrote:
> Nitesh Narayan Lal wrote:
>
>> Introduce a new API num_housekeeping_cpus(), that can be used to retrieve
>> the number of housekeeping CPUs by reading an atomic variable
>> __num_housekeeping_cpus. This variable is set from housekeeping_setup().
>>
>> This API is introduced for the purpose of drivers that were previously
>> relying only on num_online_cpus() to determine the number of MSIX vectors
>> to create. In an RT environment with large isolated but a fewer
>> housekeeping CPUs this was leading to a situation where an attempt to
>> move all of the vectors corresponding to isolated CPUs to housekeeping
>> CPUs was failing due to per CPU vector limit.
>>
>> If there are no isolated CPUs specified then the API returns the number
>> of all online CPUs.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h |  7 +++++++
>>  kernel/sched/isolation.c        | 23 +++++++++++++++++++++++
>>  2 files changed, 30 insertions(+)
> I'm not a scheduler expert, but a couple comments follow.
>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..94c25d956d8a 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -25,6 +25,7 @@ extern bool housekeeping_enabled(enum hk_flags flags);
>>  extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
>>  extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
>>  extern void __init housekeeping_init(void);
>> +extern unsigned int num_housekeeping_cpus(void);
>>  
>>  #else
>>  
>> @@ -46,6 +47,12 @@ static inline bool housekeeping_enabled(enum hk_flags flags)
>>  static inline void housekeeping_affine(struct task_struct *t,
>>  				       enum hk_flags flags) { }
>>  static inline void housekeeping_init(void) { }
>> +
>> +static unsigned int num_housekeeping_cpus(void)
>> +{
>> +	return num_online_cpus();
>> +}
>> +
>>  #endif /* CONFIG_CPU_ISOLATION */
>>  
>>  static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>> index 5a6ea03f9882..7024298390b7 100644
>> --- a/kernel/sched/isolation.c
>> +++ b/kernel/sched/isolation.c
>> @@ -13,6 +13,7 @@ DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
>>  EXPORT_SYMBOL_GPL(housekeeping_overridden);
>>  static cpumask_var_t housekeeping_mask;
>>  static unsigned int housekeeping_flags;
>> +static atomic_t __num_housekeeping_cpus __read_mostly;
>>  
>>  bool housekeeping_enabled(enum hk_flags flags)
>>  {
>> @@ -20,6 +21,27 @@ bool housekeeping_enabled(enum hk_flags flags)
>>  }
>>  EXPORT_SYMBOL_GPL(housekeeping_enabled);
>>  
>> +/*
> use correct kdoc style, and you get free documentation from your source
> (you're so close!)
>
> should be (note the first line and the function title line change to
> remove parens:
> /**
>  * num_housekeeping_cpus - Read the number of housekeeping CPUs.
>  *
>  * This function returns the number of available housekeeping CPUs
>  * based on __num_housekeeping_cpus which is of type atomic_t
>  * and is initialized at the time of the housekeeping setup.
>  */

My bad, I missed that.
Thanks for pointing it out.

>
>> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
>> + *
>> + * This function returns the number of available housekeeping CPUs
>> + * based on __num_housekeeping_cpus which is of type atomic_t
>> + * and is initialized at the time of the housekeeping setup.
>> + */
>> +unsigned int num_housekeeping_cpus(void)
>> +{
>> +	unsigned int cpus;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		cpus = atomic_read(&__num_housekeeping_cpus);
>> +		/* We should always have at least one housekeeping CPU */
>> +		BUG_ON(!cpus);
> you need to crash the kernel because of this? maybe a WARN_ON? How did
> the global even get set to the bad value? It's going to blame the poor
> caller for this in the trace, but the caller likely had nothing to do
> with setting the value incorrectly!

Yes, ideally this should not be triggered, but if somehow it does then we have
a bug and that needs to be fixed. That's probably the only reason why I chose
BUG_ON.
But, I am not entirely against the usage of WARN_ON either, because we get a
stack trace anyways.
I will see if anyone else has any other concerns on this patch and then I can
post the next version.

>
>> +		return cpus;
>> +	}
>> +	return num_online_cpus();
>> +}
>> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
  2020-09-17 18:18   ` Jesse Brandeburg
@ 2020-09-17 20:11   ` Bjorn Helgaas
  2020-09-17 21:48     ` Jacob Keller
  2020-09-17 22:09     ` Nitesh Narayan Lal
  2020-09-21 23:40   ` Frederic Weisbecker
  2 siblings, 2 replies; 30+ messages in thread
From: Bjorn Helgaas @ 2020-09-17 20:11 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot

[+cc Ingo, Peter, Juri, Vincent (scheduler maintainers)]

s/hosekeeping/housekeeping/ (in subject)

On Wed, Sep 09, 2020 at 11:08:16AM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API num_housekeeping_cpus(), that can be used to retrieve
> the number of housekeeping CPUs by reading an atomic variable
> __num_housekeeping_cpus. This variable is set from housekeeping_setup().
> 
> This API is introduced for the purpose of drivers that were previously
> relying only on num_online_cpus() to determine the number of MSIX vectors
> to create. In an RT environment with large isolated but a fewer
> housekeeping CPUs this was leading to a situation where an attempt to
> move all of the vectors corresponding to isolated CPUs to housekeeping
> CPUs was failing due to per CPU vector limit.

Totally kibitzing here, but AFAICT the concepts of "isolated CPU" and
"housekeeping CPU" are not currently exposed to drivers, and it's not
completely clear to me that they should be.

We have carefully constructed notions of possible, present, online,
active CPUs, and it seems like whatever we do here should be
somehow integrated with those.

> If there are no isolated CPUs specified then the API returns the number
> of all online CPUs.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h |  7 +++++++
>  kernel/sched/isolation.c        | 23 +++++++++++++++++++++++
>  2 files changed, 30 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..94c25d956d8a 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -25,6 +25,7 @@ extern bool housekeeping_enabled(enum hk_flags flags);
>  extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
>  extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
>  extern void __init housekeeping_init(void);
> +extern unsigned int num_housekeeping_cpus(void);
>  
>  #else
>  
> @@ -46,6 +47,12 @@ static inline bool housekeeping_enabled(enum hk_flags flags)
>  static inline void housekeeping_affine(struct task_struct *t,
>  				       enum hk_flags flags) { }
>  static inline void housekeeping_init(void) { }
> +
> +static unsigned int num_housekeeping_cpus(void)
> +{
> +	return num_online_cpus();
> +}
> +
>  #endif /* CONFIG_CPU_ISOLATION */
>  
>  static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 5a6ea03f9882..7024298390b7 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -13,6 +13,7 @@ DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
>  EXPORT_SYMBOL_GPL(housekeeping_overridden);
>  static cpumask_var_t housekeeping_mask;
>  static unsigned int housekeeping_flags;
> +static atomic_t __num_housekeeping_cpus __read_mostly;
>  
>  bool housekeeping_enabled(enum hk_flags flags)
>  {
> @@ -20,6 +21,27 @@ bool housekeeping_enabled(enum hk_flags flags)
>  }
>  EXPORT_SYMBOL_GPL(housekeeping_enabled);
>  
> +/*
> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
> + *
> + * This function returns the number of available housekeeping CPUs
> + * based on __num_housekeeping_cpus which is of type atomic_t
> + * and is initialized at the time of the housekeeping setup.
> + */
> +unsigned int num_housekeeping_cpus(void)
> +{
> +	unsigned int cpus;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		cpus = atomic_read(&__num_housekeeping_cpus);
> +		/* We should always have at least one housekeeping CPU */
> +		BUG_ON(!cpus);
> +		return cpus;
> +	}
> +	return num_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
> +
>  int housekeeping_any_cpu(enum hk_flags flags)
>  {
>  	int cpu;
> @@ -131,6 +153,7 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
>  
>  	housekeeping_flags |= flags;
>  
> +	atomic_set(&__num_housekeeping_cpus, cpumask_weight(housekeeping_mask));
>  	free_bootmem_cpumask_var(non_housekeeping_mask);
>  
>  	return 1;
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-17 20:11   ` Bjorn Helgaas
@ 2020-09-17 21:48     ` Jacob Keller
  2020-09-17 22:09     ` Nitesh Narayan Lal
  1 sibling, 0 replies; 30+ messages in thread
From: Jacob Keller @ 2020-09-17 21:48 UTC (permalink / raw)
  To: Bjorn Helgaas, Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jlelli, hch, bhelgaas, mike.marciniszyn,
	dennis.dalessandro, thomas.lendacky, jerinj, mathias.nyman, jiri,
	Ingo Molnar, Peter Zijlstra, Juri Lelli, Vincent Guittot



On 9/17/2020 1:11 PM, Bjorn Helgaas wrote:
> [+cc Ingo, Peter, Juri, Vincent (scheduler maintainers)]
> 
> s/hosekeeping/housekeeping/ (in subject)
> 
> On Wed, Sep 09, 2020 at 11:08:16AM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API num_housekeeping_cpus(), that can be used to retrieve
>> the number of housekeeping CPUs by reading an atomic variable
>> __num_housekeeping_cpus. This variable is set from housekeeping_setup().
>>
>> This API is introduced for the purpose of drivers that were previously
>> relying only on num_online_cpus() to determine the number of MSIX vectors
>> to create. In an RT environment with large isolated but a fewer
>> housekeeping CPUs this was leading to a situation where an attempt to
>> move all of the vectors corresponding to isolated CPUs to housekeeping
>> CPUs was failing due to per CPU vector limit.
> 
> Totally kibitzing here, but AFAICT the concepts of "isolated CPU" and
> "housekeeping CPU" are not currently exposed to drivers, and it's not
> completely clear to me that they should be.
> 
> We have carefully constructed notions of possible, present, online,
> active CPUs, and it seems like whatever we do here should be
> somehow integrated with those.
> 

Perhaps "active" CPUs could be separated to not include the isolated CPUs?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-17 20:11   ` Bjorn Helgaas
  2020-09-17 21:48     ` Jacob Keller
@ 2020-09-17 22:09     ` Nitesh Narayan Lal
  1 sibling, 0 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-17 22:09 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, netdev, linux-pci, frederic, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, Ingo Molnar, Peter Zijlstra, Juri Lelli,
	Vincent Guittot


[-- Attachment #1.1: Type: text/plain, Size: 4689 bytes --]


On 9/17/20 4:11 PM, Bjorn Helgaas wrote:
> [+cc Ingo, Peter, Juri, Vincent (scheduler maintainers)]
>
> s/hosekeeping/housekeeping/ (in subject)
>
> On Wed, Sep 09, 2020 at 11:08:16AM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API num_housekeeping_cpus(), that can be used to retrieve
>> the number of housekeeping CPUs by reading an atomic variable
>> __num_housekeeping_cpus. This variable is set from housekeeping_setup().
>>
>> This API is introduced for the purpose of drivers that were previously
>> relying only on num_online_cpus() to determine the number of MSIX vectors
>> to create. In an RT environment with large isolated but a fewer
>> housekeeping CPUs this was leading to a situation where an attempt to
>> move all of the vectors corresponding to isolated CPUs to housekeeping
>> CPUs was failing due to per CPU vector limit.
> Totally kibitzing here, but AFAICT the concepts of "isolated CPU" and
> "housekeeping CPU" are not currently exposed to drivers, and it's not
> completely clear to me that they should be.
>
> We have carefully constructed notions of possible, present, online,
> active CPUs, and it seems like whatever we do here should be
> somehow integrated with those.

At one point I thought about tweaking num_online_cpus(), but then I quickly
moved away from that just because it is extensively used in the kernel and we
don't have to modify the behavior at all those places.

Thank you for including Peter and Vincent as well.
I would be happy to discuss/explore other options.

>
>> If there are no isolated CPUs specified then the API returns the number
>> of all online CPUs.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h |  7 +++++++
>>  kernel/sched/isolation.c        | 23 +++++++++++++++++++++++
>>  2 files changed, 30 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..94c25d956d8a 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -25,6 +25,7 @@ extern bool housekeeping_enabled(enum hk_flags flags);
>>  extern void housekeeping_affine(struct task_struct *t, enum hk_flags flags);
>>  extern bool housekeeping_test_cpu(int cpu, enum hk_flags flags);
>>  extern void __init housekeeping_init(void);
>> +extern unsigned int num_housekeeping_cpus(void);
>>  
>>  #else
>>  
>> @@ -46,6 +47,12 @@ static inline bool housekeeping_enabled(enum hk_flags flags)
>>  static inline void housekeeping_affine(struct task_struct *t,
>>  				       enum hk_flags flags) { }
>>  static inline void housekeeping_init(void) { }
>> +
>> +static unsigned int num_housekeeping_cpus(void)
>> +{
>> +	return num_online_cpus();
>> +}
>> +
>>  #endif /* CONFIG_CPU_ISOLATION */
>>  
>>  static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
>> index 5a6ea03f9882..7024298390b7 100644
>> --- a/kernel/sched/isolation.c
>> +++ b/kernel/sched/isolation.c
>> @@ -13,6 +13,7 @@ DEFINE_STATIC_KEY_FALSE(housekeeping_overridden);
>>  EXPORT_SYMBOL_GPL(housekeeping_overridden);
>>  static cpumask_var_t housekeeping_mask;
>>  static unsigned int housekeeping_flags;
>> +static atomic_t __num_housekeeping_cpus __read_mostly;
>>  
>>  bool housekeeping_enabled(enum hk_flags flags)
>>  {
>> @@ -20,6 +21,27 @@ bool housekeeping_enabled(enum hk_flags flags)
>>  }
>>  EXPORT_SYMBOL_GPL(housekeeping_enabled);
>>  
>> +/*
>> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
>> + *
>> + * This function returns the number of available housekeeping CPUs
>> + * based on __num_housekeeping_cpus which is of type atomic_t
>> + * and is initialized at the time of the housekeeping setup.
>> + */
>> +unsigned int num_housekeeping_cpus(void)
>> +{
>> +	unsigned int cpus;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		cpus = atomic_read(&__num_housekeeping_cpus);
>> +		/* We should always have at least one housekeeping CPU */
>> +		BUG_ON(!cpus);
>> +		return cpus;
>> +	}
>> +	return num_online_cpus();
>> +}
>> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
>> +
>>  int housekeeping_any_cpu(enum hk_flags flags)
>>  {
>>  	int cpu;
>> @@ -131,6 +153,7 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
>>  
>>  	housekeeping_flags |= flags;
>>  
>> +	atomic_set(&__num_housekeeping_cpus, cpumask_weight(housekeeping_mask));
>>  	free_bootmem_cpumask_var(non_housekeeping_mask);
>>  
>>  	return 1;
>> -- 
>> 2.27.0
>>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-17 18:23   ` Jesse Brandeburg
  2020-09-17 18:31     ` Nitesh Narayan Lal
@ 2020-09-21 22:58     ` Frederic Weisbecker
  2020-09-22  3:08       ` Nitesh Narayan Lal
  1 sibling, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2020-09-21 22:58 UTC (permalink / raw)
  To: Jesse Brandeburg
  Cc: Nitesh Narayan Lal, linux-kernel, netdev, linux-pci, mtosatti,
	sassmann, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch,
	bhelgaas, mike.marciniszyn, dennis.dalessandro, thomas.lendacky,
	jerinj, mathias.nyman, jiri

On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
> Nitesh Narayan Lal wrote:
> 
> > In a realtime environment, it is essential to isolate unwanted IRQs from
> > isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> > based on the online CPUs could lead to a potential issue on an RT setup
> > that has several isolated CPUs but a very few housekeeping CPUs. This is
> > because in these kinds of setups an attempt to move the IRQs to the
> > limited housekeeping CPUs from isolated CPUs might fail due to the per
> > CPU vector limit. This could eventually result in latency spikes because
> > of the IRQ threads that we fail to move from isolated CPUs.
> > 
> > This patch prevents i40e to add vectors only based on available
> > housekeeping CPUs by using num_housekeeping_cpus().
> > 
> > Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> 
> The driver changes are straightforward, but this isn't the only driver
> with this issue, right?  I'm sure ixgbe and ice both have this problem
> too, you should fix them as well, at a minimum, and probably other
> vendors drivers:
> 
> $ rg -c --stats num_online_cpus drivers/net/ethernet
> ...
> 50 files contained matches

Ouch, I was indeed surprised that these MSI vector allocations were done
at the driver level and not at some $SUBSYSTEM level.

The logic is already there in the driver so I wouldn't oppose to this very patch
but would a shared infrastructure make sense for this? Something that would
also handle hotplug operations?

Does it possibly go even beyond networking drivers?

Thanks.

> 
> for this patch i40e
> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
  2020-09-17 18:18   ` Jesse Brandeburg
  2020-09-17 20:11   ` Bjorn Helgaas
@ 2020-09-21 23:40   ` Frederic Weisbecker
  2020-09-22  3:16     ` Nitesh Narayan Lal
  2 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2020-09-21 23:40 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

On Wed, Sep 09, 2020 at 11:08:16AM -0400, Nitesh Narayan Lal wrote:
> +/*
> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
> + *
> + * This function returns the number of available housekeeping CPUs
> + * based on __num_housekeeping_cpus which is of type atomic_t
> + * and is initialized at the time of the housekeeping setup.
> + */
> +unsigned int num_housekeeping_cpus(void)
> +{
> +	unsigned int cpus;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		cpus = atomic_read(&__num_housekeeping_cpus);
> +		/* We should always have at least one housekeeping CPU */
> +		BUG_ON(!cpus);
> +		return cpus;
> +	}
> +	return num_online_cpus();
> +}
> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
> +
>  int housekeeping_any_cpu(enum hk_flags flags)
>  {
>  	int cpu;
> @@ -131,6 +153,7 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
>  
>  	housekeeping_flags |= flags;
>  
> +	atomic_set(&__num_housekeeping_cpus, cpumask_weight(housekeeping_mask));

So the problem here is that it takes the whole cpumask weight but you're only
interested in the housekeepers who take the managed irq duties I guess
(HK_FLAG_MANAGED_IRQ ?).

>  	free_bootmem_cpumask_var(non_housekeeping_mask);
>  
>  	return 1;
> -- 
> 2.27.0
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-21 22:58     ` Frederic Weisbecker
@ 2020-09-22  3:08       ` Nitesh Narayan Lal
  2020-09-22  9:54         ` Frederic Weisbecker
  0 siblings, 1 reply; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22  3:08 UTC (permalink / raw)
  To: Frederic Weisbecker, Jesse Brandeburg
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 2408 bytes --]


On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
>> Nitesh Narayan Lal wrote:
>>
>>> In a realtime environment, it is essential to isolate unwanted IRQs from
>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>>> based on the online CPUs could lead to a potential issue on an RT setup
>>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>>> because in these kinds of setups an attempt to move the IRQs to the
>>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>>> CPU vector limit. This could eventually result in latency spikes because
>>> of the IRQ threads that we fail to move from isolated CPUs.
>>>
>>> This patch prevents i40e to add vectors only based on available
>>> housekeeping CPUs by using num_housekeeping_cpus().
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> The driver changes are straightforward, but this isn't the only driver
>> with this issue, right?  I'm sure ixgbe and ice both have this problem
>> too, you should fix them as well, at a minimum, and probably other
>> vendors drivers:
>>
>> $ rg -c --stats num_online_cpus drivers/net/ethernet
>> ...
>> 50 files contained matches
> Ouch, I was indeed surprised that these MSI vector allocations were done
> at the driver level and not at some $SUBSYSTEM level.
>
> The logic is already there in the driver so I wouldn't oppose to this very patch
> but would a shared infrastructure make sense for this? Something that would
> also handle hotplug operations?
>
> Does it possibly go even beyond networking drivers?

From a generic solution perspective, I think it makes sense to come up with a
shared infrastructure.
Something that can be consumed by all the drivers and maybe hotplug operations
as well (I will have to further explore the hotplug part).

However, there are RT workloads that are getting affected because of this
issue, so does it make sense to go ahead with this per-driver basis approach
for now?

Since a generic solution will require a fair amount of testing and
understanding of different drivers. Having said that, I can definetly start
looking in that direction.

Thanks for reviewing.

> Thanks.
>
>> for this patch i40e
>> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-21 23:40   ` Frederic Weisbecker
@ 2020-09-22  3:16     ` Nitesh Narayan Lal
  2020-09-22 10:08       ` Frederic Weisbecker
  0 siblings, 1 reply; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22  3:16 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 1582 bytes --]


On 9/21/20 7:40 PM, Frederic Weisbecker wrote:
> On Wed, Sep 09, 2020 at 11:08:16AM -0400, Nitesh Narayan Lal wrote:
>> +/*
>> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
>> + *
>> + * This function returns the number of available housekeeping CPUs
>> + * based on __num_housekeeping_cpus which is of type atomic_t
>> + * and is initialized at the time of the housekeeping setup.
>> + */
>> +unsigned int num_housekeeping_cpus(void)
>> +{
>> +	unsigned int cpus;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		cpus = atomic_read(&__num_housekeeping_cpus);
>> +		/* We should always have at least one housekeeping CPU */
>> +		BUG_ON(!cpus);
>> +		return cpus;
>> +	}
>> +	return num_online_cpus();
>> +}
>> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
>> +
>>  int housekeeping_any_cpu(enum hk_flags flags)
>>  {
>>  	int cpu;
>> @@ -131,6 +153,7 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
>>  
>>  	housekeeping_flags |= flags;
>>  
>> +	atomic_set(&__num_housekeeping_cpus, cpumask_weight(housekeeping_mask));
> So the problem here is that it takes the whole cpumask weight but you're only
> interested in the housekeepers who take the managed irq duties I guess
> (HK_FLAG_MANAGED_IRQ ?).

IMHO we should also consider the cases where we only have nohz_full.
Otherwise, we may run into the same situation on those setups, do you agree?

>
>>  	free_bootmem_cpumask_var(non_housekeeping_mask);
>>  
>>  	return 1;
>> -- 
>> 2.27.0
>>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-22  3:08       ` Nitesh Narayan Lal
@ 2020-09-22  9:54         ` Frederic Weisbecker
  2020-09-22 13:34           ` Nitesh Narayan Lal
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2020-09-22  9:54 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: Jesse Brandeburg, linux-kernel, netdev, linux-pci, mtosatti,
	sassmann, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch,
	bhelgaas, mike.marciniszyn, dennis.dalessandro, thomas.lendacky,
	jerinj, mathias.nyman, jiri

On Mon, Sep 21, 2020 at 11:08:20PM -0400, Nitesh Narayan Lal wrote:
> 
> On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
> > On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
> >> Nitesh Narayan Lal wrote:
> >>
> >>> In a realtime environment, it is essential to isolate unwanted IRQs from
> >>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
> >>> based on the online CPUs could lead to a potential issue on an RT setup
> >>> that has several isolated CPUs but a very few housekeeping CPUs. This is
> >>> because in these kinds of setups an attempt to move the IRQs to the
> >>> limited housekeeping CPUs from isolated CPUs might fail due to the per
> >>> CPU vector limit. This could eventually result in latency spikes because
> >>> of the IRQ threads that we fail to move from isolated CPUs.
> >>>
> >>> This patch prevents i40e to add vectors only based on available
> >>> housekeeping CPUs by using num_housekeeping_cpus().
> >>>
> >>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> >> The driver changes are straightforward, but this isn't the only driver
> >> with this issue, right?  I'm sure ixgbe and ice both have this problem
> >> too, you should fix them as well, at a minimum, and probably other
> >> vendors drivers:
> >>
> >> $ rg -c --stats num_online_cpus drivers/net/ethernet
> >> ...
> >> 50 files contained matches
> > Ouch, I was indeed surprised that these MSI vector allocations were done
> > at the driver level and not at some $SUBSYSTEM level.
> >
> > The logic is already there in the driver so I wouldn't oppose to this very patch
> > but would a shared infrastructure make sense for this? Something that would
> > also handle hotplug operations?
> >
> > Does it possibly go even beyond networking drivers?
> 
> From a generic solution perspective, I think it makes sense to come up with a
> shared infrastructure.
> Something that can be consumed by all the drivers and maybe hotplug operations
> as well (I will have to further explore the hotplug part).

That would be great. I'm completely clueless about those MSI things and the
actual needs of those drivers. Now it seems to me that if several CPUs become
offline, or as is planned in the future, CPU isolation gets enabled/disabled
through cpuset, then the vectors may need some reorganization.

But I don't also want to push toward a complicated solution to handle CPU hotplug
if there is no actual problem to solve there. So I let you guys judge.

> However, there are RT workloads that are getting affected because of this
> issue, so does it make sense to go ahead with this per-driver basis approach
> for now?

Yep that sounds good.

> 
> Since a generic solution will require a fair amount of testing and
> understanding of different drivers. Having said that, I can definetly start
> looking in that direction.

Thanks a lot!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-22  3:16     ` Nitesh Narayan Lal
@ 2020-09-22 10:08       ` Frederic Weisbecker
  2020-09-22 13:50         ` Nitesh Narayan Lal
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2020-09-22 10:08 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

On Mon, Sep 21, 2020 at 11:16:51PM -0400, Nitesh Narayan Lal wrote:
> 
> On 9/21/20 7:40 PM, Frederic Weisbecker wrote:
> > On Wed, Sep 09, 2020 at 11:08:16AM -0400, Nitesh Narayan Lal wrote:
> >> +/*
> >> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
> >> + *
> >> + * This function returns the number of available housekeeping CPUs
> >> + * based on __num_housekeeping_cpus which is of type atomic_t
> >> + * and is initialized at the time of the housekeeping setup.
> >> + */
> >> +unsigned int num_housekeeping_cpus(void)
> >> +{
> >> +	unsigned int cpus;
> >> +
> >> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> >> +		cpus = atomic_read(&__num_housekeeping_cpus);
> >> +		/* We should always have at least one housekeeping CPU */
> >> +		BUG_ON(!cpus);
> >> +		return cpus;
> >> +	}
> >> +	return num_online_cpus();
> >> +}
> >> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
> >> +
> >>  int housekeeping_any_cpu(enum hk_flags flags)
> >>  {
> >>  	int cpu;
> >> @@ -131,6 +153,7 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
> >>  
> >>  	housekeeping_flags |= flags;
> >>  
> >> +	atomic_set(&__num_housekeeping_cpus, cpumask_weight(housekeeping_mask));
> > So the problem here is that it takes the whole cpumask weight but you're only
> > interested in the housekeepers who take the managed irq duties I guess
> > (HK_FLAG_MANAGED_IRQ ?).
> 
> IMHO we should also consider the cases where we only have nohz_full.
> Otherwise, we may run into the same situation on those setups, do you agree?

I guess it's up to the user to gather the tick and managed irq housekeeping
together?

Of course that makes the implementation more complicated. But if this is
called only on drivers initialization for now, this could be just a function
that does:

cpumask_weight(cpu_online_mask | housekeeping_cpumask(HK_FLAG_MANAGED_IRQ))

And then can we rename it to housekeeping_num_online()?

Thanks.

> >
> >>  	free_bootmem_cpumask_var(non_housekeeping_mask);
> >>  
> >>  	return 1;
> >> -- 
> >> 2.27.0
> >>
> -- 
> Thanks
> Nitesh
> 




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-22  9:54         ` Frederic Weisbecker
@ 2020-09-22 13:34           ` Nitesh Narayan Lal
  2020-09-22 20:44             ` Frederic Weisbecker
  0 siblings, 1 reply; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22 13:34 UTC (permalink / raw)
  To: Frederic Weisbecker, bhelgaas
  Cc: Jesse Brandeburg, linux-kernel, netdev, linux-pci, mtosatti,
	sassmann, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 3127 bytes --]


On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
> On Mon, Sep 21, 2020 at 11:08:20PM -0400, Nitesh Narayan Lal wrote:
>> On 9/21/20 6:58 PM, Frederic Weisbecker wrote:
>>> On Thu, Sep 17, 2020 at 11:23:59AM -0700, Jesse Brandeburg wrote:
>>>> Nitesh Narayan Lal wrote:
>>>>
>>>>> In a realtime environment, it is essential to isolate unwanted IRQs from
>>>>> isolated CPUs to prevent latency overheads. Creating MSIX vectors only
>>>>> based on the online CPUs could lead to a potential issue on an RT setup
>>>>> that has several isolated CPUs but a very few housekeeping CPUs. This is
>>>>> because in these kinds of setups an attempt to move the IRQs to the
>>>>> limited housekeeping CPUs from isolated CPUs might fail due to the per
>>>>> CPU vector limit. This could eventually result in latency spikes because
>>>>> of the IRQ threads that we fail to move from isolated CPUs.
>>>>>
>>>>> This patch prevents i40e to add vectors only based on available
>>>>> housekeeping CPUs by using num_housekeeping_cpus().
>>>>>
>>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>> The driver changes are straightforward, but this isn't the only driver
>>>> with this issue, right?  I'm sure ixgbe and ice both have this problem
>>>> too, you should fix them as well, at a minimum, and probably other
>>>> vendors drivers:
>>>>
>>>> $ rg -c --stats num_online_cpus drivers/net/ethernet
>>>> ...
>>>> 50 files contained matches
>>> Ouch, I was indeed surprised that these MSI vector allocations were done
>>> at the driver level and not at some $SUBSYSTEM level.
>>>
>>> The logic is already there in the driver so I wouldn't oppose to this very patch
>>> but would a shared infrastructure make sense for this? Something that would
>>> also handle hotplug operations?
>>>
>>> Does it possibly go even beyond networking drivers?
>> From a generic solution perspective, I think it makes sense to come up with a
>> shared infrastructure.
>> Something that can be consumed by all the drivers and maybe hotplug operations
>> as well (I will have to further explore the hotplug part).
> That would be great. I'm completely clueless about those MSI things and the
> actual needs of those drivers. Now it seems to me that if several CPUs become
> offline, or as is planned in the future, CPU isolation gets enabled/disabled
> through cpuset, then the vectors may need some reorganization.

+1

>
> But I don't also want to push toward a complicated solution to handle CPU hotplug
> if there is no actual problem to solve there.

Sure, even I am not particularly sure about the hotplug scenarios.

>  So I let you guys judge.
>
>> However, there are RT workloads that are getting affected because of this
>> issue, so does it make sense to go ahead with this per-driver basis approach
>> for now?
> Yep that sounds good.

Thank you for confirming.

>
>> Since a generic solution will require a fair amount of testing and
>> understanding of different drivers. Having said that, I can definetly start
>> looking in that direction.
> Thanks a lot!
>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-22 10:08       ` Frederic Weisbecker
@ 2020-09-22 13:50         ` Nitesh Narayan Lal
  2020-09-22 20:58           ` Frederic Weisbecker
  0 siblings, 1 reply; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22 13:50 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 2754 bytes --]


On 9/22/20 6:08 AM, Frederic Weisbecker wrote:
> On Mon, Sep 21, 2020 at 11:16:51PM -0400, Nitesh Narayan Lal wrote:
>> On 9/21/20 7:40 PM, Frederic Weisbecker wrote:
>>> On Wed, Sep 09, 2020 at 11:08:16AM -0400, Nitesh Narayan Lal wrote:
>>>> +/*
>>>> + * num_housekeeping_cpus() - Read the number of housekeeping CPUs.
>>>> + *
>>>> + * This function returns the number of available housekeeping CPUs
>>>> + * based on __num_housekeeping_cpus which is of type atomic_t
>>>> + * and is initialized at the time of the housekeeping setup.
>>>> + */
>>>> +unsigned int num_housekeeping_cpus(void)
>>>> +{
>>>> +	unsigned int cpus;
>>>> +
>>>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>>>> +		cpus = atomic_read(&__num_housekeeping_cpus);
>>>> +		/* We should always have at least one housekeeping CPU */
>>>> +		BUG_ON(!cpus);
>>>> +		return cpus;
>>>> +	}
>>>> +	return num_online_cpus();
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(num_housekeeping_cpus);
>>>> +
>>>>  int housekeeping_any_cpu(enum hk_flags flags)
>>>>  {
>>>>  	int cpu;
>>>> @@ -131,6 +153,7 @@ static int __init housekeeping_setup(char *str, enum hk_flags flags)
>>>>  
>>>>  	housekeeping_flags |= flags;
>>>>  
>>>> +	atomic_set(&__num_housekeeping_cpus, cpumask_weight(housekeeping_mask));
>>> So the problem here is that it takes the whole cpumask weight but you're only
>>> interested in the housekeepers who take the managed irq duties I guess
>>> (HK_FLAG_MANAGED_IRQ ?).
>> IMHO we should also consider the cases where we only have nohz_full.
>> Otherwise, we may run into the same situation on those setups, do you agree?
> I guess it's up to the user to gather the tick and managed irq housekeeping
> together?

TBH I don't have a very strong case here at the moment.
But still, IMHO, this will force the user to have both managed irqs and
nohz_full in their environments to avoid these kinds of issues. Is that how
we would like to proceed?

The reason why I want to get this clarity is that going forward for any RT
related work I can form my thoughts based on this discussion.

>
> Of course that makes the implementation more complicated. But if this is
> called only on drivers initialization for now, this could be just a function
> that does:
>
> cpumask_weight(cpu_online_mask | housekeeping_cpumask(HK_FLAG_MANAGED_IRQ))

Ack, this makes more sense.

>
> And then can we rename it to housekeeping_num_online()?

It could be just me, but does something like hk_num_online_cpus() makes more
sense here?

>
> Thanks.
>
>>>>  	free_bootmem_cpumask_var(non_housekeeping_mask);
>>>>  
>>>>  	return 1;
>>>> -- 
>>>> 2.27.0
>>>>
>> -- 
>> Thanks
>> Nitesh
>>
>
>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-10 19:31     ` Nitesh Narayan Lal
@ 2020-09-22 13:54       ` Nitesh Narayan Lal
  2020-09-22 21:08         ` Frederic Weisbecker
  0 siblings, 1 reply; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22 13:54 UTC (permalink / raw)
  To: Marcelo Tosatti, frederic, bhelgaas
  Cc: linux-kernel, netdev, linux-pci, sassmann, jeffrey.t.kirsher,
	jacob.e.keller, jlelli, hch, mike.marciniszyn,
	dennis.dalessandro, thomas.lendacky, jerinj, mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 3173 bytes --]


On 9/10/20 3:31 PM, Nitesh Narayan Lal wrote:
> On 9/10/20 3:22 PM, Marcelo Tosatti wrote:
>> On Wed, Sep 09, 2020 at 11:08:18AM -0400, Nitesh Narayan Lal wrote:
>>> This patch limits the pci_alloc_irq_vectors max vectors that is passed on
>>> by the caller based on the available housekeeping CPUs by only using the
>>> minimum of the two.
>>>
>>> A minimum of the max_vecs passed and available housekeeping CPUs is
>>> derived to ensure that we don't create excess vectors which can be
>>> problematic specifically in an RT environment. This is because for an RT
>>> environment unwanted IRQs are moved to the housekeeping CPUs from
>>> isolated CPUs to keep the latency overhead to a minimum. If the number of
>>> housekeeping CPUs are significantly lower than that of the isolated CPUs
>>> we can run into failures while moving these IRQs to housekeeping due to
>>> per CPU vector limit.
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>> ---
>>>  include/linux/pci.h | 16 ++++++++++++++++
>>>  1 file changed, 16 insertions(+)
>>>
>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>> index 835530605c0d..750ba927d963 100644
>>> --- a/include/linux/pci.h
>>> +++ b/include/linux/pci.h
>>> @@ -38,6 +38,7 @@
>>>  #include <linux/interrupt.h>
>>>  #include <linux/io.h>
>>>  #include <linux/resource_ext.h>
>>> +#include <linux/sched/isolation.h>
>>>  #include <uapi/linux/pci.h>
>>>  
>>>  #include <linux/pci_ids.h>
>>> @@ -1797,6 +1798,21 @@ static inline int
>>>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>  		      unsigned int max_vecs, unsigned int flags)
>>>  {
>>> +	unsigned int num_housekeeping = num_housekeeping_cpus();
>>> +	unsigned int num_online = num_online_cpus();
>>> +
>>> +	/*
>>> +	 * Try to be conservative and at max only ask for the same number of
>>> +	 * vectors as there are housekeeping CPUs. However, skip any
>>> +	 * modification to the of max vectors in two conditions:
>>> +	 * 1. If the min_vecs requested are higher than that of the
>>> +	 *    housekeeping CPUs as we don't want to prevent the initialization
>>> +	 *    of a device.
>>> +	 * 2. If there are no isolated CPUs as in this case the driver should
>>> +	 *    already have taken online CPUs into consideration.
>>> +	 */
>>> +	if (min_vecs < num_housekeeping && num_housekeeping != num_online)
>>> +		max_vecs = min_t(int, max_vecs, num_housekeeping);
>>>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>>  					      NULL);
>>>  }
>> If min_vecs > num_housekeeping, for example:
>>
>> /* PCI MSI/MSIx support */
>> #define XGBE_MSI_BASE_COUNT     4
>> #define XGBE_MSI_MIN_COUNT      (XGBE_MSI_BASE_COUNT + 1)
>>
>> Then the protection fails.
> Right, I was ignoring that case.
>
>> How about reducing max_vecs down to min_vecs, if min_vecs >
>> num_housekeeping ?
> Yes, I think this makes sense.
> I will wait a bit to see if anyone else has any other comment and will post
> the next version then.
>

Are there any other comments/concerns on this patch that I need to address in
the next posting?

-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-22 13:34           ` Nitesh Narayan Lal
@ 2020-09-22 20:44             ` Frederic Weisbecker
  2020-09-22 21:05               ` Nitesh Narayan Lal
  0 siblings, 1 reply; 30+ messages in thread
From: Frederic Weisbecker @ 2020-09-22 20:44 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: bhelgaas, Jesse Brandeburg, linux-kernel, netdev, linux-pci,
	mtosatti, sassmann, jeffrey.t.kirsher, jacob.e.keller, jlelli,
	hch, mike.marciniszyn, dennis.dalessandro, thomas.lendacky,
	jerinj, mathias.nyman, jiri

On Tue, Sep 22, 2020 at 09:34:02AM -0400, Nitesh Narayan Lal wrote:
> On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
> > But I don't also want to push toward a complicated solution to handle CPU hotplug
> > if there is no actual problem to solve there.
> 
> Sure, even I am not particularly sure about the hotplug scenarios.

Surely when isolation is something that will be configurable through
cpuset, it will become interesting. For now it's probably not worth
adding hundreds lines of code if nobody steps into such issue yet.

What is probably more worthy for now is some piece of code to consolidate
the allocation of those MSI vectors on top of the number of housekeeping
CPUs.

Thanks.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-22 13:50         ` Nitesh Narayan Lal
@ 2020-09-22 20:58           ` Frederic Weisbecker
  2020-09-22 21:15             ` Nitesh Narayan Lal
  2020-09-22 21:26             ` Andrew Lunn
  0 siblings, 2 replies; 30+ messages in thread
From: Frederic Weisbecker @ 2020-09-22 20:58 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

On Tue, Sep 22, 2020 at 09:50:55AM -0400, Nitesh Narayan Lal wrote:
> On 9/22/20 6:08 AM, Frederic Weisbecker wrote:
> TBH I don't have a very strong case here at the moment.
> But still, IMHO, this will force the user to have both managed irqs and
> nohz_full in their environments to avoid these kinds of issues. Is that how
> we would like to proceed?

Yep that sounds good to me. I never know how much we want to split each and any
of the isolation features but I'd rather stay cautious to separate HK_FLAG_TICK
from the rest, just in case running in nohz_full mode ever becomes interesting
alone for performance and not just latency/isolation.

But look what you can do as well:

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..9df9598a9e39 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
 	unsigned int flags;
 
 	flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
-		HK_FLAG_MISC | HK_FLAG_KTHREAD;
+		HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_MANAGED_IRQ;
 
 	return housekeeping_setup(str, flags);
 }


"nohz_full=" has historically gathered most wanted isolation features. It can
as well isolate managed irqs.


> > And then can we rename it to housekeeping_num_online()?
> 
> It could be just me, but does something like hk_num_online_cpus() makes more
> sense here?

Sure, that works as well.

Thanks.

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-22 20:44             ` Frederic Weisbecker
@ 2020-09-22 21:05               ` Nitesh Narayan Lal
  0 siblings, 0 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22 21:05 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: bhelgaas, Jesse Brandeburg, linux-kernel, netdev, linux-pci,
	mtosatti, sassmann, jeffrey.t.kirsher, jacob.e.keller, jlelli,
	hch, mike.marciniszyn, dennis.dalessandro, thomas.lendacky,
	jerinj, mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 807 bytes --]


On 9/22/20 4:44 PM, Frederic Weisbecker wrote:
> On Tue, Sep 22, 2020 at 09:34:02AM -0400, Nitesh Narayan Lal wrote:
>> On 9/22/20 5:54 AM, Frederic Weisbecker wrote:
>>> But I don't also want to push toward a complicated solution to handle CPU hotplug
>>> if there is no actual problem to solve there.
>> Sure, even I am not particularly sure about the hotplug scenarios.
> Surely when isolation is something that will be configurable through
> cpuset, it will become interesting. For now it's probably not worth
> adding hundreds lines of code if nobody steps into such issue yet.
>
> What is probably more worthy for now is some piece of code to consolidate
> the allocation of those MSI vectors on top of the number of housekeeping
> CPUs.

+1

>
> Thanks.
>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-22 13:54       ` Nitesh Narayan Lal
@ 2020-09-22 21:08         ` Frederic Weisbecker
  0 siblings, 0 replies; 30+ messages in thread
From: Frederic Weisbecker @ 2020-09-22 21:08 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: Marcelo Tosatti, bhelgaas, linux-kernel, netdev, linux-pci,
	sassmann, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri

On Tue, Sep 22, 2020 at 09:54:58AM -0400, Nitesh Narayan Lal wrote:
> >> If min_vecs > num_housekeeping, for example:
> >>
> >> /* PCI MSI/MSIx support */
> >> #define XGBE_MSI_BASE_COUNT     4
> >> #define XGBE_MSI_MIN_COUNT      (XGBE_MSI_BASE_COUNT + 1)
> >>
> >> Then the protection fails.
> > Right, I was ignoring that case.
> >
> >> How about reducing max_vecs down to min_vecs, if min_vecs >
> >> num_housekeeping ?
> > Yes, I think this makes sense.
> > I will wait a bit to see if anyone else has any other comment and will post
> > the next version then.
> >
> 
> Are there any other comments/concerns on this patch that I need to address in
> the next posting?

No objection from me, I don't know much about this area anyway.

> -- 
> Nitesh
> 




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-22 20:58           ` Frederic Weisbecker
@ 2020-09-22 21:15             ` Nitesh Narayan Lal
  2020-09-22 21:26             ` Andrew Lunn
  1 sibling, 0 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22 21:15 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 1710 bytes --]


On 9/22/20 4:58 PM, Frederic Weisbecker wrote:
> On Tue, Sep 22, 2020 at 09:50:55AM -0400, Nitesh Narayan Lal wrote:
>> On 9/22/20 6:08 AM, Frederic Weisbecker wrote:
>> TBH I don't have a very strong case here at the moment.
>> But still, IMHO, this will force the user to have both managed irqs and
>> nohz_full in their environments to avoid these kinds of issues. Is that how
>> we would like to proceed?
> Yep that sounds good to me. I never know how much we want to split each and any
> of the isolation features but I'd rather stay cautious to separate HK_FLAG_TICK
> from the rest, just in case running in nohz_full mode ever becomes interesting
> alone for performance and not just latency/isolation.

Fair point.

>
> But look what you can do as well:
>
> diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
> index 5a6ea03f9882..9df9598a9e39 100644
> --- a/kernel/sched/isolation.c
> +++ b/kernel/sched/isolation.c
> @@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
>  	unsigned int flags;
>  
>  	flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
> -		HK_FLAG_MISC | HK_FLAG_KTHREAD;
> +		HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_MANAGED_IRQ;
>  
>  	return housekeeping_setup(str, flags);
>  }
>
>
> "nohz_full=" has historically gathered most wanted isolation features. It can
> as well isolate managed irqs.

Nice, yeap this will work.

>
>
>>> And then can we rename it to housekeeping_num_online()?
>> It could be just me, but does something like hk_num_online_cpus() makes more
>> sense here?
> Sure, that works as well.

Thanks a lot for all the help.

>
> Thanks.
>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-22 20:58           ` Frederic Weisbecker
  2020-09-22 21:15             ` Nitesh Narayan Lal
@ 2020-09-22 21:26             ` Andrew Lunn
  2020-09-22 22:20               ` Nitesh Narayan Lal
  1 sibling, 1 reply; 30+ messages in thread
From: Andrew Lunn @ 2020-09-22 21:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Nitesh Narayan Lal, linux-kernel, netdev, linux-pci, mtosatti,
	sassmann, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch,
	bhelgaas, mike.marciniszyn, dennis.dalessandro, thomas.lendacky,
	jerinj, mathias.nyman, jiri

> Subject: Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs

Hosekeeping? Are these CPUs out gardening in the weeds?

	     Andrew

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
  2020-09-22 21:26             ` Andrew Lunn
@ 2020-09-22 22:20               ` Nitesh Narayan Lal
  0 siblings, 0 replies; 30+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-22 22:20 UTC (permalink / raw)
  To: Andrew Lunn, Frederic Weisbecker
  Cc: linux-kernel, netdev, linux-pci, mtosatti, sassmann,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri


[-- Attachment #1.1: Type: text/plain, Size: 381 bytes --]


On 9/22/20 5:26 PM, Andrew Lunn wrote:
>> Subject: Re: [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs
> Hosekeeping? Are these CPUs out gardening in the weeds?

Bjorn has already highlighted the typo, so I will be fixing it in the next
version.
Do you find the commit message and body of this patch unclear?

>
> 	     Andrew
>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2020-09-22 22:20 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-09 15:08 [RFC] [PATCH v1 0/3] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-09 15:08 ` [RFC][Patch v1 1/3] sched/isolation: API to get num of hosekeeping CPUs Nitesh Narayan Lal
2020-09-17 18:18   ` Jesse Brandeburg
2020-09-17 18:43     ` Nitesh Narayan Lal
2020-09-17 20:11   ` Bjorn Helgaas
2020-09-17 21:48     ` Jacob Keller
2020-09-17 22:09     ` Nitesh Narayan Lal
2020-09-21 23:40   ` Frederic Weisbecker
2020-09-22  3:16     ` Nitesh Narayan Lal
2020-09-22 10:08       ` Frederic Weisbecker
2020-09-22 13:50         ` Nitesh Narayan Lal
2020-09-22 20:58           ` Frederic Weisbecker
2020-09-22 21:15             ` Nitesh Narayan Lal
2020-09-22 21:26             ` Andrew Lunn
2020-09-22 22:20               ` Nitesh Narayan Lal
2020-09-09 15:08 ` [RFC][Patch v1 2/3] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-11 15:23   ` Marcelo Tosatti
2020-09-17 18:23   ` Jesse Brandeburg
2020-09-17 18:31     ` Nitesh Narayan Lal
2020-09-21 22:58     ` Frederic Weisbecker
2020-09-22  3:08       ` Nitesh Narayan Lal
2020-09-22  9:54         ` Frederic Weisbecker
2020-09-22 13:34           ` Nitesh Narayan Lal
2020-09-22 20:44             ` Frederic Weisbecker
2020-09-22 21:05               ` Nitesh Narayan Lal
2020-09-09 15:08 ` [RFC][Patch v1 3/3] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
2020-09-10 19:22   ` Marcelo Tosatti
2020-09-10 19:31     ` Nitesh Narayan Lal
2020-09-22 13:54       ` Nitesh Narayan Lal
2020-09-22 21:08         ` Frederic Weisbecker

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).