All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs
@ 2020-09-23 18:11 ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

This is a follow-up posting for "[RFC v1 0/3] isolation: limit msix vectors
based on housekeeping CPUs".


Issue
=====
With the current implementation device drivers while creating their MSIX        
vectors only take num_online_cpus() into consideration which works quite well  
for a non-RT environment, but in an RT environment that has a large number of   
isolated CPUs and very few housekeeping CPUs this could lead to a problem.    
The problem will be triggered when something like tuned will try to move all    
the IRQs from isolated CPUs to the limited number of housekeeping CPUs to       
prevent interruptions for a latency-sensitive workload that will be running on   
the isolated CPUs. This failure is caused because of the per CPU vector         
limitation.                                                                     


Proposed Fix
============
In this patch-set, the following changes are proposed:
- A generic API hk_num_online_cpus() which is meant to return the online
  housekeeping CPUs that are meant to handle managed IRQ jobs.
- i40e: Specifically for the i40e driver the num_online_cpus() used in 
  i40e_init_msix() to calculate numbers msix vectors is replaced with the above
  defined API. This is done to restrict the number of msix vectors for i40e in
  RT environments.
- pci_alloc_irq_vector(): With the help of hk_num_online_cpus() the max_vecs
  passed in pci_alloc_irq_vector() is restricted only to the online
  housekeeping CPUs only in an RT environment. However, if the min_vecs exceeds
  the online housekeeping CPUs, max_vecs is limited based on the min_vecs
  instead.


Future Work
===========

- In the previous upstream discussion [1], it was decided that it would be
  better if we can have a generic framework that can be consumed by all the
  drivers to  fix this kind of issue. However, it will be a long term work,
  and since there are RT workloads that are getting impacted by the reported
  issue. We agreed upon the proposed per-device approach for now.


Testing
=======
Functionality:
- To test that the issue is resolved with i40e change I added a tracepoint
  in i40e_init_msix() to find the number of CPUs derived for vector creation
  with and without tuned's realtime-virtual-host profile. As per expectation
  with the profile applied I was only getting the number of housekeeping CPUs
  and all available CPUs without it.
  Similarly did a few more tests with different modes eg with only
  nohz_full, isolcpus etc.

Performance:
- To analyze the performance impact I have targetted the change introduced in 
  pci_alloc_irq_vectors() and compared the results against a vanilla kernel
  (5.9.0-rc3) results.

  Setup Information:
  + I had a couple of 24-core machines connected back to back via a couple of
    mlx5 NICs and I analyzed the average bitrate for server-client TCP and UDP
    transmission via iperf. 
  + To minimize the Bitrate variation of iperf TCP and UDP stream test I have
    applied the tuned's network-throughput profile and disabled HT.
 Test Information:
  + For the environment that had no isolated CPUs:
    I have tested with single stream and 24 streams (same as that of online
    CPUs).
  + For the environment that had 20 isolated CPUs:
    I have tested with single stream, 4 streams (same as that the number of
    housekeeping) and 24 streams (same as that of online CPUs).

 Results:
  # UDP Stream Test:
    + There was no degradation observed in UDP stream tests in both
      environments. (With isolated CPUs and without isolated CPUs after the
      introduction of the patches).
  # TCP Stream Test - No isolated CPUs:
    + No noticeable degradation was observed.
  # TCP Stream Test - With isolated CPUs:
    + Multiple Stream (4)  - Average degradation of around 5-6%
    + Multiple Stream (24) - Average degradation of around 2-3%
    + Single Stream        - Even on a vanilla kernel the Bitrate observed for
                             a TCP single stream test seem to vary
                             significantly across different runs (eg. the %
                             variation between the best and the worst case on
                             a vanilla kernel was around 8-10%). A similar
                             variation was observed with the kernel that
                             included my patches. No additional degradation
                             was observed.

If there are any suggestions for more performance evaluation, I would
be happy to discuss/perform them.

Changes from v1[2]:
==================
Patch1:                                                                       
- Replaced num_houskeeeping_cpus() with hk_num_online_cpus() and started using
  the cpumask corresponding to HK_FLAG_MANAGED_IRQ to derive the number of
  online housekeeping CPUs. This is based on Frederic Weisbecker's suggestion.                                           
- Since the hk_num_online_cpus() is self-explanatory, got rid of             
  the comment that was added previously.                                     
Patch2:                                                                       
- Added a new patch that is meant to enable managed IRQ isolation for nohz_full
  CPUs. This is based on Frederic Weisbecker's suggestion.              
Patch4 (PCI):                                                                 
- For cases where the min_vecs exceeds the online housekeeping CPUs, instead
  of skipping modification to max_vecs, started restricting it based on the
  min_vecs. This is based on a suggestion from Marcelo Tosatti.                                                                    

[1] https://lore.kernel.org/lkml/20200922095440.GA5217@lenoir/
[2] https://lore.kernel.org/lkml/20200909150818.313699-1-nitesh@redhat.com/

Nitesh Narayan Lal (4):
  sched/isolation: API to get housekeeping online CPUs
  sched/isolation: Extend nohz_full to isolate managed IRQs
  i40e: limit msix vectors based on housekeeping CPUs
  PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs

 drivers/net/ethernet/intel/i40e/i40e_main.c |  3 ++-
 include/linux/pci.h                         | 15 +++++++++++++++
 include/linux/sched/isolation.h             | 13 +++++++++++++
 kernel/sched/isolation.c                    |  2 +-
 4 files changed, 31 insertions(+), 2 deletions(-)

-- 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs
@ 2020-09-23 18:11 ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: intel-wired-lan

This is a follow-up posting for "[RFC v1 0/3] isolation: limit msix vectors
based on housekeeping CPUs".


Issue
=====
With the current implementation device drivers while creating their MSIX ? ? ? ?
vectors only take num_online_cpus() into consideration which works quite well ?
for a non-RT environment, but in an RT environment that has a large number of ? 
isolated CPUs and very few housekeeping CPUs this could lead to a problem. ? ?
The problem will be triggered when something like tuned will try to move all ? ?
the IRQs from isolated CPUs to the limited number of housekeeping CPUs to ? ? ? 
prevent interruptions for a latency-sensitive workload that will be running on ? 
the isolated CPUs. This failure is caused because of the per CPU vector ? ? ? ? 
limitation. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 


Proposed Fix
============
In this patch-set, the following changes are proposed:
- A generic API hk_num_online_cpus() which is meant to return the online
  housekeeping CPUs that are meant to handle managed IRQ jobs.
- i40e: Specifically for the i40e driver the num_online_cpus() used in 
? i40e_init_msix() to calculate numbers msix vectors is replaced with the above
? defined API. This is done to restrict the number of msix vectors for i40e in
  RT environments.
- pci_alloc_irq_vector(): With the help of hk_num_online_cpus() the max_vecs
? passed in pci_alloc_irq_vector() is restricted only to the online
? housekeeping CPUs only in an RT environment. However, if the min_vecs exceeds
  the online housekeeping CPUs, max_vecs is limited based on the min_vecs
  instead.


Future Work
===========

- In the previous upstream discussion [1], it was decided that it would be
  better if we can have a generic framework that can be consumed by all the
  drivers to  fix this kind of issue. However, it will be a long term work,
  and since there are RT workloads that are getting impacted by the reported
  issue. We agreed upon the proposed per-device approach for now.


Testing
=======
Functionality:
- To test that the issue is resolved with i40e change I added a tracepoint
? in i40e_init_msix() to find the number of CPUs derived for vector creation
? with and without tuned's realtime-virtual-host profile. As per expectation
? with the profile applied I was only getting the number of housekeeping CPUs
? and all available CPUs without it.
  Similarly did a few more tests with different modes eg with only
  nohz_full, isolcpus etc.

Performance:
- To analyze the performance impact I have targetted the change introduced in 
? pci_alloc_irq_vectors() and compared the results against a vanilla kernel
? (5.9.0-rc3) results.

? Setup Information:
? + I had a couple of 24-core machines connected back to back via a couple of
? ? mlx5 NICs and I analyzed the average bitrate for server-client TCP and UDP
? ? transmission via iperf. 
? + To minimize the Bitrate variation of iperf TCP and UDP stream test I have
? ? applied the tuned's network-throughput profile and disabled HT.
?Test Information:
? + For the environment that had no isolated CPUs:
? ? I have tested with single stream and 24 streams (same as that of online
? ? CPUs).
? + For the environment that had 20 isolated CPUs:
? ? I have tested with single stream, 4 streams (same as that the number of
? ? housekeeping) and 24 streams (same as that of online CPUs).

?Results:
? # UDP Stream Test:
?   + There was no degradation observed in UDP stream tests in both
      environments. (With isolated CPUs and without isolated CPUs after the
      introduction of the patches).
? # TCP Stream Test - No isolated CPUs:
?   + No noticeable degradation was observed.
? # TCP Stream Test - With isolated CPUs:
?   + Multiple Stream (4) ?- Average degradation of around 5-6%
?   + Multiple Stream (24) - Average degradation of around 2-3%
?   + Single Stream ? ? ? ?- Even on a vanilla kernel the Bitrate observed for
                             a TCP single stream test seem to vary
                             significantly across different runs (eg. the %
                             variation between the best and the worst case on
                             a vanilla kernel was around 8-10%). A similar
                             variation was observed with the kernel that
                             included my patches. No additional degradation
                             was observed.

If there are any suggestions for more performance evaluation, I would
be happy to discuss/perform them.

Changes from v1[2]:
==================
Patch1: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
- Replaced num_houskeeeping_cpus() with hk_num_online_cpus() and started using
  the cpumask corresponding to HK_FLAG_MANAGED_IRQ to derive the number of
  online housekeeping CPUs. This is based on Frederic Weisbecker's suggestion. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
- Since the hk_num_online_cpus() is self-explanatory, got rid of ? ? ? ? ? ? 
? the comment that was added previously. ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
Patch2: ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
- Added a new patch that is meant to enable managed IRQ isolation for nohz_full
  CPUs. This is based on Frederic Weisbecker's suggestion.? ? ? ? ? ? ? 
Patch4 (PCI): ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? 
- For cases where the min_vecs exceeds the online housekeeping CPUs, instead
  of skipping modification to max_vecs, started restricting it based on the
  min_vecs. This is based on a suggestion from Marcelo Tosatti.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ??

[1] https://lore.kernel.org/lkml/20200922095440.GA5217 at lenoir/
[2] https://lore.kernel.org/lkml/20200909150818.313699-1-nitesh at redhat.com/

Nitesh Narayan Lal (4):
  sched/isolation: API to get housekeeping online CPUs
  sched/isolation: Extend nohz_full to isolate managed IRQs
  i40e: limit msix vectors based on housekeeping CPUs
  PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs

 drivers/net/ethernet/intel/i40e/i40e_main.c |  3 ++-
 include/linux/pci.h                         | 15 +++++++++++++++
 include/linux/sched/isolation.h             | 13 +++++++++++++
 kernel/sched/isolation.c                    |  2 +-
 4 files changed, 31 insertions(+), 2 deletions(-)

-- 



^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-23 18:11 ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

Introduce a new API hk_num_online_cpus(), that can be used to
retrieve the number of online housekeeping CPUs that are meant to handle
managed IRQ jobs.

This API is introduced for the drivers that were previously relying only
on num_online_cpus() to determine the number of MSIX vectors to create.
In an RT environment with large isolated but fewer housekeeping CPUs this
was leading to a situation where an attempt to move all of the vectors
corresponding to isolated CPUs to housekeeping CPUs were failing due to
per CPU vector limit.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 include/linux/sched/isolation.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index cc9f393e2a70..2e96b626e02e 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
 	return true;
 }
 
+static inline unsigned int hk_num_online_cpus(void)
+{
+#ifdef CONFIG_CPU_ISOLATION
+	const struct cpumask *hk_mask;
+
+	if (static_branch_unlikely(&housekeeping_overridden)) {
+		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
+		return cpumask_weight(hk_mask);
+	}
+#endif
+	return cpumask_weight(cpu_online_mask);
+}
+
 #endif /* _LINUX_SCHED_ISOLATION_H */
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: intel-wired-lan

Introduce a new API hk_num_online_cpus(), that can be used to
retrieve the number of online housekeeping CPUs that are meant to handle
managed IRQ jobs.

This API is introduced for the drivers that were previously relying only
on num_online_cpus() to determine the number of MSIX vectors to create.
In an RT environment with large isolated but fewer housekeeping CPUs this
was leading to a situation where an attempt to move all of the vectors
corresponding to isolated CPUs to housekeeping CPUs were failing due to
per CPU vector limit.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 include/linux/sched/isolation.h | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index cc9f393e2a70..2e96b626e02e 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
 	return true;
 }
 
+static inline unsigned int hk_num_online_cpus(void)
+{
+#ifdef CONFIG_CPU_ISOLATION
+	const struct cpumask *hk_mask;
+
+	if (static_branch_unlikely(&housekeeping_overridden)) {
+		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
+		return cpumask_weight(hk_mask);
+	}
+#endif
+	return cpumask_weight(cpu_online_mask);
+}
+
 #endif /* _LINUX_SCHED_ISOLATION_H */
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs
  2020-09-23 18:11 ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

Extend nohz_full feature set to include isolation from managed IRQS. This
is required specifically for setups that only uses nohz_full and still
requires isolation for maintaining lower latency for the listed CPUs.

Having this change will ensure that the kernel functions that were using
HK_FLAG_MANAGED_IRQ to derive cpumask for pinning various jobs/IRQs do not
enqueue anything on the CPUs listed under nohz_full.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 kernel/sched/isolation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..9df9598a9e39 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
 	unsigned int flags;
 
 	flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
-		HK_FLAG_MISC | HK_FLAG_KTHREAD;
+		HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_MANAGED_IRQ;
 
 	return housekeeping_setup(str, flags);
 }
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: intel-wired-lan

Extend nohz_full feature set to include isolation from managed IRQS. This
is required specifically for setups that only uses nohz_full and still
requires isolation for maintaining lower latency for the listed CPUs.

Having this change will ensure that the kernel functions that were using
HK_FLAG_MANAGED_IRQ to derive cpumask for pinning various jobs/IRQs do not
enqueue anything on the CPUs listed under nohz_full.

Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 kernel/sched/isolation.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..9df9598a9e39 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
 	unsigned int flags;
 
 	flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
-		HK_FLAG_MISC | HK_FLAG_KTHREAD;
+		HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_MANAGED_IRQ;
 
 	return housekeeping_setup(str, flags);
 }
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs
  2020-09-23 18:11 ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

In a realtime environment, it is essential to isolate unwanted IRQs from
isolated CPUs to prevent latency overheads. Creating MSIX vectors only
based on the online CPUs could lead to a potential issue on an RT setup
that has several isolated CPUs but a very few housekeeping CPUs. This is
because in these kinds of setups an attempt to move the IRQs to the limited
housekeeping CPUs from isolated CPUs might fail due to the per CPU vector
limit. This could eventually result in latency spikes because of the IRQs
that we fail to move from isolated CPUs.

This patch prevents i40e to create vectors only based on online CPUs by
using hk_num_online_cpus() instead.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e433fdbf2c3..fcb6fa3707e0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5,6 +5,7 @@
 #include <linux/of_net.h>
 #include <linux/pci.h>
 #include <linux/bpf.h>
+#include <linux/sched/isolation.h>
 #include <generated/utsrelease.h>
 
 /* Local includes */
@@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
 	 * will use any remaining vectors to reach as close as we can to the
 	 * number of online CPUs.
 	 */
-	cpus = num_online_cpus();
+	cpus = hk_num_online_cpus();
 	pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
 	vectors_left -= pf->num_lan_msix;
 
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: intel-wired-lan

In a realtime environment, it is essential to isolate unwanted IRQs from
isolated CPUs to prevent latency overheads. Creating MSIX vectors only
based on the online CPUs could lead to a potential issue on an RT setup
that has several isolated CPUs but a very few housekeeping CPUs. This is
because in these kinds of setups an attempt to move the IRQs to the limited
housekeeping CPUs from isolated CPUs might fail due to the per CPU vector
limit. This could eventually result in latency spikes because of the IRQs
that we fail to move from isolated CPUs.

This patch prevents i40e to create vectors only based on online CPUs by
using hk_num_online_cpus() instead.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
 drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e433fdbf2c3..fcb6fa3707e0 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5,6 +5,7 @@
 #include <linux/of_net.h>
 #include <linux/pci.h>
 #include <linux/bpf.h>
+#include <linux/sched/isolation.h>
 #include <generated/utsrelease.h>
 
 /* Local includes */
@@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
 	 * will use any remaining vectors to reach as close as we can to the
 	 * number of online CPUs.
 	 */
-	cpus = num_online_cpus();
+	cpus = hk_num_online_cpus();
 	pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
 	vectors_left -= pf->num_lan_msix;
 
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-23 18:11 ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
passed on by the caller based on the housekeeping online CPUs (that are
meant to perform managed IRQ jobs).

A minimum of the max_vecs passed and housekeeping online CPUs is derived
to ensure that we don't create excess vectors as that can be problematic
specifically in an RT environment. In cases where the min_vecs exceeds the
housekeeping online CPUs, max vecs is restricted based on the min_vecs
instead. The proposed change is required because for an RT environment
unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
keep the latency overhead to a minimum. If the number of housekeeping CPUs
is significantly lower than that of the isolated CPUs we can run into
failures while moving these IRQs to housekeeping CPUs due to per CPU
vector limit.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 include/linux/pci.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 835530605c0d..cf9ca9410213 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -38,6 +38,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/resource_ext.h>
+#include <linux/sched/isolation.h>
 #include <uapi/linux/pci.h>
 
 #include <linux/pci_ids.h>
@@ -1797,6 +1798,20 @@ static inline int
 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		      unsigned int max_vecs, unsigned int flags)
 {
+	unsigned int hk_cpus = hk_num_online_cpus();
+
+	/*
+	 * For a real-time environment, try to be conservative and at max only
+	 * ask for the same number of vectors as there are housekeeping online
+	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
+	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
+	 */
+	if (hk_cpus != num_online_cpus()) {
+		if (min_vecs > hk_cpus)
+			max_vecs = min_vecs;
+		else
+			max_vecs = min_t(int, max_vecs, hk_cpus);
+	}
 	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
 					      NULL);
 }
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
@ 2020-09-23 18:11   ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-23 18:11 UTC (permalink / raw)
  To: intel-wired-lan

This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
passed on by the caller based on the housekeeping online CPUs (that are
meant to perform managed IRQ jobs).

A minimum of the max_vecs passed and housekeeping online CPUs is derived
to ensure that we don't create excess vectors as that can be problematic
specifically in an RT environment. In cases where the min_vecs exceeds the
housekeeping online CPUs, max vecs is restricted based on the min_vecs
instead. The proposed change is required because for an RT environment
unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
keep the latency overhead to a minimum. If the number of housekeeping CPUs
is significantly lower than that of the isolated CPUs we can run into
failures while moving these IRQs to housekeeping CPUs due to per CPU
vector limit.

Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
 include/linux/pci.h | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 835530605c0d..cf9ca9410213 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -38,6 +38,7 @@
 #include <linux/interrupt.h>
 #include <linux/io.h>
 #include <linux/resource_ext.h>
+#include <linux/sched/isolation.h>
 #include <uapi/linux/pci.h>
 
 #include <linux/pci_ids.h>
@@ -1797,6 +1798,20 @@ static inline int
 pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
 		      unsigned int max_vecs, unsigned int flags)
 {
+	unsigned int hk_cpus = hk_num_online_cpus();
+
+	/*
+	 * For a real-time environment, try to be conservative and at max only
+	 * ask for the same number of vectors as there are housekeeping online
+	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
+	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
+	 */
+	if (hk_cpus != num_online_cpus()) {
+		if (min_vecs > hk_cpus)
+			max_vecs = min_vecs;
+		else
+			max_vecs = min_t(int, max_vecs, hk_cpus);
+	}
 	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
 					      NULL);
 }
-- 
2.18.2


^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-24  8:40     ` peterz
  -1 siblings, 0 replies; 38+ messages in thread
From: peterz @ 2020-09-24  8:40 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot

On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)

This breaks with the established naming of that header.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24  8:40     ` peterz
  0 siblings, 0 replies; 38+ messages in thread
From: peterz @ 2020-09-24  8:40 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)

This breaks with the established naming of that header.

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-24  8:40     ` [Intel-wired-lan] " peterz
@ 2020-09-24 12:09       ` Frederic Weisbecker
  -1 siblings, 0 replies; 38+ messages in thread
From: Frederic Weisbecker @ 2020-09-24 12:09 UTC (permalink / raw)
  To: peterz
  Cc: Nitesh Narayan Lal, linux-kernel, netdev, linux-pci,
	intel-wired-lan, mtosatti, sassmann, jesse.brandeburg,
	lihong.yang, helgaas, jeffrey.t.kirsher, jacob.e.keller, jlelli,
	hch, bhelgaas, mike.marciniszyn, dennis.dalessandro,
	thomas.lendacky, jerinj, mathias.nyman, jiri, mingo, juri.lelli,
	vincent.guittot

On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz@infradead.org wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> > Introduce a new API hk_num_online_cpus(), that can be used to
> > retrieve the number of online housekeeping CPUs that are meant to handle
> > managed IRQ jobs.
> > 
> > This API is introduced for the drivers that were previously relying only
> > on num_online_cpus() to determine the number of MSIX vectors to create.
> > In an RT environment with large isolated but fewer housekeeping CPUs this
> > was leading to a situation where an attempt to move all of the vectors
> > corresponding to isolated CPUs to housekeeping CPUs were failing due to
> > per CPU vector limit.
> > 
> > Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> > ---
> >  include/linux/sched/isolation.h | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index cc9f393e2a70..2e96b626e02e 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> >  	return true;
> >  }
> >  
> > +static inline unsigned int hk_num_online_cpus(void)
> 
> This breaks with the established naming of that header.

I guess we can make it housekeeping_num_online_cpus() ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 12:09       ` Frederic Weisbecker
  0 siblings, 0 replies; 38+ messages in thread
From: Frederic Weisbecker @ 2020-09-24 12:09 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz at infradead.org wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> > Introduce a new API hk_num_online_cpus(), that can be used to
> > retrieve the number of online housekeeping CPUs that are meant to handle
> > managed IRQ jobs.
> > 
> > This API is introduced for the drivers that were previously relying only
> > on num_online_cpus() to determine the number of MSIX vectors to create.
> > In an RT environment with large isolated but fewer housekeeping CPUs this
> > was leading to a situation where an attempt to move all of the vectors
> > corresponding to isolated CPUs to housekeeping CPUs were failing due to
> > per CPU vector limit.
> > 
> > Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> > ---
> >  include/linux/sched/isolation.h | 13 +++++++++++++
> >  1 file changed, 13 insertions(+)
> > 
> > diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> > index cc9f393e2a70..2e96b626e02e 100644
> > --- a/include/linux/sched/isolation.h
> > +++ b/include/linux/sched/isolation.h
> > @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
> >  	return true;
> >  }
> >  
> > +static inline unsigned int hk_num_online_cpus(void)
> 
> This breaks with the established naming of that header.

I guess we can make it housekeeping_num_online_cpus() ?

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-24 12:11     ` Frederic Weisbecker
  -1 siblings, 0 replies; 38+ messages in thread
From: Frederic Weisbecker @ 2020-09-24 12:11 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, mtosatti,
	sassmann, jesse.brandeburg, lihong.yang, helgaas,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> +	const struct cpumask *hk_mask;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);

HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:

housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
completely arbitrary otherwise.

> +		return cpumask_weight(hk_mask);
> +	}
> +#endif
> +	return cpumask_weight(cpu_online_mask);
> +}
> +
>  #endif /* _LINUX_SCHED_ISOLATION_H */
> -- 
> 2.18.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 12:11     ` Frederic Weisbecker
  0 siblings, 0 replies; 38+ messages in thread
From: Frederic Weisbecker @ 2020-09-24 12:11 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> +	const struct cpumask *hk_mask;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);

HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:

housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
completely arbitrary otherwise.

> +		return cpumask_weight(hk_mask);
> +	}
> +#endif
> +	return cpumask_weight(cpu_online_mask);
> +}
> +
>  #endif /* _LINUX_SCHED_ISOLATION_H */
> -- 
> 2.18.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-24 12:09       ` [Intel-wired-lan] " Frederic Weisbecker
@ 2020-09-24 12:23         ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 12:23 UTC (permalink / raw)
  To: Frederic Weisbecker, peterz
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, mtosatti,
	sassmann, jesse.brandeburg, lihong.yang, helgaas,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot


[-- Attachment #1.1: Type: text/plain, Size: 1528 bytes --]


On 9/24/20 8:09 AM, Frederic Weisbecker wrote:
> On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz@infradead.org wrote:
>> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>>> Introduce a new API hk_num_online_cpus(), that can be used to
>>> retrieve the number of online housekeeping CPUs that are meant to handle
>>> managed IRQ jobs.
>>>
>>> This API is introduced for the drivers that were previously relying only
>>> on num_online_cpus() to determine the number of MSIX vectors to create.
>>> In an RT environment with large isolated but fewer housekeeping CPUs this
>>> was leading to a situation where an attempt to move all of the vectors
>>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>>> per CPU vector limit.
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>> ---
>>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>>> index cc9f393e2a70..2e96b626e02e 100644
>>> --- a/include/linux/sched/isolation.h
>>> +++ b/include/linux/sched/isolation.h
>>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>>  	return true;
>>>  }
>>>  
>>> +static inline unsigned int hk_num_online_cpus(void)
>> This breaks with the established naming of that header.
> I guess we can make it housekeeping_num_online_cpus() ?

Right, I can do that.

Thanks

>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 12:23         ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 12:23 UTC (permalink / raw)
  To: intel-wired-lan


On 9/24/20 8:09 AM, Frederic Weisbecker wrote:
> On Thu, Sep 24, 2020 at 10:40:29AM +0200, peterz at infradead.org wrote:
>> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>>> Introduce a new API hk_num_online_cpus(), that can be used to
>>> retrieve the number of online housekeeping CPUs that are meant to handle
>>> managed IRQ jobs.
>>>
>>> This API is introduced for the drivers that were previously relying only
>>> on num_online_cpus() to determine the number of MSIX vectors to create.
>>> In an RT environment with large isolated but fewer housekeeping CPUs this
>>> was leading to a situation where an attempt to move all of the vectors
>>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>>> per CPU vector limit.
>>>
>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>> ---
>>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>>  1 file changed, 13 insertions(+)
>>>
>>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>>> index cc9f393e2a70..2e96b626e02e 100644
>>> --- a/include/linux/sched/isolation.h
>>> +++ b/include/linux/sched/isolation.h
>>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>>  	return true;
>>>  }
>>>  
>>> +static inline unsigned int hk_num_online_cpus(void)
>> This breaks with the established naming of that header.
> I guess we can make it housekeeping_num_online_cpus() ?

Right, I can do that.

Thanks

>
-- 
Nitesh

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20200924/12dbc9ec/attachment-0001.asc>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-24 12:09       ` [Intel-wired-lan] " Frederic Weisbecker
@ 2020-09-24 12:24         ` Peter Zijlstra
  -1 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-09-24 12:24 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Nitesh Narayan Lal, linux-kernel, netdev, linux-pci,
	intel-wired-lan, mtosatti, sassmann, jesse.brandeburg,
	lihong.yang, helgaas, jeffrey.t.kirsher, jacob.e.keller, jlelli,
	hch, bhelgaas, mike.marciniszyn, dennis.dalessandro,
	thomas.lendacky, jerinj, mathias.nyman, jiri, mingo, juri.lelli,
	vincent.guittot

On Thu, Sep 24, 2020 at 02:09:57PM +0200, Frederic Weisbecker wrote:

> > > +static inline unsigned int hk_num_online_cpus(void)
> > 
> > This breaks with the established naming of that header.
> 
> I guess we can make it housekeeping_num_online_cpus() ?

That would be consistent :-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 12:24         ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-09-24 12:24 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Sep 24, 2020 at 02:09:57PM +0200, Frederic Weisbecker wrote:

> > > +static inline unsigned int hk_num_online_cpus(void)
> > 
> > This breaks with the established naming of that header.
> 
> I guess we can make it housekeeping_num_online_cpus() ?

That would be consistent :-)

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-24 12:11     ` [Intel-wired-lan] " Frederic Weisbecker
@ 2020-09-24 12:26       ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 12:26 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, mtosatti,
	sassmann, jesse.brandeburg, lihong.yang, helgaas,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot


[-- Attachment #1.1: Type: text/plain, Size: 1959 bytes --]


On 9/24/20 8:11 AM, Frederic Weisbecker wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>  	return true;
>>  }
>>  
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> +	const struct cpumask *hk_mask;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:
>
> housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
> completely arbitrary otherwise.


Yeap that is more sensible, I will do that.
Do you have any other concerns/suggestions on any other patch?

>
>> +		return cpumask_weight(hk_mask);
>> +	}
>> +#endif
>> +	return cpumask_weight(cpu_online_mask);
>> +}
>> +
>>  #endif /* _LINUX_SCHED_ISOLATION_H */
>> -- 
>> 2.18.2
>>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 12:26       ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 12:26 UTC (permalink / raw)
  To: intel-wired-lan


On 9/24/20 8:11 AM, Frederic Weisbecker wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>  	return true;
>>  }
>>  
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> +	const struct cpumask *hk_mask;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> HK_FLAG_MANAGED_IRQ should be pass as an argument to the function:
>
> housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ) because it's
> completely arbitrary otherwise.


Yeap that is more sensible, I will do that.
Do you have any other concerns/suggestions on any other patch?

>
>> +		return cpumask_weight(hk_mask);
>> +	}
>> +#endif
>> +	return cpumask_weight(cpu_online_mask);
>> +}
>> +
>>  #endif /* _LINUX_SCHED_ISOLATION_H */
>> -- 
>> 2.18.2
>>
-- 
Thanks
Nitesh

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20200924/8a3add1d/attachment-0001.asc>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-24 12:46     ` Peter Zijlstra
  -1 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-09-24 12:46 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot



FWIW, cross-posting to moderated lists is annoying. I don't know why we
allow them in MAINTAINERS :-(

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 12:46     ` Peter Zijlstra
  0 siblings, 0 replies; 38+ messages in thread
From: Peter Zijlstra @ 2020-09-24 12:46 UTC (permalink / raw)
  To: intel-wired-lan



FWIW, cross-posting to moderated lists is annoying. I don't know why we
allow them in MAINTAINERS :-(

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-24 12:46     ` [Intel-wired-lan] " Peter Zijlstra
@ 2020-09-24 13:45       ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 13:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, juri.lelli, vincent.guittot


[-- Attachment #1.1: Type: text/plain, Size: 342 bytes --]


On 9/24/20 8:46 AM, Peter Zijlstra wrote:
>
> FWIW, cross-posting to moderated lists is annoying. I don't know why we
> allow them in MAINTAINERS :-(

Yeah, it sends out an acknowledgment for every email.
I had to include it because sending the patches to it apparently allows them
to get tested by Intel folks.

>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 13:45       ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 13:45 UTC (permalink / raw)
  To: intel-wired-lan


On 9/24/20 8:46 AM, Peter Zijlstra wrote:
>
> FWIW, cross-posting to moderated lists is annoying. I don't know why we
> allow them in MAINTAINERS :-(

Yeah, it sends out an acknowledgment for every email.
I had to include it because sending the patches to it apparently allows them
to get tested by Intel folks.

>
-- 
Nitesh

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20200924/a85c79c8/attachment.asc>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-24 20:45     ` Bjorn Helgaas
  -1 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 20:45 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

Possible subject:

  PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs

On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
> passed on by the caller based on the housekeeping online CPUs (that are
> meant to perform managed IRQ jobs).
>
> A minimum of the max_vecs passed and housekeeping online CPUs is derived
> to ensure that we don't create excess vectors as that can be problematic
> specifically in an RT environment. In cases where the min_vecs exceeds the
> housekeeping online CPUs, max vecs is restricted based on the min_vecs
> instead. The proposed change is required because for an RT environment
> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
> keep the latency overhead to a minimum. If the number of housekeeping CPUs
> is significantly lower than that of the isolated CPUs we can run into
> failures while moving these IRQs to housekeeping CPUs due to per CPU
> vector limit.

Does this capture enough of the log?

  If we have isolated CPUs dedicated for use by real-time tasks, we
  try to move IRQs to housekeeping CPUs to reduce overhead on the
  isolated CPUs.

  If we allocate too many IRQ vectors, moving them all to housekeeping
  CPUs may exceed per-CPU vector limits.

  When we have isolated CPUs, limit the number of vectors allocated by
  pci_alloc_irq_vectors() to the minimum number required by the
  driver, or to one per housekeeping CPU if that is larger.

> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/pci.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 835530605c0d..cf9ca9410213 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -38,6 +38,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/resource_ext.h>
> +#include <linux/sched/isolation.h>
>  #include <uapi/linux/pci.h>
>  
>  #include <linux/pci_ids.h>
> @@ -1797,6 +1798,20 @@ static inline int
>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>  		      unsigned int max_vecs, unsigned int flags)
>  {
> +	unsigned int hk_cpus = hk_num_online_cpus();
> +
> +	/*
> +	 * For a real-time environment, try to be conservative and at max only
> +	 * ask for the same number of vectors as there are housekeeping online
> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
> +	 */
> +	if (hk_cpus != num_online_cpus()) {
> +		if (min_vecs > hk_cpus)
> +			max_vecs = min_vecs;
> +		else
> +			max_vecs = min_t(int, max_vecs, hk_cpus);
> +	}

Is the below basically the same?

	/*
	 * If we have isolated CPUs for use by real-time tasks,
	 * minimize overhead on those CPUs by moving IRQs to the
	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
	 * housekeeping CPUs from running out of IRQ vectors.
	 */
	if (housekeeping_cpus < num_online_cpus()) {
		if (housekeeping_cpus < min_vecs)
			max_vecs = min_vecs;
		else if (housekeeping_cpus < max_vecs)
			max_vecs = housekeeping_cpus;
	}

My comment isn't quite right because this patch only limits the number
of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
I don't know where the move happens (or maybe you just avoid assigning
IRQs to isolated CPUs, and I don't know how that happens either).

>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>  					      NULL);
>  }
> -- 
> 2.18.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
@ 2020-09-24 20:45     ` Bjorn Helgaas
  0 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 20:45 UTC (permalink / raw)
  To: intel-wired-lan

Possible subject:

  PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs

On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
> passed on by the caller based on the housekeeping online CPUs (that are
> meant to perform managed IRQ jobs).
>
> A minimum of the max_vecs passed and housekeeping online CPUs is derived
> to ensure that we don't create excess vectors as that can be problematic
> specifically in an RT environment. In cases where the min_vecs exceeds the
> housekeeping online CPUs, max vecs is restricted based on the min_vecs
> instead. The proposed change is required because for an RT environment
> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
> keep the latency overhead to a minimum. If the number of housekeeping CPUs
> is significantly lower than that of the isolated CPUs we can run into
> failures while moving these IRQs to housekeeping CPUs due to per CPU
> vector limit.

Does this capture enough of the log?

  If we have isolated CPUs dedicated for use by real-time tasks, we
  try to move IRQs to housekeeping CPUs to reduce overhead on the
  isolated CPUs.

  If we allocate too many IRQ vectors, moving them all to housekeeping
  CPUs may exceed per-CPU vector limits.

  When we have isolated CPUs, limit the number of vectors allocated by
  pci_alloc_irq_vectors() to the minimum number required by the
  driver, or to one per housekeeping CPU if that is larger.

> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/pci.h | 15 +++++++++++++++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 835530605c0d..cf9ca9410213 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -38,6 +38,7 @@
>  #include <linux/interrupt.h>
>  #include <linux/io.h>
>  #include <linux/resource_ext.h>
> +#include <linux/sched/isolation.h>
>  #include <uapi/linux/pci.h>
>  
>  #include <linux/pci_ids.h>
> @@ -1797,6 +1798,20 @@ static inline int
>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>  		      unsigned int max_vecs, unsigned int flags)
>  {
> +	unsigned int hk_cpus = hk_num_online_cpus();
> +
> +	/*
> +	 * For a real-time environment, try to be conservative and at max only
> +	 * ask for the same number of vectors as there are housekeeping online
> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
> +	 */
> +	if (hk_cpus != num_online_cpus()) {
> +		if (min_vecs > hk_cpus)
> +			max_vecs = min_vecs;
> +		else
> +			max_vecs = min_t(int, max_vecs, hk_cpus);
> +	}

Is the below basically the same?

	/*
	 * If we have isolated CPUs for use by real-time tasks,
	 * minimize overhead on those CPUs by moving IRQs to the
	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
	 * housekeeping CPUs from running out of IRQ vectors.
	 */
	if (housekeeping_cpus < num_online_cpus()) {
		if (housekeeping_cpus < min_vecs)
			max_vecs = min_vecs;
		else if (housekeeping_cpus < max_vecs)
			max_vecs = housekeeping_cpus;
	}

My comment isn't quite right because this patch only limits the number
of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
I don't know where the move happens (or maybe you just avoid assigning
IRQs to isolated CPUs, and I don't know how that happens either).

>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>  					      NULL);
>  }
> -- 
> 2.18.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-24 20:47     ` Bjorn Helgaas
  -1 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 20:47 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> +	const struct cpumask *hk_mask;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> +		return cpumask_weight(hk_mask);
> +	}
> +#endif
> +	return cpumask_weight(cpu_online_mask);

Just curious: why is this not

  #ifdef CONFIG_CPU_ISOLATION
  ...
  #endif
    return num_online_cpus();

> +}
> +
>  #endif /* _LINUX_SCHED_ISOLATION_H */
> -- 
> 2.18.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 20:47     ` Bjorn Helgaas
  0 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 20:47 UTC (permalink / raw)
  To: intel-wired-lan

On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
> Introduce a new API hk_num_online_cpus(), that can be used to
> retrieve the number of online housekeeping CPUs that are meant to handle
> managed IRQ jobs.
> 
> This API is introduced for the drivers that were previously relying only
> on num_online_cpus() to determine the number of MSIX vectors to create.
> In an RT environment with large isolated but fewer housekeeping CPUs this
> was leading to a situation where an attempt to move all of the vectors
> corresponding to isolated CPUs to housekeeping CPUs were failing due to
> per CPU vector limit.
> 
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
>  include/linux/sched/isolation.h | 13 +++++++++++++
>  1 file changed, 13 insertions(+)
> 
> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
> index cc9f393e2a70..2e96b626e02e 100644
> --- a/include/linux/sched/isolation.h
> +++ b/include/linux/sched/isolation.h
> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>  	return true;
>  }
>  
> +static inline unsigned int hk_num_online_cpus(void)
> +{
> +#ifdef CONFIG_CPU_ISOLATION
> +	const struct cpumask *hk_mask;
> +
> +	if (static_branch_unlikely(&housekeeping_overridden)) {
> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
> +		return cpumask_weight(hk_mask);
> +	}
> +#endif
> +	return cpumask_weight(cpu_online_mask);

Just curious: why is this not

  #ifdef CONFIG_CPU_ISOLATION
  ...
  #endif
    return num_online_cpus();

> +}
> +
>  #endif /* _LINUX_SCHED_ISOLATION_H */
> -- 
> 2.18.2
> 

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-24 20:45     ` [Intel-wired-lan] " Bjorn Helgaas
@ 2020-09-24 21:39       ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 21:39 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot


[-- Attachment #1.1: Type: text/plain, Size: 5133 bytes --]


On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
> Possible subject:
>
>   PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs

Will switch to this.

>
> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
>> passed on by the caller based on the housekeeping online CPUs (that are
>> meant to perform managed IRQ jobs).
>>
>> A minimum of the max_vecs passed and housekeeping online CPUs is derived
>> to ensure that we don't create excess vectors as that can be problematic
>> specifically in an RT environment. In cases where the min_vecs exceeds the
>> housekeeping online CPUs, max vecs is restricted based on the min_vecs
>> instead. The proposed change is required because for an RT environment
>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
>> keep the latency overhead to a minimum. If the number of housekeeping CPUs
>> is significantly lower than that of the isolated CPUs we can run into
>> failures while moving these IRQs to housekeeping CPUs due to per CPU
>> vector limit.
> Does this capture enough of the log?
>
>   If we have isolated CPUs dedicated for use by real-time tasks, we
>   try to move IRQs to housekeeping CPUs to reduce overhead on the
>   isolated CPUs.

How about:
"
If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
overhead on these real-time CPUs.
"

What do you think?

>
>   If we allocate too many IRQ vectors, moving them all to housekeeping
>   CPUs may exceed per-CPU vector limits.
>
>   When we have isolated CPUs, limit the number of vectors allocated by
>   pci_alloc_irq_vectors() to the minimum number required by the
>   driver, or to one per housekeeping CPU if that is larger

I think this is good, I can adopt this.

> .
>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/pci.h | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 835530605c0d..cf9ca9410213 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -38,6 +38,7 @@
>>  #include <linux/interrupt.h>
>>  #include <linux/io.h>
>>  #include <linux/resource_ext.h>
>> +#include <linux/sched/isolation.h>
>>  #include <uapi/linux/pci.h>
>>  
>>  #include <linux/pci_ids.h>
>> @@ -1797,6 +1798,20 @@ static inline int
>>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>  		      unsigned int max_vecs, unsigned int flags)
>>  {
>> +	unsigned int hk_cpus = hk_num_online_cpus();
>> +
>> +	/*
>> +	 * For a real-time environment, try to be conservative and at max only
>> +	 * ask for the same number of vectors as there are housekeeping online
>> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
>> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
>> +	 */
>> +	if (hk_cpus != num_online_cpus()) {
>> +		if (min_vecs > hk_cpus)
>> +			max_vecs = min_vecs;
>> +		else
>> +			max_vecs = min_t(int, max_vecs, hk_cpus);
>> +	}
> Is the below basically the same?
>
> 	/*
> 	 * If we have isolated CPUs for use by real-time tasks,
> 	 * minimize overhead on those CPUs by moving IRQs to the
> 	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
> 	 * housekeeping CPUs from running out of IRQ vectors.
> 	 */

How about the following as a comment:

"
If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
latency overhead is minimized on those CPUs by moving the IRQ vectors to
the housekeeping CPUs. Limit the number of vectors to keep housekeeping
CPUs from running out of IRQ vectors.

"

> 	if (housekeeping_cpus < num_online_cpus()) {
> 		if (housekeeping_cpus < min_vecs)
> 			max_vecs = min_vecs;
> 		else if (housekeeping_cpus < max_vecs)
> 			max_vecs = housekeeping_cpus;
> 	}

The only reason I went with hk_cpus instead of housekeeping_cpus is because
at other places in the kernel I found a similar naming convention (eg.
hk_mask, hk_flags etc.).
But if housekeeping_cpus makes things more clear, I can switch to that
instead.

Although after Frederic and Peter's suggestion the previous call will change
to something like:

"
housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
"

Which should still falls in the that 80 chars per line limit.

>
> My comment isn't quite right because this patch only limits the number
> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.

Yeap it doesn't move IRQs to the housekeeping CPUs.

> I don't know where the move happens (or maybe you just avoid assigning
> IRQs to isolated CPUs, and I don't know how that happens either).

This can happen in the userspace, either manually or by some application
such as tuned.

>
>>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>  					      NULL);
>>  }
>> -- 
>> 2.18.2
>>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
@ 2020-09-24 21:39       ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 21:39 UTC (permalink / raw)
  To: intel-wired-lan


On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
> Possible subject:
>
>   PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs

Will switch to this.

>
> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
>> passed on by the caller based on the housekeeping online CPUs (that are
>> meant to perform managed IRQ jobs).
>>
>> A minimum of the max_vecs passed and housekeeping online CPUs is derived
>> to ensure that we don't create excess vectors as that can be problematic
>> specifically in an RT environment. In cases where the min_vecs exceeds the
>> housekeeping online CPUs, max vecs is restricted based on the min_vecs
>> instead. The proposed change is required because for an RT environment
>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
>> keep the latency overhead to a minimum. If the number of housekeeping CPUs
>> is significantly lower than that of the isolated CPUs we can run into
>> failures while moving these IRQs to housekeeping CPUs due to per CPU
>> vector limit.
> Does this capture enough of the log?
>
>   If we have isolated CPUs dedicated for use by real-time tasks, we
>   try to move IRQs to housekeeping CPUs to reduce overhead on the
>   isolated CPUs.

How about:
"
If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
overhead on these real-time CPUs.
"

What do you think?

>
>   If we allocate too many IRQ vectors, moving them all to housekeeping
>   CPUs may exceed per-CPU vector limits.
>
>   When we have isolated CPUs, limit the number of vectors allocated by
>   pci_alloc_irq_vectors() to the minimum number required by the
>   driver, or to one per housekeeping CPU if that is larger

I think this is good, I can adopt this.

> .
>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/pci.h | 15 +++++++++++++++
>>  1 file changed, 15 insertions(+)
>>
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 835530605c0d..cf9ca9410213 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -38,6 +38,7 @@
>>  #include <linux/interrupt.h>
>>  #include <linux/io.h>
>>  #include <linux/resource_ext.h>
>> +#include <linux/sched/isolation.h>
>>  #include <uapi/linux/pci.h>
>>  
>>  #include <linux/pci_ids.h>
>> @@ -1797,6 +1798,20 @@ static inline int
>>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>  		      unsigned int max_vecs, unsigned int flags)
>>  {
>> +	unsigned int hk_cpus = hk_num_online_cpus();
>> +
>> +	/*
>> +	 * For a real-time environment, try to be conservative and at max only
>> +	 * ask for the same number of vectors as there are housekeeping online
>> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
>> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
>> +	 */
>> +	if (hk_cpus != num_online_cpus()) {
>> +		if (min_vecs > hk_cpus)
>> +			max_vecs = min_vecs;
>> +		else
>> +			max_vecs = min_t(int, max_vecs, hk_cpus);
>> +	}
> Is the below basically the same?
>
> 	/*
> 	 * If we have isolated CPUs for use by real-time tasks,
> 	 * minimize overhead on those CPUs by moving IRQs to the
> 	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
> 	 * housekeeping CPUs from running out of IRQ vectors.
> 	 */

How about the following as a comment:

"
If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
latency overhead is minimized on those CPUs by moving the IRQ vectors to
the housekeeping CPUs. Limit the number of vectors to keep housekeeping
CPUs from running out of IRQ vectors.

"

> 	if (housekeeping_cpus < num_online_cpus()) {
> 		if (housekeeping_cpus < min_vecs)
> 			max_vecs = min_vecs;
> 		else if (housekeeping_cpus < max_vecs)
> 			max_vecs = housekeeping_cpus;
> 	}

The only reason I went with hk_cpus instead of housekeeping_cpus is because
at other places in the kernel I found a similar naming convention (eg.
hk_mask, hk_flags etc.).
But if housekeeping_cpus makes things more clear, I can switch to that
instead.

Although after Frederic and Peter's suggestion the previous call will change
to something like:

"
housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
"

Which should still falls in the that 80 chars per line limit.

>
> My comment isn't quite right because this patch only limits the number
> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.

Yeap it doesn't move IRQs to the housekeeping CPUs.

> I don't know where the move happens (or maybe you just avoid assigning
> IRQs to isolated CPUs, and I don't know how that happens either).

This can happen in the userspace, either manually or by some application
such as tuned.

>
>>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>  					      NULL);
>>  }
>> -- 
>> 2.18.2
>>
-- 
Thanks
Nitesh

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20200924/322fa0b1/attachment.asc>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
  2020-09-24 20:47     ` [Intel-wired-lan] " Bjorn Helgaas
@ 2020-09-24 21:52       ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 21:52 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot


[-- Attachment #1.1: Type: text/plain, Size: 1879 bytes --]


On 9/24/20 4:47 PM, Bjorn Helgaas wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>  	return true;
>>  }
>>  
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> +	const struct cpumask *hk_mask;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
>> +		return cpumask_weight(hk_mask);
>> +	}
>> +#endif
>> +	return cpumask_weight(cpu_online_mask);
> Just curious: why is this not
>
>   #ifdef CONFIG_CPU_ISOLATION
>   ...
>   #endif
>     return num_online_cpus();

I think doing an atomic read is better than a bitmap operation.
Thanks for pointing this out.

>
>> +}
>> +
>>  #endif /* _LINUX_SCHED_ISOLATION_H */
>> -- 
>> 2.18.2
>>
-- 
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs
@ 2020-09-24 21:52       ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 21:52 UTC (permalink / raw)
  To: intel-wired-lan


On 9/24/20 4:47 PM, Bjorn Helgaas wrote:
> On Wed, Sep 23, 2020 at 02:11:23PM -0400, Nitesh Narayan Lal wrote:
>> Introduce a new API hk_num_online_cpus(), that can be used to
>> retrieve the number of online housekeeping CPUs that are meant to handle
>> managed IRQ jobs.
>>
>> This API is introduced for the drivers that were previously relying only
>> on num_online_cpus() to determine the number of MSIX vectors to create.
>> In an RT environment with large isolated but fewer housekeeping CPUs this
>> was leading to a situation where an attempt to move all of the vectors
>> corresponding to isolated CPUs to housekeeping CPUs were failing due to
>> per CPU vector limit.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>>  include/linux/sched/isolation.h | 13 +++++++++++++
>>  1 file changed, 13 insertions(+)
>>
>> diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
>> index cc9f393e2a70..2e96b626e02e 100644
>> --- a/include/linux/sched/isolation.h
>> +++ b/include/linux/sched/isolation.h
>> @@ -57,4 +57,17 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
>>  	return true;
>>  }
>>  
>> +static inline unsigned int hk_num_online_cpus(void)
>> +{
>> +#ifdef CONFIG_CPU_ISOLATION
>> +	const struct cpumask *hk_mask;
>> +
>> +	if (static_branch_unlikely(&housekeeping_overridden)) {
>> +		hk_mask = housekeeping_cpumask(HK_FLAG_MANAGED_IRQ);
>> +		return cpumask_weight(hk_mask);
>> +	}
>> +#endif
>> +	return cpumask_weight(cpu_online_mask);
> Just curious: why is this not
>
>   #ifdef CONFIG_CPU_ISOLATION
>   ...
>   #endif
>     return num_online_cpus();

I think doing an atomic read is better than a bitmap operation.
Thanks for pointing this out.

>
>> +}
>> +
>>  #endif /* _LINUX_SCHED_ISOLATION_H */
>> -- 
>> 2.18.2
>>
-- 
Nitesh

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20200924/45fdf978/attachment-0001.asc>

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-24 21:39       ` [Intel-wired-lan] " Nitesh Narayan Lal
@ 2020-09-24 22:59         ` Bjorn Helgaas
  -1 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 22:59 UTC (permalink / raw)
  To: Nitesh Narayan Lal
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot

On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
> 
> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
> > Possible subject:
> >
> >   PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
> 
> Will switch to this.
> 
> > On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
> >> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
> >> passed on by the caller based on the housekeeping online CPUs (that are
> >> meant to perform managed IRQ jobs).
> >>
> >> A minimum of the max_vecs passed and housekeeping online CPUs is derived
> >> to ensure that we don't create excess vectors as that can be problematic
> >> specifically in an RT environment. In cases where the min_vecs exceeds the
> >> housekeeping online CPUs, max vecs is restricted based on the min_vecs
> >> instead. The proposed change is required because for an RT environment
> >> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
> >> keep the latency overhead to a minimum. If the number of housekeeping CPUs
> >> is significantly lower than that of the isolated CPUs we can run into
> >> failures while moving these IRQs to housekeeping CPUs due to per CPU
> >> vector limit.
> > Does this capture enough of the log?
> >
> >   If we have isolated CPUs dedicated for use by real-time tasks, we
> >   try to move IRQs to housekeeping CPUs to reduce overhead on the
> >   isolated CPUs.
> 
> How about:
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
> overhead on these real-time CPUs.
> "
> 
> What do you think?

It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
really mean anything.  I think it's a detail that should be inside the
"housekeeping CPU" abstraction.

> >   If we allocate too many IRQ vectors, moving them all to housekeeping
> >   CPUs may exceed per-CPU vector limits.
> >
> >   When we have isolated CPUs, limit the number of vectors allocated by
> >   pci_alloc_irq_vectors() to the minimum number required by the
> >   driver, or to one per housekeeping CPU if that is larger
> 
> I think this is good, I can adopt this.
> 
> > .
> >
> >> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> >> ---
> >>  include/linux/pci.h | 15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >>
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index 835530605c0d..cf9ca9410213 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -38,6 +38,7 @@
> >>  #include <linux/interrupt.h>
> >>  #include <linux/io.h>
> >>  #include <linux/resource_ext.h>
> >> +#include <linux/sched/isolation.h>
> >>  #include <uapi/linux/pci.h>
> >>  
> >>  #include <linux/pci_ids.h>
> >> @@ -1797,6 +1798,20 @@ static inline int
> >>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> >>  		      unsigned int max_vecs, unsigned int flags)
> >>  {
> >> +	unsigned int hk_cpus = hk_num_online_cpus();
> >> +
> >> +	/*
> >> +	 * For a real-time environment, try to be conservative and at max only
> >> +	 * ask for the same number of vectors as there are housekeeping online
> >> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
> >> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
> >> +	 */
> >> +	if (hk_cpus != num_online_cpus()) {
> >> +		if (min_vecs > hk_cpus)
> >> +			max_vecs = min_vecs;
> >> +		else
> >> +			max_vecs = min_t(int, max_vecs, hk_cpus);
> >> +	}
> > Is the below basically the same?
> >
> > 	/*
> > 	 * If we have isolated CPUs for use by real-time tasks,
> > 	 * minimize overhead on those CPUs by moving IRQs to the
> > 	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
> > 	 * housekeeping CPUs from running out of IRQ vectors.
> > 	 */
> 
> How about the following as a comment:
> 
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
> latency overhead is minimized on those CPUs by moving the IRQ vectors to
> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
> CPUs from running out of IRQ vectors.
> 
> "
> 
> > 	if (housekeeping_cpus < num_online_cpus()) {
> > 		if (housekeeping_cpus < min_vecs)
> > 			max_vecs = min_vecs;
> > 		else if (housekeeping_cpus < max_vecs)
> > 			max_vecs = housekeeping_cpus;
> > 	}
> 
> The only reason I went with hk_cpus instead of housekeeping_cpus is because
> at other places in the kernel I found a similar naming convention (eg.
> hk_mask, hk_flags etc.).
> But if housekeeping_cpus makes things more clear, I can switch to that
> instead.
> 
> Although after Frederic and Peter's suggestion the previous call will change
> to something like:
> 
> "
> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
> "
> 
> Which should still falls in the that 80 chars per line limit.

I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
long as "housekeeping" appears in the code somewhere.  If we call
"housekeeping_num_online_cpus()" that should be enough.

> > My comment isn't quite right because this patch only limits the number
> > of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
> 
> Yeap it doesn't move IRQs to the housekeeping CPUs.
> 
> > I don't know where the move happens (or maybe you just avoid assigning
> > IRQs to isolated CPUs, and I don't know how that happens either).
> 
> This can happen in the userspace, either manually or by some application
> such as tuned.

Some brief hint about this might be helpful.

> >>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
> >>  					      NULL);
> >>  }
> >> -- 
> >> 2.18.2
> >>
> -- 
> Thanks
> Nitesh
> 




^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
@ 2020-09-24 22:59         ` Bjorn Helgaas
  0 siblings, 0 replies; 38+ messages in thread
From: Bjorn Helgaas @ 2020-09-24 22:59 UTC (permalink / raw)
  To: intel-wired-lan

On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
> 
> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
> > Possible subject:
> >
> >   PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
> 
> Will switch to this.
> 
> > On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
> >> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
> >> passed on by the caller based on the housekeeping online CPUs (that are
> >> meant to perform managed IRQ jobs).
> >>
> >> A minimum of the max_vecs passed and housekeeping online CPUs is derived
> >> to ensure that we don't create excess vectors as that can be problematic
> >> specifically in an RT environment. In cases where the min_vecs exceeds the
> >> housekeeping online CPUs, max vecs is restricted based on the min_vecs
> >> instead. The proposed change is required because for an RT environment
> >> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
> >> keep the latency overhead to a minimum. If the number of housekeeping CPUs
> >> is significantly lower than that of the isolated CPUs we can run into
> >> failures while moving these IRQs to housekeeping CPUs due to per CPU
> >> vector limit.
> > Does this capture enough of the log?
> >
> >   If we have isolated CPUs dedicated for use by real-time tasks, we
> >   try to move IRQs to housekeeping CPUs to reduce overhead on the
> >   isolated CPUs.
> 
> How about:
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
> overhead on these real-time CPUs.
> "
> 
> What do you think?

It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
really mean anything.  I think it's a detail that should be inside the
"housekeeping CPU" abstraction.

> >   If we allocate too many IRQ vectors, moving them all to housekeeping
> >   CPUs may exceed per-CPU vector limits.
> >
> >   When we have isolated CPUs, limit the number of vectors allocated by
> >   pci_alloc_irq_vectors() to the minimum number required by the
> >   driver, or to one per housekeeping CPU if that is larger
> 
> I think this is good, I can adopt this.
> 
> > .
> >
> >> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> >> ---
> >>  include/linux/pci.h | 15 +++++++++++++++
> >>  1 file changed, 15 insertions(+)
> >>
> >> diff --git a/include/linux/pci.h b/include/linux/pci.h
> >> index 835530605c0d..cf9ca9410213 100644
> >> --- a/include/linux/pci.h
> >> +++ b/include/linux/pci.h
> >> @@ -38,6 +38,7 @@
> >>  #include <linux/interrupt.h>
> >>  #include <linux/io.h>
> >>  #include <linux/resource_ext.h>
> >> +#include <linux/sched/isolation.h>
> >>  #include <uapi/linux/pci.h>
> >>  
> >>  #include <linux/pci_ids.h>
> >> @@ -1797,6 +1798,20 @@ static inline int
> >>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> >>  		      unsigned int max_vecs, unsigned int flags)
> >>  {
> >> +	unsigned int hk_cpus = hk_num_online_cpus();
> >> +
> >> +	/*
> >> +	 * For a real-time environment, try to be conservative and at max only
> >> +	 * ask for the same number of vectors as there are housekeeping online
> >> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
> >> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
> >> +	 */
> >> +	if (hk_cpus != num_online_cpus()) {
> >> +		if (min_vecs > hk_cpus)
> >> +			max_vecs = min_vecs;
> >> +		else
> >> +			max_vecs = min_t(int, max_vecs, hk_cpus);
> >> +	}
> > Is the below basically the same?
> >
> > 	/*
> > 	 * If we have isolated CPUs for use by real-time tasks,
> > 	 * minimize overhead on those CPUs by moving IRQs to the
> > 	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
> > 	 * housekeeping CPUs from running out of IRQ vectors.
> > 	 */
> 
> How about the following as a comment:
> 
> "
> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
> latency overhead is minimized on those CPUs by moving the IRQ vectors to
> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
> CPUs from running out of IRQ vectors.
> 
> "
> 
> > 	if (housekeeping_cpus < num_online_cpus()) {
> > 		if (housekeeping_cpus < min_vecs)
> > 			max_vecs = min_vecs;
> > 		else if (housekeeping_cpus < max_vecs)
> > 			max_vecs = housekeeping_cpus;
> > 	}
> 
> The only reason I went with hk_cpus instead of housekeeping_cpus is because
> at other places in the kernel I found a similar naming convention (eg.
> hk_mask, hk_flags etc.).
> But if housekeeping_cpus makes things more clear, I can switch to that
> instead.
> 
> Although after Frederic and Peter's suggestion the previous call will change
> to something like:
> 
> "
> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
> "
> 
> Which should still falls in the that 80 chars per line limit.

I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
long as "housekeeping" appears in the code somewhere.  If we call
"housekeeping_num_online_cpus()" that should be enough.

> > My comment isn't quite right because this patch only limits the number
> > of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
> 
> Yeap it doesn't move IRQs to the housekeeping CPUs.
> 
> > I don't know where the move happens (or maybe you just avoid assigning
> > IRQs to isolated CPUs, and I don't know how that happens either).
> 
> This can happen in the userspace, either manually or by some application
> such as tuned.

Some brief hint about this might be helpful.

> >>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
> >>  					      NULL);
> >>  }
> >> -- 
> >> 2.18.2
> >>
> -- 
> Thanks
> Nitesh
> 




^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
  2020-09-24 22:59         ` [Intel-wired-lan] " Bjorn Helgaas
@ 2020-09-24 23:40           ` Nitesh Narayan Lal
  -1 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 23:40 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
	mtosatti, sassmann, jesse.brandeburg, lihong.yang,
	jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
	mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jerinj,
	mathias.nyman, jiri, mingo, peterz, juri.lelli, vincent.guittot


[-- Attachment #1.1: Type: text/plain, Size: 6343 bytes --]


On 9/24/20 6:59 PM, Bjorn Helgaas wrote:
> On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
>> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
>>> Possible subject:
>>>
>>>   PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
>> Will switch to this.
>>
>>> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
>>>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
>>>> passed on by the caller based on the housekeeping online CPUs (that are
>>>> meant to perform managed IRQ jobs).
>>>>
>>>> A minimum of the max_vecs passed and housekeeping online CPUs is derived
>>>> to ensure that we don't create excess vectors as that can be problematic
>>>> specifically in an RT environment. In cases where the min_vecs exceeds the
>>>> housekeeping online CPUs, max vecs is restricted based on the min_vecs
>>>> instead. The proposed change is required because for an RT environment
>>>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
>>>> keep the latency overhead to a minimum. If the number of housekeeping CPUs
>>>> is significantly lower than that of the isolated CPUs we can run into
>>>> failures while moving these IRQs to housekeeping CPUs due to per CPU
>>>> vector limit.
>>> Does this capture enough of the log?
>>>
>>>   If we have isolated CPUs dedicated for use by real-time tasks, we
>>>   try to move IRQs to housekeeping CPUs to reduce overhead on the
>>>   isolated CPUs.
>> How about:
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
>> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
>> overhead on these real-time CPUs.
>> "
>>
>> What do you think?
> It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
> really mean anything.  I think it's a detail that should be inside the
> "housekeeping CPU" abstraction.

I get your point, in that case I will probably stick to your original
suggestion just replacing the term "overhead" with "latency overhead".

>
>>>   If we allocate too many IRQ vectors, moving them all to housekeeping
>>>   CPUs may exceed per-CPU vector limits.
>>>
>>>   When we have isolated CPUs, limit the number of vectors allocated by
>>>   pci_alloc_irq_vectors() to the minimum number required by the
>>>   driver, or to one per housekeeping CPU if that is larger
>> I think this is good, I can adopt this.
>>
>>> .
>>>
>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>> ---
>>>>  include/linux/pci.h | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 835530605c0d..cf9ca9410213 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -38,6 +38,7 @@
>>>>  #include <linux/interrupt.h>
>>>>  #include <linux/io.h>
>>>>  #include <linux/resource_ext.h>
>>>> +#include <linux/sched/isolation.h>
>>>>  #include <uapi/linux/pci.h>
>>>>  
>>>>  #include <linux/pci_ids.h>
>>>> @@ -1797,6 +1798,20 @@ static inline int
>>>>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>>  		      unsigned int max_vecs, unsigned int flags)
>>>>  {
>>>> +	unsigned int hk_cpus = hk_num_online_cpus();
>>>> +
>>>> +	/*
>>>> +	 * For a real-time environment, try to be conservative and at max only
>>>> +	 * ask for the same number of vectors as there are housekeeping online
>>>> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
>>>> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
>>>> +	 */
>>>> +	if (hk_cpus != num_online_cpus()) {
>>>> +		if (min_vecs > hk_cpus)
>>>> +			max_vecs = min_vecs;
>>>> +		else
>>>> +			max_vecs = min_t(int, max_vecs, hk_cpus);
>>>> +	}
>>> Is the below basically the same?
>>>
>>> 	/*
>>> 	 * If we have isolated CPUs for use by real-time tasks,
>>> 	 * minimize overhead on those CPUs by moving IRQs to the
>>> 	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
>>> 	 * housekeeping CPUs from running out of IRQ vectors.
>>> 	 */
>> How about the following as a comment:
>>
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
>> latency overhead is minimized on those CPUs by moving the IRQ vectors to
>> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
>> CPUs from running out of IRQ vectors.
>>
>> "
>>
>>> 	if (housekeeping_cpus < num_online_cpus()) {
>>> 		if (housekeeping_cpus < min_vecs)
>>> 			max_vecs = min_vecs;
>>> 		else if (housekeeping_cpus < max_vecs)
>>> 			max_vecs = housekeeping_cpus;
>>> 	}
>> The only reason I went with hk_cpus instead of housekeeping_cpus is because
>> at other places in the kernel I found a similar naming convention (eg.
>> hk_mask, hk_flags etc.).
>> But if housekeeping_cpus makes things more clear, I can switch to that
>> instead.
>>
>> Although after Frederic and Peter's suggestion the previous call will change
>> to something like:
>>
>> "
>> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
>> "
>>
>> Which should still falls in the that 80 chars per line limit.
> I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
> long as "housekeeping" appears in the code somewhere.  If we call
> "housekeeping_num_online_cpus()" that should be enough.

Got it, in that case we are good here.

>
>>> My comment isn't quite right because this patch only limits the number
>>> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
>> Yeap it doesn't move IRQs to the housekeeping CPUs.
>>
>>> I don't know where the move happens (or maybe you just avoid assigning
>>> IRQs to isolated CPUs, and I don't know how that happens either).
>> This can happen in the userspace, either manually or by some application
>> such as tuned.
> Some brief hint about this might be helpful.

Sure, I will try to come up with something on the lines what you suggested
i.e., it also includes the information that the IRQs are moved from the
userspace.


>
>>>>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>>>  					      NULL);
>>>>  }
>>>> -- 
>>>> 2.18.2
>>>>
>> -- 
>> Thanks
>> Nitesh
>>
>
>
-- 
Thanks
Nitesh


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [Intel-wired-lan] [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per housekeeping CPUs
@ 2020-09-24 23:40           ` Nitesh Narayan Lal
  0 siblings, 0 replies; 38+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-24 23:40 UTC (permalink / raw)
  To: intel-wired-lan


On 9/24/20 6:59 PM, Bjorn Helgaas wrote:
> On Thu, Sep 24, 2020 at 05:39:07PM -0400, Nitesh Narayan Lal wrote:
>> On 9/24/20 4:45 PM, Bjorn Helgaas wrote:
>>> Possible subject:
>>>
>>>   PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
>> Will switch to this.
>>
>>> On Wed, Sep 23, 2020 at 02:11:26PM -0400, Nitesh Narayan Lal wrote:
>>>> This patch limits the pci_alloc_irq_vectors, max_vecs argument that is
>>>> passed on by the caller based on the housekeeping online CPUs (that are
>>>> meant to perform managed IRQ jobs).
>>>>
>>>> A minimum of the max_vecs passed and housekeeping online CPUs is derived
>>>> to ensure that we don't create excess vectors as that can be problematic
>>>> specifically in an RT environment. In cases where the min_vecs exceeds the
>>>> housekeeping online CPUs, max vecs is restricted based on the min_vecs
>>>> instead. The proposed change is required because for an RT environment
>>>> unwanted IRQs are moved to the housekeeping CPUs from isolated CPUs to
>>>> keep the latency overhead to a minimum. If the number of housekeeping CPUs
>>>> is significantly lower than that of the isolated CPUs we can run into
>>>> failures while moving these IRQs to housekeeping CPUs due to per CPU
>>>> vector limit.
>>> Does this capture enough of the log?
>>>
>>>   If we have isolated CPUs dedicated for use by real-time tasks, we
>>>   try to move IRQs to housekeeping CPUs to reduce overhead on the
>>>   isolated CPUs.
>> How about:
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for the purpose
>> of real-time, we try to move IRQs to housekeeping CPUs to reduce latency
>> overhead on these real-time CPUs.
>> "
>>
>> What do you think?
> It's OK, but from the PCI core perspective, "nohz_full mode" doesn't
> really mean anything.  I think it's a detail that should be inside the
> "housekeeping CPU" abstraction.

I get your point, in that case I will probably stick to your original
suggestion just replacing the term "overhead" with "latency overhead".

>
>>>   If we allocate too many IRQ vectors, moving them all to housekeeping
>>>   CPUs may exceed per-CPU vector limits.
>>>
>>>   When we have isolated CPUs, limit the number of vectors allocated by
>>>   pci_alloc_irq_vectors() to the minimum number required by the
>>>   driver, or to one per housekeeping CPU if that is larger
>> I think this is good, I can adopt this.
>>
>>> .
>>>
>>>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>>>> ---
>>>>  include/linux/pci.h | 15 +++++++++++++++
>>>>  1 file changed, 15 insertions(+)
>>>>
>>>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>>>> index 835530605c0d..cf9ca9410213 100644
>>>> --- a/include/linux/pci.h
>>>> +++ b/include/linux/pci.h
>>>> @@ -38,6 +38,7 @@
>>>>  #include <linux/interrupt.h>
>>>>  #include <linux/io.h>
>>>>  #include <linux/resource_ext.h>
>>>> +#include <linux/sched/isolation.h>
>>>>  #include <uapi/linux/pci.h>
>>>>  
>>>>  #include <linux/pci_ids.h>
>>>> @@ -1797,6 +1798,20 @@ static inline int
>>>>  pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>>>>  		      unsigned int max_vecs, unsigned int flags)
>>>>  {
>>>> +	unsigned int hk_cpus = hk_num_online_cpus();
>>>> +
>>>> +	/*
>>>> +	 * For a real-time environment, try to be conservative and at max only
>>>> +	 * ask for the same number of vectors as there are housekeeping online
>>>> +	 * CPUs. In case, the min_vecs requested exceeds the housekeeping
>>>> +	 * online CPUs, restrict the max_vecs based on the min_vecs instead.
>>>> +	 */
>>>> +	if (hk_cpus != num_online_cpus()) {
>>>> +		if (min_vecs > hk_cpus)
>>>> +			max_vecs = min_vecs;
>>>> +		else
>>>> +			max_vecs = min_t(int, max_vecs, hk_cpus);
>>>> +	}
>>> Is the below basically the same?
>>>
>>> 	/*
>>> 	 * If we have isolated CPUs for use by real-time tasks,
>>> 	 * minimize overhead on those CPUs by moving IRQs to the
>>> 	 * remaining "housekeeping" CPUs.  Limit vector usage to keep
>>> 	 * housekeeping CPUs from running out of IRQ vectors.
>>> 	 */
>> How about the following as a comment:
>>
>> "
>> If we have isolated CPUs or CPUs running in nohz_full mode for real-time,
>> latency overhead is minimized on those CPUs by moving the IRQ vectors to
>> the housekeeping CPUs. Limit the number of vectors to keep housekeeping
>> CPUs from running out of IRQ vectors.
>>
>> "
>>
>>> 	if (housekeeping_cpus < num_online_cpus()) {
>>> 		if (housekeeping_cpus < min_vecs)
>>> 			max_vecs = min_vecs;
>>> 		else if (housekeeping_cpus < max_vecs)
>>> 			max_vecs = housekeeping_cpus;
>>> 	}
>> The only reason I went with hk_cpus instead of housekeeping_cpus is because
>> at other places in the kernel I found a similar naming convention (eg.
>> hk_mask, hk_flags etc.).
>> But if housekeeping_cpus makes things more clear, I can switch to that
>> instead.
>>
>> Although after Frederic and Peter's suggestion the previous call will change
>> to something like:
>>
>> "
>> housekeeping_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
>> "
>>
>> Which should still falls in the that 80 chars per line limit.
> I don't really care whether it's "hk_cpus" or "housekeeping_cpus" as
> long as "housekeeping" appears in the code somewhere.  If we call
> "housekeeping_num_online_cpus()" that should be enough.

Got it, in that case we are good here.

>
>>> My comment isn't quite right because this patch only limits the number
>>> of vectors; it doesn't actually *move* IRQs to the housekeeping CPUs.
>> Yeap it doesn't move IRQs to the housekeeping CPUs.
>>
>>> I don't know where the move happens (or maybe you just avoid assigning
>>> IRQs to isolated CPUs, and I don't know how that happens either).
>> This can happen in the userspace, either manually or by some application
>> such as tuned.
> Some brief hint about this might be helpful.

Sure, I will try to come up with something on the lines what you suggested
i.e., it also includes the information that the IRQs are moved from the
userspace.


>
>>>>  	return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>>>>  					      NULL);
>>>>  }
>>>> -- 
>>>> 2.18.2
>>>>
>> -- 
>> Thanks
>> Nitesh
>>
>
>
-- 
Thanks
Nitesh

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://lists.osuosl.org/pipermail/intel-wired-lan/attachments/20200924/a3be05fb/attachment-0001.asc>

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2020-09-24 23:41 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-23 18:11 [PATCH v2 0/4] isolation: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11 ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 1/4] sched/isolation: API to get housekeeping online CPUs Nitesh Narayan Lal
2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-24  8:40   ` peterz
2020-09-24  8:40     ` [Intel-wired-lan] " peterz
2020-09-24 12:09     ` Frederic Weisbecker
2020-09-24 12:09       ` [Intel-wired-lan] " Frederic Weisbecker
2020-09-24 12:23       ` Nitesh Narayan Lal
2020-09-24 12:23         ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-24 12:24       ` Peter Zijlstra
2020-09-24 12:24         ` [Intel-wired-lan] " Peter Zijlstra
2020-09-24 12:11   ` Frederic Weisbecker
2020-09-24 12:11     ` [Intel-wired-lan] " Frederic Weisbecker
2020-09-24 12:26     ` Nitesh Narayan Lal
2020-09-24 12:26       ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-24 12:46   ` Peter Zijlstra
2020-09-24 12:46     ` [Intel-wired-lan] " Peter Zijlstra
2020-09-24 13:45     ` Nitesh Narayan Lal
2020-09-24 13:45       ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-24 20:47   ` Bjorn Helgaas
2020-09-24 20:47     ` [Intel-wired-lan] " Bjorn Helgaas
2020-09-24 21:52     ` Nitesh Narayan Lal
2020-09-24 21:52       ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 3/4] i40e: limit msix vectors based on housekeeping CPUs Nitesh Narayan Lal
2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-23 18:11 ` [PATCH v2 4/4] PCI: Limit pci_alloc_irq_vectors as per " Nitesh Narayan Lal
2020-09-23 18:11   ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-24 20:45   ` Bjorn Helgaas
2020-09-24 20:45     ` [Intel-wired-lan] " Bjorn Helgaas
2020-09-24 21:39     ` Nitesh Narayan Lal
2020-09-24 21:39       ` [Intel-wired-lan] " Nitesh Narayan Lal
2020-09-24 22:59       ` Bjorn Helgaas
2020-09-24 22:59         ` [Intel-wired-lan] " Bjorn Helgaas
2020-09-24 23:40         ` Nitesh Narayan Lal
2020-09-24 23:40           ` [Intel-wired-lan] " Nitesh Narayan Lal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.