* [PATCH v3 1/4] sched/isolation: API to get number of housekeeping CPUs
2020-09-25 18:26 [PATCH v3 0/4] isolation: limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
@ 2020-09-25 18:26 ` Nitesh Narayan Lal
2020-09-25 18:26 ` [PATCH v3 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
` (2 subsequent siblings)
3 siblings, 0 replies; 8+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-25 18:26 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jiri,
mingo, peterz, juri.lelli, vincent.guittot, lgoncalv
Introduce a new API housekeeping_num_online_cpus(), that can be used to
retrieve the number of online housekeeping CPUs based on the housekeeping
flag passed by the caller.
Some of the consumers for this API are the device drivers that were
previously relying only on num_online_cpus() to determine the number of
MSIX vectors to create. In real-time environments to minimize interruptions
to isolated CPUs, all device-specific IRQ vectors are often moved to the
housekeeping CPUs, having excess vectors could cause housekeeping CPU to
run out of IRQ vectors.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
include/linux/sched/isolation.h | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/include/linux/sched/isolation.h b/include/linux/sched/isolation.h
index cc9f393e2a70..e021b1846c1d 100644
--- a/include/linux/sched/isolation.h
+++ b/include/linux/sched/isolation.h
@@ -57,4 +57,13 @@ static inline bool housekeeping_cpu(int cpu, enum hk_flags flags)
return true;
}
+static inline unsigned int housekeeping_num_online_cpus(enum hk_flags flags)
+{
+#ifdef CONFIG_CPU_ISOLATION
+ if (static_branch_unlikely(&housekeeping_overridden))
+ return cpumask_weight(housekeeping_cpumask(flags));
+#endif
+ return num_online_cpus();
+}
+
#endif /* _LINUX_SCHED_ISOLATION_H */
--
2.18.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs
2020-09-25 18:26 [PATCH v3 0/4] isolation: limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
2020-09-25 18:26 ` [PATCH v3 1/4] sched/isolation: API to get number of " Nitesh Narayan Lal
@ 2020-09-25 18:26 ` Nitesh Narayan Lal
2020-09-25 18:26 ` [PATCH v3 3/4] i40e: Limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
2020-09-25 18:26 ` [PATCH v3 4/4] PCI: Limit pci_alloc_irq_vectors() " Nitesh Narayan Lal
3 siblings, 0 replies; 8+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-25 18:26 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jiri,
mingo, peterz, juri.lelli, vincent.guittot, lgoncalv
Extend nohz_full feature set to include isolation from managed IRQS. This
is required specifically for setups that only uses nohz_full and still
requires isolation for maintaining lower latency for the listed CPUs.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
kernel/sched/isolation.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/isolation.c b/kernel/sched/isolation.c
index 5a6ea03f9882..9df9598a9e39 100644
--- a/kernel/sched/isolation.c
+++ b/kernel/sched/isolation.c
@@ -141,7 +141,7 @@ static int __init housekeeping_nohz_full_setup(char *str)
unsigned int flags;
flags = HK_FLAG_TICK | HK_FLAG_WQ | HK_FLAG_TIMER | HK_FLAG_RCU |
- HK_FLAG_MISC | HK_FLAG_KTHREAD;
+ HK_FLAG_MISC | HK_FLAG_KTHREAD | HK_FLAG_MANAGED_IRQ;
return housekeeping_setup(str, flags);
}
--
2.18.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 3/4] i40e: Limit msix vectors to housekeeping CPUs
2020-09-25 18:26 [PATCH v3 0/4] isolation: limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
2020-09-25 18:26 ` [PATCH v3 1/4] sched/isolation: API to get number of " Nitesh Narayan Lal
2020-09-25 18:26 ` [PATCH v3 2/4] sched/isolation: Extend nohz_full to isolate managed IRQs Nitesh Narayan Lal
@ 2020-09-25 18:26 ` Nitesh Narayan Lal
2020-09-25 18:26 ` [PATCH v3 4/4] PCI: Limit pci_alloc_irq_vectors() " Nitesh Narayan Lal
3 siblings, 0 replies; 8+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-25 18:26 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jiri,
mingo, peterz, juri.lelli, vincent.guittot, lgoncalv
If we have isolated CPUs designated to perform real-time tasks, to keep the
latency overhead to a minimum for real-time CPUs IRQ vectors are moved to
housekeeping CPUs from the userspace. Creating MSIX vectors only based on
the online CPUs could lead to exhaustion of housekeeping CPU IRQ vectors in
such environments.
This patch prevents i40e to create vectors only based on online CPUs by
retrieving the online housekeeping CPUs that are designated to perform
managed IRQ jobs.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com>
Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
---
drivers/net/ethernet/intel/i40e/i40e_main.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/intel/i40e/i40e_main.c b/drivers/net/ethernet/intel/i40e/i40e_main.c
index 2e433fdbf2c3..370b581cd48c 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_main.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_main.c
@@ -5,6 +5,7 @@
#include <linux/of_net.h>
#include <linux/pci.h>
#include <linux/bpf.h>
+#include <linux/sched/isolation.h>
#include <generated/utsrelease.h>
/* Local includes */
@@ -11002,7 +11003,7 @@ static int i40e_init_msix(struct i40e_pf *pf)
* will use any remaining vectors to reach as close as we can to the
* number of online CPUs.
*/
- cpus = num_online_cpus();
+ cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
pf->num_lan_msix = min_t(int, cpus, vectors_left / 2);
vectors_left -= pf->num_lan_msix;
--
2.18.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH v3 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
2020-09-25 18:26 [PATCH v3 0/4] isolation: limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
` (2 preceding siblings ...)
2020-09-25 18:26 ` [PATCH v3 3/4] i40e: Limit msix vectors to housekeeping CPUs Nitesh Narayan Lal
@ 2020-09-25 18:26 ` Nitesh Narayan Lal
2020-09-25 20:23 ` Bjorn Helgaas
3 siblings, 1 reply; 8+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-25 18:26 UTC (permalink / raw)
To: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang, helgaas,
nitesh, jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jiri,
mingo, peterz, juri.lelli, vincent.guittot, lgoncalv
If we have isolated CPUs dedicated for use by real-time tasks, we try to
move IRQs to housekeeping CPUs from the userspace to reduce latency
overhead on the isolated CPUs.
If we allocate too many IRQ vectors, moving them all to housekeeping CPUs
may exceed per-CPU vector limits.
When we have isolated CPUs, limit the number of vectors allocated by
pci_alloc_irq_vectors() to the minimum number required by the driver, or
to one per housekeeping CPU if that is larger.
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
---
include/linux/pci.h | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 835530605c0d..a7b10240b778 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -38,6 +38,7 @@
#include <linux/interrupt.h>
#include <linux/io.h>
#include <linux/resource_ext.h>
+#include <linux/sched/isolation.h>
#include <uapi/linux/pci.h>
#include <linux/pci_ids.h>
@@ -1797,6 +1798,22 @@ static inline int
pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
unsigned int max_vecs, unsigned int flags)
{
+ unsigned int hk_cpus;
+
+ hk_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
+ /*
+ * If we have isolated CPUs for use by real-time tasks, to keep the
+ * latency overhead to a minimum, device-specific IRQ vectors are moved
+ * to the housekeeping CPUs from the userspace by changing their
+ * affinity mask. Limit the vector usage to keep housekeeping CPUs from
+ * running out of IRQ vectors.
+ */
+ if (hk_cpus < num_online_cpus()) {
+ if (hk_cpus < min_vecs)
+ max_vecs = min_vecs;
+ else if (hk_cpus < max_vecs)
+ max_vecs = hk_cpus;
+ }
return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
NULL);
}
--
2.18.2
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCH v3 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
2020-09-25 18:26 ` [PATCH v3 4/4] PCI: Limit pci_alloc_irq_vectors() " Nitesh Narayan Lal
@ 2020-09-25 20:23 ` Bjorn Helgaas
2020-09-25 21:38 ` Nitesh Narayan Lal
0 siblings, 1 reply; 8+ messages in thread
From: Bjorn Helgaas @ 2020-09-25 20:23 UTC (permalink / raw)
To: Nitesh Narayan Lal
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jiri,
mingo, peterz, juri.lelli, vincent.guittot, lgoncalv
On Fri, Sep 25, 2020 at 02:26:54PM -0400, Nitesh Narayan Lal wrote:
> If we have isolated CPUs dedicated for use by real-time tasks, we try to
> move IRQs to housekeeping CPUs from the userspace to reduce latency
> overhead on the isolated CPUs.
>
> If we allocate too many IRQ vectors, moving them all to housekeeping CPUs
> may exceed per-CPU vector limits.
>
> When we have isolated CPUs, limit the number of vectors allocated by
> pci_alloc_irq_vectors() to the minimum number required by the driver, or
> to one per housekeeping CPU if that is larger.
>
> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
> ---
> include/linux/pci.h | 17 +++++++++++++++++
> 1 file changed, 17 insertions(+)
>
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index 835530605c0d..a7b10240b778 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -38,6 +38,7 @@
> #include <linux/interrupt.h>
> #include <linux/io.h>
> #include <linux/resource_ext.h>
> +#include <linux/sched/isolation.h>
> #include <uapi/linux/pci.h>
>
> #include <linux/pci_ids.h>
> @@ -1797,6 +1798,22 @@ static inline int
> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
> unsigned int max_vecs, unsigned int flags)
> {
> + unsigned int hk_cpus;
> +
> + hk_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
Add blank line here before the block comment.
> + /*
> + * If we have isolated CPUs for use by real-time tasks, to keep the
> + * latency overhead to a minimum, device-specific IRQ vectors are moved
> + * to the housekeeping CPUs from the userspace by changing their
> + * affinity mask. Limit the vector usage to keep housekeeping CPUs from
> + * running out of IRQ vectors.
> + */
> + if (hk_cpus < num_online_cpus()) {
> + if (hk_cpus < min_vecs)
> + max_vecs = min_vecs;
> + else if (hk_cpus < max_vecs)
> + max_vecs = hk_cpus;
> + }
It seems like you'd want to do this inside
pci_alloc_irq_vectors_affinity() since that's an exported interface,
and drivers that use it will bypass the limiting you're doing here.
> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
> NULL);
> }
> --
> 2.18.2
>
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
2020-09-25 20:23 ` Bjorn Helgaas
@ 2020-09-25 21:38 ` Nitesh Narayan Lal
2020-09-25 23:18 ` Nitesh Narayan Lal
0 siblings, 1 reply; 8+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-25 21:38 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jiri,
mingo, peterz, juri.lelli, vincent.guittot, lgoncalv
[-- Attachment #1.1: Type: text/plain, Size: 2529 bytes --]
On 9/25/20 4:23 PM, Bjorn Helgaas wrote:
> On Fri, Sep 25, 2020 at 02:26:54PM -0400, Nitesh Narayan Lal wrote:
>> If we have isolated CPUs dedicated for use by real-time tasks, we try to
>> move IRQs to housekeeping CPUs from the userspace to reduce latency
>> overhead on the isolated CPUs.
>>
>> If we allocate too many IRQ vectors, moving them all to housekeeping CPUs
>> may exceed per-CPU vector limits.
>>
>> When we have isolated CPUs, limit the number of vectors allocated by
>> pci_alloc_irq_vectors() to the minimum number required by the driver, or
>> to one per housekeeping CPU if that is larger.
>>
>> Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
>> ---
>> include/linux/pci.h | 17 +++++++++++++++++
>> 1 file changed, 17 insertions(+)
>>
>> diff --git a/include/linux/pci.h b/include/linux/pci.h
>> index 835530605c0d..a7b10240b778 100644
>> --- a/include/linux/pci.h
>> +++ b/include/linux/pci.h
>> @@ -38,6 +38,7 @@
>> #include <linux/interrupt.h>
>> #include <linux/io.h>
>> #include <linux/resource_ext.h>
>> +#include <linux/sched/isolation.h>
>> #include <uapi/linux/pci.h>
>>
>> #include <linux/pci_ids.h>
>> @@ -1797,6 +1798,22 @@ static inline int
>> pci_alloc_irq_vectors(struct pci_dev *dev, unsigned int min_vecs,
>> unsigned int max_vecs, unsigned int flags)
>> {
>> + unsigned int hk_cpus;
>> +
>> + hk_cpus = housekeeping_num_online_cpus(HK_FLAG_MANAGED_IRQ);
> Add blank line here before the block comment.
>
>> + /*
>> + * If we have isolated CPUs for use by real-time tasks, to keep the
>> + * latency overhead to a minimum, device-specific IRQ vectors are moved
>> + * to the housekeeping CPUs from the userspace by changing their
>> + * affinity mask. Limit the vector usage to keep housekeeping CPUs from
>> + * running out of IRQ vectors.
>> + */
>> + if (hk_cpus < num_online_cpus()) {
>> + if (hk_cpus < min_vecs)
>> + max_vecs = min_vecs;
>> + else if (hk_cpus < max_vecs)
>> + max_vecs = hk_cpus;
>> + }
> It seems like you'd want to do this inside
> pci_alloc_irq_vectors_affinity() since that's an exported interface,
> and drivers that use it will bypass the limiting you're doing here.
Good point, few drivers directly use this.
I took a quick look and it seems I may also have to take the pre and the
post vectors into consideration.
>> return pci_alloc_irq_vectors_affinity(dev, min_vecs, max_vecs, flags,
>> NULL);
>> }
>> --
>> 2.18.2
>>
--
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [PATCH v3 4/4] PCI: Limit pci_alloc_irq_vectors() to housekeeping CPUs
2020-09-25 21:38 ` Nitesh Narayan Lal
@ 2020-09-25 23:18 ` Nitesh Narayan Lal
0 siblings, 0 replies; 8+ messages in thread
From: Nitesh Narayan Lal @ 2020-09-25 23:18 UTC (permalink / raw)
To: Bjorn Helgaas
Cc: linux-kernel, netdev, linux-pci, intel-wired-lan, frederic,
mtosatti, sassmann, jesse.brandeburg, lihong.yang,
jeffrey.t.kirsher, jacob.e.keller, jlelli, hch, bhelgaas,
mike.marciniszyn, dennis.dalessandro, thomas.lendacky, jiri,
mingo, peterz, juri.lelli, vincent.guittot, lgoncalv
[-- Attachment #1.1: Type: text/plain, Size: 1328 bytes --]
On 9/25/20 5:38 PM, Nitesh Narayan Lal wrote:
> On 9/25/20 4:23 PM, Bjorn Helgaas wrote:
[...]
>>> + /*
>>> + * If we have isolated CPUs for use by real-time tasks, to keep the
>>> + * latency overhead to a minimum, device-specific IRQ vectors are moved
>>> + * to the housekeeping CPUs from the userspace by changing their
>>> + * affinity mask. Limit the vector usage to keep housekeeping CPUs from
>>> + * running out of IRQ vectors.
>>> + */
>>> + if (hk_cpus < num_online_cpus()) {
>>> + if (hk_cpus < min_vecs)
>>> + max_vecs = min_vecs;
>>> + else if (hk_cpus < max_vecs)
>>> + max_vecs = hk_cpus;
>>> + }
>> It seems like you'd want to do this inside
>> pci_alloc_irq_vectors_affinity() since that's an exported interface,
>> and drivers that use it will bypass the limiting you're doing here.
> Good point, few drivers directly use this.
> I took a quick look and it seems I may also have to take the pre and the
> post vectors into consideration.
>
It seems my initial interpretation was incorrect, reserved vecs (pre+post)
should always be less than the min_vecs.
So, FWIU we should be fine in limiting the max_vecs.
I can make this change and do a repost.
Do you have any other concerns with this patch or with any of the other
patches?
[...]
--
Nitesh
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 8+ messages in thread