* [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts @ 2018-09-13 3:10 Dou Liyang 2018-09-13 11:00 ` Thomas Gleixner 0 siblings, 1 reply; 4+ messages in thread From: Dou Liyang @ 2018-09-13 3:10 UTC (permalink / raw) To: linux-kernel Cc: tglx, kashyap.desai, shivasharan.srikanteshwara, sumit.saxena, ming.lei, hch, douly.fnst From: Dou Liyang <douly.fnst@cn.fujitsu.com> As Kashyap and Sumit reported, in MSI/-x subsystem, the pre/post vectors may be used to some extra reply queues for performance. the best way to map the pre/post vectors is map them to the local numa node. But, current Linux can't do that, because The pre and post vectors are marked managed and their affinity mask is set to the irq default affinity mask. The default affinity mask is by default ALL cpus, but it can be tweaked both on the kernel command line and via proc. If that mask is only a subset of CPUs and all of them go offline then these vectors are shutdown in managed mode. So, clear these affinity mask and check it in alloc_desc() to leave them as regular interrupts which can be affinity controlled and also can move freely on hotplug. Note: will break the validation of affinity mask(s) Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Sumit Saxena <sumit.saxena@broadcom.com> Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Dou Liyang <douly.fnst@cn.fujitsu.com> --- kernel/irq/affinity.c | 9 ++++++--- kernel/irq/irqdesc.c | 24 ++++++++++-------------- 2 files changed, 16 insertions(+), 17 deletions(-) diff --git a/kernel/irq/affinity.c b/kernel/irq/affinity.c index f4f29b9d90ee..ba35a5050dd2 100644 --- a/kernel/irq/affinity.c +++ b/kernel/irq/affinity.c @@ -204,7 +204,7 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) /* Fill out vectors at the beginning that don't need affinity */ for (curvec = 0; curvec < affd->pre_vectors; curvec++) - cpumask_copy(masks + curvec, irq_default_affinity); + cpumask_clear(masks + curvec); /* Stabilize the cpumasks */ get_online_cpus(); @@ -234,10 +234,13 @@ irq_create_affinity_masks(int nvecs, const struct irq_affinity *affd) /* Fill out vectors at the end that don't need affinity */ if (usedvecs >= affvecs) curvec = affd->pre_vectors + affvecs; - else + else { curvec = affd->pre_vectors + usedvecs; + for (; curvec < affd->pre_vectors + affvecs; curvec++) + cpumask_copy(masks + curvec, irq_default_affinity); + } for (; curvec < nvecs; curvec++) - cpumask_copy(masks + curvec, irq_default_affinity); + cpumask_clear(masks + curvec); outnodemsk: free_node_to_cpumask(node_to_cpumask); diff --git a/kernel/irq/irqdesc.c b/kernel/irq/irqdesc.c index 578d0e5f1b5b..5cffa791a20b 100644 --- a/kernel/irq/irqdesc.c +++ b/kernel/irq/irqdesc.c @@ -453,24 +453,20 @@ static int alloc_descs(unsigned int start, unsigned int cnt, int node, { const struct cpumask *mask = NULL; struct irq_desc *desc; - unsigned int flags; + unsigned int flags = 0; int i; - /* Validate affinity mask(s) */ - if (affinity) { - for (i = 0, mask = affinity; i < cnt; i++, mask++) { - if (cpumask_empty(mask)) - return -EINVAL; - } - } - - flags = affinity ? IRQD_AFFINITY_MANAGED | IRQD_MANAGED_SHUTDOWN : 0; - mask = NULL; - for (i = 0; i < cnt; i++) { if (affinity) { - node = cpu_to_node(cpumask_first(affinity)); - mask = affinity; + if (cpumask_empty(affinity)) { + flags = 0; + mask = NULL; + } else { + flags = IRQD_AFFINITY_MANAGED | + IRQD_MANAGED_SHUTDOWN; + mask = affinity; + node = cpu_to_node(cpumask_first(affinity)); + } affinity++; } desc = alloc_desc(start + i, node, flags, mask, owner); -- 2.14.3 ^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts 2018-09-13 3:10 [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts Dou Liyang @ 2018-09-13 11:00 ` Thomas Gleixner 2018-09-20 8:39 ` Kashyap Desai 0 siblings, 1 reply; 4+ messages in thread From: Thomas Gleixner @ 2018-09-13 11:00 UTC (permalink / raw) To: Dou Liyang Cc: LKML, kashyap.desai, shivasharan.srikanteshwara, sumit.saxena, ming.lei, Christoph Hellwig, douly.fnst, Bjorn Helgaas On Thu, 13 Sep 2018, Dou Liyang wrote: > So, clear these affinity mask and check it in alloc_desc() to leave them > as regular interrupts which can be affinity controlled and also can move > freely on hotplug. This is the wrong direction as it does not allow to do initial affinity assignement for the non-managed interrupts on allocation time. And that's what Kashyap and Sumit are looking for. The trivial fix for the possible breakage when irq_default_affinity != cpu_possible_mask is to set the affinity for the pre/post vectors to cpu_possible_mask and be done with it. One other thing I noticed while staring at that is the fact, that the PCI code does not care about the return value of irq_create_affinity_masks() at all. It just proceeds if masks is NULL as if the stuff was requested with the affinity descriptor pointer being NULL. I don't think that's a brilliant idea because the drivers might rely on the interrupt being managed, but it might be a non-issue and just result in bad locality. Christoph? So back to the problem at hand. We need to change the affinity management scheme in a way which lets us differentiate between managed and unmanaged interrupts, so it allows to automatically assign (initial) affinity to all of them. Right now everything is bound to the cpumasks array, which does not allow to convey more information. So we need to come up with something different. Something like the below (does not compile and is just for reference) should do the trick. I'm not sure whether we want to convey the information (masks and bitmap) in a single data structure or not, but that's an implementation detail. Thanks, tglx 8<--------- --- a/drivers/pci/msi.c +++ b/drivers/pci/msi.c @@ -535,15 +535,16 @@ static struct msi_desc * msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd) { struct cpumask *masks = NULL; + unsigned long managed = 0; struct msi_desc *entry; u16 control; if (affd) - masks = irq_create_affinity_masks(nvec, affd); + masks = irq_create_affinity_masks(nvec, affd, &managed); /* MSI Entry Initialization */ - entry = alloc_msi_entry(&dev->dev, nvec, masks); + entry = alloc_msi_entry(&dev->dev, nvec, masks, managed); if (!entry) goto out; @@ -672,15 +673,22 @@ static int msix_setup_entries(struct pci struct msix_entry *entries, int nvec, const struct irq_affinity *affd) { + /* + * MSIX_MAX_VECTORS = 2048, i.e. 256 bytes. Might need runtime + * allocation. OTOH, are 2048 vectors realistic? + */ + DECLARE_BITMAP(managed, MSIX_MAX_VECTORS); struct cpumask *curmsk, *masks = NULL; struct msi_desc *entry; int ret, i; if (affd) - masks = irq_create_affinity_masks(nvec, affd); + masks = irq_create_affinity_masks(nvec, affd, managed); for (i = 0, curmsk = masks; i < nvec; i++) { - entry = alloc_msi_entry(&dev->dev, 1, curmsk); + unsigned long m = test_bit(i, managed) ? 1 : 0; + + entry = alloc_msi_entry(&dev->dev, 1, curmsk, m); if (!entry) { if (!i) iounmap(base); --- a/kernel/irq/msi.c +++ b/kernel/irq/msi.c @@ -27,7 +27,8 @@ * and the affinity masks from @affinity are copied. */ struct msi_desc * -alloc_msi_entry(struct device *dev, int nvec, const struct cpumask *affinity) +alloc_msi_entry(struct device *dev, int nvec, const struct cpumask *affinity, + unsigned long managed) { struct msi_desc *desc; @@ -38,6 +39,7 @@ alloc_msi_entry(struct device *dev, int INIT_LIST_HEAD(&desc->list); desc->dev = dev; desc->nvec_used = nvec; + desc->managed = managed; if (affinity) { desc->affinity = kmemdup(affinity, nvec * sizeof(*desc->affinity), GFP_KERNEL); @@ -416,7 +418,7 @@ int msi_domain_alloc_irqs(struct irq_dom virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used, dev_to_node(dev), &arg, false, - desc->affinity); + desc->affinity, desc->managed); if (virq < 0) { ret = -ENOSPC; if (ops->handle_error) ^ permalink raw reply [flat|nested] 4+ messages in thread
* RE: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts 2018-09-13 11:00 ` Thomas Gleixner @ 2018-09-20 8:39 ` Kashyap Desai 2018-09-20 12:30 ` Dou Liyang 0 siblings, 1 reply; 4+ messages in thread From: Kashyap Desai @ 2018-09-20 8:39 UTC (permalink / raw) To: Thomas Gleixner, Dou Liyang Cc: LKML, Shivasharan Srikanteshwara, Sumit Saxena, ming.lei, Christoph Hellwig, douly.fnst, Bjorn Helgaas > This is the wrong direction as it does not allow to do initial affinity > assignement for the non-managed interrupts on allocation time. And that's > what Kashyap and Sumit are looking for. > > The trivial fix for the possible breakage when irq_default_affinity != > cpu_possible_mask is to set the affinity for the pre/post vectors to > cpu_possible_mask and be done with it. > > One other thing I noticed while staring at that is the fact, that the PCI > code does not care about the return value of irq_create_affinity_masks() at > all. It just proceeds if masks is NULL as if the stuff was requested with > the affinity descriptor pointer being NULL. I don't think that's a > brilliant idea because the drivers might rely on the interrupt being > managed, but it might be a non-issue and just result in bad locality. > > Christoph? > > So back to the problem at hand. We need to change the affinity management > scheme in a way which lets us differentiate between managed and > unmanaged > interrupts, so it allows to automatically assign (initial) affinity to all > of them. > > Right now everything is bound to the cpumasks array, which does not allow > to convey more information. So we need to come up with something > different. > > Something like the below (does not compile and is just for reference) > should do the trick. I'm not sure whether we want to convey the information > (masks and bitmap) in a single data structure or not, but that's an > implementation detail. Dou - Do you want me to test your patch or shall I wait for next version ? > > Thanks, > > tglx > > 8<--------- > --- a/drivers/pci/msi.c > +++ b/drivers/pci/msi.c > @@ -535,15 +535,16 @@ static struct msi_desc * > msi_setup_entry(struct pci_dev *dev, int nvec, const struct irq_affinity *affd) > { > struct cpumask *masks = NULL; > + unsigned long managed = 0; > struct msi_desc *entry; > u16 control; > > if (affd) > - masks = irq_create_affinity_masks(nvec, affd); > + masks = irq_create_affinity_masks(nvec, affd, &managed); > > > /* MSI Entry Initialization */ > - entry = alloc_msi_entry(&dev->dev, nvec, masks); > + entry = alloc_msi_entry(&dev->dev, nvec, masks, managed); > if (!entry) > goto out; > > @@ -672,15 +673,22 @@ static int msix_setup_entries(struct pci > struct msix_entry *entries, int nvec, > const struct irq_affinity *affd) > { > + /* > + * MSIX_MAX_VECTORS = 2048, i.e. 256 bytes. Might need runtime > + * allocation. OTOH, are 2048 vectors realistic? > + */ > + DECLARE_BITMAP(managed, MSIX_MAX_VECTORS); > struct cpumask *curmsk, *masks = NULL; > struct msi_desc *entry; > int ret, i; > > if (affd) > - masks = irq_create_affinity_masks(nvec, affd); > + masks = irq_create_affinity_masks(nvec, affd, managed); > > for (i = 0, curmsk = masks; i < nvec; i++) { > - entry = alloc_msi_entry(&dev->dev, 1, curmsk); > + unsigned long m = test_bit(i, managed) ? 1 : 0; > + > + entry = alloc_msi_entry(&dev->dev, 1, curmsk, m); > if (!entry) { > if (!i) > iounmap(base); > --- a/kernel/irq/msi.c > +++ b/kernel/irq/msi.c > @@ -27,7 +27,8 @@ > * and the affinity masks from @affinity are copied. > */ > struct msi_desc * > -alloc_msi_entry(struct device *dev, int nvec, const struct cpumask *affinity) > +alloc_msi_entry(struct device *dev, int nvec, const struct cpumask *affinity, > + unsigned long managed) > { > struct msi_desc *desc; > > @@ -38,6 +39,7 @@ alloc_msi_entry(struct device *dev, int > INIT_LIST_HEAD(&desc->list); > desc->dev = dev; > desc->nvec_used = nvec; > + desc->managed = managed; > if (affinity) { > desc->affinity = kmemdup(affinity, > nvec * sizeof(*desc->affinity), GFP_KERNEL); > @@ -416,7 +418,7 @@ int msi_domain_alloc_irqs(struct irq_dom > > virq = __irq_domain_alloc_irqs(domain, -1, desc->nvec_used, > dev_to_node(dev), &arg, false, > - desc->affinity); > + desc->affinity, desc->managed); > if (virq < 0) { > ret = -ENOSPC; > if (ops->handle_error) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts 2018-09-20 8:39 ` Kashyap Desai @ 2018-09-20 12:30 ` Dou Liyang 0 siblings, 0 replies; 4+ messages in thread From: Dou Liyang @ 2018-09-20 12:30 UTC (permalink / raw) To: Kashyap Desai, Thomas Gleixner Cc: LKML, Shivasharan Srikanteshwara, Sumit Saxena, ming.lei, Christoph Hellwig, douly.fnst, Bjorn Helgaas Hi Kashyap, On 2018/9/20 16:39, Kashyap Desai worte: > Dou - Do you want me to test your patch or shall I wait for next version > ? I'm sorry, please wait for the next version. Thanks, dou ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2018-09-20 13:18 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2018-09-13 3:10 [RFC PATCH] irq/affinity: Mark the pre/post vectors as regular interrupts Dou Liyang 2018-09-13 11:00 ` Thomas Gleixner 2018-09-20 8:39 ` Kashyap Desai 2018-09-20 12:30 ` Dou Liyang
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).